1 #-----------------------------------------------------------------
3 # BioPerl module Bio::Search::HSP::PsiBlastHSP
5 # (This module was originally called Bio::Tools::Blast::HSP)
7 # Please direct questions and support issues to <bioperl-l@bioperl.org>
9 # Cared for by Steve Chervitz <sac@bioperl.org>
11 # You may distribute this module under the same terms as perl itself
12 #-----------------------------------------------------------------
18 Bio::Search::HSP::PsiBlastHSP - Bioperl BLAST High-Scoring Pair object
22 See L<Bio::Search::Hit::BlastHit>.
26 A Bio::Search::HSP::PsiBlastHSP object provides an interface to data
27 obtained in a single alignment section of a Blast report (known as a
28 "High-scoring Segment Pair"). This is essentially a pairwise
29 alignment with score information.
31 PsiBlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>
32 objects after parsing a BLAST report using the L<Bio::SearchIO>
35 The construction of PsiBlastHSP objects is performed by
36 Bio::Factory::BlastHitFactory in a process that is
37 orchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).
38 The resulting PsiBlastHSPs are then accessed via
39 L<Bio::Search::Hit::BlastHit>). Therefore, you do not need to
40 use L<Bio::Search::HSP::PsiBlastHSP>) directly. If you need to construct
41 PsiBlastHSPs directly, see the new() function for details.
43 For L<Bio::SearchIO> BLAST parsing usage examples, see the
44 C<examples/search-blast> directory of the Bioperl distribution.
47 =head2 Start and End coordinates
49 Sequence endpoints are swapped so that start is always less than
50 end. This affects For TBLASTN/X hits on the minus strand. Strand
51 information can be recovered using the strand() method. This
52 normalization step is standard Bioperl practice. It also facilitates
53 use of range information by methods such as match().
57 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
61 Bio::Search::HSP::PsiBlastHSP.pm has the ability to extract a list of all
62 residue indices for identical and conservative matches along both
63 query and sbjct sequences. Since this degree of detail is not always
64 needed, this behavior does not occur during construction of the PsiBlastHSP
65 object. These data will automatically be collected as necessary as
66 the PsiBlastHSP.pm object is used.
70 Bio::Search::HSP::PsiBlastHSP.pm is a concrete class that inherits from
71 L<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.
72 L<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating
73 sequence and alignment objects, respectively.
75 =head2 Relationship to L<Bio::SimpleAlign> and L<Bio::Seq>
77 PsiBlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>
78 object via the L<seq()|seq> method. The PsiBlastHSP.pm object can also create a
79 two-sequence L<Bio::SimpleAlign> alignment object using the the query
80 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
81 objects is not automatic when constructing the PsiBlastHSP.pm object since
82 this level of functionality is not always required and would generate
83 a lot of extra overhead when crunching many reports.
90 User feedback is an integral part of the evolution of this and other
91 Bioperl modules. Send your comments and suggestions preferably to one
92 of the Bioperl mailing lists. Your participation is much appreciated.
94 bioperl-l@bioperl.org - General discussion
95 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
99 Please direct usage questions or support issues to the mailing list:
101 I<bioperl-l@bioperl.org>
103 rather than to the module maintainer directly. Many experienced and
104 reponsive experts will be able look at the problem and quickly
105 address it. Please include a thorough description of the problem
106 with code and data examples if at all possible.
108 =head2 Reporting Bugs
110 Report bugs to the Bioperl bug tracking system to help us keep track
111 the bugs and their resolution. Bug reports can be submitted via the
114 https://github.com/bioperl/bioperl-live/issues
118 Steve Chervitz E<lt>sac-at-bioperl.orgE<gt>
120 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
122 =head1 ACKNOWLEDGEMENTS
124 This software was originally developed in the Department of Genetics
125 at Stanford University. I would also like to acknowledge my
126 colleagues at Affymetrix for useful feedback.
130 Bio::Search::Hit::BlastHit.pm - Blast hit object.
131 Bio::Search::Result::BlastResult.pm - Blast Result object.
132 Bio::Seq.pm - Biosequence object
136 http://bio.perl.org/ - Bioperl Project Homepage
140 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
144 This software is provided "as is" without warranty of any kind.
149 # END of main POD documentation.
153 The rest of the documentation details each of the object methods.
154 Internal methods are usually preceded with a _
158 # Let the code begin...
160 package Bio
::Search
::HSP
::PsiBlastHSP
;
163 use Bio
::SeqFeature
::Similarity
;
165 use vars
qw($GAP_SYMBOL %STRAND_SYMBOL);
170 use base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);
172 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols.
173 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );
178 Usage : $hsp = Bio::Search::HSP::PsiBlastHSP->new( %named_params );
179 : Bio::Search::HSP::PsiBlastHSP.pm objects are constructed
180 : automatically by Bio::SearchIO::BlastHitFactory.pm,
181 : so there is no need for direct instantiation.
182 Purpose : Constructs a new PsiBlastHSP object and Initializes key variables
184 Returns : A Bio::Search::HSP::PsiBlastHSP object
185 Argument : Named parameters:
186 : Parameter keys are case-insensitive.
187 : -RAW_DATA => array ref containing raw BLAST report data for
188 : for a single HSP. This includes all lines
189 : of the HSP alignment from a traditional BLAST
190 or PSI-BLAST (non-XML) report,
191 : -RANK => integer (1..n).
192 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.).
193 : -QUERY_NAME => string, id of query sequence
194 : -HIT_NAME => string, id of hit sequence
196 Comments : Having the raw data allows this object to do lazy parsing of
197 : the raw HSP data (i.e., not parsed until needed).
199 : Note that there is a fair amount of basic parsing that is
200 : currently performed in this module that would be more appropriate
201 : to do within a separate factory object.
202 : This parsing code will likely be relocated and more initialization
203 : parameters will be added to new().
205 See Also : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>
212 my ($class, @args ) = @_;
214 my $self = $class->SUPER::new
( @args );
215 # Initialize placeholders
216 $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
217 my ($raw_data, $qname, $hname, $qlen, $hlen);
219 ($self->{'_prog'}, $self->{'_rank'}, $raw_data,
221 $self->_rearrange([qw( PROGRAM
228 # _set_data() does a fair amount of parsing.
229 # This will likely change (see comment above.)
230 $self->_set_data( @
{$raw_data} );
231 # Store the aligned query as sequence feature
232 my ($qb, $hb) = ($self->start());
233 my ($qe, $he) = ($self->end());
234 my ($qs, $hs) = ($self->strand());
235 my ($qf,$hf) = ($self->query->frame(),
238 $self->query( Bio
::SeqFeature
::Similarity
->new (-start
=>$qb,
242 -score
=>$self->score,
245 -source
=>$self->{'_prog'} ));
247 $self->hit( Bio
::SeqFeature
::Similarity
->new (-start
=>$hb,
251 -score
=>$self->score,
254 -source
=>$self->{'_prog'} ));
257 $self->query->seqlength($qlen); # query
258 $self->hit->seqlength($hlen); # subject
260 $self->query->frac_identical($self->frac_identical('query'));
261 $self->hit->frac_identical($self->frac_identical('hit'));
267 # #print STDERR "--->DESTROYING $self\n";
272 # Purpose : Intended for internal use only to provide a string for use
273 # within exception messages to help users figure out which
274 # query/hit caused the problem.
275 # Returns : Short string with name of query and hit seq
278 if( not defined $self->{'_id_str'}) {
279 my $qname = $self->query->seqname;
280 my $hname = $self->hit->seqname;
281 $self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
283 return $self->{'_id_str'};
286 #=================================================
287 # Begin Bio::Search::HSP::HSPI implementation
288 #=================================================
293 Usage : $alg = $hsp->algorithm();
294 Function: Gets the algorithm specification that was used to obtain the hsp
295 For BLAST, the algorithm denotes what type of sequence was aligned
296 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
297 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
299 Returns : a scalar string
307 my ($self,@args) = @_;
308 return $self->{'_prog'};
316 Usage : $hsp_obj->signif()
317 Purpose : Get the P-value or Expect value for the HSP.
318 Returns : Float (0.001 or 1.3e-43)
319 : Returns P-value if it is defined, otherwise, Expect value.
322 Comments : Provided for consistency with BlastHit::signif()
323 : Support for returning the significance data in different
324 : formats (e.g., exponent only), is not provided for HSP objects.
325 : This is only available for the BlastHit or Blast object.
327 See Also : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>
335 my $val ||= defined($self->{'_p'}) ?
$self->{'_p'} : $self->{'_expect'};
343 Usage : $hsp_obj->evalue()
344 Purpose : Get the Expect value for the HSP.
345 Returns : Float (0.001 or 1.3e-43)
348 Comments : Support for returning the expectation data in different
349 : formats (e.g., exponent only), is not provided for HSP objects.
350 : This is only available for the BlastHit or Blast object.
357 sub evalue
{ shift->{'_expect'} }
363 Usage : $hsp_obj->p()
364 Purpose : Get the P-value for the HSP.
365 Returns : Float (0.001 or 1.3e-43) or undef if not defined.
368 Comments : P-value is not defined with NCBI Blast2 reports.
369 : Support for returning the expectation data in different
370 : formats (e.g., exponent only) is not provided for HSP objects.
371 : This is only available for the BlastHit or Blast object.
373 See Also : L</expect>
378 sub p
{ my $self = shift; $self->{'_p'}; }
382 sub pvalue
{ shift->p(@_); }
386 Usage : $hsp->length( [seq_type] )
387 Purpose : Get the length of the aligned portion of the query or sbjct.
388 Example : $hsp->length('query')
390 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' (default = 'total')
391 ('sbjct' is synonymous with 'hit')
393 Comments : 'total' length is the full length of the alignment
394 : as reported in the denominators in the alignment section:
395 : "Identical = 34/120 Positives = 67/120".
404 ## Developer note: when using the built-in length function within
405 ## this module, call it as CORE::length().
406 my( $self, $seqType ) = @_;
407 $seqType ||= 'total';
408 $seqType = 'sbjct' if $seqType eq 'hit';
410 $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
412 ## Sensitive to member name format.
413 $seqType = "_\L$seqType\E";
414 $self->{$seqType.'Length'};
421 Usage : $hsp->gaps( [seq_type] )
422 Purpose : Get the number of gap characters in the query, sbjct, or total alignment.
423 : Also can return query gap chars and sbjct gap chars as a two-element list
424 : when in array context.
425 Example : $total_gaps = $hsp->gaps();
426 : ($qgaps, $sgaps) = $hsp->gaps();
427 : $qgaps = $hsp->gaps('query');
428 Returns : scalar context: integer
429 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
430 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
431 : ('sbjct' is synonymous with 'hit')
432 : (default = 'total', scalar context)
433 : Array context can be "induced" by providing an argument of 'list' or 'array'.
436 See Also : L</length>, L</matches>
443 my( $self, $seqType ) = @_;
445 $self->_set_seq_data() unless $self->{'_set_seq_data'};
447 $seqType ||= (wantarray ?
'list' : 'total');
448 $seqType = 'sbjct' if $seqType eq 'hit';
450 if($seqType =~ /list|array/i) {
451 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
454 if($seqType eq 'total') {
455 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
457 ## Sensitive to member name format.
458 $seqType = "_\L$seqType\E";
459 return $self->{$seqType.'Gaps'} || 0;
464 =head2 frac_identical
466 Usage : $hsp_object->frac_identical( [seq_type] );
467 Purpose : Get the fraction of identical positions within the given HSP.
468 Example : $frac_iden = $hsp_object->frac_identical('query');
469 Returns : Float (2-decimal precision, e.g., 0.75).
470 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
471 : ('sbjct' is synonymous with 'hit')
472 : default = 'total' (but see comments below).
474 Comments : Different versions of Blast report different values for the total
475 : length of the alignment. This is the number reported in the
476 : denominators in the stats section:
477 : "Identical = 34/120 Positives = 67/120".
478 : NCBI-BLAST uses the total length of the alignment (with gaps)
479 : WU-BLAST uses the length of the query sequence (without gaps).
480 : Therefore, when called without an argument or an argument of 'total',
481 : this method will report different values depending on the
482 : version of BLAST used.
484 : To get the fraction identical among only the aligned residues,
485 : ignoring the gaps, call this method with an argument of 'query'
486 : or 'sbjct' ('sbjct' is synonymous with 'hit').
488 See Also : L</frac_conserved>, L</num_identical>, L</matches>
495 # The value is calculated as opposed to storing it from the parsed results.
496 # This saves storage and also permits flexibility in determining for which
497 # sequence (query or sbjct) the figure is to be calculated.
499 my( $self, $seqType ) = @_;
500 $seqType ||= 'total';
501 $seqType = 'sbjct' if $seqType eq 'hit';
503 if($seqType ne 'total') {
504 $self->_set_seq_data() unless $self->{'_set_seq_data'};
506 ## Sensitive to member name format.
507 $seqType = "_\L$seqType\E";
509 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
513 =head2 frac_conserved
515 Usage : $hsp_object->frac_conserved( [seq_type] );
516 Purpose : Get the fraction of conserved positions within the given HSP.
517 : (Note: 'conservative' positions are called 'positives' in the
519 Example : $frac_cons = $hsp_object->frac_conserved('query');
520 Returns : Float (2-decimal precision, e.g., 0.75).
521 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
522 : ('sbjct' is synonymous with 'hit')
523 : default = 'total' (but see comments below).
525 Comments : Different versions of Blast report different values for the total
526 : length of the alignment. This is the number reported in the
527 : denominators in the stats section:
528 : "Identical = 34/120 Positives = 67/120".
529 : NCBI-BLAST uses the total length of the alignment (with gaps)
530 : WU-BLAST uses the length of the query sequence (without gaps).
531 : Therefore, when called without an argument or an argument of 'total',
532 : this method will report different values depending on the
533 : version of BLAST used.
535 : To get the fraction conserved among only the aligned residues,
536 : ignoring the gaps, call this method with an argument of 'query'
539 See Also : L</frac_conserved>, L</num_conserved>, L</matches>
543 #--------------------
545 #--------------------
546 # The value is calculated as opposed to storing it from the parsed results.
547 # This saves storage and also permits flexibility in determining for which
548 # sequence (query or sbjct) the figure is to be calculated.
550 my( $self, $seqType ) = @_;
551 $seqType ||= 'total';
552 $seqType = 'sbjct' if $seqType eq 'hit';
554 if($seqType ne 'total') {
555 $self->_set_seq_data() unless $self->{'_set_seq_data'};
558 ## Sensitive to member name format.
559 $seqType = "_\L$seqType\E";
561 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
567 Usage : my $qseq = $hsp->query_string;
568 Function: Retrieves the query sequence of this HSP as a string
576 sub query_string
{ shift->seq_str('query'); }
582 Usage : my $hseq = $hsp->hit_string;
583 Function: Retrieves the hit sequence of this HSP as a string
591 sub hit_string
{ shift->seq_str('hit'); }
595 =head2 homology_string
597 Title : homology_string
598 Usage : my $homo_string = $hsp->homology_string;
599 Function: Retrieves the homology sequence for this HSP as a string.
600 : The homology sequence is the string of symbols in between the
601 : query and hit sequences in the alignment indicating the degree
602 : of conservation (e.g., identical, similar, not similar).
609 sub homology_string
{ shift->seq_str('match'); }
612 #=================================================
613 # End Bio::Search::HSP::HSPI implementation
614 #=================================================
616 # Older method delegating to method defined in HSPI.
620 See L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>
625 sub expect
{ shift->evalue( @_ ); }
631 Usage : $hsp->rank( [string] );
632 Purpose : Get the rank of the HSP within a given Blast hit.
633 Example : $rank = $hsp->rank;
634 Returns : Integer (1..n) corresponding to the order in which the HSP
635 appears in the BLAST report.
642 sub rank
{ shift->{'_rank'} }
645 # For backward compatibility
647 sub name
{ shift->rank }
653 Usage : print $hsp->to_string;
654 Function: Returns a string representation for the Blast HSP.
655 Primarily intended for debugging purposes.
657 Returns : A string of the form:
669 return "[PsiBlastHSP] " . $self->rank();
675 Usage : called automatically during object construction.
676 Purpose : Parses the raw HSP section from a flat BLAST report and
677 sets the query sequence, sbjct sequence, and the "match" data
678 : which consists of the symbols between the query and sbjct lines
680 Argument : Array (all lines for a single, complete HSP, from a raw,
681 flat (i.e., non-XML) BLAST report)
682 Throws : Propagates any exceptions from the methods called ("See Also")
684 See Also : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>
693 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
694 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
696 my $matchLine = 0; # Alternating boolean: when true, load 'match' data.
699 #print STDERR "PsiBlastHSP: set_data()\n";
701 my($line, $aln_row_len, $length_diff);
704 # Collecting data for all lines in the alignment
705 # and then storing the collections for possible processing later.
707 # Note that "match" lines may not be properly padded with spaces.
708 # This loop now properly handles such cases:
709 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
710 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
711 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
713 foreach $line( @data ) {
714 next if $line =~ /^\s*$/;
716 if( $line =~ /^ ?Score/ ) {
717 $self->_set_score_stats( $line );
718 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
719 $self->_set_match_stats( $line );
720 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
721 # Version 2.0.8 has Frame information on a separate line.
722 # Storing frame according to SeqFeature::Generic::frame()
723 # which does not contain strand info (use strand()).
724 my $frame = abs($1) - 1;
725 $self->frame( $frame );
726 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
727 push @queryList, $line;
728 $self->{'_match_indent'} = CORE
::length $1;
729 $aln_row_len = (CORE
::length $1) + (CORE
::length $2);
731 } elsif( $matchLine ) {
732 # Pad the match line with spaces if necessary.
733 $length_diff = $aln_row_len - CORE
::length $line;
734 $length_diff and $line .= ' 'x
$length_diff;
735 push @matchList, $line;
737 } elsif( $line =~ /^Sbjct/ ) {
738 push @sbjctList, $line;
741 # Storing the query and sbjct lists in case they are needed later.
742 # We could make this conditional to save memory.
743 $self->{'_queryList'} = \
@queryList;
744 $self->{'_sbjctList'} = \
@sbjctList;
746 # Storing the match list in case it is needed later.
747 $self->{'_matchList'} = \
@matchList;
749 if(not defined ($self->{'_numIdentical'})) {
750 my $id_str = $self->_id_str;
751 $self->throw( -text
=> "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
754 if(!scalar @queryList or !scalar @sbjctList) {
755 my $id_str = $self->_id_str;
756 $self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
761 =head2 _set_score_stats
763 Usage : called automatically by _set_data()
764 Purpose : Sets various score statistics obtained from the HSP listing.
765 Argument : String with any of the following formats:
766 : blast2: Score = 30.1 bits (66), Expect = 9.2
767 : blast2: Score = 158.2 bits (544), Expect(2) = e-110
768 : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
769 : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
770 Throws : Exception if the stats cannot be parsed, probably due to a change
771 : in the Blast report format.
773 See Also : L</_set_data>
777 #--------------------
778 sub _set_score_stats
{
779 #--------------------
780 my ($self, $data) = @_;
784 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
785 # blast2 format n = 1
789 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
790 # blast2 format n > 1
796 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
797 # blast1 format, n = 1
803 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
804 # blast1 format, n > 1
812 my $id_str = $self->_id_str;
813 $self->throw(-class => 'Bio::Root::Exception',
814 -text
=> "Can't parse score statistics: unrecognized format. ($id_str)",
817 $expect = "1$expect" if $expect =~ /^e/i;
818 $p = "1$p" if defined $p and $p=~ /^e/i;
820 $self->{'_expect'} = $expect;
821 $self->{'_p'} = $p || undef;
822 $self->significance( $p || $expect );
826 =head2 _set_match_stats
828 Usage : Private method; called automatically by _set_data()
829 Purpose : Sets various matching statistics obtained from the HSP listing.
830 Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
831 : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
832 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
833 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
834 : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
835 Throws : Exception if the stats cannot be parsed, probably due to a change
836 : in the Blast report format.
837 Comments : The "Gaps = " data in the HSP header has a different meaning depending
838 : on the type of Blast: for BLASTP, this number is the total number of
839 : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
840 : query sequence only. Thus, it is safer to collect the data
841 : separately by examining the actual sequence strings as is done
844 See Also : L</_set_data>, L</_set_seq>
848 #--------------------
849 sub _set_match_stats
{
850 #--------------------
851 my ($self, $data) = @_;
853 if($data =~ m!Identities = (\d+)/(\d+)!) {
855 $self->{'_numIdentical'} = $1;
856 $self->{'_totalLength'} = $2;
859 if($data =~ m!Positives = (\d+)/(\d+)!) {
861 $self->{'_numConserved'} = $1;
862 $self->{'_totalLength'} = $2;
865 if($data =~ m!Frame = ([\d+-]+)!) {
869 # Strand data is not always present in this line.
870 # _set_seq() will also set strand information.
871 if($data =~ m!Strand = (\w+) / (\w+)!) {
872 $self->{'_queryStrand'} = $1;
873 $self->{'_sbjctStrand'} = $2;
876 # if($data =~ m!Gaps = (\d+)/(\d+)!) {
877 # $self->{'_totalGaps'} = $1;
879 # $self->{'_totalGaps'} = 0;
887 Usage : called automatically when sequence data is requested.
888 Purpose : Sets the HSP sequence data for both query and sbjct sequences.
889 : Includes: start, stop, length, gaps, and raw sequence.
891 Throws : Propagates any exception thrown by _set_match_seq()
892 Comments : Uses raw data stored by _set_data() during object construction.
893 : These data are not always needed, so it is conditionally
894 : executed only upon demand by methods such as gaps(), _set_residues(),
895 : etc. _set_seq() does the dirty work.
897 See Also : L</_set_seq>
906 $self->_set_seq('query', @
{$self->{'_queryList'}});
907 $self->_set_seq('sbjct', @
{$self->{'_sbjctList'}});
909 # Liberate some memory.
910 @
{$self->{'_queryList'}} = @
{$self->{'_sbjctList'}} = ();
911 undef $self->{'_queryList'};
912 undef $self->{'_sbjctList'};
914 $self->{'_set_seq_data'} = 1;
921 Usage : called automatically by _set_seq_data()
922 : $hsp_obj->($seq_type, @data);
923 Purpose : Sets sequence information for both the query and sbjct sequences.
924 : Directly counts the number of gaps in each sequence (if gapped Blast).
925 Argument : $seq_type = 'query' or 'sbjct'
926 : @data = all seq lines with the form:
927 : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
928 Throws : Exception if data strings cannot be parsed, probably due to a change
929 : in the Blast report format.
930 Comments : Uses first argument to determine which data members to set
931 : making this method sensitive data member name changes.
932 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
933 Warning : Sequence endpoints are normalized so that start < end. This affects HSPs
934 : for TBLASTN/X hits on the minus strand. Normalization facilitates use
935 : of range information by methods such as match().
937 See Also : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>
952 if( m/(\d+) *([^\d\s]+) *(\d+)/) {
953 push @ranges, ( $1, $3 ) ;
955 #print STDERR "_set_seq found sequence \"$2\"\n";
957 $self->warn("Bad sequence data: $_");
961 if( !(scalar(@sequence) and scalar(@ranges))) {
962 my $id_str = $self->_id_str;
963 $self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str)");
966 # Sensitive to member name changes.
967 $seqType = "_\L$seqType\E";
968 $self->{$seqType.'Start'} = $ranges[0];
969 $self->{$seqType.'Stop'} = $ranges[ $#ranges ];
970 $self->{$seqType.'Seq'} = \
@sequence;
972 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
974 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
975 # Converting nucl coords to amino acid coords.
977 my $prog = $self->algorithm;
978 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
979 $self->{$seqType.'Length'} /= 3;
980 } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
981 $self->{$seqType.'Length'} /= 3;
982 } elsif($prog eq 'TBLASTX') {
983 $self->{$seqType.'Length'} /= 3;
986 if( $prog ne 'BLASTP' ) {
987 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
988 $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
989 # Normalize sequence endpoints so that start < end.
990 # Reverse complement or 'minus strand' HSPs get flipped here.
991 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
992 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =
993 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
994 $self->{$seqType.'Strand'} = 'Minus';
998 ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
999 # if($self->{'_gapped'}) {
1000 my $seqstr = join('', @sequence);
1002 my $num_gaps = CORE
::length($seqstr) - $self->{$seqType.'Length'};
1003 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
1008 =head2 _set_residues
1010 Usage : called automatically when residue data is requested.
1011 Purpose : Sets the residue numbers representing the identical and
1012 : conserved positions. These data are obtained by analyzing the
1013 : symbols between query and sbjct lines of the alignments.
1015 Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
1016 Comments : These data are not always needed, so it is conditionally
1017 : executed only upon demand by methods such as seq_inds().
1018 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
1020 See Also : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>
1030 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1032 # Using hashes to avoid saving duplicate residue numbers.
1033 my %identicalList_query = ();
1034 my %identicalList_sbjct = ();
1035 my %conservedList_query = ();
1036 my %conservedList_sbjct = ();
1038 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
1039 $aref ||= $self->{'_matchSeq'};
1040 my $seqString = join('', @
$aref );
1042 my $qseq = join('',@
{$self->{'_querySeq'}});
1043 my $sseq = join('',@
{$self->{'_sbjctSeq'}});
1044 my $resCount_query = $self->{'_queryStop'} || 0;
1045 my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
1047 my $prog = $self->algorithm;
1048 if($prog !~ /^BLASTP|^BLASTN/) {
1049 if($prog eq 'TBLASTN') {
1050 $resCount_sbjct /= 3;
1051 } elsif($prog eq 'BLASTX') {
1052 $resCount_query /= 3;
1053 } elsif($prog eq 'TBLASTX') {
1054 $resCount_query /= 3;
1055 $resCount_sbjct /= 3;
1059 my ($mchar, $schar, $qchar);
1060 while( $mchar = chop($seqString) ) {
1061 ($qchar, $schar) = (chop($qseq), chop($sseq));
1062 if( $mchar eq '+' ) {
1063 $conservedList_query{ $resCount_query } = 1;
1064 $conservedList_sbjct{ $resCount_sbjct } = 1;
1065 } elsif( $mchar ne ' ' ) {
1066 $identicalList_query{ $resCount_query } = 1;
1067 $identicalList_sbjct{ $resCount_sbjct } = 1;
1069 $resCount_query-- if $qchar ne $GAP_SYMBOL;
1070 $resCount_sbjct-- if $schar ne $GAP_SYMBOL;
1072 $self->{'_identicalRes_query'} = \
%identicalList_query;
1073 $self->{'_conservedRes_query'} = \
%conservedList_query;
1074 $self->{'_identicalRes_sbjct'} = \
%identicalList_sbjct;
1075 $self->{'_conservedRes_sbjct'} = \
%conservedList_sbjct;
1082 =head2 _set_match_seq
1084 Usage : $hsp_obj->_set_match_seq()
1085 Purpose : Set the 'match' sequence for the current HSP (symbols in between
1086 : the query and sbjct lines.)
1087 Returns : Array reference holding the match sequences lines.
1089 Throws : Exception if the _matchList field is not set.
1090 Comments : The match information is not always necessary. This method
1091 : allows it to be conditionally prepared.
1092 : Called by _set_residues>() and seq_str().
1094 See Also : L</_set_residues>, L</seq_str>
1098 #-------------------
1099 sub _set_match_seq
{
1100 #-------------------
1103 if( ! ref($self->{'_matchList'}) ) {
1104 my $id_str = $self->_id_str;
1105 $self->throw("Can't set HSP match sequence: No data ($id_str)");
1108 my @data = @
{$self->{'_matchList'}};
1113 ## Remove leading spaces; (note: aln may begin with a space
1114 ## which is why we can't use s/^ +//).
1115 s/^ {$self->{'_match_indent'}}//;
1118 # Liberate some memory.
1119 @
{$self->{'_matchList'}} = undef;
1120 $self->{'_matchList'} = undef;
1122 $self->{'_matchSeq'} = \
@sequence;
1124 return $self->{'_matchSeq'};
1130 Usage : $hsp_obj->n()
1131 Purpose : Get the N value (num HSPs on which P/Expect is based).
1132 : This value is not defined with NCBI Blast2 with gapping.
1133 Returns : Integer or null string if not defined.
1136 Comments : The 'N' value is listed in parenthesis with P/Expect value:
1137 : e.g., P(3) = 1.2e-30 ---> (N = 3).
1138 : Not defined in NCBI Blast2 with gaps.
1139 : This typically is equal to the number of HSPs but not always.
1140 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
1142 See Also : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>
1147 sub n
{ my $self = shift; $self->{'_n'} || ''; }
1153 Usage : $hsp->matches([seq_type], [start], [stop]);
1154 Purpose : Get the total number of identical and conservative matches
1155 : in the query or sbjct sequence for the given HSP. Optionally can
1156 : report data within a defined interval along the seq.
1157 : (Note: 'conservative' matches are called 'positives' in the
1159 Example : ($id,$cons) = $hsp_object->matches('hit');
1160 : ($id,$cons) = $hsp_object->matches('query',300,400);
1161 Returns : 2-element array of integers
1162 Argument : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)
1163 : ('sbjct' is synonymous with 'hit')
1164 : (2) start = Starting coordinate (optional)
1165 : (3) stop = Ending coordinate (optional)
1166 Throws : Exception if the supplied coordinates are out of range.
1167 Comments : Relies on seq_str('match') to get the string of alignment symbols
1168 : between the query and sbjct lines which are used for determining
1169 : the number of identical and conservative matches.
1171 See Also : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>
1178 my( $self, %param ) = @_;
1180 my($seqType, $beg, $end) = ($param{-SEQ
}, $param{-START
}, $param{-STOP
});
1181 $seqType ||= 'query';
1182 $seqType = 'sbjct' if $seqType eq 'hit';
1186 if(!defined $beg && !defined $end) {
1187 ## Get data for the whole alignment.
1188 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
1190 ## Get the substring representing the desired sub-section of aln.
1193 ($start,$stop) = $self->range($seqType);
1194 if($beg == 0) { $beg = $start; $end = $beg+$end; }
1195 elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
1197 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
1198 else { $end += 1;} ##ML moved from commented position below, makes
1200 # if($end > $stop) { $end = $stop; }
1201 if($beg < $start) { $beg = $start; }
1202 # else { $end += 1;}
1204 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
1206 ## ML: START fix for substr out of range error ------------------
1208 my $prog = $self->algorithm;
1209 if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
1211 $seq = substr($self->seq_str('match'),
1212 int(($beg-$start)/3), int(($end-$beg+1)/3));
1214 } elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
1216 $seq = substr($self->seq_str('match'),
1217 int(($beg-$start)/3), int(($end-$beg+1)/3));
1219 $seq = substr($self->seq_str('match'),
1220 $beg-$start, ($end-$beg));
1222 ## ML: End of fix for substr out of range error -----------------
1225 ## ML: debugging code
1226 ## This is where we get our exception. Try printing out the values going
1230 # qq(*------------MY EXCEPTION --------------------\nSeq: ") ,
1231 # $self->seq_str("$seqType"), qq("\n),$self->rank,",( index:";
1232 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:",
1233 # CORE::length $self->seq_str("$seqType");
1234 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ),
1235 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
1236 ## ML: END DEBUGGING CODE----------
1238 if(!CORE
::length $seq) {
1239 my $id_str = $self->_id_str;
1240 $self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
1242 ## Get data for a substring.
1243 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
1244 # printf "Original match seq:\n%s\n",$self->seq_str('match');
1245 $seq =~ s/ //g; # remove space (no info).
1246 my $len_cons = CORE
::length $seq;
1247 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions)
1248 my $len_id = CORE
::length $seq;
1249 push @data, ($len_id, $len_cons);
1250 # printf " HSP = %s\n id = %d; cons = %d\n", $self->rank, $len_id, $len_cons; <STDIN>;
1256 =head2 num_identical
1258 Usage : $hsp_object->num_identical();
1259 Purpose : Get the number of identical positions within the given HSP.
1260 Example : $num_iden = $hsp_object->num_identical();
1265 See Also : L</num_conserved>, L</frac_identical>
1269 #-------------------
1271 #-------------------
1274 $self->{'_numIdentical'};
1278 =head2 num_conserved
1280 Usage : $hsp_object->num_conserved();
1281 Purpose : Get the number of conserved positions within the given HSP.
1282 Example : $num_iden = $hsp_object->num_conserved();
1287 See Also : L</num_identical>, L</frac_conserved>
1291 #-------------------
1293 #-------------------
1296 $self->{'_numConserved'};
1303 Usage : $hsp->range( [seq_type] );
1304 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
1305 : in the HSP alignment.
1306 Example : ($query_beg, $query_end) = $hsp->range('query');
1307 : ($hit_beg, $hit_end) = $hsp->range('hit');
1308 Returns : Two-element array of integers
1309 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
1310 : ('sbjct' is synonymous with 'hit')
1313 See Also : L</start>, L</end>
1320 my ($self, $seqType) = @_;
1322 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1324 $seqType ||= 'query';
1325 $seqType = 'sbjct' if $seqType eq 'hit';
1327 ## Sensitive to member name changes.
1328 $seqType = "_\L$seqType\E";
1330 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
1335 Usage : $hsp->start( [seq_type] );
1336 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
1337 : in the HSP alignment.
1338 : NOTE: Start will always be less than end.
1339 : To determine strand, use $hsp->strand()
1340 Example : $query_beg = $hsp->start('query');
1341 : $hit_beg = $hsp->start('hit');
1342 : ($query_beg, $hit_beg) = $hsp->start();
1343 Returns : scalar context: integer
1344 : array context without args: list of two integers
1345 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
1346 : ('sbjct' is synonymous with 'hit')
1347 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1350 See Also : L</end>, L</range>
1357 my ($self, $seqType) = @_;
1359 $seqType ||= (wantarray ?
'list' : 'query');
1360 $seqType = 'sbjct' if $seqType eq 'hit';
1362 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1364 if($seqType =~ /list|array/i) {
1365 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
1367 ## Sensitive to member name changes.
1368 $seqType = "_\L$seqType\E";
1369 return $self->{$seqType.'Start'};
1375 Usage : $hsp->end( [seq_type] );
1376 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
1377 : in the HSP alignment.
1378 : NOTE: Start will always be less than end.
1379 : To determine strand, use $hsp->strand()
1380 Example : $query_end = $hsp->end('query');
1381 : $hit_end = $hsp->end('hit');
1382 : ($query_end, $hit_end) = $hsp->end();
1383 Returns : scalar context: integer
1384 : array context without args: list of two integers
1385 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
1386 : ('sbjct' is synonymous with 'hit')
1387 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1390 See Also : L</start>, L</range>, L</strand>
1397 my ($self, $seqType) = @_;
1399 $seqType ||= (wantarray ?
'list' : 'query');
1400 $seqType = 'sbjct' if $seqType eq 'hit';
1402 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1404 if($seqType =~ /list|array/i) {
1405 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
1407 ## Sensitive to member name changes.
1408 $seqType = "_\L$seqType\E";
1409 return $self->{$seqType.'Stop'};
1417 Usage : $hsp_object->strand( [seq_type] )
1418 Purpose : Get the strand of the query or sbjct sequence.
1419 Example : print $hsp->strand('query');
1420 : ($query_strand, $hit_strand) = $hsp->strand();
1421 Returns : -1, 0, or 1
1422 : -1 = Minus strand, +1 = Plus strand
1423 : Returns 0 if strand is not defined, which occurs
1424 : for BLASTP reports, and the query of TBLASTN
1425 : as well as the hit if BLASTX reports.
1426 : In scalar context without arguments, returns queryStrand value.
1427 : In array context without arguments, returns a two-element list
1428 : of strings (queryStrand, sbjctStrand).
1429 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1430 Argument : seq_type: 'query' or 'hit' or 'sbjct' or undef
1431 : ('sbjct' is synonymous with 'hit')
1434 See Also : L</_set_seq>, L</_set_match_stats>
1441 my( $self, $seqType ) = @_;
1443 $seqType ||= (wantarray ?
'list' : 'query');
1444 $seqType = 'sbjct' if $seqType eq 'hit';
1446 ## Sensitive to member name format.
1447 $seqType = "_\L$seqType\E";
1449 # $seqType could be '_list'.
1450 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
1452 my $prog = $self->algorithm;
1454 if($seqType =~ /list|array/i) {
1456 if( $prog eq 'BLASTP') {
1460 elsif( $prog eq 'TBLASTN') {
1462 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
1464 elsif( $prog eq 'BLASTX') {
1465 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
1469 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
1470 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
1474 return ($qstr, $hstr);
1477 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
1483 Usage : $hsp->seq( [seq_type] );
1484 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
1485 Example : $seqObj = $hsp->seq('query');
1486 Returns : Object reference for a Bio::Seq.pm object.
1487 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').
1488 : ('sbjct' is synonymous with 'hit')
1489 Throws : Propagates any exception that occurs during construction
1490 : of the Bio::Seq.pm object.
1491 Comments : The sequence is returned in an array of strings corresponding
1492 : to the strings in the original format of the Blast alignment.
1493 : (i.e., same spacing).
1495 See Also : L</seq_str>, L</seq_inds>, L<Bio::Seq>
1502 my($self,$seqType) = @_;
1503 $seqType ||= 'query';
1504 $seqType = 'sbjct' if $seqType eq 'hit';
1505 my $str = $self->seq_str($seqType);
1509 Bio
::Seq
->new(-ID
=> $self->to_string,
1511 -DESC
=> "$seqType sequence",
1517 Usage : $hsp->seq_str( seq_type );
1518 Purpose : Get the full query, sbjct, or 'match' sequence as a string.
1519 : The 'match' sequence is the string of symbols in between the
1520 : query and sbjct sequences.
1521 Example : $str = $hsp->seq_str('query');
1523 Argument : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'
1524 : ('sbjct' is synonymous with 'hit')
1525 Throws : Exception if the argument does not match an accepted seq_type.
1526 Comments : Calls _set_seq_data() to set the 'match' sequence if it has
1527 : not been set already.
1529 See Also : L</seq>, L</seq_inds>, L</_set_match_seq>
1536 my($self,$seqType) = @_;
1538 $seqType ||= 'query';
1539 $seqType = 'sbjct' if $seqType eq 'hit';
1540 ## Sensitive to member name changes.
1541 $seqType = "_\L$seqType\E";
1543 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1545 if($seqType =~ /sbjct|query/) {
1546 my $seq = join('',@
{$self->{$seqType.'Seq'}});
1550 } elsif( $seqType =~ /match/i) {
1551 # Only need to call _set_match_seq() if the match seq is requested.
1552 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
1553 $aref = $self->{'_matchSeq'};
1555 return join('',@
$aref);
1558 my $id_str = $self->_id_str;
1559 $self->throw(-class => 'Bio::Root::BadParameter',
1560 -text
=> "Invalid or undefined sequence type: $seqType ($id_str)\n" .
1561 "Valid types: query, sbjct, match",
1562 -value
=> $seqType);
1568 Usage : $hsp->seq_inds( seq_type, class, collapse );
1569 Purpose : Get a list of residue positions (indices) for all identical
1570 : or conserved residues in the query or sbjct sequence.
1571 Example : @s_ind = $hsp->seq_inds('query', 'identical');
1572 : @h_ind = $hsp->seq_inds('hit', 'conserved');
1573 : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
1574 Returns : List of integers
1575 : May include ranges if collapse is true.
1576 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
1577 : ('sbjct' is synonymous with 'hit')
1578 : class = 'identical' or 'conserved' (default = identical)
1579 : (can be shortened to 'id' or 'cons')
1580 : (actually, anything not 'id' will evaluate to 'conserved').
1581 : collapse = boolean, if true, consecutive positions are merged
1582 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
1583 : collapses to "1-5 7 9-11". This is useful for
1584 : consolidating long lists. Default = no collapse.
1586 Comments : Calls _set_residues() to set the 'match' sequence if it has
1587 : not been set already.
1589 See Also : L</seq>, L</_set_residues>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
1596 my ($self, $seqType, $class, $collapse) = @_;
1598 $seqType ||= 'query';
1599 $class ||= 'identical';
1601 $seqType = 'sbjct' if $seqType eq 'hit';
1603 $self->_set_residues() unless defined $self->{'_identicalRes_query'};
1605 $seqType = ($seqType !~ /^q/i ?
'sbjct' : 'query');
1606 $class = ($class !~ /^id/i ?
'conserved' : 'identical');
1608 ## Sensitive to member name changes.
1609 $seqType = "_\L$seqType\E";
1610 $class = "_\L$class\E";
1612 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
1614 require Bio
::Search
::BlastUtils
if $collapse;
1616 return $collapse ?
&Bio
::Search
::BlastUtils
::collapse_nums
(@ary) : @ary;
1622 Usage : $hsp->get_aln()
1623 Purpose : Get a Bio::SimpleAlign object constructed from the query + sbjct
1624 : sequences of the present HSP object.
1625 Example : $aln_obj = $hsp->get_aln();
1626 Returns : Object reference for a Bio::SimpleAlign.pm object.
1628 Throws : Propagates any exception ocurring during the construction of
1629 : the Bio::SimpleAlign object.
1630 Comments : Requires Bio::SimpleAlign.
1631 : The Bio::SimpleAlign object is constructed from the query + sbjct
1632 : sequence objects obtained by calling seq().
1633 : Gap residues are included (see $GAP_SYMBOL).
1635 See Also : L</seq>, L<Bio::SimpleAlign>
1644 require Bio
::SimpleAlign
;
1645 require Bio
::LocatableSeq
;
1646 my $qseq = $self->seq('query');
1647 my $sseq = $self->seq('sbjct');
1649 my $type = $self->algorithm =~ /P$|^T/ ?
'amino' : 'dna';
1650 my $aln = Bio
::SimpleAlign
->new();
1651 $aln->add_seq(Bio
::LocatableSeq
->new(-seq
=> $qseq->seq(),
1652 -id
=> 'query_'.$qseq->display_id(),
1654 -end
=> CORE
::length($qseq)));
1656 $aln->add_seq(Bio
::LocatableSeq
->new(-seq
=> $sseq->seq(),
1657 -id
=> 'hit_'.$sseq->display_id(),
1659 -end
=> CORE
::length($sseq)));
1669 =head1 FOR DEVELOPERS ONLY
1673 Information about the various data members of this module is provided for those
1674 wishing to modify or understand the code. Two things to bear in mind:
1678 =item 1 Do NOT rely on these in any code outside of this module.
1680 All data members are prefixed with an underscore to signify that they are private.
1681 Always use accessor methods. If the accessor doesn't exist or is inadequate,
1682 create or modify an accessor (and let me know, too!).
1684 =item 2 This documentation may be incomplete and out of date.
1686 It is easy for these data member descriptions to become obsolete as
1687 this module is still evolving. Always double check this info and search
1688 for members not described here.
1692 An instance of Bio::Search::HSP::PsiBlastHSP.pm is a blessed reference to a hash containing
1693 all or some of the following fields:
1696 --------------------------------------------------------------
1697 (member names are mostly self-explanatory)
1702 _n : Integer. The 'N' value listed in parenthesis with P/Expect value:
1703 : e.g., P(3) = 1.2e-30 ---> (N = 3).
1704 : Not defined in NCBI Blast2 with gaps.
1705 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
1717 _matchSeq : String. Contains the symbols between the query and sbjct lines
1718 which indicate identical (letter) and conserved ('+') matches
1719 or a mismatch (' ').
1722 _identicalRes_query :
1723 _identicalRes_sbjct :
1724 _conservedRes_query :
1725 _conservedRes_sbjct :
1726 _match_indent : The number of leading space characters on each line containing
1727 the match symbols. _match_indent is 13 in this example:
1728 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
1729 Q +APWGLARIS G+ + Y YD+ AG