1 #-----------------------------------------------------------------
3 # BioPerl module Bio::Search::HSP::BlastHSP
5 # (This module was originally called Bio::Tools::Blast::HSP)
7 # Please direct questions and support issues to <bioperl-l@bioperl.org>
9 # Cared for by Steve Chervitz <sac@bioperl.org>
11 # You may distribute this module under the same terms as perl itself
12 #-----------------------------------------------------------------
18 Bio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object
22 See L<Bio::Search::Hit::BlastHit>.
26 A Bio::Search::HSP::BlastHSP object provides an interface to data
27 obtained in a single alignment section of a Blast report (known as a
28 "High-scoring Segment Pair"). This is essentially a pairwise
29 alignment with score information.
31 BlastHSP objects are accessed via L<Bio::Search::Hit::BlastHit>
32 objects after parsing a BLAST report using the L<Bio::SearchIO>
35 The construction of BlastHSP objects is performed by
36 Bio::Factory::BlastHitFactory in a process that is
37 orchestrated by the Blast parser (L<Bio::SearchIO::psiblast>).
38 The resulting BlastHSPs are then accessed via
39 L<Bio::Search::Hit::BlastHit>). Therefore, you do not need to
40 use L<Bio::Search::HSP::BlastHSP>) directly. If you need to construct
41 BlastHSPs directly, see the new() function for details.
43 For L<Bio::SearchIO> BLAST parsing usage examples, see the
44 C<examples/searchio> directory of the Bioperl distribution.
46 =head2 Start and End coordinates
48 Sequence endpoints are swapped so that start is always less than
49 end. This affects For TBLASTN/X hits on the minus strand. Strand
50 information can be recovered using the strand() method. This
51 normalization step is standard Bioperl practice. It also facilitates
52 use of range information by methods such as match().
56 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
60 Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
61 residue indices for identical and conservative matches along both
62 query and sbjct sequences. Since this degree of detail is not always
63 needed, this behavior does not occur during construction of the BlastHSP
64 object. These data will automatically be collected as necessary as
65 the BlastHSP.pm object is used.
69 Bio::Search::HSP::BlastHSP.pm is a concrete class that inherits from
70 L<Bio::SeqFeature::SimilarityPair> and L<Bio::Search::HSP::HSPI>.
71 L<Bio::Seq> and L<Bio::SimpleAlign> are employed for creating
72 sequence and alignment objects, respectively.
74 =head2 Relationship to SimpleAlign.pm & Seq.pm
76 BlastHSP.pm can provide the query or sbjct sequence as a L<Bio::Seq>
77 object via the L<seq()|seq> method. The BlastHSP.pm object can also create a
78 two-sequence L<Bio::SimpleAlign> alignment object using the the query
79 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
80 objects is not automatic when constructing the BlastHSP.pm object since
81 this level of functionality is not always required and would generate
82 a lot of extra overhead when crunching many reports.
89 User feedback is an integral part of the evolution of this and other
90 Bioperl modules. Send your comments and suggestions preferably to one
91 of the Bioperl mailing lists. Your participation is much appreciated.
93 bioperl-l@bioperl.org - General discussion
94 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
98 Please direct usage questions or support issues to the mailing list:
100 I<bioperl-l@bioperl.org>
102 rather than to the module maintainer directly. Many experienced and
103 reponsive experts will be able look at the problem and quickly
104 address it. Please include a thorough description of the problem
105 with code and data examples if at all possible.
107 =head2 Reporting Bugs
109 Report bugs to the Bioperl bug tracking system to help us keep track
110 the bugs and their resolution. Bug reports can be submitted via the
113 https://github.com/bioperl/bioperl-live/issues
117 Steve Chervitz E<lt>sac-at-bioperl.orgE<gt>
119 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
121 =head1 ACKNOWLEDGEMENTS
123 This software was originally developed in the Department of Genetics
124 at Stanford University. I would also like to acknowledge my
125 colleagues at Affymetrix for useful feedback.
129 Bio::Search::Hit::BlastHit.pm - Blast hit object.
130 Bio::Search::Result::BlastResult.pm - Blast Result object.
131 Bio::Seq.pm - Biosequence object
135 http://bio.perl.org/ - Bioperl Project Homepage
139 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
143 This software is provided "as is" without warranty of any kind.
148 # END of main POD documentation.
152 The rest of the documentation details each of the object methods.
153 Internal methods are usually preceded with a _
157 # Let the code begin...
159 package Bio
::Search
::HSP
::BlastHSP
;
162 use Bio
::SeqFeature
::Similarity
;
164 use vars
qw($GAP_SYMBOL %STRAND_SYMBOL);
169 use base qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);
171 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols.
172 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );
177 Usage : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );
178 : Bio::Search::HSP::BlastHSP objects are constructed
179 : automatically by Bio::SearchIO::BlastHitFactory,
180 : so there is no need for direct instantiation.
181 Purpose : Constructs a new BlastHSP object and Initializes key variables
183 Returns : A Bio::Search::HSP::BlastHSP object
184 Argument : Named parameters:
185 : Parameter keys are case-insensitive.
186 : -RAW_DATA => array ref containing raw BLAST report data for
187 : for a single HSP. This includes all lines
188 : of the HSP alignment from a traditional BLAST
189 or PSI-BLAST (non-XML) report,
190 : -RANK => integer (1..n).
191 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.).
192 : -QUERY_NAME => string, id of query sequence
193 : -HIT_NAME => string, id of hit sequence
195 Comments : Having the raw data allows this object to do lazy parsing of
196 : the raw HSP data (i.e., not parsed until needed).
198 : Note that there is a fair amount of basic parsing that is
199 : currently performed in this module that would be more appropriate
200 : to do within a separate factory object.
201 : This parsing code will likely be relocated and more initialization
202 : parameters will be added to new().
204 See Also : L<Bio::SeqFeature::SimilarityPair::new()>, L<Bio::SeqFeature::Similarity::new()>
211 my ($class, @args ) = @_;
213 my $self = $class->SUPER::new
( @args );
214 # Initialize placeholders
215 $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
216 my ($raw_data, $qname, $hname, $qlen, $hlen);
218 ($self->{'_prog'}, $self->{'_rank'}, $raw_data,
220 $self->_rearrange([qw( PROGRAM
227 # _set_data() does a fair amount of parsing.
228 # This will likely change (see comment above.)
229 $self->_set_data( @
{$raw_data} );
230 # Store the aligned query as sequence feature
231 my ($qb, $hb) = ($self->start());
232 my ($qe, $he) = ($self->end());
233 my ($qs, $hs) = ($self->strand());
234 my ($qf,$hf) = ($self->query->frame(),
237 $self->query( Bio
::SeqFeature
::Similarity
->new (-start
=>$qb,
241 -score
=>$self->score,
244 -source
=>$self->{'_prog'} ));
246 $self->hit( Bio
::SeqFeature
::Similarity
->new (-start
=>$hb,
250 -score
=>$self->score,
253 -source
=>$self->{'_prog'} ));
256 $self->query->seqlength($qlen); # query
257 $self->hit->seqlength($hlen); # subject
259 $self->query->frac_identical($self->frac_identical('query'));
260 $self->hit->frac_identical($self->frac_identical('hit'));
266 # Purpose : Intended for internal use only to provide a string for use
267 # within exception messages to help users figure out which
268 # query/hit caused the problem.
269 # Returns : Short string with name of query and hit seq
272 if( not defined $self->{'_id_str'}) {
273 my $qname = $self->query->seq_id;
274 my $hname = $self->hit->seq_id;
275 $self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
277 return $self->{'_id_str'};
280 #=================================================
281 # Begin Bio::Search::HSP::HSPI implementation
282 #=================================================
287 Usage : $alg = $hsp->algorithm();
288 Function: Gets the algorithm specification that was used to obtain the hsp
289 For BLAST, the algorithm denotes what type of sequence was aligned
290 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
291 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
293 Returns : a scalar string
301 my ($self,@args) = @_;
302 return $self->{'_prog'};
310 Usage : $hsp_obj->signif()
311 Purpose : Get the P-value or Expect value for the HSP.
312 Returns : Float (0.001 or 1.3e-43)
313 : Returns P-value if it is defined, otherwise, Expect value.
316 Comments : Provided for consistency with BlastHit::signif()
317 : Support for returning the significance data in different
318 : formats (e.g., exponent only), is not provided for HSP objects.
319 : This is only available for the BlastHit or Blast object.
321 See Also : L</p>, L</expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>
329 my $val ||= defined($self->{'_p'}) ?
$self->{'_p'} : $self->{'_expect'};
337 Usage : $hsp_obj->evalue()
338 Purpose : Get the Expect value for the HSP.
339 Returns : Float (0.001 or 1.3e-43)
342 Comments : Support for returning the expectation data in different
343 : formats (e.g., exponent only), is not provided for HSP objects.
344 : This is only available for the BlastHit or Blast object.
351 sub evalue
{ shift->{'_expect'} }
357 Usage : $hsp_obj->p()
358 Purpose : Get the P-value for the HSP.
359 Returns : Float (0.001 or 1.3e-43) or undef if not defined.
362 Comments : P-value is not defined with NCBI Blast2 reports.
363 : Support for returning the expectation data in different
364 : formats (e.g., exponent only) is not provided for HSP objects.
365 : This is only available for the BlastHit or Blast object.
367 See Also : L<expect()|expect>
372 sub p
{ my $self = shift; $self->{'_p'}; }
376 sub pvalue
{ shift->p(@_); }
380 Usage : $hsp->length( [seq_type] )
381 Purpose : Get the length of the aligned portion of the query or sbjct.
382 Example : $hsp->length('query')
384 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' (default = 'total')
385 ('sbjct' is synonymous with 'hit')
387 Comments : 'total' length is the full length of the alignment
388 : as reported in the denominators in the alignment section:
389 : "Identical = 34/120 Positives = 67/120".
398 ## Developer note: when using the built-in length function within
399 ## this module, call it as CORE::length().
400 my( $self, $seqType,$data ) = @_;
401 $seqType ||= 'total';
402 $seqType = 'sbjct' if $seqType eq 'hit';
404 $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
406 ## Sensitive to member name format.
407 $seqType = "_\L$seqType\E";
408 if( defined $data ) {
409 $self->{$seqType.'Length'} = $data;
411 $self->{$seqType.'Length'};
418 Usage : $hsp->gaps( [seq_type] )
419 Purpose : Get the number of gap characters in the query, sbjct, or total alignment.
420 : Also can return query gap chars and sbjct gap chars as a two-element list
421 : when in array context.
422 Example : $total_gaps = $hsp->gaps();
423 : ($qgaps, $sgaps) = $hsp->gaps();
424 : $qgaps = $hsp->gaps('query');
425 Returns : scalar context: integer
426 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
427 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
428 : ('sbjct' is synonymous with 'hit')
429 : (default = 'total', scalar context)
430 : Array context can be "induced" by providing an argument of 'list' or 'array'.
433 See Also : L</length>, L</matches>
440 my( $self, $seqType ) = @_;
442 $self->_set_seq_data() unless $self->{'_set_seq_data'};
444 $seqType ||= (wantarray ?
'list' : 'total');
445 $seqType = 'sbjct' if $seqType eq 'hit';
447 if($seqType =~ /list|array/i) {
448 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
451 if($seqType eq 'total') {
452 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
454 ## Sensitive to member name format.
455 $seqType = "_\L$seqType\E";
456 return $self->{$seqType.'Gaps'} || 0;
461 =head2 frac_identical
463 Usage : $hsp_object->frac_identical( [seq_type] );
464 Purpose : Get the fraction of identical positions within the given HSP.
465 Example : $frac_iden = $hsp_object->frac_identical('query');
466 Returns : Float (2-decimal precision, e.g., 0.75).
467 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
468 : ('sbjct' is synonymous with 'hit')
469 : default = 'total' (but see comments below).
471 Comments : Different versions of Blast report different values for the total
472 : length of the alignment. This is the number reported in the
473 : denominators in the stats section:
474 : "Identical = 34/120 Positives = 67/120".
475 : NCBI-BLAST uses the total length of the alignment (with gaps)
476 : WU-BLAST uses the length of the query sequence (without gaps).
477 : Therefore, when called without an argument or an argument of 'total',
478 : this method will report different values depending on the
479 : version of BLAST used.
481 : To get the fraction identical among only the aligned residues,
482 : ignoring the gaps, call this method with an argument of 'query'
483 : or 'sbjct' ('sbjct' is synonymous with 'hit').
485 See Also : L</frac_conserved>, L</num_identical>, L</matches>
492 # The value is calculated as opposed to storing it from the parsed results.
493 # This saves storage and also permits flexibility in determining for which
494 # sequence (query or sbjct) the figure is to be calculated.
496 my( $self, $seqType ) = @_;
497 $seqType ||= 'total';
498 $seqType = 'sbjct' if $seqType eq 'hit';
500 if($seqType ne 'total') {
501 $self->_set_seq_data() unless $self->{'_set_seq_data'};
503 ## Sensitive to member name format.
504 $seqType = "_\L$seqType\E";
506 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
510 =head2 frac_conserved
512 Usage : $hsp_object->frac_conserved( [seq_type] );
513 Purpose : Get the fraction of conserved positions within the given HSP.
514 : (Note: 'conservative' positions are called 'positives' in the
516 Example : $frac_cons = $hsp_object->frac_conserved('query');
517 Returns : Float (2-decimal precision, e.g., 0.75).
518 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
519 : ('sbjct' is synonymous with 'hit')
520 : default = 'total' (but see comments below).
522 Comments : Different versions of Blast report different values for the total
523 : length of the alignment. This is the number reported in the
524 : denominators in the stats section:
525 : "Identical = 34/120 Positives = 67/120".
526 : NCBI-BLAST uses the total length of the alignment (with gaps)
527 : WU-BLAST uses the length of the query sequence (without gaps).
528 : Therefore, when called without an argument or an argument of 'total',
529 : this method will report different values depending on the
530 : version of BLAST used.
532 : To get the fraction conserved among only the aligned residues,
533 : ignoring the gaps, call this method with an argument of 'query'
536 See Also : L</frac_conserved>, L</num_conserved>, L</matches>
540 #--------------------
542 #--------------------
543 # The value is calculated as opposed to storing it from the parsed results.
544 # This saves storage and also permits flexibility in determining for which
545 # sequence (query or sbjct) the figure is to be calculated.
547 my( $self, $seqType ) = @_;
548 $seqType ||= 'total';
549 $seqType = 'sbjct' if $seqType eq 'hit';
551 if($seqType ne 'total') {
552 $self->_set_seq_data() unless $self->{'_set_seq_data'};
555 ## Sensitive to member name format.
556 $seqType = "_\L$seqType\E";
558 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
564 Usage : my $qseq = $hsp->query_string;
565 Function: Retrieves the query sequence of this HSP as a string
573 sub query_string
{ shift->seq_str('query'); }
579 Usage : my $hseq = $hsp->hit_string;
580 Function: Retrieves the hit sequence of this HSP as a string
588 sub hit_string
{ shift->seq_str('hit'); }
592 =head2 homology_string
594 Title : homology_string
595 Usage : my $homo_string = $hsp->homology_string;
596 Function: Retrieves the homology sequence for this HSP as a string.
597 : The homology sequence is the string of symbols in between the
598 : query and hit sequences in the alignment indicating the degree
599 : of conservation (e.g., identical, similar, not similar).
606 sub homology_string
{ shift->seq_str('match'); }
609 #=================================================
610 # End Bio::Search::HSP::HSPI implementation
611 #=================================================
613 # Older method delegating to method defined in HSPI.
617 See L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>
622 sub expect
{ shift->evalue( @_ ); }
628 Usage : $hsp->rank( [string] );
629 Purpose : Get the rank of the HSP within a given Blast hit.
630 Example : $rank = $hsp->rank;
631 Returns : Integer (1..n) corresponding to the order in which the HSP
632 appears in the BLAST report.
639 sub rank
{ shift->{'_rank'} }
642 # For backward compatibility
644 sub name
{ shift->rank }
650 Usage : print $hsp->to_string;
651 Function: Returns a string representation for the Blast HSP.
652 Primarily intended for debugging purposes.
654 Returns : A string of the form:
666 return "[BlastHSP] " . $self->rank();
672 Usage : called automatically during object construction.
673 Purpose : Parses the raw HSP section from a flat BLAST report and
674 sets the query sequence, sbjct sequence, and the "match" data
675 : which consists of the symbols between the query and sbjct lines
677 Argument : Array (all lines for a single, complete HSP, from a raw,
678 flat (i.e., non-XML) BLAST report)
679 Throws : Propagates any exceptions from the methods called ("See Also")
681 See Also : L</_set_seq>, L</_set_score_stats>, L</_set_match_stats>
690 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
691 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
693 my $matchLine = 0; # Alternating boolean: when true, load 'match' data.
696 #print STDERR "BlastHSP: set_data()\n";
698 my($line, $aln_row_len, $length_diff);
701 # Collecting data for all lines in the alignment
702 # and then storing the collections for possible processing later.
704 # Note that "match" lines may not be properly padded with spaces.
705 # This loop now properly handles such cases:
706 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
707 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
708 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
710 foreach $line( @data ) {
711 next if $line =~ /^\s*$/;
713 if( $line =~ /^ ?Score/ ) {
714 $self->_set_score_stats( $line );
715 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
716 $self->_set_match_stats( $line );
717 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
718 # Version 2.0.8 has Frame information on a separate line.
719 # Storing frame according to SeqFeature::Generic::frame()
720 # which does not contain strand info (use strand()).
721 my $frame = abs($1) - 1;
722 $self->frame( $frame );
723 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
724 push @queryList, $line;
725 $self->{'_match_indent'} = CORE
::length $1;
726 $aln_row_len = (CORE
::length $1) + (CORE
::length $2);
728 } elsif( $matchLine ) {
729 # Pad the match line with spaces if necessary.
730 $length_diff = $aln_row_len - CORE
::length $line;
731 $length_diff and $line .= ' 'x
$length_diff;
732 push @matchList, $line;
734 } elsif( $line =~ /^Sbjct/ ) {
735 push @sbjctList, $line;
738 # Storing the query and sbjct lists in case they are needed later.
739 # We could make this conditional to save memory.
740 $self->{'_queryList'} = \
@queryList;
741 $self->{'_sbjctList'} = \
@sbjctList;
743 # Storing the match list in case it is needed later.
744 $self->{'_matchList'} = \
@matchList;
746 if(not defined ($self->{'_numIdentical'})) {
747 my $id_str = $self->_id_str;
748 $self->throw( -text
=> "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
751 if(!scalar @queryList or !scalar @sbjctList) {
752 my $id_str = $self->_id_str;
753 $self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
758 =head2 _set_score_stats
760 Usage : called automatically by _set_data()
761 Purpose : Sets various score statistics obtained from the HSP listing.
762 Argument : String with any of the following formats:
763 : blast2: Score = 30.1 bits (66), Expect = 9.2
764 : blast2: Score = 158.2 bits (544), Expect(2) = e-110
765 : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
766 : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
767 Throws : Exception if the stats cannot be parsed, probably due to a change
768 : in the Blast report format.
770 See Also : L</_set_data>
774 #--------------------
775 sub _set_score_stats
{
776 #--------------------
777 my ($self, $data) = @_;
781 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
782 # blast2 format n = 1
786 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
787 # blast2 format n > 1
793 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
794 # blast1 format, n = 1
800 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
801 # blast1 format, n > 1
809 my $id_str = $self->_id_str;
810 $self->throw(-class => 'Bio::Root::Exception',
811 -text
=> "Can't parse score statistics: unrecognized format. ($id_str)",
814 $expect = "1$expect" if $expect =~ /^e/i;
815 $p = "1$p" if defined $p and $p=~ /^e/i;
817 $self->{'_expect'} = $expect;
818 $self->{'_p'} = $p || undef;
819 $self->significance( $p || $expect );
823 =head2 _set_match_stats
825 Usage : Private method; called automatically by _set_data()
826 Purpose : Sets various matching statistics obtained from the HSP listing.
827 Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
828 : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
829 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
830 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
831 : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
832 Throws : Exception if the stats cannot be parsed, probably due to a change
833 : in the Blast report format.
834 Comments : The "Gaps = " data in the HSP header has a different meaning depending
835 : on the type of Blast: for BLASTP, this number is the total number of
836 : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
837 : query sequence only. Thus, it is safer to collect the data
838 : separately by examining the actual sequence strings as is done
841 See Also : L</_set_data>, L</_set_seq>
845 #--------------------
846 sub _set_match_stats
{
847 #--------------------
848 my ($self, $data) = @_;
850 if($data =~ m!Identities = (\d+)/(\d+)!) {
852 $self->{'_numIdentical'} = $1;
853 $self->{'_totalLength'} = $2;
856 if($data =~ m!Positives = (\d+)/(\d+)!) {
858 $self->{'_numConserved'} = $1;
859 $self->{'_totalLength'} = $2;
862 if($data =~ m!Frame = ([\d+-]+)!) {
866 # Strand data is not always present in this line.
867 # _set_seq() will also set strand information.
868 if($data =~ m!Strand = (\w+) / (\w+)!) {
869 $self->{'_queryStrand'} = $1;
870 $self->{'_sbjctStrand'} = $2;
873 # if($data =~ m!Gaps = (\d+)/(\d+)!) {
874 # $self->{'_totalGaps'} = $1;
876 # $self->{'_totalGaps'} = 0;
884 Usage : called automatically when sequence data is requested.
885 Purpose : Sets the HSP sequence data for both query and sbjct sequences.
886 : Includes: start, stop, length, gaps, and raw sequence.
888 Throws : Propagates any exception thrown by _set_match_seq()
889 Comments : Uses raw data stored by _set_data() during object construction.
890 : These data are not always needed, so it is conditionally
891 : executed only upon demand by methods such as gaps(), _set_residues(),
892 : etc. _set_seq() does the dirty work.
894 See Also : L</_set_seq>
903 $self->_set_seq('query', @
{$self->{'_queryList'}});
904 $self->_set_seq('sbjct', @
{$self->{'_sbjctList'}});
906 # Liberate some memory.
907 @
{$self->{'_queryList'}} = @
{$self->{'_sbjctList'}} = ();
908 undef $self->{'_queryList'};
909 undef $self->{'_sbjctList'};
911 $self->{'_set_seq_data'} = 1;
918 Usage : called automatically by _set_seq_data()
919 : $hsp_obj->($seq_type, @data);
920 Purpose : Sets sequence information for both the query and sbjct sequences.
921 : Directly counts the number of gaps in each sequence (if gapped Blast).
922 Argument : $seq_type = 'query' or 'sbjct'
923 : @data = all seq lines with the form:
924 : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
925 Throws : Exception if data strings cannot be parsed, probably due to a change
926 : in the Blast report format.
927 Comments : Uses first argument to determine which data members to set
928 : making this method sensitive data member name changes.
929 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
930 Warning : Sequence endpoints are normalized so that start < end. This affects HSPs
931 : for TBLASTN/X hits on the minus strand. Normalization facilitates use
932 : of range information by methods such as match().
934 See Also : L</_set_seq_data>, L</matches>, L</range>, L</start>, L</end>
949 if( m/(\d+) *([^\d\s]+) *(\d+)/) {
950 push @ranges, ( $1, $3 ) ;
952 #print STDERR "_set_seq found sequence \"$2\"\n";
954 $self->warn("Bad sequence data: $_");
958 if( !(scalar(@sequence) and scalar(@ranges))) {
959 my $id_str = $self->_id_str;
960 $self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str) $seqType");
963 # Sensitive to member name changes.
964 $seqType = "_\L$seqType\E";
965 $self->{$seqType.'Start'} = $ranges[0];
966 $self->{$seqType.'Stop'} = $ranges[ $#ranges ];
967 $self->{$seqType.'Seq'} = \
@sequence;
969 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
971 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
972 # Converting nucl coords to amino acid coords.
974 my $prog = $self->algorithm;
975 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
976 $self->{$seqType.'Length'} /= 3;
977 } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
978 $self->{$seqType.'Length'} /= 3;
979 } elsif($prog eq 'TBLASTX') {
980 $self->{$seqType.'Length'} /= 3;
983 if( $prog ne 'BLASTP' ) {
984 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
985 $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
986 # Normalize sequence endpoints so that start < end.
987 # Reverse complement or 'minus strand' HSPs get flipped here.
988 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
989 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =
990 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
991 $self->{$seqType.'Strand'} = 'Minus';
995 ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
996 # if($self->{'_gapped'}) {
997 my $seqstr = join('', @sequence);
999 my $num_gaps = CORE
::length($seqstr) - $self->{$seqType.'Length'};
1000 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
1005 =head2 _set_residues
1007 Usage : called automatically when residue data is requested.
1008 Purpose : Sets the residue numbers representing the identical and
1009 : conserved positions. These data are obtained by analyzing the
1010 : symbols between query and sbjct lines of the alignments.
1012 Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
1013 Comments : These data are not always needed, so it is conditionally
1014 : executed only upon demand by methods such as seq_inds().
1015 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
1017 See Also : L</_set_seq_data>, L</_set_match_seq>, L</seq_inds>
1027 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1029 # Using hashes to avoid saving duplicate residue numbers.
1030 my %identicalList_query = ();
1031 my %identicalList_sbjct = ();
1032 my %conservedList_query = ();
1033 my %conservedList_sbjct = ();
1035 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
1036 $aref ||= $self->{'_matchSeq'};
1037 my $seqString = join('', @
$aref );
1039 my $qseq = join('',@
{$self->{'_querySeq'}});
1040 my $sseq = join('',@
{$self->{'_sbjctSeq'}});
1041 my $resCount_query = $self->{'_queryStop'} || 0;
1042 my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
1044 my $prog = $self->algorithm;
1045 if($prog !~ /^BLASTP|^BLASTN/) {
1046 if($prog eq 'TBLASTN') {
1047 $resCount_sbjct /= 3;
1048 } elsif($prog eq 'BLASTX') {
1049 $resCount_query /= 3;
1050 } elsif($prog eq 'TBLASTX') {
1051 $resCount_query /= 3;
1052 $resCount_sbjct /= 3;
1056 my ($mchar, $schar, $qchar);
1057 while( $mchar = chop($seqString) ) {
1058 ($qchar, $schar) = (chop($qseq), chop($sseq));
1059 if( $mchar eq '+' ) {
1060 $conservedList_query{ $resCount_query } = 1;
1061 $conservedList_sbjct{ $resCount_sbjct } = 1;
1062 } elsif( $mchar ne ' ' ) {
1063 $identicalList_query{ $resCount_query } = 1;
1064 $identicalList_sbjct{ $resCount_sbjct } = 1;
1066 $resCount_query-- if $qchar ne $GAP_SYMBOL;
1067 $resCount_sbjct-- if $schar ne $GAP_SYMBOL;
1069 $self->{'_identicalRes_query'} = \
%identicalList_query;
1070 $self->{'_conservedRes_query'} = \
%conservedList_query;
1071 $self->{'_identicalRes_sbjct'} = \
%identicalList_sbjct;
1072 $self->{'_conservedRes_sbjct'} = \
%conservedList_sbjct;
1076 =head2 _set_match_seq
1078 Usage : $hsp_obj->_set_match_seq()
1079 Purpose : Set the 'match' sequence for the current HSP (symbols in between
1080 : the query and sbjct lines.)
1081 Returns : Array reference holding the match sequences lines.
1083 Throws : Exception if the _matchList field is not set.
1084 Comments : The match information is not always necessary. This method
1085 : allows it to be conditionally prepared.
1086 : Called by _set_residues>() and seq_str().
1088 See Also : L</_set_residues>, L</seq_str>
1092 #-------------------
1093 sub _set_match_seq
{
1094 #-------------------
1097 if( ! ref($self->{'_matchList'}) ) {
1098 my $id_str = $self->_id_str;
1099 $self->throw("Can't set HSP match sequence: No data ($id_str)");
1102 my @data = @
{$self->{'_matchList'}};
1107 ## Remove leading spaces; (note: aln may begin with a space
1108 ## which is why we can't use s/^ +//).
1109 s/^ {$self->{'_match_indent'}}//;
1112 # Liberate some memory.
1113 @
{$self->{'_matchList'}} = undef;
1114 $self->{'_matchList'} = undef;
1116 $self->{'_matchSeq'} = \
@sequence;
1118 return $self->{'_matchSeq'};
1124 Usage : $hsp_obj->n()
1125 Purpose : Get the N value (num HSPs on which P/Expect is based).
1126 : This value is not defined with NCBI Blast2 with gapping.
1127 Returns : Integer or null string if not defined.
1130 Comments : The 'N' value is listed in parenthesis with P/Expect value:
1131 : e.g., P(3) = 1.2e-30 ---> (N = 3).
1132 : Not defined in NCBI Blast2 with gaps.
1133 : This typically is equal to the number of HSPs but not always.
1134 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
1136 See Also : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>
1141 sub n
{ my $self = shift; $self->{'N'} || ''; }
1147 Usage : $hsp->matches([seq_type], [start], [stop]);
1148 Purpose : Get the total number of identical and conservative matches
1149 : in the query or sbjct sequence for the given HSP. Optionally can
1150 : report data within a defined interval along the seq.
1151 : (Note: 'conservative' matches are called 'positives' in the
1153 Example : ($id,$cons) = $hsp_object->matches('hit');
1154 : ($id,$cons) = $hsp_object->matches('query',300,400);
1155 Returns : 2-element array of integers
1156 Argument : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)
1157 : ('sbjct' is synonymous with 'hit')
1158 : (2) start = Starting coordinate (optional)
1159 : (3) stop = Ending coordinate (optional)
1160 Throws : Exception if the supplied coordinates are out of range.
1161 Comments : Relies on seq_str('match') to get the string of alignment symbols
1162 : between the query and sbjct lines which are used for determining
1163 : the number of identical and conservative matches.
1165 See Also : L</length>, L</gaps>, L</seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>
1172 my( $self, %param ) = @_;
1174 my($seqType, $beg, $end) = ($param{-SEQ
}, $param{-START
}, $param{-STOP
});
1175 $seqType ||= 'query';
1176 $seqType = 'sbjct' if $seqType eq 'hit';
1180 if(!defined $beg && !defined $end) {
1181 ## Get data for the whole alignment.
1182 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
1184 ## Get the substring representing the desired sub-section of aln.
1187 ($start,$stop) = $self->range($seqType);
1188 if($beg == 0) { $beg = $start; $end = $beg+$end; }
1189 elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
1191 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
1192 else { $end += 1;} ##ML moved from commented position below, makes
1194 # if($end > $stop) { $end = $stop; }
1195 if($beg < $start) { $beg = $start; }
1196 # else { $end += 1;}
1198 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
1200 ## ML: START fix for substr out of range error ------------------
1202 my $prog = $self->algorithm;
1203 if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
1205 $seq = substr($self->seq_str('match'),
1206 int(($beg-$start)/3), int(($end-$beg+1)/3));
1208 } elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
1210 $seq = substr($self->seq_str('match'),
1211 int(($beg-$start)/3), int(($end-$beg+1)/3));
1213 $seq = substr($self->seq_str('match'),
1214 $beg-$start, ($end-$beg));
1216 ## ML: End of fix for substr out of range error -----------------
1219 ## ML: debugging code
1220 ## This is where we get our exception. Try printing out the values going
1224 # qq(*------------MY EXCEPTION --------------------\nSeq: ") ,
1225 # $self->seq_str("$seqType"), qq("\n),$self->rank,",( index:";
1226 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:",
1227 # CORE::length $self->seq_str("$seqType");
1228 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ),
1229 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
1230 ## ML: END DEBUGGING CODE----------
1232 if(!CORE
::length $seq) {
1233 my $id_str = $self->_id_str;
1234 $self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
1236 ## Get data for a substring.
1237 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
1238 # printf "Original match seq:\n%s\n",$self->seq_str('match');
1239 $seq =~ s/ //g; # remove space (no info).
1240 my $len_cons = CORE
::length $seq;
1241 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions)
1242 my $len_id = CORE
::length $seq;
1243 push @data, ($len_id, $len_cons);
1244 # printf " HSP = %s\n id = %d; cons = %d\n", $self->rank, $len_id, $len_cons; <STDIN>;
1250 =head2 num_identical
1252 Usage : $hsp_object->num_identical();
1253 Purpose : Get the number of identical positions within the given HSP.
1254 Example : $num_iden = $hsp_object->num_identical();
1259 See Also : L</num_conserved>, L</frac_identical>
1263 #-------------------
1265 #-------------------
1268 $self->{'_numIdentical'};
1272 =head2 num_conserved
1274 Usage : $hsp_object->num_conserved();
1275 Purpose : Get the number of conserved positions within the given HSP.
1276 Example : $num_iden = $hsp_object->num_conserved();
1281 See Also : L</num_identical>, L</frac_conserved>
1285 #-------------------
1287 #-------------------
1290 $self->{'_numConserved'};
1297 Usage : $hsp->range( [seq_type] );
1298 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
1299 : in the HSP alignment.
1300 Example : ($query_beg, $query_end) = $hsp->range('query');
1301 : ($hit_beg, $hit_end) = $hsp->range('hit');
1302 Returns : Two-element array of integers
1303 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
1304 : ('sbjct' is synonymous with 'hit')
1307 See Also : L</start>, L</end>
1314 my ($self, $seqType) = @_;
1316 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1318 $seqType ||= 'query';
1319 $seqType = 'sbjct' if $seqType eq 'hit';
1321 ## Sensitive to member name changes.
1322 $seqType = "_\L$seqType\E";
1324 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
1329 Usage : $hsp->start( [seq_type] );
1330 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
1331 : in the HSP alignment.
1332 : NOTE: Start will always be less than end.
1333 : To determine strand, use $hsp->strand()
1334 Example : $query_beg = $hsp->start('query');
1335 : $hit_beg = $hsp->start('hit');
1336 : ($query_beg, $hit_beg) = $hsp->start();
1337 Returns : scalar context: integer
1338 : array context without args: list of two integers
1339 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
1340 : ('sbjct' is synonymous with 'hit')
1341 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1344 See Also : L</end>, L</range>
1351 my ($self, $seqType) = @_;
1353 $seqType ||= (wantarray ?
'list' : 'query');
1354 $seqType = 'sbjct' if $seqType eq 'hit';
1356 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1358 if($seqType =~ /list|array/i) {
1359 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
1361 ## Sensitive to member name changes.
1362 $seqType = "_\L$seqType\E";
1363 return $self->{$seqType.'Start'};
1369 Usage : $hsp->end( [seq_type] );
1370 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
1371 : in the HSP alignment.
1372 : NOTE: Start will always be less than end.
1373 : To determine strand, use $hsp->strand()
1374 Example : $query_end = $hsp->end('query');
1375 : $hit_end = $hsp->end('hit');
1376 : ($query_end, $hit_end) = $hsp->end();
1377 Returns : scalar context: integer
1378 : array context without args: list of two integers
1379 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
1380 : ('sbjct' is synonymous with 'hit')
1381 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1384 See Also : L</start>, L</range>, L</strand>
1391 my ($self, $seqType) = @_;
1393 $seqType ||= (wantarray ?
'list' : 'query');
1394 $seqType = 'sbjct' if $seqType eq 'hit';
1396 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1398 if($seqType =~ /list|array/i) {
1399 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
1401 ## Sensitive to member name changes.
1402 $seqType = "_\L$seqType\E";
1403 return $self->{$seqType.'Stop'};
1411 Usage : $hsp_object->strand( [seq_type] )
1412 Purpose : Get the strand of the query or sbjct sequence.
1413 Example : print $hsp->strand('query');
1414 : ($query_strand, $hit_strand) = $hsp->strand();
1415 Returns : -1, 0, or 1
1416 : -1 = Minus strand, +1 = Plus strand
1417 : Returns 0 if strand is not defined, which occurs
1418 : for BLASTP reports, and the query of TBLASTN
1419 : as well as the hit if BLASTX reports.
1420 : In scalar context without arguments, returns queryStrand value.
1421 : In array context without arguments, returns a two-element list
1422 : of strings (queryStrand, sbjctStrand).
1423 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1424 Argument : seq_type: 'query' or 'hit' or 'sbjct' or undef
1425 : ('sbjct' is synonymous with 'hit')
1428 See Also : L</_set_seq>, L</_set_match_stats>
1435 my( $self, $seqType ) = @_;
1437 $seqType ||= (wantarray ?
'list' : 'query');
1438 $seqType = 'sbjct' if $seqType eq 'hit';
1440 ## Sensitive to member name format.
1441 $seqType = "_\L$seqType\E";
1443 # $seqType could be '_list'.
1444 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
1446 my $prog = $self->algorithm;
1448 if($seqType =~ /list|array/i) {
1450 if( $prog eq 'BLASTP') {
1454 elsif( $prog eq 'TBLASTN') {
1456 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
1458 elsif( $prog eq 'BLASTX') {
1459 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
1463 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
1464 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
1468 return ($qstr, $hstr);
1471 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
1477 Usage : $hsp->seq( [seq_type] );
1478 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
1479 Example : $seqObj = $hsp->seq('query');
1480 Returns : Object reference for a Bio::Seq.pm object.
1481 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').
1482 : ('sbjct' is synonymous with 'hit')
1483 Throws : Propagates any exception that occurs during construction
1484 : of the Bio::Seq.pm object.
1485 Comments : The sequence is returned in an array of strings corresponding
1486 : to the strings in the original format of the Blast alignment.
1487 : (i.e., same spacing).
1489 See Also : L</seq_str>, L</seq_inds>, L<Bio::Seq>
1496 my($self,$seqType) = @_;
1497 $seqType ||= 'query';
1498 $seqType = 'sbjct' if $seqType eq 'hit';
1499 my $str = $self->seq_str($seqType);
1503 Bio
::Seq
->new(-ID
=> $self->to_string,
1505 -DESC
=> "$seqType sequence",
1511 Usage : $hsp->seq_str( seq_type );
1512 Purpose : Get the full query, sbjct, or 'match' sequence as a string.
1513 : The 'match' sequence is the string of symbols in between the
1514 : query and sbjct sequences.
1515 Example : $str = $hsp->seq_str('query');
1517 Argument : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'
1518 : ('sbjct' is synonymous with 'hit')
1519 Throws : Exception if the argument does not match an accepted seq_type.
1520 Comments : Calls _set_seq_data() to set the 'match' sequence if it has
1521 : not been set already.
1523 See Also : L</seq>, L</seq_inds>, L</_set_match_seq>
1530 my($self,$seqType) = @_;
1532 $seqType ||= 'query';
1533 $seqType = 'sbjct' if $seqType eq 'hit';
1534 ## Sensitive to member name changes.
1535 $seqType = "_\L$seqType\E";
1537 $self->_set_seq_data() unless $self->{'_set_seq_data'};
1539 if($seqType =~ /sbjct|query/) {
1540 my $seq = join('',@
{$self->{$seqType.'Seq'}});
1544 } elsif( $seqType =~ /match/i) {
1545 # Only need to call _set_match_seq() if the match seq is requested.
1546 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
1547 $aref = $self->{'_matchSeq'};
1549 return join('',@
$aref);
1552 my $id_str = $self->_id_str;
1553 $self->throw(-class => 'Bio::Root::BadParameter',
1554 -text
=> "Invalid or undefined sequence type: $seqType ($id_str)\n" .
1555 "Valid types: query, sbjct, match",
1556 -value
=> $seqType);
1562 Usage : $hsp->seq_inds( seq_type, class, collapse );
1563 Purpose : Get a list of residue positions (indices) for all identical
1564 : or conserved residues in the query or sbjct sequence.
1565 Example : @s_ind = $hsp->seq_inds('query', 'identical');
1566 : @h_ind = $hsp->seq_inds('hit', 'conserved');
1567 : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
1568 Returns : List of integers
1569 : May include ranges if collapse is true.
1570 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
1571 : ('sbjct' is synonymous with 'hit')
1572 : class = 'identical' or 'conserved' (default = identical)
1573 : (can be shortened to 'id' or 'cons')
1574 : (actually, anything not 'id' will evaluate to 'conserved').
1575 : collapse = boolean, if true, consecutive positions are merged
1576 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
1577 : collapses to "1-5 7 9-11". This is useful for
1578 : consolidating long lists. Default = no collapse.
1580 Comments : Calls _set_residues() to set the 'match' sequence if it has
1581 : not been set already.
1583 See Also : L</seq>, L</_set_residues>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
1590 my ($self, $seqType, $class, $collapse) = @_;
1592 $seqType ||= 'query';
1593 $class ||= 'identical';
1595 $seqType = 'sbjct' if $seqType eq 'hit';
1597 $self->_set_residues() unless defined $self->{'_identicalRes_query'};
1599 $seqType = ($seqType !~ /^q/i ?
'sbjct' : 'query');
1600 $class = ($class !~ /^id/i ?
'conserved' : 'identical');
1602 ## Sensitive to member name changes.
1603 $seqType = "_\L$seqType\E";
1604 $class = "_\L$class\E";
1606 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
1608 require Bio
::Search
::BlastUtils
if $collapse;
1610 return $collapse ?
&Bio
::Search
::BlastUtils
::collapse_nums
(@ary) : @ary;
1616 Usage : $hsp->get_aln()
1617 Purpose : Get a Bio::SimpleAlign object constructed from the query + sbjct
1618 : sequences of the present HSP object.
1619 Example : $aln_obj = $hsp->get_aln();
1620 Returns : Object reference for a Bio::SimpleAlign.pm object.
1622 Throws : Propagates any exception ocurring during the construction of
1623 : the Bio::SimpleAlign object.
1624 Comments : Requires Bio::SimpleAlign.
1625 : The Bio::SimpleAlign object is constructed from the query + sbjct
1626 : sequence objects obtained by calling seq().
1627 : Gap residues are included (see $GAP_SYMBOL).
1629 See Also : L</seq>, L<Bio::SimpleAlign>
1638 require Bio
::SimpleAlign
;
1639 require Bio
::LocatableSeq
;
1640 my $qseq = $self->seq('query');
1641 my $sseq = $self->seq('sbjct');
1643 my $type = $self->algorithm =~ /P$|^T/ ?
'amino' : 'dna';
1644 my $aln = Bio
::SimpleAlign
->new();
1645 $aln->add_seq(Bio
::LocatableSeq
->new(-seq
=> $qseq->seq(),
1646 -id
=> 'query_'.$qseq->display_id(),
1648 -end
=> CORE
::length($qseq)));
1650 $aln->add_seq(Bio
::LocatableSeq
->new(-seq
=> $sseq->seq(),
1651 -id
=> 'hit_'.$sseq->display_id(),
1653 -end
=> CORE
::length($sseq)));
1663 =head1 FOR DEVELOPERS ONLY
1667 Information about the various data members of this module is provided for those
1668 wishing to modify or understand the code. Two things to bear in mind:
1672 =item 1 Do NOT rely on these in any code outside of this module.
1674 All data members are prefixed with an underscore to signify that they are private.
1675 Always use accessor methods. If the accessor doesn't exist or is inadequate,
1676 create or modify an accessor (and let me know, too!).
1678 =item 2 This documentation may be incomplete and out of date.
1680 It is easy for these data member descriptions to become obsolete as
1681 this module is still evolving. Always double check this info and search
1682 for members not described here.
1686 An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
1687 all or some of the following fields:
1690 --------------------------------------------------------------
1691 (member names are mostly self-explanatory)
1696 _n : Integer. The 'N' value listed in parenthesis with P/Expect value:
1697 : e.g., P(3) = 1.2e-30 ---> (N = 3).
1698 : Not defined in NCBI Blast2 with gaps.
1699 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
1711 _matchSeq : String. Contains the symbols between the query and sbjct lines
1712 which indicate identical (letter) and conserved ('+') matches
1713 or a mismatch (' ').
1716 _identicalRes_query :
1717 _identicalRes_sbjct :
1718 _conservedRes_query :
1719 _conservedRes_sbjct :
1720 _match_indent : The number of leading space characters on each line containing
1721 the match symbols. _match_indent is 13 in this example:
1722 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
1723 Q +APWGLARIS G+ + Y YD+ AG