2 # BioPerl module for Bio::Search::Tiling::MapTiling
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Mark A. Jensen <maj@fortinbras.us>
8 # Copyright Mark A. Jensen
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
16 Bio::Search::Tiling::MapTiling - An implementation of an HSP tiling
17 algorithm, with methods to obtain frequently-requested statistics
21 # get a BLAST $hit from somewhere, then
22 $tiling = Bio::Search::Tiling::MapTiling->new($hit);
25 $numID = $tiling->identities();
26 $numCons = $tiling->conserved();
27 $query_length = $tiling->length('query');
28 $subject_length = $tiling->length('subject'); # or...
29 $subject_length = $tiling->length('hit');
31 # get a visual on the coverage map
32 print $tiling->coverage_map_as_text('query',$context,'LEGEND');
35 $context = $tiling->_context( -type => 'subject', -strand=> 1, -frame=>1);
36 @covering_hsps_for_subject = $tiling->next_tiling('subject',$context);
37 $context = $tiling->_context( -type => 'query', -strand=> -1, -frame=>0);
38 @covering_hsps_for_query = $tiling->next_tiling('query', $context);
42 Frequently, users want to use a set of high-scoring pairs (HSPs)
43 obtained from a BLAST or other search to assess the overall level of
44 identity, conservation, or coverage represented by matches between a
45 subject and a query sequence. Because a set of HSPs frequently
46 describes multiple overlapping sequence fragments, a simple summation of
47 statistics over the HSPs will generally overestimate those
48 statistics. To obtain an accurate estimate of global hit statistics, a
49 'tiling' of HSPs onto either the subject or the query sequence must be
50 performed, in order to properly correct for this.
52 This module will execute a tiling algorithm on a given hit based on an
53 interval decomposition I'm calling the "coverage map". Internal object
54 methods compute the various statistics, which are then stored in
55 appropriately-named public object attributes. See
56 L<Bio::Search::Tiling::MapTileUtils> for more info on the algorithm.
58 =head2 STRAND/FRAME CONTEXTS
60 In BLASTX, TBLASTN, and TBLASTX reports, strand and frame information
61 are reported for the query, subject, or query and subject,
62 respectively, for each HSP. Tilings for these sequence types are only
63 meaningful when they include HSPs in the same strand and frame, or
64 "context". So, in these situations, the context must be specified
65 in the method calls or the methods will throw.
67 Contexts are specified as strings: C<[ 'all' | [m|p][_|0|1|2] ]>, where
68 C<all> = all HSPs (will throw if context must be specified), C<m> = minus
69 strand, C<p> = plus strand, and C<_> = no frame info, C<0,1,2> = respective
70 (absolute) frame. The L</_make_context_key> method will convert a (strand,
71 frame) specification to a context string, e.g.:
73 $context = $self->_context(-type=>'query', -strand=>-1, -frame=>-2);
77 The contexts present among the HSPs in a hit are identified and stored
78 for convenience upon object construction. These are accessed off the
79 object with the L</contexts> method. If contexts don't apply for the
80 given report, this returns C<('all')>.
82 =head1 TILED ALIGNMENTS
84 The experimental method L<ALIGNMENTS/get_tiled_alns> will use a tiling
85 to concatenate tiled hsps into a series of L<Bio::SimpleAlign>
88 @alns = $tiling->get_tiled_alns($type, $context);
90 Each alignment contains two sequences with ids 'query' and 'subject',
91 and consists of a concatenation of tiling HSPs which overlap or are
92 directly adjacent. The alignment are returned in C<$type> sequence
93 order. When HSPs overlap, the alignment sequence is taken from the HSP
94 which comes first in the coverage map array.
96 The sequences in each alignment contain features (even though they are
97 L<Bio::LocatableSeq> objects) which map the original query/subject
98 coordinates to the new alignment sequence coordinates. You can
99 determine the original BLAST fragments this way:
101 $aln = ($tiling->get_tiled_alns)[0];
102 $qseq = $aln->get_seq_by_id('query');
103 $hseq = $aln->get_seq_by_id('subject');
104 foreach my $feat ($qseq->get_SeqFeatures) {
105 $org_start = ($feat->get_tag_values('query_start'))[0];
106 $org_end = ($feat->get_tag_values('query_end'))[0];
107 # original fragment as represented in the tiled alignment:
108 $org_fragment = $feat->seq;
110 foreach my $feat ($hseq->get_SeqFeatures) {
111 $org_start = ($feat->get_tag_values('subject_start'))[0];
112 $org_end = ($feat->get_tag_values('subject_end'))[0];
113 # original fragment as represented in the tiled alignment:
114 $org_fragment = $feat->seq;
119 The major calculations are made just-in-time, and then memoized. So,
120 for example, for a given MapTiling object, a coverage map would
121 usually be calculated only once (for the query), and at most twice (if
122 the subject perspective is also desired), and then only when a
123 statistic is first accessed. Afterward, the map and/or any statistic
124 is read from storage. So feel free to call the statistic methods
125 frequently if it suits you.
131 User feedback is an integral part of the evolution of this and other
132 Bioperl modules. Send your comments and suggestions preferably to
133 the Bioperl mailing list. Your participation is much appreciated.
135 bioperl-l@bioperl.org - General discussion
136 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
140 Please direct usage questions or support issues to the mailing list:
142 I<bioperl-l@bioperl.org>
144 rather than to the module maintainer directly. Many experienced and
145 reponsive experts will be able look at the problem and quickly
146 address it. Please include a thorough description of the problem
147 with code and data examples if at all possible.
149 =head2 Reporting Bugs
151 Report bugs to the Bioperl bug tracking system to help us keep track
152 of the bugs and their resolution. Bug reports can be submitted via
155 https://github.com/bioperl/bioperl-live/issues
157 =head1 AUTHOR - Mark A. Jensen
159 Email maj -at- fortinbras -dot- us
163 The rest of the documentation details each of the object methods.
164 Internal methods are usually preceded with a _
168 # Let the code begin...
170 package Bio
::Search
::Tiling
::MapTiling
;
175 use Bio
::Search
::Tiling
::TilingI
;
176 use Bio
::Search
::Tiling
::MapTileUtils
;
177 use Bio
::LocatableSeq
;
179 # use base qw(Bio::Root::Root Bio::Search::Tiling::TilingI);
180 use base
qw(Bio::Root::Root Bio::Search::Tiling::TilingI);
187 Usage : my $obj = new Bio::Search::Tiling::GenericTiling();
188 Function: Builds a new Bio::Search::Tiling::GenericTiling object
189 Returns : an instance of Bio::Search::Tiling::GenericTiling
190 Args : -hit => $a_Bio_Search_Hit_HitI_object
191 general filter function:
192 -hsp_filter => sub { my $this_hsp = shift;
202 my $self = $class->SUPER::new
(@args);
203 my($hit, $filter) = $self->_rearrange( [qw( HIT HSP_FILTER)],@args );
205 $self->throw("HitI object required") unless $hit;
206 $self->throw("Argument must be HitI object") unless ( ref $hit && $hit->isa('Bio::Search::Hit::HitI') );
208 $self->_set_attributes();
209 $self->{"_algorithm"} = $hit->algorithm;
211 my @hsps = $hit->hsps;
212 # apply filter function if requested
213 if ( defined $filter ) {
214 if ( ref($filter) eq 'CODE' ) {
215 @hsps = map { $filter->($_) ?
$_ : () } @hsps;
218 $self->warn("-filter is not a coderef; ignoring");
222 # identify available contexts
223 for my $t (qw( query hit )) {
225 for my $i (0..$#hsps) {
226 my $ctxt = $self->_context(
228 -strand
=> $hsps[$i]->strand($t),
229 -frame
=> $hsps[$i]->frame($t));
230 $contexts{$ctxt} ||= [];
231 push @
{$contexts{$ctxt}}, $i;
233 $self->{"_contexts_${t}"} = \
%contexts;
236 $self->warn("No HSPs present in hit after filtering") unless (@hsps);
241 # a tiling is based on the set of hsps contained in a single hit.
242 # check all the boundaries - zero hsps, one hsp, all disjoint hsps
244 =head1 TILING ITERATORS
249 Usage : @hsps = $self->next_tiling($type);
250 Function: Obtain a tiling: a minimal set of HSPs covering the $type
251 ('hit', 'subject', 'query') sequence
253 Returns : an array of HSPI objects
254 Args : scalar $type: one of 'hit', 'subject', 'query', with
255 'subject' an alias for 'hit'
261 my ($type, $context) = @_;
262 $self->_check_type_arg(\
$type);
263 $self->_check_context_arg($type, \
$context);
264 return $self->_tiling_iterator($type, $context)->();
267 =head2 rewind_tilings
269 Title : rewind_tilings
270 Usage : $self->rewind_tilings($type)
271 Function: Reset the next_tilings($type) iterator
273 Returns : True on success
274 Args : scalar $type: one of 'hit', 'subject', 'query';
281 my ($type,$context) = @_;
282 $self->_check_type_arg(\
$type);
283 $self->_check_context_arg($type, \
$context);
284 return $self->_tiling_iterator($type, $context)->('REWIND');
289 =head2 get_tiled_alns()
291 Title : get_tiled_alns
292 Usage : @alns = $tiling->get_tiled_alns($type, $context)
293 Function: Use a tiling to construct a minimal set of alignment
294 objects covering the region specified by $type/$context
295 by splicing adjacent HSP tiles
296 Returns : an array of Bio::SimpleAlign objects; see Note below
297 Args : scalar $type: one of 'hit', 'subject', 'query'
299 scalar $context: strand/frame context string
300 Following $type and $context, an array of
301 ordered, tiled HSP objects can be specified; this is
302 the tiling that will directly the alignment construction
303 default -- the first tiling provided by a tiling iterator
304 Notes : Each returned alignment is a concatenation of adjacent tiles.
305 The set of alignments will cover all regions described by the
306 $type/$context pair in the hit. The pair of sequences in each
307 alignment have ids 'query' and 'subject', and each sequence
308 possesses SeqFeatures that map the original query or subject
309 coordinates to the sequence coordinates in the tiled alignment.
315 my ($type, $context) = @_;
316 $self->_check_type_arg(\
$type);
317 $self->_check_context_arg($type, \
$context);
318 my $t = shift; # first arg after type/context, arrayref to a tiling
320 if ($t && (ref($t) eq 'ARRAY')) {
323 else { # otherwise, take the first tiling available
325 @tiling = $self->_make_tiling_iterator($type,$context)->();
329 my @map = $self->coverage_map($type, $context);
330 my @intervals = map {$_->[0]} @map; # disjoint decomp
331 # divide into disjoint covering groups
332 my @groups = covering_groups
(@intervals);
334 require Bio
::SimpleAlign
;
335 require Bio
::SeqFeature
::Generic
;
336 # cut hsp aligns along the disjoint decomp
337 # look for gaps...or add gaps?
338 my ($q_start, $h_start, $q_strand, $h_strand);
340 for my $grp (@groups) {
341 my $taln = Bio
::SimpleAlign
->new();
342 my (@qfeats, @hfeats);
343 my $query_string = '';
345 my ($qlen,$hlen) = (0,0);
346 my ($qinc, $hinc, $qstart, $hstart);
347 for my $intvl (@
$grp) {
348 # following just chooses the first available hsp containing the
349 # interval -- this is arbitrary, could be smarter.
350 my $aln = ( containing_hsps
($intvl, @tiling) )[0]->get_aln;
351 my $qseq = $aln->get_seq_by_pos(1);
352 my $hseq = $aln->get_seq_by_pos(2);
353 $qstart ||= $qseq->start;
354 $hstart ||= $hseq->start;
355 # calculate the slice boundaries
359 $beg = $aln->column_from_residue_number($qseq->id, $intvl->[0]);
360 $end = $aln->column_from_residue_number($qseq->id, $intvl->[1]);
363 /subject|hit/ && do {
364 $beg = $aln->column_from_residue_number($hseq->id, $intvl->[0]);
365 $end = $aln->column_from_residue_number($hseq->id, $intvl->[1]);
369 $aln = $aln->slice($beg, $end);
370 $qseq = $aln->get_seq_by_pos(1);
371 $hseq = $aln->get_seq_by_pos(2);
372 $qinc = $qseq->length - $qseq->num_gaps($Bio::LocatableSeq
::GAP_SYMBOLS
);
373 $hinc = $hseq->length - $hseq->num_gaps($Bio::LocatableSeq
::GAP_SYMBOLS
);
375 push @qfeats, Bio
::SeqFeature
::Generic
->new(
378 -strand
=> $qseq->strand,
380 -source_tag
=> 'BLAST',
381 -display_name
=> 'query coordinates',
382 -tag
=> { query_id
=> $qseq->id,
383 query_desc
=> $qseq->desc,
384 query_start
=> $qstart + (($qseq->strand && $qseq->strand < 0) ?
-1 : 1)*$qlen,
385 query_end
=> $qstart + (($qseq->strand && $qseq->strand < 0) ?
-1 : 1)*($qlen+$qinc-1),
388 push @hfeats, Bio
::SeqFeature
::Generic
->new(
391 -strand
=> $hseq->strand,
392 -primary
=> 'subject/hit',
393 -source_tag
=> 'BLAST',
394 -display_name
=> 'subject/hit coordinates',
395 -tag
=> { subject_id
=> $hseq->id,
396 subject_desc
=> $hseq->desc,
397 subject_start
=> $hstart + (($hseq->strand && $hseq->strand < 0) ?
-1 : 1)*$hlen,
398 subject_end
=> $hstart + (($hseq->strand && $hseq->strand < 0) ?
-1 : 1)*($hlen+$hinc-1)
401 $query_string .= $qseq->seq;
402 $hit_string .= $hseq->seq;
406 # create the LocatableSeqs and add the features to each
407 # then add the seqs to the new aln
408 # note in MapTileUtils, Bio::FeatureHolderI methods have been
409 # mixed into Bio::LocatableSeq
410 my $qseq = Bio
::LocatableSeq
->new( -id
=> 'query',
411 -seq
=> $query_string);
412 $qseq->add_SeqFeature(@qfeats);
413 my $hseq = Bio
::LocatableSeq
->new( -id
=> 'subject',
414 -seq
=> $hit_string );
415 $hseq->add_SeqFeature(@hfeats);
416 $taln->add_seq($qseq);
417 $taln->add_seq($hseq);
429 Usage : $tiling->identities($type, $action, $context)
430 Function: Retrieve the calculated number of identities for the invocant
432 Returns : value of identities (a scalar)
433 Args : scalar $type: one of 'hit', 'subject', 'query'
435 option scalar $action: one of 'exact', 'est', 'fast', 'max'
437 option scalar $context: strand/frame context string
444 my ($type, $action, $context) = @_;
445 $self->_check_type_arg(\
$type);
446 $self->_check_action_arg(\
$action);
447 $self->_check_context_arg($type, \
$context);
448 if (!defined $self->{"identities_${type}_${action}_${context}"}) {
449 $self->_calc_stats($type, $action, $context);
451 return $self->{"identities_${type}_${action}_${context}"};
457 Usage : $tiling->conserved($type, $action)
458 Function: Retrieve the calculated number of conserved sites for the invocant
460 Returns : value of conserved (a scalar)
461 Args : scalar $type: one of 'hit', 'subject', 'query'
463 option scalar $action: one of 'exact', 'est', 'fast', 'max'
465 option scalar $context: strand/frame context string
472 my ($type, $action, $context) = @_;
473 $self->_check_type_arg(\
$type);
474 $self->_check_action_arg(\
$action);
475 $self->_check_context_arg($type, \
$context);
476 if (!defined $self->{"conserved_${type}_${action}_${context}"}) {
477 $self->_calc_stats($type, $action, $context);
479 return $self->{"conserved_${type}_${action}_${context}"};
485 Usage : $tiling->length($type, $action, $context)
486 Function: Retrieve the total length of aligned residues for
489 Returns : value of length (a scalar)
490 Args : scalar $type: one of 'hit', 'subject', 'query'
492 option scalar $action: one of 'exact', 'est', 'fast', 'max'
494 option scalar $context: strand/frame context string
501 my ($type,$action,$context) = @_;
502 $self->_check_type_arg(\
$type);
503 $self->_check_action_arg(\
$action);
504 $self->_check_context_arg($type, \
$context);
505 if (!defined $self->{"length_${type}_${action}_${context}"}) {
506 $self->_calc_stats($type, $action, $context);
508 return $self->{"length_${type}_${action}_${context}"};
514 Usage : $tiling->frac($type, $denom, $action, $context, $method)
515 Function: Return the fraction of sequence length consisting
516 of desired kinds of pairs (given by $method),
517 with respect to $denom
518 Returns : scalar float
519 Args : -type => one of 'hit', 'subject', 'query'
520 -denom => one of 'total', 'aligned'
521 -action => one of 'exact', 'est', 'fast', 'max'
522 -context => strand/frame context string
523 -method => one of 'identical', 'conserved'
524 Note : $denom == 'aligned', return desired_stat/num_aligned
525 $denom == 'total', return desired_stat/_reported_length
526 (i.e., length of the original input sequences)
527 Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
528 reported lengths of translated dna are reduced by
529 a factor of 3, to provide fractions relative to
530 amino acid coordinates.
537 my ($type, $denom, $action, $context, $method) = $self->_rearrange([qw(TYPE DENOM ACTION CONTEXT METHOD)],@args);
538 $self->_check_type_arg(\
$type);
539 $self->_check_action_arg(\
$action);
540 $self->_check_context_arg($type, \
$context);
541 unless ($method and grep(/^$method$/, qw( identical conserved ))) {
542 $self->throw("-method must specified; one of ('identical', 'conserved')");
545 unless (grep /^$denom/, qw( total aligned )) {
546 $self->throw("Denominator selection must be one of ('total', 'aligned'), not '$denom'");
548 my $key = "frac_${method}_${type}_${denom}_${action}_${context}";
551 $_ eq 'identical' && do {
552 $stat = $self->identities($type, $action, $context);
555 $_ eq 'conserved' && do {
556 $stat = $self->conserved($type, $action, $context);
560 $self->throw("What are YOU doing here?");
563 if (!defined $self->{$key}) {
567 $stat/$self->_reported_length($type); # need fudge fac??
572 $stat/$self->length($type,$action,$context);
576 $self->throw("What are YOU doing here?");
580 return $self->{$key};
583 =head2 frac_identical
585 Title : frac_identical
586 Usage : $tiling->frac_identical($type, $denom, $action, $context)
587 Function: Return the fraction of sequence length consisting
588 of identical pairs, with respect to $denom
589 Returns : scalar float
590 Args : -type => one of 'hit', 'subject', 'query'
591 -denom => one of 'total', 'aligned'
592 -action => one of 'exact', 'est', 'fast', 'max'
593 -context => strand/frame context string
594 Note : $denom == 'aligned', return conserved/num_aligned
595 $denom == 'total', return conserved/_reported_length
596 (i.e., length of the original input sequences)
597 Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
598 reported lengths of translated dna are reduced by
599 a factor of 3, to provide fractions relative to
600 amino acid coordinates.
601 Note : This an alias that calls frac()
608 my ($type, $denom, $action,$context) = $self->_rearrange( [qw
[ TYPE DENOM ACTION CONTEXT
]],@args );
609 $self->frac( -type
=>$type, -denom
=>$denom, -action
=>$action, -method
=>'identical', -context
=>$context);
612 =head2 frac_conserved
614 Title : frac_conserved
615 Usage : $tiling->frac_conserved($type, $denom, $action, $context)
616 Function: Return the fraction of sequence length consisting
617 of conserved pairs, with respect to $denom
618 Returns : scalar float
619 Args : -type => one of 'hit', 'subject', 'query'
620 -denom => one of 'total', 'aligned'
621 -action => one of 'exact', 'est', 'fast', 'max'
622 -context => strand/frame context string
623 Note : $denom == 'aligned', return conserved/num_aligned
624 $denom == 'total', return conserved/_reported_length
625 (i.e., length of the original input sequences)
626 Note : In keeping with the spirit of Bio::Search::HSP::HSPI,
627 reported lengths of translated dna are reduced by
628 a factor of 3, to provide fractions relative to
629 amino acid coordinates.
630 Note : This an alias that calls frac()
637 my ($type, $denom, $action, $context) = $self->_rearrange( [qw
[ TYPE DENOM ACTION CONTEXT
]],@args );
638 $self->frac( -type
=>$type, -denom
=>$denom, -action
=>$action, -context
=>$context, -method
=>'conserved');
644 Aliases : frac_aligned_query - frac_aligned(-type=>'query',...)
645 frac_aligned_hit - frac_aligned(-type=>'hit',...)
646 Usage : $tiling->frac_aligned(-type=>$type,
649 Function: Return the fraction of input sequence length
650 that was aligned by the algorithm
651 Returns : scalar float
652 Args : -type => one of 'hit', 'subject', 'query'
653 -action => one of 'exact', 'est', 'fast', 'max'
654 -context => strand/frame context string
659 my ($self, @args) = @_;
660 my ($type, $action, $context) = $self->_rearrange([qw(TYPE ACTION CONTEXT)],@args);
661 $self->_check_type_arg(\
$type);
662 $self->_check_action_arg(\
$action);
663 $self->_check_context_arg($type, \
$context);
664 if (!$self->{"frac_aligned_${type}_${action}_${context}"}) {
665 $self->{"frac_aligned_${type}_${action}_${context}"} = $self->num_aligned($type,$action,$context)/$self->_reported_length($type);
667 return $self->{"frac_aligned_${type}_${action}_${context}"};
670 sub frac_aligned_query
{ shift->frac_aligned(-type
=>'query', @_) }
671 sub frac_aligned_hit
{ shift->frac_aligned(-type
=>'hit', @_) }
677 Usage : $tiling->num_aligned(-type=>$type)
678 Function: Return the number of residues of sequence $type
679 that were aligned by the algorithm
681 Args : -type => one of 'hit', 'subject', 'query'
682 -action => one of 'exact', 'est', 'fast', 'max'
683 -context => strand/frame context string
684 Note : Since this is calculated from reported coordinates,
685 not symbol string counts, it is already in terms of
687 Note : Aliases length()
691 sub num_aligned
{ shift->length( @_ ) };
695 Title : num_unaligned
696 Usage : $tiling->num_unaligned(-type=>$type)
697 Function: Return the number of residues of sequence $type
698 that were left unaligned by the algorithm
700 Args : -type => one of 'hit', 'subject', 'query'
701 -action => one of 'exact', 'est', 'fast', 'max'
702 -context => strand/frame context string
703 Note : Since this is calculated from reported coordinates,
704 not symbol string counts, it is already in terms of
711 my ($type,$action,$context) = @_;
713 $self->_check_type_arg(\
$type);
714 $self->_check_action_arg(\
$action);
715 $self->_check_context_arg($type, \
$context);
716 if (!defined $self->{"num_unaligned_${type}_${action}_${context}"}) {
717 $self->{"num_unaligned_${type}_${action}_${context}"} = $self->_reported_length($type)-$self->num_aligned($type,$action,$context);
719 return $self->{"num_unaligned_${type}_${action}_${context}"};
726 Usage : $tiling->range(-type=>$type)
727 Function: Returns the extent of the longest tiling
728 as ($min_coord, $max_coord)
729 Returns : array of two scalar integers
730 Args : -type => one of 'hit', 'subject', 'query'
731 -context => strand/frame context string
736 my ($self, $type, $context) = @_;
737 $self->_check_type_arg(\
$type);
738 $self->_check_context_arg($type, \
$context);
739 my @a = $self->_contig_intersection($type,$context);
740 return ($a[0][0], $a[-1][1]);
750 Usage : $map = $tiling->coverage_map($type)
751 Function: Property to contain the coverage map calculated
752 by _calc_coverage_map() - see that for
755 Returns : value of coverage_map_$type as an array
756 Args : scalar $type: one of 'hit', 'subject', 'query'
764 my ($type, $context) = @_;
765 $self->_check_type_arg(\
$type);
766 $self->_check_context_arg($type, \
$context);
768 if (!defined $self->{"coverage_map_${type}_${context}"}) {
769 # following calculates coverage maps in all strands/frames
771 $self->_calc_coverage_map($type, $context);
773 # if undef is returned, then there were no hsps for given strand/frame
774 if (!defined $self->{"coverage_map_${type}_${context}"}) {
775 $self->warn("No HSPS present for type '$type' in context '$context' for this hit");
778 return @
{$self->{"coverage_map_${type}_${context}"}};
781 =head2 coverage_map_as_text
783 Title : coverage_map_as_text
784 Usage : $tiling->coverage_map_as_text($type, $legend_flag)
785 Function: Format a text-graphic representation of the
787 Returns : an array of scalar strings, suitable for printing
788 Args : $type: one of 'query', 'hit', 'subject'
789 $context: strand/frame context string
790 $legend_flag: boolean; add a legend indicating
791 the actual interval coordinates for each component
792 interval and hsp (in the $type sequence context)
793 Example : print $tiling->coverage_map_as_text('query',1);
797 sub coverage_map_as_text
{
799 my ($type, $context, $legend_q) = @_;
800 $self->_check_type_arg(\
$type);
801 $self->_check_context_arg($type, \
$context);
803 my @map = $self->coverage_map($type, $context);
805 my @hsps = $self->hit->hsps;
807 require Tie
::RefHash
;
808 tie
%hsps_i, 'Tie::RefHash';
809 @hsps_i{@hsps} = (0..$#hsps);
812 my @hspx = ('') x
@hsps;
813 my @these_hsps = @
{$map[$_]->[1]};
814 @hspx[@hsps_i{@these_hsps}] = ('*') x
@these_hsps;
819 push @ret, "\tIntvl\n";
820 push @ret, "HSPS\t", join ("\t", (0..$#map)), "\n";
821 foreach my $h (0..$#hsps) {
822 push @ret, join("\t", $h, map { $mx[$_][$h] } (0..$#map) ),"\n";
825 push @ret, "Interval legend\n";
827 push @ret, sprintf("%d\t[%d, %d]\n", $_, @
{$map[$_][0]});
829 push @ret, "HSP legend\n";
830 my @ints = get_intervals_from_hsps
($type,@hsps);
831 foreach (0..$#hsps) {
832 push @ret, sprintf("%d\t[%d, %d]\n", $_, @
{$ints[$_]});
844 Returns : The HitI object associated with the invocant
852 $self->warn("Getter only") if @_;
853 return $self->{'hit'};
859 Usage : $tiling->hsps()
860 Function: Container for the HSP objects associated with invocant
862 Returns : an array of hsps associated with the hit
863 Args : on set, new value (an arrayref or undef, optional)
869 return $self->{'hsps'} = shift if @_;
870 return @
{$self->{'hsps'}};
876 Usage : @contexts = $tiling->context($type) or
877 @indices = $tiling->context($type, $context)
878 Function: Retrieve the set of available contexts in the hit,
879 or the indices of hsps having the given context
880 (integer indices for the array returned by $self->hsps)
881 Returns : array of scalar context strings or
882 array of scalar positive integers
883 undef if no hsps in given context
884 Args : $type: one of 'query', 'hit', 'subject'
885 optional $context: context string
891 my ($type, $context) = @_;
892 $self->_check_type_arg(\
$type);
893 return keys %{$self->{"_contexts_$type"}} unless defined $context;
894 return undef unless $self->{"_contexts_$type"}{$context};
895 return @
{$self->{"_contexts_$type"}{$context}};
901 Usage : $tiling->mapping($type)
902 Function: Retrieve the mapping coefficient for the sequence type
903 based on the underlying algorithm
904 Returns : scalar integer (mapping coefficient)
905 Args : $type: one of 'query', 'hit', 'subject'
906 Note : getter only (set in constructor)
913 $self->_check_type_arg(\
$type);
914 return $self->{"_mapping_${type}"};
917 =head2 default_context
919 Title : default_context
920 Usage : $tiling->default_context($type)
921 Function: Retrieve the default strand/frame context string
922 for the sequence type based on the underlying algorithm
923 Returns : scalar string (context string)
924 Args : $type: one of 'query', 'hit', 'subject'
925 Note : getter only (set in constructor)
932 $self->_check_type_arg(\
$type);
933 return $self->{"_def_context_${type}"};
939 Usage : $tiling->algorithm
940 Function: Retrieve the algorithm name associated with the
941 invocant's hit object
942 Returns : scalar string
944 Note : getter only (set in constructor)
950 $self->warn("Getter only") if @_;
951 return $self->{"_algorithm"};
954 =head1 "PRIVATE" METHODS
958 See L<Bio::Search::Tiling::MapTileUtils> for lower level
961 =head2 _calc_coverage_map
963 Title : _calc_coverage_map
964 Usage : $tiling->_calc_coverage_map($type)
965 Function: Calculates the coverage map for the object's associated
966 hit from the perspective of the desired $type (see Args:)
967 and sets the coverage_map() property
968 Returns : True on success
969 Args : optional scalar $type: one of 'hit'|'subject'|'query'
971 Note : The "coverage map" is an array with the following format:
972 ( [ $component_interval => [ @containing_hsps ] ], ... ),
973 where $component_interval is a closed interval (see
974 DESCRIPTION) of the form [$a0, $a1] with $a0 <= $a1, and
975 @containing_hsps is an array of all HspI objects in the hit
976 which completely contain the $component_interval.
977 The set of $component_interval's is a disjoint decomposition
978 of the minimum set of minimal intervals that completely
979 cover the hit's HSPs (from the perspective of the $type)
980 Note : This calculates the map for all strand/frame contexts available
985 sub _calc_coverage_map
{
988 $self->_check_type_arg(\
$type);
990 # obtain the [start, end] intervals for all hsps in the hit (relative
992 unless ($self->{'hsps'}) {
993 $self->warn("No HSPs for this hit");
997 my (@map, @hsps, %filters, @intervals);
1001 my $c = $self->mapping($type);
1003 # create the possible maps
1004 for my $context ($self->contexts($type)) {
1006 @hsps = ($self->hsps)[$self->contexts($type, $context)];
1007 @intervals = get_intervals_from_hsps
( $type, @hsps );
1009 my $f = ($intervals[0]->[0] - 1) % $c;
1011 # convert interval endpoints...
1013 $$_[0] = ($$_[0] - $f + $c - 1)/$c;
1014 $$_[1] = ($$_[1] - $f)/$c;
1017 # determine the minimal set of disjoint intervals that cover the
1018 # set of hsp intervals
1019 my @dj_set = interval_tiling
(\
@intervals);
1021 # decompose each disjoint interval into another set of disjoint
1022 # intervals, each of which is completely contained within the
1023 # original hsp intervals with which it overlaps
1026 for my $dj_elt (@dj_set) {
1027 my ($covering, $indices) = @
$dj_elt;
1028 my @covering_hsps = @hsps[@
$indices];
1029 my @coverers = @intervals[@
$indices];
1030 @decomp = decompose_interval
( \
@coverers );
1032 my ($component, $container_indices) = @
{$_};
1033 push @map, [ $component,
1034 [@covering_hsps[@
$container_indices]] ];
1039 # unconvert the components:
1042 $$_[0][0] = $c*$$_[0][0] - $c + 1 + $f;
1043 $$_[0][1] = $c*$$_[0][1] + $f;
1046 $$_[0][0] = $c*$$_[0][0] - $c + 1 + $f;
1047 $$_[0][1] = $c*$$_[0][1] + $f;
1050 # sort the map on the interval left-ends
1051 @map = sort { $a->[0][0]<=>$b->[0][0] } @map;
1052 $self->{"coverage_map_${type}_${context}"} = [@map];
1053 # set the _contig_intersection attribute here (side effect)
1054 $self->{"_contig_intersection_${type}_${context}"} = [map { $$_[0] } @map];
1063 Usage : $tiling->_calc_stats($type, $action, $context)
1064 Function: Calculates [estimated] tiling statistics (identities, conserved sites
1065 length) and sets the public accessors
1066 Returns : True on success
1067 Args : scalar $type: one of 'hit', 'subject', 'query'
1069 optional scalar $action: requests calculation method
1070 currently one of 'exact', 'est', 'fast', 'max'
1071 option scalar $context: strand/frame context string
1072 Note : Action: The statistics are calculated by summing quantities
1073 over the disjoint component intervals, taking into account
1074 coverage of those intervals by multiple HSPs. The action
1075 tells the algorithm how to obtain those quantities--
1076 'exact' will use Bio::Search::HSP::HSPI::matches
1077 to count the appropriate segment of the homology string;
1078 'est' will estimate the statistics by multiplying the
1079 fraction of the HSP overlapped by the component interval
1080 (see MapTileUtils) by the BLAST-reported identities/postives
1081 (this may be convenient for BLAST summary report formats)
1082 * Both exact and est take the average over the number of HSPs
1083 that overlap the component interval.
1084 'max' uses the exact method to calculate the statistics,
1085 and returns only the maximum identites/positives over
1086 overlapping HSP for the component interval. No averaging
1088 'fast' doesn't involve tiling at all (hence the name),
1089 but it seems like a very good estimate, and uses only
1090 reported values, and so does not require sequence data. It
1091 calculates an average of reported identities, conserved
1092 sites, and lengths, over unmodified hsps in the hit,
1093 weighted by the length of the hsps.
1099 my ($type, $action, $context) = @_;
1100 # need to check args here, in case method is called internally.
1101 $self->_check_type_arg(\
$type);
1102 $self->_check_action_arg(\
$action);
1103 $self->_check_context_arg($type, \
$context);
1104 my ($ident, $cons, $length) = (0,0,0);
1106 # fast : avoid coverage map altogether, get a pretty damn
1107 # good estimate with a weighted average of reported hsp
1110 ($action eq 'fast') && do {
1111 my @hsps = $self->hit->hsps;
1112 @hsps = @hsps[$self->contexts($type, $context)];
1113 # weights for averages
1114 my @wt = map {$_->length($type)} @hsps;
1115 my $sum = eval( join('+',@wt) );
1116 $_ /= $sum for (@wt);
1119 $ident += $wt*$_->matches_MT($type,'identities');
1120 $cons += $wt*$_->matches_MT($type,'conserved');
1121 $length += $wt*$_->length($type);
1127 # calculate identities/conserved sites in tiling
1128 # estimate based on the fraction of the component interval covered
1129 # and ident/cons reported by the HSPs
1130 ($action ne 'fast') && do {
1131 foreach ($self->coverage_map($type, $context)) {
1132 my ($intvl, $hsps) = @
{$_};
1133 my $len = ($$intvl[1]-$$intvl[0]+1);
1134 my $ncover = ($action eq 'max') ?
1 : scalar @
$hsps;
1135 my ($acc_i, $acc_c) = (0,0);
1136 foreach my $hsp (@
$hsps) {
1138 ($_ eq 'est') && do {
1139 my ($inc_i, $inc_c) = $hsp->matches_MT(
1141 -action
=> 'searchutils',
1143 my $frac = $len/$hsp->length($type);
1144 $acc_i += $inc_i * $frac;
1145 $acc_c += $inc_c * $frac;
1148 ($_ eq 'max') && do {
1149 my ($inc_i, $inc_c) = $hsp->matches_MT(
1151 -action
=> 'searchutils',
1152 -start
=> $$intvl[0],
1155 $acc_i = ($acc_i > $inc_i) ?
$acc_i : $inc_i;
1156 $acc_c = ($acc_c > $inc_c) ?
$acc_c : $inc_c;
1159 (!$_ || ($_ eq 'exact')) && do {
1160 my ($inc_i, $inc_c) = $hsp->matches_MT(
1162 -action
=> 'searchutils',
1163 -start
=> $$intvl[0],
1172 $ident += ($acc_i/$ncover);
1173 $cons += ($acc_c/$ncover);
1178 $self->{"identities_${type}_${action}_${context}"} = $ident;
1179 $self->{"conserved_${type}_${action}_${context}"} = $cons;
1180 $self->{"length_${type}_${action}_${context}"} = $length;
1185 =head2 Tiling Helper Methods
1189 # coverage_map is of the form
1190 # ( [ $interval, \@containing_hsps ], ... )
1192 # so, for each interval, pick one of the containing hsps,
1193 # and return the union of all the picks.
1195 # use the combinatorial generating iterator, with
1196 # the urns containing the @containing_hsps for each
1199 =head2 _make_tiling_iterator
1201 Title : _make_tiling_iterator
1202 Usage : $self->_make_tiling_iterator($type)
1203 Function: Create an iterator code ref that will step through all
1204 minimal combinations of HSPs that produce complete coverage
1205 of the $type ('hit', 'subject', 'query') sequence,
1206 and set the correct iterator property of the invocant
1208 Returns : The iterator
1209 Args : scalar $type, one of 'hit', 'subject', 'query';
1214 sub _make_tiling_iterator
{
1217 my ($type, $context) = @_;
1218 $self->_check_type_arg(\
$type);
1219 $self->_check_context_arg($type, \
$context);
1221 # initialize the urns
1222 my @urns = map { [0, $$_[1]] } $self->coverage_map($type, $context);
1227 if (my $rewind = shift) {
1228 # reinitialize urn indices
1229 $$_[0] = 0 for (@urns);
1234 return if $FINISHED;
1236 my $finished_incrementing = 0;
1237 # @ret is the collector of urn choices
1240 for my $urn (@urns) {
1241 my ($n, $hsps) = @
$urn;
1242 push @ret, $$hsps[$n];
1243 unless ($finished_incrementing) {
1244 if ($n == $#$hsps) { $$urn[0] = 0; }
1245 else { ($$urn[0])++; $finished_incrementing = 1 }
1250 $FINISHED = 1 unless $finished_incrementing;
1252 # $hsp->rank is a unique identifier for an hsp in a hit.
1253 # preserve order in @ret
1256 @order{(0..$#ret)} = @ret;
1257 $uniq{$order{$_}->rank} = $_ for (0..$#ret);
1258 @ret = @order{ sort {$a<=>$b} values %uniq };
1265 =head2 _tiling_iterator
1267 Title : _tiling_iterator
1268 Usage : $tiling->_tiling_iterator($type,$context)
1269 Function: Retrieve the tiling iterator coderef for the requested
1270 $type ('hit', 'subject', 'query')
1272 Returns : coderef to the desired iterator
1273 Args : scalar $type, one of 'hit', 'subject', 'query'
1275 option scalar $context: strand/frame context string
1280 sub _tiling_iterator
{
1282 my ($type, $context) = @_;
1283 $self->_check_type_arg(\
$type);
1284 $self->_check_context_arg($type, \
$context);
1286 if (!defined $self->{"_tiling_iterator_${type}_${context}"}) {
1287 $self->{"_tiling_iterator_${type}_${context}"} =
1288 $self->_make_tiling_iterator($type,$context);
1290 return $self->{"_tiling_iterator_${type}_${context}"};
1292 =head2 Construction Helper Methods
1294 See also L<Bio::Search::Tiling::MapTileUtils>.
1298 sub _check_type_arg
{
1300 my $typeref = shift;
1301 $$typeref ||= 'query';
1302 $self->throw("Unknown type '$$typeref'") unless grep(/^$$typeref$/, qw( hit query subject ));
1303 $$typeref = 'hit' if $$typeref eq 'subject';
1307 sub _check_action_arg
{
1309 my $actionref = shift;
1311 $$actionref = ($self->_has_sequence_data ?
'exact' : 'est');
1314 $self->throw("Calc action '$$actionref' unrecognized") unless grep /^$$actionref$/, qw( est exact fast max );
1315 if ($$actionref ne 'est' and !$self->_has_sequence_data) {
1316 $self->warn("Blast file did not possess sequence data; defaulting to 'est' action");
1317 $$actionref = 'est';
1323 sub _check_context_arg
{
1325 my ($type, $contextref) = @_;
1326 if (!$$contextref) {
1327 $self->throw("Type '$type' requires strand/frame context for algorithm ".$self->algorithm) unless ($self->mapping($type) == 1);
1328 # set default according to default_context attrib
1329 $$contextref = $self->default_context($type);
1332 ($$contextref =~ /^[mp]$/) && do { $$contextref .= '_' };
1333 $self->throw("Context '$$contextref' unrecognized") unless
1334 $$contextref =~ /all|[mp][0-2_]/;
1339 =head2 _make_context_key
1341 Title : _make_context_key
1343 Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame)
1344 Function: create a string indicating strand/frame context; serves as
1345 component of memoizing hash keys
1346 Returns : scalar string
1347 Args : -type => one of ('query', 'hit', 'subject')
1348 -strand => one of (1,0,-1)
1349 -frame => one of (-2, 1, 0, 1, -2)
1350 called w/o args: returns 'all'
1354 sub _make_context_key
{
1357 my ($type, $strand, $frame) = $self->_rearrange([qw(TYPE STRAND FRAME)], @args);
1358 _check_type_arg
(\
$type);
1359 return 'all' unless (defined $strand or defined $frame);
1360 if ( defined $strand && $self->_has_strand($type) ) {
1361 if (defined $frame && $self->_has_frame($type)) {
1362 return ($strand >= 0 ?
'p' : 'm').abs($frame);
1365 return ($strand >= 0 ?
'p_' : 'm_');
1369 if (defined $frame && $self->_has_frame($type)) {
1370 $self->warn("Frame defined without strand; punting with plus strand");
1371 return 'p'.abs($frame);
1382 Alias : _make_context_key
1383 Usage : $tiling->_make_context_key(-strand => $strand, -frame => $frame)
1384 Function: create a string indicating strand/frame context; serves as
1385 component of memoizing hash keys
1386 Returns : scalar string
1387 Args : -type => one of ('query', 'hit', 'subject')
1388 -strand => one of (1,0,-1)
1389 -frame => one of (-2, 1, 0, 1, -2)
1390 called w/o args: returns 'all'
1394 sub _context
{ shift->_make_context_key(@_) }
1398 Most based on a reading of the algorithm name with a configuration lookup.
1402 =item _has_sequence_data()
1406 sub _has_sequence_data
{
1408 $self->throw("Hit attribute not yet set") unless defined $self->hit;
1409 return (($self->hit->hsps)[0]->seq_str('match') ?
1 : 0);
1412 =item _has_logical_length()
1416 sub _has_logical_length
{
1419 $self->_check_type_arg(\
$type);
1420 # based on mapping coeff
1421 $self->throw("Mapping coefficients not yet set") unless defined $self->mapping($type);
1422 return ($self->mapping($type) > 1);
1432 $self->_check_type_arg(\
$type);
1433 return $self->{"_has_strand_${type}"};
1443 $self->_check_type_arg(\
$type);
1444 return $self->{"_has_frame_${type}"};
1449 =head1 Private Accessors
1451 =head2 _contig_intersection
1453 Title : _contig_intersection
1454 Usage : $tiling->_contig_intersection($type)
1455 Function: Return the minimal set of $type coordinate intervals
1456 covered by the invocant's HSPs
1457 Returns : array of intervals (2-member arrayrefs; see MapTileUtils)
1458 Args : scalar $type: one of 'query', 'hit', 'subject'
1462 sub _contig_intersection
{
1464 my ($type, $context) = @_;
1465 $self->_check_type_arg(\
$type);
1466 $self->_check_context_arg($type, \
$context);
1467 if (!defined $self->{"_contig_intersection_${type}_${context}"}) {
1468 $self->_calc_coverage_map($type);
1470 return @
{$self->{"_contig_intersection_${type}_${context}"}};
1473 =head2 _reported_length
1475 Title : _reported_length
1476 Usage : $tiling->_reported_length($type)
1477 Function: Get the total length of the seq $type
1478 for the invocant's hit object, as reported
1479 by (not calculated from) the input data file
1480 Returns : scalar int
1481 Args : scalar $type: one of 'query', 'hit', 'subject'
1482 Note : This is kludgy; the hit object does not currently
1483 maintain accessors for these values, but the
1484 hsps possess these attributes. This is a wrapper
1485 that allows a consistent access method in the
1487 Note : Since this number is based on a reported length,
1488 it is already a "logical length".
1492 sub _reported_length
{
1495 $self->_check_type_arg(\
$type);
1496 my $key = uc( $type."_LENGTH" );
1497 return ($self->hsps)[0]->{$key};