maint: restructure to use Dist::Zilla
[bioperl-live.git] / lib / Bio / DB / GFF / Feature.pm
blob334184859a96a907eef7433fac5198d3cc3dc137
1 =head1 NAME
3 Bio::DB::GFF::Feature -- A relative segment identified by a feature type
5 =head1 SYNOPSIS
7 See L<Bio::DB::GFF>.
9 =head1 DESCRIPTION
11 Bio::DB::GFF::Feature is a stretch of sequence that corresponding to a
12 single annotation in a GFF database. It inherits from
13 Bio::DB::GFF::RelSegment, and so has all the support for relative
14 addressing of this class and its ancestors. It also inherits from
15 Bio::SeqFeatureI and so has the familiar start(), stop(),
16 primary_tag() and location() methods (it implements Bio::LocationI
17 too, if needed).
19 Bio::DB::GFF::Feature adds new methods to retrieve the annotation
20 type, group, and other GFF attributes. Annotation types are
21 represented by Bio::DB::GFF::Typename objects, a simple class that has
22 two methods called method() and source(). These correspond to the
23 method and source fields of a GFF file.
25 Annotation groups serve the dual purpose of giving the annotation a
26 human-readable name, and providing higher-order groupings of
27 subfeatures into features. The groups returned by this module are
28 objects of the Bio::DB::GFF::Featname class.
30 Bio::DB::GFF::Feature inherits from and implements the abstract
31 methods of Bio::SeqFeatureI, allowing it to interoperate with other
32 Bioperl modules.
34 Generally, you will not create or manipulate Bio::DB::GFF::Feature
35 objects directly, but use those that are returned by the
36 Bio::DB::GFF::RelSegment-E<gt>features() method.
38 =head2 Important note about start() vs end()
40 If features are derived from segments that use relative addressing
41 (which is the default), then start() will be less than end() if the
42 feature is on the opposite strand from the reference sequence. This
43 breaks Bio::SeqI compliance, but is necessary to avoid having the real
44 genomic locations designated by start() and end() swap places when
45 changing reference points.
47 To avoid this behavior, call $segment-E<gt>absolute(1) before fetching
48 features from it. This will force everything into absolute
49 coordinates.
51 For example:
53 my $segment = $db->segment('CHROMOSOME_I');
54 $segment->absolute(1);
55 my @features = $segment->features('transcript');
57 =head1 API
59 The remainder of this document describes the public and private
60 methods implemented by this module.
62 =cut
64 package Bio::DB::GFF::Feature;
66 use strict;
68 use Bio::DB::GFF::Util::Rearrange;
69 use Bio::DB::GFF::Featname;
70 use Bio::DB::GFF::Typename;
71 use Bio::DB::GFF::Homol;
72 use Bio::LocationI;
73 use Data::Dumper;
75 use vars qw($AUTOLOAD);
76 use base qw(Bio::DB::GFF::RelSegment Bio::SeqFeatureI Bio::Root::Root);
78 #'
80 *segments = *get_SeqFeatures = \&sub_SeqFeature;
82 my %CONSTANT_TAGS = (method=>1, source=>1, score=>1, phase=>1, notes=>1, id=>1, group=>1);
84 =head2 new_from_parent
86 Title : new_from_parent
87 Usage : $f = Bio::DB::GFF::Feature->new_from_parent(@args);
88 Function: create a new feature object
89 Returns : new Bio::DB::GFF::Feature object
90 Args : see below
91 Status : Internal
93 This method is called by Bio::DB::GFF to create a new feature using
94 information obtained from the GFF database. It is one of two similar
95 constructors. This one is called when the feature is generated from a
96 RelSegment object, and should inherit the coordinate system of that
97 object.
99 The 13 arguments are positional (sorry):
101 $parent a Bio::DB::GFF::RelSegment object (or descendent)
102 $start start of this feature
103 $stop stop of this feature
104 $method this feature's GFF method
105 $source this feature's GFF source
106 $score this feature's score
107 $fstrand this feature's strand (relative to the source
108 sequence, which has its own strandedness!)
109 $phase this feature's phase
110 $group this feature's group (a Bio::DB::GFF::Featname object)
111 $db_id this feature's internal database ID
112 $group_id this feature's internal group database ID
113 $tstart this feature's target start
114 $tstop this feature's target stop
116 tstart and tstop are not used for anything at the moment, since the
117 information is embedded in the group object.
119 =cut
121 # this is called for a feature that is attached to a parent sequence,
122 # in which case it inherits its coordinate reference system and strandedness
123 sub new_from_parent {
124 my $package = shift;
125 my ($parent,
126 $start,$stop,
127 $method,$source,$score,
128 $fstrand,$phase,
129 $group,$db_id,$group_id,
130 $tstart,$tstop) = @_;
132 ($start,$stop) = ($stop,$start) if defined($fstrand) and $fstrand eq '-';
133 my $class = $group ? $group->class : $parent->class;
135 my $self = bless {
136 factory => $parent->{factory},
137 sourceseq => $parent->{sourceseq},
138 strand => $parent->{strand},
139 ref => $parent->{ref},
140 refstart => $parent->{refstart},
141 refstrand => $parent->{refstrand},
142 absolute => $parent->{absolute},
143 start => $start,
144 stop => $stop,
145 type => Bio::DB::GFF::Typename->new($method,$source),
146 fstrand => $fstrand,
147 score => $score,
148 phase => $phase,
149 group => $group,
150 db_id => $db_id,
151 group_id => $group_id,
152 class => $class,
153 },$package;
154 $self;
157 =head2 new
159 Title : new
160 Usage : $f = Bio::DB::GFF::Feature->new(@args);
161 Function: create a new feature object
162 Returns : new Bio::DB::GFF::Feature object
163 Args : see below
164 Status : Internal
166 This method is called by Bio::DB::GFF to create a new feature using
167 information obtained from the GFF database. It is one of two similar
168 constructors. This one is called when the feature is generated
169 without reference to a RelSegment object, and should therefore use its
170 default coordinate system (relative to itself).
172 The 11 arguments are positional:
174 $factory a Bio::DB::GFF adaptor object (or descendent)
175 $srcseq the source sequence
176 $start start of this feature
177 $stop stop of this feature
178 $method this feature's GFF method
179 $source this feature's GFF source
180 $score this feature's score
181 $fstrand this feature's strand (relative to the source
182 sequence, which has its own strandedness!)
183 $phase this feature's phase
184 $group this feature's group
185 $db_id this feature's internal database ID
187 =cut
189 # 'This is called when creating a feature from scratch. It does not have
190 # an inherited coordinate system.
192 sub new {
193 my $package = shift;
194 my ($factory,
195 $srcseq,
196 $start,$stop,
197 $method,$source,
198 $score,$fstrand,$phase,
199 $group,$db_id,$group_id,
200 $tstart,$tstop) = @_;
202 my $self = bless { },$package;
203 ($start,$stop) = ($stop,$start) if defined($fstrand) and $fstrand eq '-';
205 my $class = $group ? $group->class : 'Sequence';
207 @{$self}{qw(factory sourceseq start stop strand class)} =
208 ($factory,$srcseq,$start,$stop,$fstrand,$class);
210 # if the target start and stop are defined, then we use this information to create
211 # the reference sequence
212 # THIS SHOULD BE BUILT INTO RELSEGMENT
213 if (0 && $tstart ne '' && $tstop ne '') {
214 if ($tstart < $tstop) {
215 @{$self}{qw(ref refstart refstrand)} = ($group,$start - $tstart + 1,'+');
216 } else {
217 @{$self}{'start','stop'} = @{$self}{'stop','start'};
218 @{$self}{qw(ref refstart refstrand)} = ($group,$tstop + $stop - 1,'-');
221 } else {
222 @{$self}{qw(ref refstart refstrand)} = ($srcseq,1,'+');
225 @{$self}{qw(type fstrand score phase group db_id group_id absolute)} =
226 (Bio::DB::GFF::Typename->new($method,$source),$fstrand,$score,$phase,
227 $group,$db_id,$group_id,$factory->{absolute});
229 $self;
232 =head2 type
234 Title : type
235 Usage : $type = $f->type([$newtype])
236 Function: get or set the feature type
237 Returns : a Bio::DB::GFF::Typename object
238 Args : a new Typename object (optional)
239 Status : Public
241 This method gets or sets the type of the feature. The type is a
242 Bio::DB::GFF::Typename object, which encapsulates the feature method
243 and source.
245 The method() and source() methods described next provide shortcuts to
246 the individual fields of the type.
248 =cut
250 sub type {
251 my $self = shift;
252 my $d = $self->{type};
253 $self->{type} = shift if @_;
257 =head2 method
259 Title : method
260 Usage : $method = $f->method([$newmethod])
261 Function: get or set the feature method
262 Returns : a string
263 Args : a new method (optional)
264 Status : Public
266 This method gets or sets the feature method. It is a convenience
267 feature that delegates the task to the feature's type object.
269 =cut
271 sub method {
272 my $self = shift;
273 my $d = $self->{type}->method;
274 $self->{type}->method(shift) if @_;
278 =head2 source
280 Title : source
281 Usage : $source = $f->source([$newsource])
282 Function: get or set the feature source
283 Returns : a string
284 Args : a new source (optional)
285 Status : Public
287 This method gets or sets the feature source. It is a convenience
288 feature that delegates the task to the feature's type object.
290 =cut
292 sub source {
293 my $self = shift;
294 my $d = $self->{type}->source;
295 $self->{type}->source(shift) if @_;
299 =head2 score
301 Title : score
302 Usage : $score = $f->score([$newscore])
303 Function: get or set the feature score
304 Returns : a string
305 Args : a new score (optional)
306 Status : Public
308 This method gets or sets the feature score.
310 =cut
312 sub score {
313 my $self = shift;
314 my $d = $self->{score};
315 $self->{score} = shift if @_;
319 =head2 phase
321 Title : phase
322 Usage : $phase = $f->phase([$phase])
323 Function: get or set the feature phase
324 Returns : a string
325 Args : a new phase (optional)
326 Status : Public
328 This method gets or sets the feature phase.
330 =cut
332 sub phase {
333 my $self = shift;
334 my $d = $self->{phase};
335 $self->{phase} = shift if @_;
339 =head2 strand
341 Title : strand
342 Usage : $strand = $f->strand
343 Function: get the feature strand
344 Returns : +1, 0 -1
345 Args : none
346 Status : Public
348 Returns the strand of the feature. Unlike the other methods, the
349 strand cannot be changed once the object is created (due to coordinate
350 considerations).
352 =cut
354 sub strand {
355 my $self = shift;
356 return 0 unless $self->{fstrand};
357 if ($self->absolute) {
358 return Bio::DB::GFF::RelSegment::_to_strand($self->{fstrand});
360 return $self->SUPER::strand || Bio::DB::GFF::RelSegment::_to_strand($self->{fstrand});
363 =head2 group
365 Title : group
366 Usage : $group = $f->group([$new_group])
367 Function: get or set the feature group
368 Returns : a Bio::DB::GFF::Featname object
369 Args : a new group (optional)
370 Status : Public
372 This method gets or sets the feature group. The group is a
373 Bio::DB::GFF::Featname object, which has an ID and a class.
375 =cut
377 sub group {
378 my $self = shift;
379 my $d = $self->{group};
380 $self->{group} = shift if @_;
384 =head2 display_id
386 Title : display_id
387 Usage : $display_id = $f->display_id([$display_id])
388 Function: get or set the feature display id
389 Returns : a Bio::DB::GFF::Featname object
390 Args : a new display_id (optional)
391 Status : Public
393 This method is an alias for group(). It is provided for
394 Bio::SeqFeatureI compatibility.
396 =cut
398 =head2 info
400 Title : info
401 Usage : $info = $f->info([$new_info])
402 Function: get or set the feature group
403 Returns : a Bio::DB::GFF::Featname object
404 Args : a new group (optional)
405 Status : Public
407 This method is an alias for group(). It is provided for AcePerl
408 compatibility.
410 =cut
412 *info = \&group;
413 *display_id = \&group;
414 *display_name = \&group;
416 =head2 target
418 Title : target
419 Usage : $target = $f->target([$new_target])
420 Function: get or set the feature target
421 Returns : a Bio::DB::GFF::Homol object
422 Args : a new group (optional)
423 Status : Public
425 This method works like group(), but only returns the group if it
426 implements the start() method. This is typical for
427 similarity/assembly features, where the target encodes the start and
428 stop location of the alignment.
430 The returned object is of type Bio::DB::GFF::Homol, which is a
431 subclass of Bio::DB::GFF::Segment.
433 =cut
436 sub target {
437 my $self = shift;
438 my $group = $self->group or return;
439 return unless $group->can('start');
440 $group;
443 =head2 flatten_target
445 Title : flatten_target
446 Usage : $target = $f->flatten_target($f->target)
447 Function: flatten a target object
448 Returns : a string (GFF2), an array [GFF2.5] or an array ref [GFF3]
449 Args : a target object (required), GFF version (optional)
450 Status : Public
452 This method flattens a target object into text for
453 GFF dumping. If a second argument is provided, version-specific
454 vocabulary is used for the flattened target.
456 =cut
458 sub flatten_target {
459 my $self = shift;
460 my $t = shift || return;
461 my $v = shift;
463 return 0 unless $t->can('start');
464 my $class = $t->class;
465 my $name = $t->name;
466 my $start = $t->start;
467 my $stop = $t->stop;
469 $v ||=2;
470 if ( $v == 2.5 ) {
472 print STDERR qq(Target "$class:$name"), "tstart $start", "tstop $stop\n";
473 return (qq(Target "$class:$name"), "tstart $start", "tstop $stop");
475 elsif ( $v == 3 ) {
476 return [Target=>"$name $start $stop"];
478 else {
479 return qq(Target "$class:$name" $start $stop);
483 # override parent a smidgeon so that setting the ref for top-level feature
484 # sets ref for all subfeatures
485 sub refseq {
486 my $self = shift;
487 my $result = $self->SUPER::refseq(@_);
488 if (@_) {
489 my $newref = $self->SUPER::refseq;
490 for my $sub ($self->get_SeqFeatures) {
491 $sub->refseq(@_);
494 $result;
498 =head2 hit
500 Title : hit
501 Usage : $hit = $f->hit([$new_hit])
502 Function: get or set the feature hit
503 Returns : a Bio::DB::GFF::Featname object
504 Args : a new group (optional)
505 Status : Public
507 This is the same as target(), for compatibility with
508 Bio::SeqFeature::SimilarityPair.
510 =cut
512 *hit = \&target;
514 =head2 id
516 Title : id
517 Usage : $id = $f->id
518 Function: get the feature ID
519 Returns : a database identifier
520 Args : none
521 Status : Public
523 This method retrieves the database identifier for the feature. It
524 cannot be changed.
526 =cut
528 sub id { shift->{db_id} }
529 sub primary_id { shift->{db_id} }
531 =head2 group_id
533 Title : group_id
534 Usage : $id = $f->group_id
535 Function: get the feature ID
536 Returns : a database identifier
537 Args : none
538 Status : Public
540 This method retrieves the database group identifier for the feature.
541 It cannot be changed. Often the group identifier is more useful than
542 the feature identifier, since it is used to refer to a complex object
543 containing subparts.
545 =cut
547 sub group_id { shift->{group_id} }
549 =head2 clone
551 Title : clone
552 Usage : $feature = $f->clone
553 Function: make a copy of the feature
554 Returns : a new Bio::DB::GFF::Feature object
555 Args : none
556 Status : Public
558 This method returns a copy of the feature.
560 =cut
562 sub clone {
563 my $self = shift;
564 my $clone = $self->SUPER::clone;
566 if (ref(my $t = $clone->type)) {
567 my $type = $t->can('clone') ? $t->clone : bless {%$t},ref $t;
568 $clone->type($type);
571 if (ref(my $g = $clone->group)) {
572 my $group = $g->can('clone') ? $g->clone : bless {%$g},ref $g;
573 $clone->group($group);
576 if (my $merged = $self->{merged_segs}) {
577 $clone->{merged_segs} = { %$merged };
580 $clone;
583 =head2 compound
585 Title : compound
586 Usage : $flag = $f->compound([$newflag])
587 Function: get or set the compound flag
588 Returns : a boolean
589 Args : a new flag (optional)
590 Status : Public
592 This method gets or sets a flag indicated that the feature is not a
593 primary one from the database, but the result of aggregation.
595 =cut
597 sub compound {
598 my $self = shift;
599 my $d = $self->{compound};
600 $self->{compound} = shift if @_;
604 =head2 sub_SeqFeature
606 Title : sub_SeqFeature
607 Usage : @feat = $feature->sub_SeqFeature([$method])
608 Function: get subfeatures
609 Returns : a list of Bio::DB::GFF::Feature objects
610 Args : a feature method (optional)
611 Status : Public
613 This method returns a list of any subfeatures that belong to the main
614 feature. For those features that contain heterogeneous subfeatures,
615 you can retrieve a subset of the subfeatures by providing a method
616 name to filter on.
618 This method may also be called as segments() or get_SeqFeatures().
620 =cut
622 sub sub_SeqFeature {
623 my $self = shift;
624 my $type = shift;
625 my $subfeat = $self->{subfeatures} or return;
626 $self->sort_features;
627 my @a;
628 if ($type) {
629 my $features = $subfeat->{lc $type} or return;
630 @a = @{$features};
631 } else {
632 @a = map {@{$_}} values %{$subfeat};
634 return @a;
637 =head2 add_subfeature
639 Title : add_subfeature
640 Usage : $feature->add_subfeature($feature)
641 Function: add a subfeature to the feature
642 Returns : nothing
643 Args : a Bio::DB::GFF::Feature object
644 Status : Public
646 This method adds a new subfeature to the object. It is used
647 internally by aggregators, but is available for public use as well.
649 =cut
651 sub add_subfeature {
652 my $self = shift;
653 my $feature = shift;
654 my $type = $feature->method;
655 my $subfeat = $self->{subfeatures}{lc $type} ||= [];
656 push @{$subfeat},$feature;
659 =head2 attach_seq
661 Title : attach_seq
662 Usage : $sf->attach_seq($seq)
663 Function: Attaches a Bio::Seq object to this feature. This
664 Bio::Seq object is for the *entire* sequence: ie
665 from 1 to 10000
666 Example :
667 Returns : TRUE on success
668 Args : a Bio::PrimarySeqI compliant object
670 =cut
672 sub attach_seq { }
675 =head2 location
677 Title : location
678 Usage : my $location = $seqfeature->location()
679 Function: returns a location object suitable for identifying location
680 of feature on sequence or parent feature
681 Returns : Bio::LocationI object
682 Args : none
684 =cut
686 sub location {
687 my $self = shift;
688 require Bio::Location::Split unless Bio::Location::Split->can('new');
689 require Bio::Location::Simple unless Bio::Location::Simple->can('new');
691 my $location;
692 if (my @segments = $self->segments) {
693 $location = Bio::Location::Split->new(-seq_id => $self->seq_id);
694 foreach (@segments) {
695 $location->add_sub_Location($_->location);
697 } else {
698 $location = Bio::Location::Simple->new(-start => $self->start,
699 -end => $self->stop,
700 -strand => $self->strand,
701 -seq_id => $self->seq_id);
703 $location;
706 =head2 entire_seq
708 Title : entire_seq
709 Usage : $whole_seq = $sf->entire_seq()
710 Function: gives the entire sequence that this seqfeature is attached to
711 Example :
712 Returns : a Bio::PrimarySeqI compliant object, or undef if there is no
713 sequence attached
714 Args : none
717 =cut
719 sub entire_seq {
720 my $self = shift;
721 $self->factory->segment($self->sourceseq);
724 =head2 merged_segments
726 Title : merged_segments
727 Usage : @segs = $feature->merged_segments([$method])
728 Function: get merged subfeatures
729 Returns : a list of Bio::DB::GFF::Feature objects
730 Args : a feature method (optional)
731 Status : Public
733 This method acts like sub_SeqFeature, except that it merges
734 overlapping segments of the same time into contiguous features. For
735 those features that contain heterogeneous subfeatures, you can
736 retrieve a subset of the subfeatures by providing a method name to
737 filter on.
739 A side-effect of this method is that the features are returned in
740 sorted order by their start tposition.
742 =cut
746 sub merged_segments {
747 my $self = shift;
748 my $type = shift;
749 $type ||= ''; # prevent uninitialized variable warnings
751 my $truename = overload::StrVal($self);
753 return @{$self->{merged_segs}{$type}} if exists $self->{merged_segs}{$type};
754 my @segs = map { $_->[0] }
755 sort { $a->[1] <=> $b->[1] ||
756 $a->[2] cmp $b->[2] }
757 map { [$_, $_->start, $_->type] } $self->sub_SeqFeature($type);
759 # attempt to merge overlapping segments
760 my @merged = ();
761 for my $s (@segs) {
762 my $previous = $merged[-1] if @merged;
763 my ($pscore,$score) = (eval{$previous->score}||0,eval{$s->score}||0);
764 if (defined($previous)
765 && $previous->stop+1 >= $s->start
766 && $pscore == $score
767 && $previous->method eq $s->method
769 if ($self->absolute && $self->strand < 0) {
770 $previous->{start} = $s->{start};
771 } else {
772 $previous->{stop} = $s->{stop};
774 # fix up the target too
775 my $g = $previous->{group};
776 if ( ref($g) && $g->isa('Bio::DB::GFF::Homol')) {
777 my $cg = $s->{group};
778 $g->{stop} = $cg->{stop};
781 elsif (defined($previous)
782 && $previous->start == $s->start
783 && $previous->stop == $s->stop
784 && $previous->method eq $s->method
786 next;
789 else {
790 my $copy = $s->clone;
791 push @merged,$copy;
794 $self->{merged_segs}{$type} = \@merged;
795 @merged;
798 =head2 sub_types
800 Title : sub_types
801 Usage : @methods = $feature->sub_types
802 Function: get methods of all sub-seqfeatures
803 Returns : a list of method names
804 Args : none
805 Status : Public
807 For those features that contain subfeatures, this method will return a
808 unique list of method names of those subfeatures, suitable for use
809 with sub_SeqFeature().
811 =cut
813 sub sub_types {
814 my $self = shift;
815 my $subfeat = $self->{subfeatures} or return;
816 return keys %$subfeat;
819 =head2 attributes
821 Title : attributes
822 Usage : @attributes = $feature->attributes($name)
823 Function: get the "attributes" on a particular feature
824 Returns : an array of string
825 Args : feature ID
826 Status : public
828 Some GFF version 2 files use the groups column to store a series of
829 attribute/value pairs. In this interpretation of GFF, the first such
830 pair is treated as the primary group for the feature; subsequent pairs
831 are treated as attributes. Two attributes have special meaning:
832 "Note" is for backward compatibility and is used for unstructured text
833 remarks. "Alias" is considered as a synonym for the feature name.
835 @gene_names = $feature->attributes('Gene');
836 @aliases = $feature->attributes('Alias');
838 If no name is provided, then attributes() returns a flattened hash, of
839 attribute=E<gt>value pairs. This lets you do:
841 %attributes = $db->attributes;
843 =cut
845 sub attributes {
846 my $self = shift;
847 my $factory = $self->factory;
848 defined(my $id = $self->id) or return;
849 $factory->attributes($id,@_)
853 =head2 notes
855 Title : notes
856 Usage : @notes = $feature->notes
857 Function: get the "notes" on a particular feature
858 Returns : an array of string
859 Args : feature ID
860 Status : public
862 Some GFF version 2 files use the groups column to store various notes
863 and remarks. Adaptors can elect to store the notes in the database,
864 or just ignore them. For those adaptors that store the notes, the
865 notes() method will return them as a list.
867 =cut
869 sub notes {
870 my $self = shift;
871 $self->attributes('Note');
874 =head2 aliases
876 Title : aliases
877 Usage : @aliases = $feature->aliases
878 Function: get the "aliases" on a particular feature
879 Returns : an array of string
880 Args : feature ID
881 Status : public
883 This method will return a list of attributes of type 'Alias'.
885 =cut
887 sub aliases {
888 my $self = shift;
889 $self->attributes('Alias');
894 =head2 Autogenerated Methods
896 Title : AUTOLOAD
897 Usage : @subfeat = $feature->Method
898 Function: Return subfeatures using autogenerated methods
899 Returns : a list of Bio::DB::GFF::Feature objects
900 Args : none
901 Status : Public
903 Any method that begins with an initial capital letter will be passed
904 to AUTOLOAD and treated as a call to sub_SeqFeature with the method
905 name used as the method argument. For instance, this call:
907 @exons = $feature->Exon;
909 is equivalent to this call:
911 @exons = $feature->sub_SeqFeature('exon');
913 =cut
915 =head2 SeqFeatureI methods
917 The following Bio::SeqFeatureI methods are implemented:
919 primary_tag(), source_tag(), all_tags(), has_tag(), each_tag_value() [renamed get_tag_values()].
921 =cut
923 *primary_tag = \&method;
924 *source_tag = \&source;
925 sub all_tags {
926 my $self = shift;
927 my %atts = $self->attributes;
928 my @tags = keys %atts;
930 # autogenerated methods
931 #if (my $subfeat = $self->{subfeatures}) {
932 # push @tags,keys %$subfeat;
935 @tags;
937 *get_all_tags = \&all_tags;
939 sub has_tag {
940 my $self = shift;
941 my $tag = shift;
942 my %att = $self->attributes;
943 my %tags = map {$_=>1} ( $self->all_tags );
945 return $tags{$tag};
948 *each_tag_value = \&get_tag_values;
950 sub get_tag_values {
951 my $self = shift;
952 my $tag = shift;
953 return $self->$tag() if $CONSTANT_TAGS{$tag};
955 my $atts = $self->attributes;
956 return @{$atts->{$tag}} if $atts && $atts->{$tag};
958 $tag = ucfirst $tag;
959 return $self->$tag(); # try autogenerated tag
962 sub AUTOLOAD {
963 my($pack,$func_name) = $AUTOLOAD=~/(.+)::([^:]+)$/;
964 my $sub = $AUTOLOAD;
965 my $self = $_[0];
967 # ignore DESTROY calls
968 return if $func_name eq 'DESTROY';
970 # fetch subfeatures if func_name has an initial cap
971 # return sort {$a->start <=> $b->start} $self->sub_SeqFeature($func_name) if $func_name =~ /^[A-Z]/;
972 return $self->sub_SeqFeature($func_name) if $func_name =~ /^[A-Z]/;
974 # error message of last resort
975 $self->throw(qq(Can't locate object method "$func_name" via package "$pack"));
978 =head2 adjust_bounds
980 Title : adjust_bounds
981 Usage : $feature->adjust_bounds
982 Function: adjust the bounds of a feature
983 Returns : ($start,$stop,$strand)
984 Args : none
985 Status : Public
987 This method adjusts the boundaries of the feature to enclose all its
988 subfeatures. It returns the new start, stop and strand of the
989 enclosing feature.
991 =cut
993 # adjust a feature so that its boundaries are synched with its subparts' boundaries.
994 # this works recursively, so subfeatures can contain other features
995 sub adjust_bounds {
996 my $self = shift;
997 my $shrink = shift;
998 my $g = $self->{group};
1000 my $first = 0;
1001 my $tfirst = 0;
1002 if (my $subfeat = $self->{subfeatures}) {
1003 for my $list (values %$subfeat) {
1004 for my $feat (@$list) {
1005 # fix up our bounds to hold largest subfeature
1006 my($start,$stop,$strand) = $feat->adjust_bounds($shrink);
1008 if (defined($self->{fstrand})) {
1009 $self->debug("Subfeature's strand ($strand) doesn't match parent strand ($self->{fstrand})\n") if $self->{fstrand} ne $strand;
1010 } else {
1011 $self->{fstrand} = $strand;
1014 my ($low,$high) = $start < $stop ? ($start,$stop) : ($stop,$start);
1015 if ($shrink && !$first++) {
1016 # first subfeature resets start & stop:
1017 $self->{start} = $self->{fstrand} ne '-' ? $low : $high;
1018 $self->{stop} = $self->{fstrand} ne '-' ? $high : $low;
1019 } else {
1020 if ($self->{fstrand} ne '-') {
1021 $self->{start} = $low
1022 if (!defined($self->{start})) || $low < $self->{start};
1023 $self->{stop} = $high
1024 if (!defined($self->{stop})) || $high > $self->{stop};
1025 } else {
1026 $self->{start} = $high
1027 if (!defined($self->{start})) || $high > $self->{start};
1028 $self->{stop} = $low
1029 if (!defined($self->{stop})) || $low < $self->{stop};
1033 # fix up endpoints of targets too (for homologies only)
1034 my $h = $feat->group;
1035 next unless $h && $h->isa('Bio::DB::GFF::Homol');
1036 next unless $g && $g->isa('Bio::DB::GFF::Homol');
1038 ($start,$stop) = ($h->{start},$h->{stop});
1039 if ($shrink && !$tfirst++) {
1040 $g->{start} = $start;
1041 $g->{stop} = $stop;
1042 } else {
1043 if ($start <= $stop) {
1044 $g->{start} = $start if (!defined($g->{start})) || $start < $g->{start};
1045 $g->{stop} = $stop if (!defined($g->{stop})) || $stop > $g->{stop};
1046 } else {
1047 $g->{start} = $start if (!defined($g->{start})) || $start > $g->{start};
1048 $g->{stop} = $stop if (!defined($g->{stop})) || $stop < $g->{stop};
1055 ($self->{start},$self->{stop},$self->strand);
1058 =head2 sort_features
1060 Title : sort_features
1061 Usage : $feature->sort_features
1062 Function: sort features
1063 Returns : nothing
1064 Args : none
1065 Status : Public
1067 This method sorts subfeatures in ascending order by their start
1068 position. For reverse strand features, it sorts subfeatures in
1069 descending order. After this is called sub_SeqFeature will return the
1070 features in order.
1072 This method is called internally by merged_segments().
1074 =cut
1076 # sort features
1077 sub sort_features {
1078 my $self = shift;
1079 return if $self->{sorted}++;
1080 my $strand = $self->strand or return;
1081 my $subfeat = $self->{subfeatures} or return;
1082 for my $type (keys %$subfeat) {
1083 $subfeat->{$type} = [map { $_->[0] }
1084 sort {$a->[1] <=> $b->[1] }
1085 map { [$_,$_->start] }
1086 @{$subfeat->{$type}}] if $strand > 0;
1087 $subfeat->{$type} = [map { $_->[0] }
1088 sort {$b->[1] <=> $a->[1]}
1089 map { [$_,$_->start] }
1090 @{$subfeat->{$type}}] if $strand < 0;
1094 =head2 asString
1096 Title : asString
1097 Usage : $string = $feature->asString
1098 Function: return human-readabled representation of feature
1099 Returns : a string
1100 Args : none
1101 Status : Public
1103 This method returns a human-readable representation of the feature and
1104 is called by the overloaded "" operator.
1106 =cut
1108 sub asString {
1109 my $self = shift;
1110 my $type = $self->type;
1111 my $name = $self->group;
1112 return "$type($name)" if $name;
1113 return $type;
1114 # my $type = $self->method;
1115 # my $id = $self->group || 'unidentified';
1116 # return join '/',$id,$type,$self->SUPER::asString;
1119 sub name {
1120 my $self =shift;
1121 return $self->group || $self->SUPER::name;
1124 =head2 gff_string
1126 Title : gff_string
1127 Usage : $string = $feature->gff_string
1128 Function: return GFF2 of GFF2.5 representation of feature
1129 Returns : a string
1130 Args : none
1131 Status : Public
1133 =cut
1135 sub gff_string {
1136 my $self = shift;
1137 my $version = $self->version;
1139 # gff3_string and gff_string are synonymous if the version is set to 3
1140 return $self->gff3_string(@_) if $version == 3;
1142 my ($start,$stop) = ($self->start,$self->stop);
1144 # the defined() tests prevent uninitialized variable warnings, when dealing with clone objects
1145 # whose endpoints may be undefined
1146 ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) && $start > $stop;
1148 my ($class,$name) = ('','');
1149 my $strand = ('-','.','+')[$self->strand+1];
1151 my @group;
1153 if (my $t = $self->target) {
1154 push @group, $version == 2.5 ? $self->flatten_target($t,2.5)
1155 : $self->flatten_target($t);
1157 elsif (my $g = $self->group) {
1158 $class = $g->class || '';
1159 $name = $g->name || '';
1160 ($name =~ /\S\s\S/)?(push @group, "$class '$name'"):(push @group,"$class $name");
1163 # add exhaustive list of attributes
1164 my $att = $self->attributes;
1165 for ( keys %$att ) {
1166 for my $v ( @{$att->{$_}} ) {
1167 $v = qq("$v") if $v=~ /\S\s+\S/;
1168 push @group, qq($_ $v);
1172 my $group_field = join ' ; ',@group;
1173 my $ref = $self->refseq;
1174 my $n = ref($ref) ? $ref->name : $ref;
1175 my $phase = $self->phase;
1176 $phase = '.' unless defined $phase;
1177 return join("\t",
1179 $self->source,$self->method,
1180 (defined $start ? $start : '.'),
1181 (defined $stop ? $stop : '.'),
1182 (defined $self->score ? $self->score : '.'),
1183 (defined $strand ? $strand : '.'),
1184 $phase,
1185 $group_field);
1188 =head2 gff3_string
1190 Title : gff3_string
1191 Usage : $string = $feature->gff3_string([$recurse])
1192 Function: return GFF3 representation of feature
1193 Returns : a string
1194 Args : An optional flag, which if true, will cause the feature to recurse over
1195 subfeatures.
1196 Status : Public
1198 =cut
1200 sub gff3_string {
1201 my $self = shift;
1202 my ($recurse,$parent) = @_;
1203 my ($start,$stop) = ($self->start,$self->stop);
1205 # the defined() tests prevent uninitialized variable warnings, when dealing with clone objects
1206 # whose endpoints may be undefined
1207 ($start,$stop) = ($stop,$start) if defined($start) && defined($stop) && $start > $stop;
1209 my $strand = ('-','.','+')[$self->strand+1];
1210 my $ref = $self->refseq;
1211 my $n = ref($ref) ? $ref->name : $ref;
1212 my $phase = $self->phase;
1213 $phase = '.' unless defined $phase;
1215 my ($class,$name) = ('','');
1216 my @group;
1217 if (my $g = $self->group) {
1218 $class = $g->class || '';
1219 $name = $g->name || '';
1220 $name = "$class:$name" if defined $class;
1221 push @group,[ID => $name] if !defined($parent) || $name ne $parent;
1224 push @group,[Parent => $parent] if defined $parent && $parent ne '';
1226 if (my $t = $self->target) {
1227 $strand = '-' if $t->stop < $t->start;
1228 push @group, $self->flatten_target($t,3);
1231 my @attributes = $self->attributes;
1232 while (@attributes) {
1233 push @group,[shift(@attributes),shift(@attributes)]
1235 my $group_field = join ';',map {join '=',_escape($_->[0]),_escape($_->[1])} @group;
1236 my $string = join("\t",$n,$self->source,$self->method,$start||'.',$stop||'.',
1237 $self->score||'.',$strand||'.',$phase,$group_field);
1238 $string .= "\n";
1239 if ($recurse) {
1240 foreach ($self->sub_SeqFeature) {
1241 $string .= $_->gff3_string(1,$name);
1244 $string;
1247 =head2 version
1249 Title : version
1250 Usage : $feature->version()
1251 Function: get/set the GFF version to be returned by gff_string
1252 Returns : the GFF version (default is 2)
1253 Args : the GFF version (2, 2.5 of 3)
1254 Status : Public
1256 =cut
1258 sub version {
1259 my ($self, $version) = @_;
1260 $self->{version} = $version if $version;
1261 return $self->{version} || 2;
1265 sub _escape {
1266 my $toencode = shift;
1267 $toencode =~ s/([^a-zA-Z0-9_. :?^*\(\)\[\]@!-])/uc sprintf("%%%02x",ord($1))/eg;
1268 $toencode =~ tr/ /+/;
1269 $toencode;
1272 =head2 cmap_link()
1274 Title : cmap_link
1275 Usage : $link = $feature->cmap_link
1276 Function: returns a URL link to the corresponding feature in cmap
1277 Returns : a string
1278 Args : none
1279 Status : Public
1281 If integrated cmap/gbrowse installation, it returns a link to the map otherwise
1282 it returns a link to a feature search on the feature name. See the cmap
1283 documentation for more information.
1285 This function is intended primarily to be used in gbrowse conf files.
1286 For example:
1288 link = sub {my $self = shift; return $self->cmap_viewer_link(data_source);}
1290 =cut
1293 sub cmap_viewer_link {
1294 # Use ONLY if CMap is installed
1295 my $self = shift;
1296 my $data_source = shift;
1297 my $group_id = $self->group_id;
1298 my $factory = $self->factory; # aka adaptor
1300 my $link_str;
1302 if ($factory->can("create_cmap_viewer_link")){
1303 $link_str = $factory->create_cmap_viewer_link(
1304 data_source => $data_source,
1305 group_id => $group_id,
1308 my $name = $self->name();
1309 $link_str = '/cgi-bin/cmap/feature_search?features='
1310 . $name
1311 . '&search_field=feature_name&order_by=&data_source='
1312 . $data_source
1313 . '&submit=Submit'
1314 unless $link_str;
1316 return $link_str;
1320 =head1 A Note About Similarities
1322 The current default aggregator for GFF "similarity" features creates a
1323 composite Bio::DB::GFF::Feature object of type "gapped_alignment".
1324 The target() method for the feature as a whole will return a
1325 RelSegment object that is as long as the extremes of the similarity
1326 hit target, but will not necessarily be the same length as the query
1327 sequence. The length of each "similarity" subfeature will be exactly
1328 the same length as its target(). These subfeatures are essentially
1329 the HSPs of the match.
1331 The following illustrates this:
1333 @similarities = $segment->feature('similarity:BLASTN');
1334 $sim = $similarities[0];
1336 print $sim->type; # yields "gapped_similarity:BLASTN"
1338 $query_length = $sim->length;
1339 $target_length = $sim->target->length; # $query_length != $target_length
1341 @matches = $sim->Similarity; # use autogenerated method
1342 $query1_length = $matches[0]->length;
1343 $target1_length = $matches[0]->target->length; # $query1_length == $target1_length
1345 If you merge segments by calling merged_segments(), then the length of
1346 the query sequence segments will no longer necessarily equal the
1347 length of the targets, because the alignment information will have
1348 been lost. Nevertheless, the targets are adjusted so that the first
1349 and last base pairs of the query match the first and last base pairs
1350 of the target.
1352 =cut
1356 =head1 BUGS
1358 This module is still under development.
1360 =head1 SEE ALSO
1362 L<bioperl>, L<Bio::DB::GFF>, L<Bio::DB::RelSegment>
1364 =head1 AUTHOR
1366 Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
1368 Copyright (c) 2001 Cold Spring Harbor Laboratory.
1370 This library is free software; you can redistribute it and/or modify
1371 it under the same terms as Perl itself.
1373 =cut