1 package Bio
::DB
::SeqFeature
::Store
::GFF3Loader
;
6 Bio::DB::SeqFeature::Store::GFF3Loader -- GFF3 file loader for Bio::DB::SeqFeature::Store
10 use Bio::DB::SeqFeature::Store;
11 use Bio::DB::SeqFeature::Store::GFF3Loader;
13 # Open the sequence database
14 my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql',
15 -dsn => 'dbi:mysql:test',
18 my $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(-store => $db,
22 $loader->load('./my_genome.gff3');
27 The Bio::DB::SeqFeature::Store::GFF3Loader object parsers GFF3-format
28 sequence annotation files and loads Bio::DB::SeqFeature::Store
29 databases. For certain combinations of SeqFeature classes and
30 SeqFeature::Store databases it features a "fast load" mode which will
31 greatly accelerate the loading of GFF3 databases by a factor of 5-10.
33 The GFF3 file format has been extended very slightly to accommodate
34 Bio::DB::SeqFeature::Store. First, the loader recognizes is a new
37 # #index-subfeatures [0|1]
39 Note that you can place a space between the two #'s in order to
40 prevent GFF3 validators from complaining.
42 If this is true, then subfeatures are indexed (the default) so that
43 they can be retrieved with a query. See L<Bio::DB::SeqFeature::Store>
44 for an explanation of this. If false, then subfeatures can only be
45 accessed through their parent feature.
47 Second, the loader recognizes a new attribute tag called index, which
48 if present, controls indexing of the current feature. Example:
50 ctg123 . TF_binding_site 1000 1012 . + . ID=tfbs00001;index=1
52 You can use this to turn indexing on and off, overriding the default
53 for a particular feature.
55 Note that the loader keeps a record -- in memory -- of each feature
56 that it has processed. If you find the loader running out of memory on
57 particularly large GFF3 files, please split the input file into
58 smaller pieces and do the load in steps.
63 # load utility - incrementally load the store based on GFF3 file
66 # slow mode -- features can occur in any order in the GFF3 file
67 # fast mode -- all features with same ID must be contiguous in GFF3 file
71 use Bio
::DB
::GFF
::Util
::Rearrange
;
72 use Bio
::DB
::SeqFeature
::Store
::LoadHelper
;
73 use constant DEBUG
=> 0;
75 use base
'Bio::DB::SeqFeature::Store::Loader';
78 my %Special_attributes =(
79 Gap
=> 1, Target
=> 1,
80 Parent
=> 1, Name
=> 1,
82 index => 1, Index
=> 1,
84 my %Strandedness = ( '+' => 1,
98 Usage : $loader = Bio::DB::SeqFeature::Store::GFF3Loader->new(@options)
99 Function: create a new parser
100 Returns : a Bio::DB::SeqFeature::Store::GFF3Loader gff3 parser and loader
101 Args : several - see below
104 This method creates a new GFF3 loader and establishes its connection
105 with a Bio::DB::SeqFeature::Store database. Arguments are -name=E<gt>$value
106 pairs as described in this table:
111 -store A writable Bio::DB::SeqFeature::Store database handle.
113 -seqfeature_class The name of the type of Bio::SeqFeatureI object to create
114 and store in the database (Bio::DB::SeqFeature by default)
116 -sf_class A shorter alias for -seqfeature_class
118 -verbose Send progress information to standard error.
120 -fast If true, activate fast loading (see below)
122 -chunk_size Set the storage chunk size for nucleotide/protein sequences
125 -tmp Indicate a temporary directory to use when loading non-normalized
128 -ignore_seqregion Ignore ##sequence-region directives. The default is to create a
129 feature corresponding to the directive.
131 -noalias_target Don't create an Alias attribute for a target_id named in a
132 Target attribute. The default is to create an Alias
133 attribute containing the target_id found in a Target
136 When you call new(), a connection to a Bio::DB::SeqFeature::Store
137 database should already have been established and the database
138 initialized (if appropriate).
140 Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store
141 databases support a fast loading mode. Currently the only reliable
142 implementation of fast loading is the combination of DBI::mysql with
143 Bio::DB::SeqFeature. The other important restriction on fast loading
144 is the requirement that a feature that contains subfeatures must occur
145 in the GFF3 file before any of its subfeatures. Otherwise the
146 subfeatures that occurred before the parent feature will not be
147 attached to the parent correctly. This restriction does not apply to
148 normal (slow) loading.
150 If you use an unnormalized feature class, such as
151 Bio::SeqFeature::Generic, then the loader needs to create a temporary
152 database in which to cache features until all their parts and subparts
153 have been seen. This temporary databases uses the "berkeleydb"
154 adaptor. The -tmp option specifies the directory in which that
155 database will be created. If not present, it defaults to the system
156 default tmp directory specified by File::Spec-E<gt>tmpdir().
158 The -chunk_size option allows you to tune the representation of
159 DNA/Protein sequence in the Store database. By default, sequences are
160 split into 2000 base/residue chunks and then reassembled as
161 needed. This avoids the problem of pulling a whole chromosome into
162 memory in order to fetch a short subsequence from somewhere in the
163 middle. Depending on your usage patterns, you may wish to tune this
164 parameter using a chunk size that is larger or smaller than the
171 my $self = $class->SUPER::new
(@_);
172 my ($ignore_seqregion) = rearrange
(['IGNORE_SEQREGION'],@_);
173 $self->ignore_seqregion($ignore_seqregion);
174 my ($noalias_target) = rearrange
(['NOALIAS_TARGET'],@_);
175 $self->noalias_target($noalias_target);
179 =head2 ignore_seqregion
181 $ignore_it = $loader->ignore_seqregion([$new_flag])
183 Get or set the ignore_seqregion flag, which if true, will cause
184 GFF3 ##sequence-region directives to be ignored. The default behavior
185 is to create a feature corresponding to the region.
189 sub ignore_seqregion
{
191 my $d = $self->{ignore_seqregion
};
192 $self->{ignore_seqregion
} = shift if @_;
196 =head2 noalias_target
198 $noalias_target = $loader->noalias_target([$new_flag])
200 Get or set the noalias_target flag, which if true, will disable the creation of
201 an Alias attribute for a target_id named in a Target attribute. The default is
202 to create an Alias attribute containing the target_id found in a Target
209 my $d = $self->{noalias_target
};
210 $self->{noalias_target
} = shift if @_;
217 Usage : $count = $loader->load(@ARGV)
218 Function: load the indicated files or filehandles
219 Returns : number of feature lines loaded
220 Args : list of files or filehandles
223 Once the loader is created, invoke its load() method with a list of
224 GFF3 or FASTA file paths or previously-opened filehandles in order to
225 load them into the database. Compressed files ending with .gz, .Z and
226 .bz2 are automatically recognized and uncompressed on the fly. Paths
227 beginning with http: or ftp: are treated as URLs and opened using the
228 LWP GET program (which must be on your path).
230 FASTA files are recognized by their initial "E<gt>" character. Do not feed
231 the loader a file that is neither GFF3 nor FASTA; I don't know what
232 will happen, but it will probably not be what you expect.
236 # sub load { } inherited
240 The following read-only accessors return values passed or created during new():
242 store() the long-term Bio::DB::SeqFeature::Store object
244 tmp_store() the temporary Bio::DB::SeqFeature::Store object used
247 sfclass() the Bio::SeqFeatureI class
249 fast() whether fast loading is active
251 seq_chunk_size() the sequence chunk size
253 verbose() verbose progress messages
257 # sub store inherited
258 # sub tmp_store inherited
259 # sub sfclass inherited
261 # sub seq_chunk_size inherited
262 # sub verbose inherited
264 =head2 Internal Methods
266 The following methods are used internally and may be overridden by
271 =item default_seqfeature_class
273 $class = $loader->default_seqfeature_class
275 Return the default SeqFeatureI class (Bio::DB::SeqFeature).
279 # sub default_seqfeature_class { } inherited
281 =item subfeatures_normalized
283 $flag = $loader->subfeatures_normalized([$new_flag])
285 Get or set a flag that indicates that the subfeatures are
286 normalized. This is deduced from the SeqFeature class information.
290 # sub subfeatures_normalized { } inherited
292 =item subfeatures_in_table
294 $flag = $loader->subfeatures_in_table([$new_flag])
296 Get or set a flag that indicates that feature/subfeature relationships
297 are stored in a table. This is deduced from the SeqFeature class and
302 # sub subfeatures_in_table { } inherited
306 $count = $loader->load_fh($filehandle)
308 Load the GFF3 data at the other end of the filehandle and return true
309 if successful. Internally, load_fh() invokes:
312 do_load($filehandle);
317 # sub load_fh { } inherited
319 =item start_load, finish_load
321 These methods are called at the start and end of a filehandle load.
325 sub create_load_data
{ #overridden
327 $self->SUPER::create_load_data
;
328 $self->{load_data
}{TemporaryID
} = "GFFLoad0000000";
329 $self->{load_data
}{IndexSubfeatures
} = $self->index_subfeatures();
330 $self->{load_data
}{mode
} = 'gff';
332 $self->{load_data
}{Helper
} =
333 Bio
::DB
::SeqFeature
::Store
::LoadHelper
->new($self->{tmpdir
});
336 sub finish_load
{ #overridden
339 $self->store_current_feature(); # during fast loading, we will have a feature left at the very end
340 $self->start_or_finish_sequence(); # finish any half-loaded sequences
342 $self->msg("Building object tree...");
343 my $start = $self->time();
344 $self->build_object_tree;
345 $self->msg(sprintf "%5.2fs\n",$self->time()-$start);
348 $self->msg("Loading bulk data into database...");
349 $start = $self->time();
350 $self->store->finish_bulk_update;
351 $self->msg(sprintf "%5.2fs\n",$self->time()-$start);
353 eval {$self->store->commit};
355 # don't delete load data so that caller can ask for the loaded IDs
356 # $self->delete_load_data;
361 $count = $loader->do_load($fh)
363 This is called by load_fh() to load the GFF3 file's filehandle and
364 return the number of lines loaded.
368 # sub do_load { } inherited
372 $loader->load_line($data);
374 Load a line of a GFF3 file. You must bracket this with calls to
375 start_load() and finish_load()!
377 $loader->start_load();
378 $loader->load_line($_) while <FH>;
379 $loader->finish_load();
383 sub load_line
{ #overridden
388 my $load_data = $self->{load_data
};
389 $load_data->{line
}++;
391 return unless $line =~ /^\S/; # blank line
393 # if it has a tab in it or looks like a chrom.sizes file, switch to gff mode
394 $load_data->{mode
} = 'gff' if $line =~ /\t/
395 or $line =~ /^\w+\s+\d+\s*$/;
397 if ($line =~ /^\#\s?\#\s*(.+)/) { ## meta instruction
398 $load_data->{mode
} = 'gff';
399 $self->handle_meta($1);
401 } elsif ($line =~ /^\#/) {
402 $load_data->{mode
} = 'gff'; # just to be safe
406 elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming
407 $load_data->{mode
} = 'fasta';
408 $self->start_or_finish_sequence($1);
411 elsif ($load_data->{mode
} eq 'fasta') {
412 $self->load_sequence($line);
415 elsif ($load_data->{mode
} eq 'gff') {
416 $self->handle_feature($line);
417 if (++$load_data->{count
} % 1000 == 0) {
418 my $now = $self->time();
419 my $nl = -t STDOUT
&& !$ENV{EMACS
} ?
"\r" : "\n";
420 local $^W
= 0; # kill uninit variable warning
421 $self->msg(sprintf("%d features loaded in %5.2fs (%5.2fs/1000 features)...%s$nl",
422 $load_data->{count
},$now - $load_data->{start_time
},
423 $now - $load_data->{millenium_time
},
426 $load_data->{millenium_time
} = $now;
431 $self->throw("I don't know what to do with this line:\n$line");
438 $loader->handle_meta($meta_directive)
440 This method is called to handle meta-directives such as
441 ##sequence-region. The method will receive the directive with the
442 initial ## stripped off.
448 my $instruction = shift;
450 if ( $instruction =~ /^#$/ ) {
451 $self->store_current_feature() ; # during fast loading, we will have a feature left at the very end
452 $self->start_or_finish_sequence(); # finish any half-loaded sequences
453 if ( $self->store->can('handle_resolution_meta') ) {
454 $self->store->handle_resolution_meta($instruction);
459 if ($instruction =~ /sequence-region\s+(.+)\s+(-?\d+)\s+(-?\d+)/i
460 && !$self->ignore_seqregion()) {
461 my($ref,$start,$end,$strand) = $self->_remap($1,$2,$3,+1);
462 my $feature = $self->sfclass->new(-name
=> $ref,
467 -primary_tag
=> 'region');
468 $self->store->store($feature);
472 if ($instruction =~/index-subfeatures\s+(\S+)/i) {
473 $self->{load_data
}{IndexSubfeatures
} = $1;
474 $self->store->index_subfeatures($1);
478 if ( $self->store->can('handle_unrecognized_meta') ) {
479 $self->store->handle_unrecognized_meta($instruction);
486 $loader->handle_feature($gff3_line)
488 This method is called to process a single GFF3 line. It manipulates
489 information stored a data structure called $self-E<gt>{load_data}.
493 sub handle_feature
{ #overridden
495 my $gff_line = shift;
496 my $ld = $self->{load_data
};
498 my $allow_whitespace = $self->allow_whitespace;
500 # special case for a chrom.sizes-style line
502 if ($gff_line =~ /^(\w+)\s+(\d+)\s*$/) {
503 @columns = ($1,undef,'chromosome',1,$2,undef,undef,undef,"Name=$1");
505 $gff_line =~ s/\s+/\t/g if $allow_whitespace;
506 @columns = map {$_ eq '.' ?
undef : $_ } split /\t/,$gff_line;
509 $self->invalid_gff($gff_line) if @columns < 4;
510 $self->invalid_gff($gff_line) if @columns > 9 && $allow_whitespace;
514 if (@columns > 9) { #oops, split too much due to whitespace
515 $columns[8] = join(' ',@columns[8..$#columns]);
519 my ($refname,$source,$method,$start,$end,$score,$strand,$phase,$attributes) = @columns;
521 $self->invalid_gff($gff_line) unless defined $refname;
522 $self->invalid_gff($gff_line) unless !defined $start || $start =~ /^[\d.-]+$/;
523 $self->invalid_gff($gff_line) unless !defined $end || $end =~ /^[\d.-]+$/;
524 $self->invalid_gff($gff_line) unless defined $method;
526 $strand = $Strandedness{$strand||0};
527 my ($reserved,$unreserved) = $attributes ?
$self->parse_attributes($attributes) : ();
529 my $name = ($reserved->{Name
} && $reserved->{Name
}[0]);
531 my $has_loadid = defined $reserved->{ID
}[0];
533 my $feature_id = defined $reserved->{ID
}[0] ?
$reserved->{ID
}[0] : $ld->{TemporaryID
}++;
534 my @parent_ids = @
{$reserved->{Parent
}} if defined $reserved->{Parent
};
536 my $index_it = $ld->{IndexSubfeatures
};
537 if (exists $reserved->{Index
} || exists $reserved->{index}) {
538 $index_it = $reserved->{Index
}[0] || $reserved->{index}[0];
541 # Everything in the unreserved hash becomes an attribute, so we copy
542 # some attributes over
543 $unreserved->{Note
} = $reserved->{Note
} if exists $reserved->{Note
};
544 $unreserved->{Alias
} = $reserved->{Alias
} if exists $reserved->{Alias
};
545 $unreserved->{Target
} = $reserved->{Target
} if exists $reserved->{Target
};
546 $unreserved->{Gap
} = $reserved->{Gap
} if exists $reserved->{Gap
};
547 $unreserved->{load_id
}= $reserved->{ID
} if exists $reserved->{ID
};
549 # mec@stowers-institute.org, wondering why not all attributes are
550 # carried forward, adds ID tag in particular service of
551 # round-tripping ID, which, though present in database as load_id
552 # attribute, was getting lost as itself
553 # $unreserved->{ID}= $reserved->{ID} if exists $reserved->{ID};
555 # TEMPORARY HACKS TO SIMPLIFY DEBUGGING
556 $feature_id = '' unless defined $feature_id;
557 $name = '' unless defined $name; # prevent uninit variable warnings
558 # push @{$unreserved->{Alias}},$feature_id if $has_loadid && $feature_id ne $name;
559 # If DEBUG != 0, any Parent attribute is also copied over (as 'parent_id')
560 $unreserved->{parent_id
} = \
@parent_ids if DEBUG
&& @parent_ids;
562 # POSSIBLY A PERMANENT HACK -- TARGETS BECOME ALIASES
563 # THIS IS TO ALLOW FOR TARGET-BASED LOOKUPS
564 if (exists $reserved->{Target
} && !$self->{noalias_target
}) {
565 my %aliases = map {$_=>1} @
{$unreserved->{Alias
}};
566 for my $t (@
{$reserved->{Target
}}) {
567 (my $tc = $t) =~ s/\s+.*$//; # get rid of coordinates
569 push @
{$unreserved->{Alias
}},$tc unless $name eq $tc || $aliases{$tc};
573 ($refname,$start,$end,$strand) = $self->_remap($refname,$start,$end,$strand) or return;
575 my @args = (-display_name
=> $name,
579 -strand
=> $strand || 0,
582 -primary_tag
=> $method || 'feature',
585 -attributes
=> $unreserved,
588 # Here's where we handle feature lines that have the same ID (multiple locations, not
589 # parent/child relationships)
593 # Current feature is the same as the previous feature, which hasn't yet been loaded
594 if (defined $ld->{CurrentID
} && $ld->{CurrentID
} eq $feature_id) {
595 $old_feat = $ld->{CurrentFeature
};
598 # Current feature is the same as a feature that was loaded earlier
599 elsif (defined(my $id = $self->{load_data
}{Helper
}->local2global($feature_id))) {
600 $old_feat = $self->fetch($feature_id)
601 or $self->warn(<<END);
602 ID=$feature_id has been used more than once, but it cannot be found in the database.
603 This can happen if you have specified fast loading, but features sharing the same ID
604 are not contiguous in the GFF file. This will be loaded as a separate feature.
609 # contiguous feature, so add a segment
610 warn $old_feat if defined $old_feat and !ref $old_feat;
611 if (defined $old_feat) {
612 # set this to 1 to disable split-location behavior
613 if (0 && @parent_ids) { # If multiple features are held together by the same ID
614 $feature_id = $ld->{TemporaryID
}++; # AND they have a Parent attribute, this causes an undesirable
615 } # additional layer of aggregation. Changing the ID fixes this.
617 $old_feat->seq_id ne $refname ||
618 $old_feat->start != $start ||
619 $old_feat->end != $end # make sure endpoints are distinct
622 $self->add_segment($old_feat,$self->sfclass->new(@args));
627 # we get here if this is a new feature
628 # first of all, store the current feature if it is there
629 $self->store_current_feature() if defined $ld->{CurrentID
};
631 # now create the new feature
632 # (index top-level features only if policy asks us to)
633 my $feature = $self->sfclass->new(@args);
634 $feature->object_store($self->store) if $feature->can('object_store'); # for lazy table features
635 $ld->{CurrentFeature
} = $feature;
636 $ld->{CurrentID
} = $feature_id;
638 my $top_level = !@parent_ids;
639 my $has_id = defined $reserved->{ID
}[0];
640 $index_it ||= $top_level;
642 my $helper = $ld->{Helper
};
643 $helper->indexit($feature_id=>1) if $index_it;
644 $helper->toplevel($feature_id=>1) if !$self->{fast
}
645 && $top_level; # need to track top level features
649 for my $parent (@parent_ids) {
650 $helper->add_children($parent=>$feature_id);
658 $self->throw("invalid GFF line at line $self->{load_data}{line}.\n".$line);
661 =item allow_whitespace
663 $allow_it = $loader->allow_whitespace([$newvalue]);
665 Get or set the allow_whitespace flag. If true, then GFF3 files are
666 allowed to be delimited with whitespace in addition to tabs.
670 sub allow_whitespace
{
672 my $d = $self->{allow_whitespace
};
673 $self->{allow_whitespace
} = shift if @_;
677 =item store_current_feature
679 $loader->store_current_feature()
681 This method is called to store the currently active feature in the
682 database. It uses a data structure stored in $self-E<gt>{load_data}.
686 # sub store_current_feature { } inherited
688 =item build_object_tree
690 $loader->build_object_tree()
692 This method gathers together features and subfeatures and builds the graph that connects them.
697 # put objects together
699 sub build_object_tree
{
701 $self->subfeatures_in_table ?
$self->build_object_tree_in_tables : $self->build_object_tree_in_features;
704 =item build_object_tree_in_tables
706 $loader->build_object_tree_in_tables()
708 This method gathers together features and subfeatures and builds the
709 graph that connects them, assuming that parent/child relationships
710 will be stored in a database table.
714 sub build_object_tree_in_tables
{
716 my $store = $self->store;
717 my $helper = $self->{load_data
}{Helper
};
719 while (my ($load_id,$children) = $helper->each_family()) {
721 my $parent_id = $helper->local2global($load_id);
722 die $self->throw("$load_id doesn't have a primary id")
723 unless defined $parent_id;
725 my @children = map {$helper->local2global($_)} @
$children;
726 # this updates the table that keeps track of parent/child relationships,
727 # but does not update the parent object -- so (start,end) had better be right!!!
728 $store->add_SeqFeature($parent_id,@children);
734 =item build_object_tree_in_features
736 $loader->build_object_tree_in_features()
738 This method gathers together features and subfeatures and builds the
739 graph that connects them, assuming that parent/child relationships are
740 stored in the seqfeature objects themselves.
744 sub build_object_tree_in_features
{
746 my $store = $self->store;
747 my $tmp = $self->tmp_store;
748 my $ld = $self->{load_data
};
749 my $normalized = $self->subfeatures_normalized;
751 my $helper = $ld->{Helper
};
753 while (my $load_id = $helper->each_toplevel) {
754 my $feature = $self->fetch($load_id)
755 or $self->throw("$load_id (id="
756 .$helper->local2global($load_id)
757 ." should have a database entry, but doesn't");
758 $self->attach_children($store,$ld,$load_id,$feature);
759 # Indexed objects are updated, not created anew
760 $feature->primary_id(undef) unless $helper->indexit($load_id);
761 $store->store($feature);
766 =item attach_children
768 $loader->attach_children($store,$load_data,$load_id,$feature)
770 This recursively adds children to features and their subfeatures. It
771 is called when subfeatures are directly contained within other
772 features, rather than stored in a relational table.
776 sub attach_children
{
778 my ($store,$ld,$load_id,$feature) = @_;
780 my $children = $ld->{Helper
}->children() or return;
781 for my $child_id (@
$children) {
782 my $child = $self->fetch($child_id)
783 or $self->throw("$child_id should have a database entry, but doesn't");
784 $self->attach_children($store,$ld,$child_id,$child); # recursive call
785 $feature->add_SeqFeature($child);
791 my $feature = $loader->fetch($load_id)
793 Given a load ID (from the ID= attribute) this method returns the
794 feature from the temporary database or the permanent one, depending on
802 my $helper = $self->{load_data
}{Helper
};
803 my $id = $helper->local2global($load_id);
806 ($self->subfeatures_normalized || $helper->indexit($load_id)
807 ?
$self->store->fetch($id)
808 : $self->tmp_store->fetch($id)
814 $loader->add_segment($parent,$child)
816 This method is used to add a split location to the parent.
822 my ($parent,$child) = @_;
824 if ($parent->can('add_segment')) { # probably a lazy table feature
825 my $segment_count = $parent->can('denormalized_segment_count') ?
$parent->denormalized_segment_count
826 : $parent->can('denormalized_segments ') ?
$parent->denormalized_segments
827 : $parent->can('segments') ?
$parent->segments
829 unless ($segment_count) { # convert into a segmented object
831 if ($parent->can('clone')) {
832 $segment = $parent->clone;
834 my %clone = %$parent;
835 $segment = bless \
%clone,ref $parent;
837 delete $segment->{segments
};
838 eval {$segment->object_store(undef) };
839 $segment->primary_id(undef);
841 # this updates the object and expands its start and end positions without writing
842 # the segments into the database as individual objects
843 $parent->add_segment($segment);
845 $parent->add_segment($child);
849 # a conventional Bio::SeqFeature::Generic object - create a split location
851 my $current_location = $parent->location;
852 if ($current_location->can('add_sub_Location')) {
853 $current_location->add_sub_Location($child->location);
855 eval "require Bio::Location::Split" unless Bio
::Location
::Split
->can('add_sub_Location');
856 my $new_location = Bio
::Location
::Split
->new();
857 $new_location->add_sub_Location($current_location);
858 $new_location->add_sub_Location($child->location);
859 $parent->location($new_location);
864 =item parse_attributes
866 ($reserved,$unreserved) = $loader->parse_attributes($attribute_line)
868 This method parses the information contained in the $attribute_line
869 into two hashrefs, one containing the values of reserved attribute
870 tags (e.g. ID) and the other containing the values of unreserved ones.
874 sub parse_attributes
{
878 unless ($att =~ /=/) { # ouch! must be a GFF line
879 require Bio
::DB
::SeqFeature
::Store
::GFF2Loader
880 unless Bio
::DB
::SeqFeature
::Store
::GFF2Loader
->can('parse_attributes');
881 return $self->Bio::DB
::SeqFeature
::Store
::GFF2Loader
::parse_attributes
($att);
884 my @pairs = map { my ($name,$value) = split '=';
885 [$self->unescape($name) => $value];
887 my (%reserved,%unreserved);
891 unless (defined $_->[1]) {
892 warn "$tag does not have a value at GFF3 file line $.\n";
896 my @values = split ',',$_->[1];
897 map {$_ = $self->unescape($_);} @values;
898 if ($Special_attributes{$tag}) { # reserved attribute
899 push @
{$reserved{$tag}},@values;
901 push @
{$unreserved{$tag}},@values
904 return (\
%reserved,\
%unreserved);
907 =item start_or_finish_sequence
909 $loader->start_or_finish_sequence('Chr9')
911 This method is called at the beginning and end of a fasta section.
915 # sub start_or_finish_sequence { } inherited
919 $loader->load_sequence('gatttcccaaa')
921 This method is called to load some amount of sequence after
922 start_or_finish_sequence() is first called.
926 # sub load_sequence { } inherited
930 my $io_file = $loader->open_fh($filehandle_or_path)
932 This method opens up the indicated file or pipe, using some
933 intelligence to recognized compressed files and URLs and doing the
938 # sub open_fh { } inherited
940 # sub msg { } inherited
944 my $time = $loader->time
946 This method returns the current time in seconds, using Time::HiRes if available.
950 # sub time { } inherited
954 my $unescaped = GFF3Loader::unescape($escaped)
956 This is an internal utility. It is the same as CGI::Util::unescape,
957 but doesn't change pluses into spaces and ignores unicode escapes.
961 # sub unescape { } inherited
965 my ($ref,$start,$end,$strand) = @_;
966 my $mapper = $self->coordinate_mapper;
967 return ($ref,$start,$end,$strand) unless $mapper;
969 my ($newref,$coords) = $mapper->($ref,[$start,$end]);
970 return unless defined $coords->[0];
971 if ($coords->[0] > $coords->[1]) {
972 @
{$coords} = reverse(@
{$coords});
975 return ($newref,@
{$coords},$strand);
978 sub _indexit
{ # override
980 return $self->{load_data
}{Helper
}->indexit(@_);
983 sub _local2global
{ # override
985 return $self->{load_data
}{Helper
}->local2global(@_);
990 my $ids = $self->local_ids;
993 After performing a load, this returns an array ref containing all the
994 load file IDs that were contained within the file just loaded.
998 sub local_ids
{ # override
1000 return $self->{load_data
}{Helper
}->local_ids(@_);
1005 my $ids = $loader->loaded_ids;
1008 After performing a load, this returns an array ref containing all the
1009 feature primary ids that were created during the load.
1013 sub loaded_ids
{ # override
1015 return $self->{load_data
}{Helper
}->loaded_ids(@_);
1026 This is an early version, so there are certainly some bugs. Please
1027 use the BioPerl bug tracking system to report bugs.
1031 L<Bio::DB::SeqFeature::Store>,
1032 L<Bio::DB::SeqFeature::Segment>,
1033 L<Bio::DB::SeqFeature::NormalizedFeature>,
1034 L<Bio::DB::SeqFeature::GFF2Loader>,
1035 L<Bio::DB::SeqFeature::Store::DBI::mysql>,
1036 L<Bio::DB::SeqFeature::Store::berkeleydb>
1040 Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
1042 Copyright (c) 2006 Cold Spring Harbor Laboratory.
1044 This library is free software; you can redistribute it and/or modify
1045 it under the same terms as Perl itself.