t/AlignIO/AlignIO.t: fix number of tests in plan (fixup c523e6bed866)
[bioperl-live.git] / Bio / DB / SeqFeature / Store / FeatureFileLoader.pm
blob4cdaa4e50961d48913e9e4fa5a851fa69c7a9127
1 package Bio::DB::SeqFeature::Store::FeatureFileLoader;
4 =head1 NAME
6 Bio::DB::SeqFeature::Store::FeatureFileLoader -- feature file loader for Bio::DB::SeqFeature::Store
8 =head1 SYNOPSIS
10 use Bio::DB::SeqFeature::Store;
11 use Bio::DB::SeqFeature::Store::FeatureFileLoader;
13 # Open the sequence database
14 my $db = Bio::DB::SeqFeature::Store->new( -adaptor => 'DBI::mysql',
15 -dsn => 'dbi:mysql:test',
16 -write => 1 );
18 my $loader =
19 Bio::DB::SeqFeature::Store::FeatureFileLoader->new(-store => $db,
20 -verbose => 1,
21 -fast => 1);
23 $loader->load('./my_genome.fff');
26 =head1 DESCRIPTION
28 The Bio::DB::SeqFeature::Store::FeatureFileLoader object parsers
29 FeatureFile-format sequence annotation files and loads
30 Bio::DB::SeqFeature::Store databases. For certain combinations of
31 SeqFeature classes and SeqFeature::Store databases it features a "fast
32 load" mode which will greatly accelerate the loading of databases by a
33 factor of 5-10.
35 FeatureFile Format (.fff) is very simple:
37 mRNA B0511.1 Chr1:1..100 Type=UTR;Note="putative primase"
38 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
39 mRNA B0511.1 Chr1:801..1000 Type=UTR
41 reference = Chr3
42 Cosmid B0511 516..619
43 Cosmid B0511 3185..3294
44 Cosmid B0511 10946..11208
45 Cosmid B0511 13126..13511
46 Cosmid B0511 11394..11539
47 EST yk260e10.5 15569..15724
48 EST yk672a12.5 537..618,3187..3294
49 EST yk595e6.5 552..618
50 EST yk595e6.5 3187..3294
51 EST yk846e07.3 11015..11208
52 EST yk53c10
53 yk53c10.3 15000..15500,15700..15800
54 yk53c10.5 18892..19154
55 EST yk53c10.5 16032..16105
56 SwissProt PECANEX 13153-13656 Note="Swedish fish"
57 FGENESH "Predicted gene 1" 1-205,518-616,661-735,3187-3365,3436-3846 "Pfam domain"
58 # file ends
60 There are up to four columns of WHITESPACE (not necessarily tab)
61 delimited text. Embedded whitespace must be escaped using shell
62 escaping rules (quoting the column or backslashing whitespace).
64 Column 1: The feature type. You may use type:subtype as a convention
65 for method:source.
67 Column 2: The feature name/ID.
69 Column 3: The position of this feature in base pair
70 coordinates. Ranges can be given as either
71 start-end or start..end. A chromosome position
72 can be specified using the format "reference:start..end".
73 A discontinuous feature can be specified by giving
74 multiple ranges separated by commas. Minus-strand features
75 are indicated by specifying a start > end.
77 Column 4: Comment/attribute field. A single Note can be given, or
78 a series of attribute=value pairs, separated by
79 spaces or semicolons, as in "score=23;type=transmembrane"
81 =head2 Specifying Positions and Ranges
83 A feature position is specified using a sequence ID (a genbank
84 accession number, a chromosome name, a contig, or any other meaningful
85 reference system, followed by a colon and a position range. Ranges are
86 two integers separated by double dots or the hyphen. Examples:
87 "Chr1:516..11208", "ctgA:1-5000". Negative coordinates are allowed, as
88 in "Chr1:-187..1000".
90 A discontinuous range ("split location") uses commas to separate the
91 ranges. For example:
93 Gene B0511.1 Chr1:516..619,3185..3294,10946..11208
95 In the case of a split location, the sequence id only has to appear in
96 front of the first range.
98 Alternatively, a split location can be indicated by repeating the
99 features type and name on multiple adjacent lines:
101 Gene B0511.1 Chr1:516..619
102 Gene B0511.1 Chr1:3185..3294
103 Gene B0511.1 Chr1:10946..11208
105 If all the locations are on the same reference sequence, you can
106 specify a default chromosome using a "reference=E<lt>seqidE<gt>":
108 reference=Chr1
109 Gene B0511.1 516..619
110 Gene B0511.1 3185..3294
111 Gene B0511.1 10946..11208
113 The default seqid is in effect until the next "reference" line
114 appears.
116 =head2 Feature Tags
118 Tags can be added to features by adding a fourth column consisting of
119 "tag=value" pairs:
121 Gene B0511.1 Chr1:516..619,3185..3294 Note="Putative primase"
123 Tags and their values take any form you want, and multiple tags can be
124 separated by semicolons. You can also repeat tags multiple times:
126 Gene B0511.1 Chr1:516..619,3185..3294 GO_Term=GO:100;GO_Term=GO:2087
128 Several tags have special meanings:
130 Tag Meaning
131 --- -------
133 Type The primary tag for a subfeature.
134 Score The score of a feature or subfeature.
135 Phase The phase of a feature or subfeature.
136 URL A URL to link to (via the Bio::Graphics library).
137 Note A note to attach to the feature for display by the Bio::Graphics library.
139 For example, in the common case of an mRNA, you can use the "Type" tag
140 to distinguish the parts of the mRNA into UTR and CDS:
142 mRNA B0511.1 Chr1:1..100 Type=UTR
143 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
144 mRNA B0511.1 Chr1:801..1000 Type=UTR
146 The top level feature's primary tag will be "mRNA", and its subparts
147 will have types UTR and CDS as indicated. Additional tags that are
148 placed in the first line of the feature will be applied to the top
149 level. In this example, the note "Putative primase" will be applied to
150 the mRNA at the top level of the feature:
152 mRNA B0511.1 Chr1:1..100 Type=UTR;Note="Putative primase"
153 mRNA B0511.1 Chr1:101..200,300..400,500..800 Type=CDS
154 mRNA B0511.1 Chr1:801..1000 Type=UTR
156 =head2 Feature Groups
158 Features can be grouped so that they are rendered by the "group"
159 glyph. To start a group, create a two-column feature entry showing
160 the group type and a name for the group. Follow this with a list of
161 feature entries with a blank type. For example:
163 EST yk53c10
164 yk53c10.3 15000-15500,15700-15800
165 yk53c10.5 18892-19154
167 This example is declaring that the ESTs named yk53c10.3 and yk53c10.5
168 belong to the same group named yk53c10.
170 =head2 Comments and the #include Directive
172 Lines that begin with the # sign are treated as comments and
173 ignored. When a # sign appears within a line, everything to the right
174 of the symbol is also ignored, unless it looks like an HTML fragment or
175 an HTML color, e.g.:
177 # this is ignored
178 [Example]
179 glyph = generic # this comment is ignored
180 bgcolor = #FF0000
181 link = http://www.google.com/search?q=$name#results
183 Be careful, because the processing of # signs uses a regexp heuristic. To be safe,
184 always put a space after the # sign to make sure it is treated as a comment.
186 The special comment "#include 'filename'" acts like the C preprocessor
187 directive and will insert the comments of a named file into the
188 position at which it occurs. Relative paths will be treated relative
189 to the file in which the #include occurs. Nested #include directives
190 are allowed:
192 #include "/usr/local/share/my_directives.txt"
193 #include 'my_directives.txt'
194 #include chromosome3_features.gff3
196 You can enclose the file path in single or double quotes as shown
197 above. If there are no spaces in the filename the quotes are optional.
199 Include file processing is not very smart. Avoid creating circular
200 #include references. You have been warned!
202 =head2 Caveats
204 Note that this loader always creates denormalized features such that
205 subfeatures and their parents are stored as one big database
206 object. The GFF3 format and its loader is usually preferred for both
207 space and execution efficiency.
209 =head1 METHODS
211 =cut
214 use strict;
215 use Carp 'croak';
216 use File::Spec;
217 use Text::ParseWords 'shellwords','quotewords';
219 use base 'Bio::DB::SeqFeature::Store::Loader';
221 =head2 new
223 Title : new
224 Usage : $loader = Bio::DB::SeqFeature::Store::FeatureFileLoader->new(@options)
225 Function: create a new parser
226 Returns : a Bio::DB::SeqFeature::Store::FeatureFileLoader parser and loader
227 Args : several - see below
228 Status : public
230 This method creates a new FeatureFile loader and establishes its connection
231 with a Bio::DB::SeqFeature::Store database. Arguments are -name=E<gt>$value
232 pairs as described in this table:
234 Name Value
235 ---- -----
237 -store A writable Bio::DB::SeqFeature::Store database handle.
239 -seqfeature_class The name of the type of Bio::SeqFeatureI object to create
240 and store in the database (Bio::DB::SeqFeature by default)
242 -sf_class A shorter alias for -seqfeature_class
244 -verbose Send progress information to standard error.
246 -fast If true, activate fast loading (see below)
248 -chunk_size Set the storage chunk size for nucleotide/protein sequences
249 (default 2000 bytes)
251 -tmp Indicate a temporary directory to use when loading non-normalized
252 features.
254 When you call new(), a connection to a Bio::DB::SeqFeature::Store
255 database should already have been established and the database
256 initialized (if appropriate).
258 Some combinations of Bio::SeqFeatures and Bio::DB::SeqFeature::Store
259 databases support a fast loading mode. Currently the only reliable
260 implementation of fast loading is the combination of DBI::mysql with
261 Bio::DB::SeqFeature. The other important restriction on fast loading
262 is the requirement that a feature that contains subfeatures must occur
263 in the FeatureFile file before any of its subfeatures. Otherwise the
264 subfeatures that occurred before the parent feature will not be
265 attached to the parent correctly. This restriction does not apply to
266 normal (slow) loading.
268 If you use an unnormalized feature class, such as
269 Bio::SeqFeature::Generic, then the loader needs to create a temporary
270 database in which to cache features until all their parts and subparts
271 have been seen. This temporary databases uses the "bdb" adaptor. The
272 -tmp option specifies the directory in which that database will be
273 created. If not present, it defaults to the system default tmp
274 directory specified by File::Spec-E<gt>tmpdir().
276 The -chunk_size option allows you to tune the representation of
277 DNA/Protein sequence in the Store database. By default, sequences are
278 split into 2000 base/residue chunks and then reassembled as
279 needed. This avoids the problem of pulling a whole chromosome into
280 memory in order to fetch a short subsequence from somewhere in the
281 middle. Depending on your usage patterns, you may wish to tune this
282 parameter using a chunk size that is larger or smaller than the
283 default.
285 =cut
287 # sub new {} inherited
289 =head2 load
291 Title : load
292 Usage : $count = $loader->load(@ARGV)
293 Function: load the indicated files or filehandles
294 Returns : number of feature lines loaded
295 Args : list of files or filehandles
296 Status : public
298 Once the loader is created, invoke its load() method with a list of
299 FeatureFile or FASTA file paths or previously-opened filehandles in order to
300 load them into the database. Compressed files ending with .gz, .Z and
301 .bz2 are automatically recognized and uncompressed on the fly. Paths
302 beginning with http: or ftp: are treated as URLs and opened using the
303 LWP GET program (which must be on your path).
305 FASTA files are recognized by their initial "E<gt>" character. Do not feed
306 the loader a file that is neither FeatureFile nor FASTA; I don't know what
307 will happen, but it will probably not be what you expect.
309 =cut
311 # sub load {} inherited
313 =head2 accessors
315 The following read-only accessors return values passed or created during new():
317 store() the long-term Bio::DB::SeqFeature::Store object
319 tmp_store() the temporary Bio::DB::SeqFeature::Store object used
320 during loading
322 sfclass() the Bio::SeqFeatureI class
324 fast() whether fast loading is active
326 seq_chunk_size() the sequence chunk size
328 verbose() verbose progress messages
330 =cut
332 # sub store {} inherited
333 # sub tmp_store {} inherited
334 # sub sfclass {} inherited
335 # sub fast {} inherited
336 # sub seq_chunk_size {} inherited
337 # sub verbose {} inherited
339 =head2 default_seqfeature_class
341 $class = $loader->default_seqfeature_class
343 Return the default SeqFeatureI class (Bio::Graphics::Feature).
345 =cut
347 sub default_seqfeature_class { #override
348 my $self = shift;
349 return 'Bio::Graphics::Feature';
353 =head2 load_fh
355 $count = $loader->load_fh($filehandle)
357 Load the FeatureFile data at the other end of the filehandle and return true
358 if successful. Internally, load_fh() invokes:
360 start_load();
361 do_load($filehandle);
362 finish_load();
364 =cut
366 # sub load_fh { } inherited
368 =head2 start_load, finish_load
370 These methods are called at the start and end of a filehandle load.
372 =cut
374 sub create_load_data {
375 my $self = shift;
376 $self->SUPER::create_load_data();
377 $self->{load_data}{mode} = 'fff';
378 $self->{load_data}{CurrentGroup} = undef;
381 sub finish_load {
382 my $self = shift;
383 $self->_store_group;
384 $self->SUPER::finish_load;
387 =head2 load_line
389 $loader->load_line($data);
391 Load a line of a FeatureFile file. You must bracket this with calls to
392 start_load() and finish_load()!
394 $loader->start_load();
395 $loader->load_line($_) while <FH>;
396 $loader->finish_load();
398 =cut
400 sub load_line {
401 my $self = shift;
402 my $line = shift;
404 chomp($line);
405 return unless $line =~ /\S/; # blank line
406 my $load_data = $self->{load_data};
408 $load_data->{mode} = 'fff' if /\s/; # if it has any whitespace in
409 # it, then back to fff mode
411 if ($line =~ /^\#\s?\#\s*([\#]+)/) { ## meta instruction
412 $load_data->{mode} = 'fff';
413 $self->handle_meta($1);
415 } elsif ($line =~ /^\#/) {
416 $load_data->{mode} = 'fff'; # just to be safe
417 return; # comment
420 elsif ($line =~ /^>\s*(\S+)/) { # FASTA lines are coming
421 $load_data->{mode} = 'fasta';
422 $self->start_or_finish_sequence($1);
425 elsif ($load_data->{mode} eq 'fasta') {
426 $self->load_sequence($line);
429 elsif ($load_data->{mode} eq 'fff') {
430 $self->handle_feature($line);
431 if (++$load_data->{count} % 1000 == 0) {
432 my $now = $self->time();
433 my $nl = -t STDOUT && !$ENV{EMACS} ? "\r" : "\n";
434 $self->msg(sprintf("%d features loaded in %5.2fs...$nl",
435 $load_data->{count},$now - $load_data->{start_time}));
436 $load_data->{start_time} = $now;
440 else {
441 $self->throw("I don't know what to do with this line:\n$line");
446 =head2 handle_meta
448 $loader->handle_meta($meta_directive)
450 This method is called to handle meta-directives such as
451 ##sequence-region. The method will receive the directive with the
452 initial ## stripped off.
454 =cut
456 # sub handle_meta { } inherited
458 =head2 handle_feature
460 $loader->handle_feature($gff3_line)
462 This method is called to process a single FeatureFile line. It manipulates
463 information stored a data structure called $self-E<gt>{load_data}.
465 =cut
467 sub handle_feature {
468 my $self = shift;
469 local $_ = shift;
471 my $ld = $self->{load_data};
473 # handle reference line
474 if (/^reference\s*=\s*(.+)/) {
475 $ld->{reference} = $1;
476 return;
479 # parse data lines
480 my @tokens = quotewords('\s+',1,$_);
481 for (0..2) { # remove quotes from everything but last column
482 next unless defined $tokens[$_];
483 $tokens[$_] =~ s/^"//;
484 $tokens[$_] =~ s/"$//;
487 if (@tokens < 3) { # short line; assume a group identifier
488 $self->store_current_feature();
489 my $type = shift @tokens;
490 my $name = shift @tokens;
491 $ld->{CurrentGroup} = $self->_make_indexed_feature($name,$type,'',{_ff_group=>1});
492 $self->_indexit($name => 1);
493 return;
496 my($type,$name,$strand,$bounds,$attributes);
498 if ($tokens[2] =~ /^([+-.]|[+-]?[01])$/) { # old version
499 ($type,$name,$strand,$bounds,$attributes) = @tokens;
500 } else { # new version
501 ($type,$name,$bounds,$attributes) = @tokens;
504 # handle case of there only being one value in the last column,
505 # in which case we treat it the same as Note="value"
506 my $attr = $self->parse_attributes($attributes);
508 # @parts is an array of ([ref,start,end],[ref,start,end],...)
509 my @parts = map { [/(?:(\w+):)?(-?\d+)(?:-|\.\.)(-?\d+)/]} split /(?:,| )\s*/,$bounds;
511 # deal with groups -- a group is ending if $type is defined
512 # and CurrentGroup is set
513 if ($type && $ld->{CurrentGroup}) {
514 $self->_store_group();
517 $type = '' unless defined $type;
518 $name = '' unless defined $name;
519 $type ||= $ld->{CurrentGroup}->primary_tag if $ld->{CurrentGroup};
521 my $reference = $ld->{reference} || 'ChrUN';
522 foreach (@parts) {
523 if (defined $_ && ref($_) eq 'ARRAY'
524 && defined $_->[1]
525 && defined $_->[2])
527 $strand ||= $_->[1] <= $_->[2] ? '+' : '-';
528 ($_->[1],$_->[2]) = ($_->[2],$_->[1]) if $_->[1] > $_->[2];
530 $reference = $_->[0] if defined $_->[0];
531 $_ = [@{$_}[1,2]]; # strip off the reference.
534 # now @parts is an array of [start,end] and $reference contains the seqid
536 # apply coordinate mapper
537 if ($self->{coordinate_mapper} && $reference) {
538 my @remapped = $self->{coordinate_mapper}->($reference,@parts);
539 ($reference,@parts) = @remapped if @remapped;
542 # either create a new feature or add a segment to it
543 my $feature = $ld->{CurrentFeature};
545 $ld->{OldPartType} = $ld->{PartType};
546 if (exists $attr->{Type} || exists $attr->{type}) {
547 $ld->{PartType} = $attr->{Type}[0] || $attr->{type}[0];
548 } else {
549 $ld->{PartType} = $type;
552 if ($feature) {
553 local $^W = 0; # avoid uninit warning when display_name() is called
555 # if this is a different feature from what we have now, then we
556 # store the current one, and create a new one
557 if ($feature->display_name ne $name ||
558 $feature->method ne $type) {
559 $self->store_current_feature; # new feature, store old one
560 undef $feature;
561 } else { # create a new multipart feature
562 $self->_multilevel_feature($feature,$ld->{OldPartType})
563 unless $feature->get_SeqFeatures;
564 my $part = $self->_make_feature($name,
565 $ld->{PartType},
566 $strand,
567 $attr,
568 $reference,
569 @{$parts[0]});
570 $feature->add_SeqFeature($part);
574 $feature ||= $self->_make_indexed_feature($name,
575 $type, # side effect is to set CurrentFeature
576 $strand,
577 $attr,
578 $reference,
579 @{$parts[0]});
581 # add more segments to the current feature
582 if (@parts > 1) {
583 for my $part (@parts) {
584 $type ||= $feature->primary_tag;
585 my $sp = $self->_make_feature($name,
586 $ld->{PartType},
587 $strand,
588 $attr,
589 $reference,
590 @{$part});
591 $feature->add_SeqFeature($sp);
596 sub _multilevel_feature { # turn a single-level feature into a multilevel one
597 my $self = shift;
598 my $f = shift;
599 my $type = shift;
600 my %attributes = $f->attributes;
601 $attributes{Score} = [$f->score] if defined $f->score;
602 $attributes{Phase} = [$f->phase] if defined $f->phase;
603 my @args = ($f->display_name,
604 $type||$f->type,
605 $f->strand,
606 \%attributes,
607 $f->seq_id,
608 $f->start,
609 $f->end);
610 my $subpart = $self->_make_feature(@args);
611 $f->add_SeqFeature($subpart);
614 sub _make_indexed_feature {
615 my $self = shift;
616 my $f = $self->_make_feature(@_);
617 my $name = $f->display_name;
618 $self->{load_data}{CurrentFeature} = $f;
619 $self->{load_data}{CurrentID} = $name;
620 $self->_indexit($name => 1);
621 return $f;
624 sub _make_feature {
625 my $self = shift;
626 my ($name,$type,$strand,$attributes,$ref,$start,$end) = @_;
628 # some basic error checking
629 $self->throw("syntax error at line $.: '$_'")
630 if ($ref && !defined $start)
631 or ($ref && !defined $end)
632 or ($start && $start !~ /^[-\d]+$/)
633 or ($end && $end !~ /^[-\d]+$/)
634 or !defined $type
635 or !defined $name;
637 $strand ||= '';
639 my @args = (-name => $name,
640 -strand => $strand eq '+' ? 1
641 :$strand eq '-' ? -1
642 :$strand eq '' ? 0
643 :$strand eq '.' ? 0
644 :$strand == 1 ? 1
645 :$strand == -1 ? -1
647 -attributes => $attributes,
650 if (my ($method,$source) = $type =~ /(\S+):(\S+)/) {
651 push @args,(-primary_tag => $method,
652 -source => $source);
653 } else {
654 push @args,(-primary_tag => $type);
657 push @args,(-seq_id => $ref) if defined $ref;
658 push @args,(-start => $start) if defined $start;
659 push @args,(-end => $end) if defined $end;
661 # pull out special attributes
662 if (my $score = $attributes->{Score} || $attributes->{score}) {
663 push @args,(-score => $score->[0]);
664 delete $attributes->{$_} foreach qw(Score score);
667 if (my $note = $attributes->{Note} || $attributes->{note}) {
668 push @args,(-desc => join '; ',@$note);
669 delete $attributes->{$_} foreach qw(Note note);
672 if (my $url = $attributes->{url} || $attributes->{Url}) {
673 push @args,(-url => $url->[0]);
674 delete $attributes->{$_} foreach qw (Url url);
677 if (my $phase = $attributes->{phase} || $attributes->{Phase}) {
678 push @args,(-phase => $phase->[0]);
679 delete $attributes->{$_} foreach qw (Phase phase);
682 $self->_indexit($name=>1)
683 if $self->index_subfeatures && $name;
685 return $self->sfclass->new(@args);
688 =head2 store_current_feature
690 $loader->store_current_feature()
692 This method is called to store the currently active feature in the
693 database. It uses a data structure stored in $self-E<gt>{load_data}.
695 =cut
697 sub store_current_feature { # overridden
698 my $self = shift;
700 # handle open groups
701 # if there is an open group, then we simply add the current
702 # feature to the group.
703 my $ld = $self->{load_data};
704 if ($ld->{CurrentGroup} && $ld->{CurrentFeature}) {
705 $ld->{CurrentGroup}->add_SeqFeature($ld->{CurrentFeature})
706 unless $ld->{CurrentGroup} eq $ld->{CurrentFeature}; # paranoia - shouldn't happen
707 return;
709 else {
710 $self->SUPER::store_current_feature();
714 sub _store_group {
715 my $self = shift;
716 my $ld = $self->{load_data};
717 my $group = $ld->{CurrentGroup} or return;
718 # if there is an unattached feature, then add it
719 $self->store_current_feature() if $ld->{CurrentFeature};
720 $ld->{CurrentFeature} = $group;
721 $ld->{CurrentID} = $group->display_name;
722 $self->_indexit($ld->{CurrentID} => 1);
723 undef $ld->{CurrentGroup};
724 $self->store_current_feature();
727 =head2 build_object_tree
729 $loader->build_object_tree()
731 This method gathers together features and subfeatures and builds the
732 graph that connects them.
734 =cut
737 # put objects together
739 sub build_object_tree {
740 croak "We shouldn't be building an object tree in the FeatureFileLoader";
743 =head2 build_object_tree_in_tables
745 $loader->build_object_tree_in_tables()
747 This method gathers together features and subfeatures and builds the
748 graph that connects them, assuming that parent/child relationships
749 will be stored in a database table.
751 =cut
753 sub build_object_tree_in_tables {
754 croak "We shouldn't be building an object tree in the FeatureFileLoader";
757 =head2 build_object_tree_in_features
759 $loader->build_object_tree_in_features()
761 This method gathers together features and subfeatures and builds the
762 graph that connects them, assuming that parent/child relationships are
763 stored in the seqfeature objects themselves.
765 =cut
767 sub build_object_tree_in_features {
768 croak "We shouldn't be building an object tree in the FeatureFileLoader";
771 =head2 attach_children
773 $loader->attach_children($store,$load_data,$load_id,$feature)
775 This recursively adds children to features and their subfeatures. It
776 is called when subfeatures are directly contained within other
777 features, rather than stored in a relational table.
779 =cut
781 sub attach_children {
782 croak "We shouldn't be attaching children in the
783 FeatureFileLoader!";
786 =head2 parse_attributes
788 @attributes = $loader->parse_attributes($attribute_line)
790 This method parses the information contained in the $attribute_line
791 into a flattened hash (array). It may return one element, in which case it is
792 an implicit
794 =cut
796 sub parse_attributes {
797 my $self = shift;
798 my $att = shift;
800 $att ||= ''; # to prevent uninit variable warnings from quotewords()
802 my @pairs = quotewords('[;\s]',1,$att);
803 my %attributes;
804 for my $pair (@pairs) {
805 unless ($pair =~ /=/) {
806 push @{$attributes{Note}},(quotewords('',0,$pair))[0] || $pair;
807 } else {
808 my ($tag,$value) = quotewords('\s*=\s*',0,$pair);
809 $tag = 'Note' if $tag eq 'description';
810 push @{$attributes{$tag}},$value;
813 return \%attributes;
816 =head2 start_or_finish_sequence
818 $loader->start_or_finish_sequence('Chr9')
820 This method is called at the beginning and end of a fasta section.
822 =cut
827 __END__
830 =head1 BUGS
832 This is an early version, so there are certainly some bugs. Please
833 use the BioPerl bug tracking system to report bugs.
835 =head1 SEE ALSO
837 L<Bio::DB::SeqFeature::Store>,
838 L<Bio::DB::SeqFeature::Segment>,
839 L<Bio::DB::SeqFeature::NormalizedFeature>,
840 L<Bio::DB::SeqFeature::GFF3Loader>,
841 L<Bio::DB::SeqFeature::Store::DBI::mysql>,
842 L<Bio::DB::SeqFeature::Store::bdb>
844 =head1 AUTHOR
846 Lincoln Stein E<lt>lstein@cshl.orgE<gt>.
848 Copyright (c) 2006 Cold Spring Harbor Laboratory.
850 This library is free software; you can redistribute it and/or modify
851 it under the same terms as Perl itself.
853 =cut