lib/Bio/SeqIO/genbank.pm

   1 #
   2 # BioPerl module for Bio::SeqIO::genbank
   3 #
   4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
   5 #
   6 # Cared for by Bioperl project bioperl-l(at)bioperl.org
   7 #
   8 # Copyright Elia Stupka and contributors see AUTHORS section
   9 #
  10 # You may distribute this module under the same terms as perl itself
  11
  12 # POD documentation - main docs before the code
  13
  14 =head1 NAME
  15
  16 Bio::SeqIO::genbank - GenBank sequence input/output stream
  17
  18 =head1 SYNOPSIS
  19
  20 It is probably best not to use this object directly, but
  21 rather go through the SeqIO handler:
  22
  23     $stream = Bio::SeqIO->new(-file   => $filename,
  24                               -format => 'GenBank');
  25
  26     while ( my $seq = $stream->next_seq ) {
  27         # do something with $seq
  28     }
  29
  30
  31 =head1 DESCRIPTION
  32
  33 This object can transform Bio::Seq objects to and from GenBank flat
  34 file databases.
  35
  36 There is some flexibility here about how to write GenBank output
  37 that is not fully documented.
  38
  39 =head2 Optional functions
  40
  41 =over 3
  42
  43 =item _show_dna()
  44
  45 (output only) shows the dna or not
  46
  47 =item _post_sort()
  48
  49 (output only) provides a sorting func which is applied to the FTHelpers
  50 before printing
  51
  52 =item _id_generation_func()
  53
  54 This is function which is called as
  55
  56    print "ID   ", $func($seq), "\n";
  57
  58 To generate the ID line. If it is not there, it generates a sensible ID
  59 line using a number of tools.
  60
  61 If you want to output annotations in Genbank format they need to be
  62 stored in a Bio::Annotation::Collection object which is accessible
  63 through the Bio::SeqI interface method L<annotation()|annotation>.
  64
  65 The following are the names of the keys which are pulled from a
  66 L<Bio::Annotation::Collection> object:
  67
  68  reference       - Should contain Bio::Annotation::Reference objects
  69  comment         - Should contain Bio::Annotation::Comment objects
  70  dblink          - Should contain a Bio::Annotation::DBLink object
  71  segment         - Should contain a Bio::Annotation::SimpleValue object
  72  origin          - Should contain a Bio::Annotation::SimpleValue object
  73  wgs             - Should contain a Bio::Annotation::SimpleValue object
  74
  75 =back
  76
  77 =head1 Where does the data go?
  78
  79 Data parsed in Bio::SeqIO::genbank is stored in a variety of data
  80 fields in the sequence object that is returned. Here is a partial list
  81 of fields.
  82
  83 Items listed as RichSeq or Seq or PrimarySeq and then NAME() tell you
  84 the top level object which defines a function called NAME() which
  85 stores this information.
  86
  87 Items listed as Annotation 'NAME' tell you the data is stored the
  88 associated Bio::AnnotationCollectionI object which is associated with
  89 Bio::Seq objects.  If it is explicitly requested that no annotations
  90 should be stored when parsing a record of course they will not be
  91 available when you try and get them.  If you are having this problem
  92 look at the type of SeqBuilder that is being used to construct your
  93 sequence object.
  94
  95  Comments             Annotation 'comment'
  96  References           Annotation 'reference'
  97  Segment              Annotation 'segment'
  98  Origin               Annotation 'origin'
  99  Dbsource             Annotation 'dblink'
 100
 101  Accessions           PrimarySeq accession_number()
 102  Secondary accessions RichSeq get_secondary_accessions()
 103  GI number            PrimarySeq primary_id()
 104  LOCUS                PrimarySeq display_id()
 105  Keywords             RichSeq get_keywords()
 106  Dates                RichSeq get_dates()
 107  Molecule             RichSeq molecule()
 108  Seq Version          RichSeq seq_version()
 109  PID                  RichSeq pid()
 110  Division             RichSeq division()
 111  Features             Seq get_SeqFeatures()
 112  Alphabet             PrimarySeq alphabet()
 113  Definition           PrimarySeq description() or desc()
 114  Version              PrimarySeq version()
 115
 116  Sequence             PrimarySeq seq()
 117
 118 There is more information in the Feature-Annotation HOWTO about each
 119 field and how it is mapped to the Sequence object
 120 L<http://bioperl.org/howtos/Features_and_Annotations_HOWTO.html>.
 121
 122 =head1 FEEDBACK
 123
 124 =head2 Mailing Lists
 125
 126 User feedback is an integral part of the evolution of this and other
 127 Bioperl modules. Send your comments and suggestions preferably to one
 128 of the Bioperl mailing lists.  Your participation is much appreciated.
 129
 130   bioperl-l@bioperl.org                  - General discussion
 131   http://bioperl.org/wiki/Mailing_lists  - About the mailing lists
 132
 133 =head2 Support
 134
 135 Please direct usage questions or support issues to the mailing list:
 136
 137 I<bioperl-l@bioperl.org>
 138
 139 rather than to the module maintainer directly. Many experienced and
 140 reponsive experts will be able look at the problem and quickly
 141 address it. Please include a thorough description of the problem
 142 with code and data examples if at all possible.
 143
 144 =head2 Reporting Bugs
 145
 146 Report bugs to the Bioperl bug tracking system to help us keep track
 147 the bugs and their resolution. Bug reports can be submitted via the web:
 148
 149   https://github.com/bioperl/bioperl-live/issues
 150
 151 =head1 AUTHOR - Bioperl Project
 152
 153 bioperl-l at bioperl.org
 154
 155 Original author Elia Stupka, elia -at- tigem.it
 156
 157 =head1 CONTRIBUTORS
 158
 159 Ewan Birney birney at ebi.ac.uk
 160 Jason Stajich jason at bioperl.org
 161 Chris Mungall cjm at fruitfly.bdgp.berkeley.edu
 162 Lincoln Stein lstein at cshl.org
 163 Heikki Lehvaslaiho, heikki at ebi.ac.uk
 164 Hilmar Lapp, hlapp at gmx.net
 165 Donald G. Jackson, donald.jackson at bms.com
 166 James Wasmuth, james.wasmuth at ed.ac.uk
 167 Brian Osborne, bosborne at alum.mit.edu
 168 Chris Fields, cjfields at bioperl dot org
 169
 170 =head1 APPENDIX
 171
 172 The rest of the documentation details each of the object
 173 methods. Internal methods are usually preceded with a _
 174
 175 =cut
 176
 177 # Let the code begin...
 178
 179 package Bio::SeqIO::genbank;
 180
 181 use strict;
 182
 183 use Bio::SeqIO::FTHelper;
 184 use Bio::SeqFeature::Generic;
 185 use Bio::Species;
 186 use Bio::Seq::SeqFactory;
 187 use Bio::Annotation::Collection;
 188 use Bio::Annotation::Comment;
 189 use Bio::Annotation::Reference;
 190 use Bio::Annotation::DBLink;
 191
 192 use base qw(Bio::SeqIO);
 193
 194 our $FTQUAL_LINE_LENGTH = 60;
 195
 196 our %FTQUAL_NO_QUOTE = map {$_ => 1} qw(
 197     anticodon           citation
 198     codon               codon_start
 199     compare             cons_splice
 200     direction           estimated_length
 201     evidence            label
 202     mod_base            number
 203     rpt_type            rpt_unit
 204     rpt_unit_range      tag_peptide
 205     transl_except       transl_table
 206     usedin
 207     );
 208
 209 our %DBSOURCE = map {$_ => 1} qw(
 210     EchoBASE     IntAct    SWISS-2DPAGE    ECO2DBASE    ECOGENE    TIGRFAMs
 211     TIGR    GO    InterPro    Pfam    PROSITE    SGD    GermOnline
 212     HSSP    PhosSite    Ensembl    RGD    AGD    ArrayExpress    KEGG
 213     H-InvDB    HGNC    LinkHub    PANTHER    PRINTS    SMART    SMR
 214     MGI    MIM    RZPD-ProtExp    ProDom    MEROPS    TRANSFAC    Reactome
 215     UniGene    GlycoSuiteDB    PIRSF    HSC-2DPAGE    PHCI-2DPAGE
 216     PMMA-2DPAGE    Siena-2DPAGE    Rat-heart-2DPAGE    Aarhus/Ghent-2DPAGE
 217     Biocyc    MetaCyc    Biocyc:Metacyc    GenomeReviews    FlyBase
 218     TMHOBP    COMPLUYEAST-2DPAGE    OGP    DictyBase    HAMAP
 219     PhotoList    Gramene    WormBase    WormPep    Genew    ZFIN
 220     PeroxiBase    MaizeDB    TAIR    DrugBank    REBASE    HPA
 221     swissprot    GenBank    GenPept    REFSEQ    embl    PDB    UniProtKB
 222     DIP    PeptideAtlas    PRIDE    CYGD    HOGENOME    Gene3D
 223     Project);
 224
 225 our %VALID_MOLTYPE = map {$_ => 1} qw(NA DNA RNA tRNA rRNA cDNA cRNA ms-DNA
 226     mRNA  uRNA  ss-RNA  ss-DNA  snRNA snoRNA PRT);
 227
 228 our %VALID_ALPHABET = (
 229     'bp' => 'dna',
 230     'aa' => 'protein',
 231     'rc' => '' # rc = release candidate; file has no sequences
 232 );
 233
 234 sub _initialize {
 235     my($self, @args) = @_;
 236
 237     $self->SUPER::_initialize(@args);
 238     # hash for functions for decoding keys.
 239     $self->{'_func_ftunit_hash'} = {};
 240     $self->_show_dna(1); # sets this to one by default. People can change it
 241     if ( not defined $self->sequence_factory ) {
 242         $self->sequence_factory
 243             (Bio::Seq::SeqFactory->new(-verbose => $self->verbose,
 244                                        -type    => 'Bio::Seq::RichSeq'));
 245     }
 246 }
 247
 248 =head2 next_seq
 249
 250  Title   : next_seq
 251  Usage   : $seq = $stream->next_seq()
 252  Function: returns the next sequence in the stream
 253  Returns : Bio::Seq object
 254  Args    :
 255
 256 =cut
 257
 258 sub next_seq {
 259     my ($self, @args) = @_;
 260     my %args    = @args;
 261     my $builder = $self->sequence_builder;
 262     my $seq;
 263     my %params;
 264
 265   RECORDSTART:
 266     while (1) {
 267         my $buffer;
 268         my ( @acc,        @features );
 269         my ( $display_id, $annotation );
 270         my $species;
 271
 272         # initialize; we may come here because of starting over
 273         @features   = ();
 274         $annotation = undef;
 275         @acc        = ();
 276         $species    = undef;
 277         %params     = ( -verbose => $self->verbose );    # reset hash
 278         local ($/) = "\n";
 279         while ( defined( $buffer = $self->_readline ) ) {
 280             last if index( $buffer, 'LOCUS       ' ) == 0;
 281         }
 282         return unless defined $buffer;                   # end of file
 283         $buffer =~ /^LOCUS\s+(\S.*)$/o
 284             or $self->throw(  "GenBank stream with bad LOCUS line. "
 285                             . "Not GenBank in my book. Got '$buffer'");
 286         my @tokens = split( ' ', $1 );
 287
 288         # this is important to have the id for display in e.g. FTHelper,
 289         # otherwise you won't know which entry caused an error
 290         $display_id = shift @tokens;
 291         $params{'-display_id'} = $display_id;
 292
 293         # may still be useful if we don't want the seq
 294         my $seqlength = shift @tokens;
 295         if ( exists $VALID_ALPHABET{$seqlength} ) {
 296             # moved one token too far.  No locus name?
 297             $self->warn(  "Bad LOCUS name?  Changing [$params{'-display_id'}] "
 298                         . "to 'unknown' and length to '$display_id'"
 299             );
 300             $params{'-display_id'} = 'unknown';
 301             $params{'-length'}     = $display_id;
 302
 303             # add token back...
 304             unshift @tokens, $seqlength;
 305         }
 306         else {
 307             $params{'-length'} = $seqlength;
 308         }
 309
 310         # the alphabet of the entry
 311         # shouldn't assign alphabet unless one
 312         # is specifically designated (such as for rc files)
 313         my $alphabet = lc( shift @tokens );
 314         $params{'-alphabet'} =
 315           ( exists $VALID_ALPHABET{$alphabet} )
 316           ? $VALID_ALPHABET{$alphabet}
 317           : $self->warn("Unknown alphabet: $alphabet");
 318
 319         # for aa there is usually no 'molecule' (mRNA etc)
 320         if ( $params{'-alphabet'} eq 'protein' ) {
 321             $params{'-molecule'} = 'PRT';
 322         }
 323         else {
 324             $params{'-molecule'} = shift(@tokens);
 325         }
 326
 327         # take care of lower case issues
 328         if ( $params{'-molecule'} eq 'dna' or $params{'-molecule'} eq 'rna' ) {
 329             $params{'-molecule'} = uc $params{'-molecule'};
 330         }
 331         $self->debug( "Unrecognized molecule type: " . $params{'-molecule'} )
 332             if not exists( $VALID_MOLTYPE{ $params{'-molecule'} } );
 333
 334         my $circ = shift @tokens;
 335         if ( $circ eq 'circular' ) {
 336             $params{'-is_circular'} = 1;
 337             $params{'-division'}    = shift @tokens;
 338         }
 339         else {
 340             # 'linear' or 'circular' may actually be omitted altogether
 341             $params{'-division'} =
 342                 ( CORE::length($circ) == 3 ) ? $circ : shift @tokens;
 343         }
 344         my $date = join( ' ', @tokens );    # we lump together the rest
 345
 346         # this is per request bug #1513
 347         # we can handle:
 348         # 9-10-2003
 349         # 9-10-03
 350         # 09-10-2003
 351         # 09-10-03
 352         if ( $date =~ s/\s*((\d{1,2})-(\w{3})-(\d{2,4})).*/$1/ ) {
 353             if ( length($date) < 11 ) {
 354                 # improperly formatted date
 355                 # But we'll be nice and fix it for them
 356                 my ( $d, $m, $y ) = ( $2, $3, $4 );
 357                 if ( length($d) == 1 ) {
 358                     $d = "0$d";
 359                 }
 360
 361                 # guess the century here
 362                 if ( length($y) == 2 ) {
 363                     if ( $y > 60 ) {    # arbitrarily guess that '60' means 1960
 364                         $y = "19$y";
 365                     }
 366                     else {
 367                         $y = "20$y";
 368                     }
 369                     $self->warn(  "Date was malformed, guessing the "
 370                                 . "century for $date to be $y\n"
 371                     );
 372                 }
 373                 $params{'-dates'} = [ join( '-', $d, $m, $y ) ];
 374             }
 375             else {
 376                 $params{'-dates'} = [$date];
 377             }
 378         }
 379
 380         # set them all at once
 381         $builder->add_slot_value(%params);
 382         %params = ();
 383
 384         # parse the rest if desired, otherwise start over
 385         if ( not $builder->want_object ) {
 386             $builder->make_object;
 387             next RECORDSTART;
 388         }
 389
 390         # set up annotation depending on what the builder wants
 391         if ( $builder->want_slot('annotation') ) {
 392             $annotation = Bio::Annotation::Collection->new;
 393         }
 394
 395         $buffer = $self->_readline;
 396         while ( defined( my $line = $buffer ) ) {
 397             # Description line(s)
 398             if ($line =~ /^DEFINITION\s+(\S.*\S)/) {
 399                 my @desc = ($1);
 400                 while ( defined( $line = $self->_readline ) ) {
 401                     if ($line =~ /^\s+(.*)/) {
 402                         push( @desc, $1 );
 403                         next;
 404                     }
 405                     last;
 406                 }
 407                 $builder->add_slot_value( -desc => join( ' ', @desc ) );
 408
 409                 # we'll continue right here because DEFINITION
 410                 # always comes at the top of the entry
 411                 $buffer = $line;
 412             }
 413
 414             # accession number (there can be multiple accessions)
 415             if ($line =~ /^ACCESSION\s+(\S.*\S)/) {
 416                 push( @acc, split( /\s+/, $1 ) );
 417                 while ( defined( $line = $self->_readline ) ) {
 418                     if ($line =~ /^\s+(.*)/) {
 419                         push( @acc, split( /\s+/, $1 ) );
 420                         next;
 421                     }
 422                     last;
 423                 }
 424                 $buffer = $line;
 425                 next;
 426             }
 427
 428             # PID
 429             elsif ($line =~ /^PID\s+(\S+)/) {
 430                 $params{'-pid'} = $1;
 431             }
 432
 433             # Version number
 434             elsif ($line =~ /^VERSION\s+(\S.+)$/) {
 435                 my ( $acc, $gi ) = split( ' ', $1 );
 436                 if ( $acc =~ /^\w+\.(\d+)/ ) {
 437                     $params{'-version'}     = $1;
 438                     $params{'-seq_version'} = $1;
 439                 }
 440                 if ( $gi && ( index( $gi, "GI:" ) == 0 ) ) {
 441                     $params{'-primary_id'} = substr( $gi, 3 );
 442                 }
 443             }
 444
 445             # Keywords
 446             elsif ($line =~ /^KEYWORDS\s+(\S.*)/) {
 447                 my @kw = split( /\s*\;\s*/, $1 );
 448                 while ( defined( $line = $self->_readline ) ) {
 449                     chomp $line;
 450                     if ($line =~ /^\s+(.*)/) {
 451                         push( @kw, split( /\s*\;\s*/, $1 ) );
 452                         next;
 453                     }
 454                     last;
 455                 }
 456
 457                 @kw && $kw[-1] =~ s/\.$//;
 458                 $params{'-keywords'} = \@kw;
 459                 $buffer = $line;
 460                 next;
 461             }
 462
 463             # Organism name and phylogenetic information
 464             elsif ($line =~ /^SOURCE\s+\S/) {
 465                 if ( $builder->want_slot('species') ) {
 466                     $species = $self->_read_GenBank_Species( \$buffer );
 467                     $builder->add_slot_value( -species => $species );
 468                 }
 469                 else {
 470                     while ( defined( $buffer = $self->_readline ) ) {
 471                         last if substr( $buffer, 0, 1 ) ne ' ';
 472                     }
 473                 }
 474                 next;
 475             }
 476
 477             # References
 478             elsif ($line =~ /^REFERENCE\s+\S/) {
 479                 if ($annotation) {
 480                     my @refs = $self->_read_GenBank_References( \$buffer );
 481                     foreach my $ref (@refs) {
 482                         $annotation->add_Annotation( 'reference', $ref );
 483                     }
 484                 }
 485                 else {
 486                     while ( defined( $buffer = $self->_readline ) ) {
 487                         last if substr( $buffer, 0, 1 ) ne ' ';
 488                     }
 489                 }
 490                 next;
 491             }
 492
 493             # Project
 494             elsif ($line =~ /^PROJECT\s+(\S.*)/) {
 495                 if ($annotation) {
 496                     my $project =
 497                         Bio::Annotation::SimpleValue->new( -value => $1 );
 498                     $annotation->add_Annotation( 'project', $project );
 499                 }
 500             }
 501
 502             # Comments may be plain text or Structured Comments.
 503             # Structured Comments are made up of tag/value pairs and have beginning
 504             # and end delimiters like ##*-Data-START## and ##*-Data-END##
 505             elsif ($line =~ /^COMMENT\s+(\S.*)/) {
 506                 if ($annotation) {
 507                     my $comment = $1;
 508                     while ( defined( $line = $self->_readline ) ) {
 509                         last if ($line =~ /^\S/);
 510                         $comment .= $line;
 511                     }
 512                     $comment =~ s/  +/ /g;
 513                     # Structured Comment, do not remove returns in the tabular section
 514                     if ( my ( $text, $table )= $comment
 515                         =~ /([^#]*)(##\S+Data-START##.+?##\S+Data-END##)/is
 516                         ) {
 517                         $text    =~ s/\n/ /g if $text;
 518                         $table   =~ s/START##/START##\n/;
 519                         $table   =~ s/^\s+//gm;
 520                         $comment = $text . "\n" . $table;
 521                     }
 522                     # Plain text, remove returns
 523                     else {
 524                         $comment =~ s/\n/ /g;
 525                     }
 526                     $annotation->add_Annotation(
 527                         'comment',
 528                         Bio::Annotation::Comment->new(
 529                             -text    => $comment,
 530                             -tagname => 'comment'
 531                         )
 532                     );
 533                     $buffer = $line;
 534                 }
 535                 else {
 536                     while ( defined( $buffer = $self->_readline ) ) {
 537                         last if substr( $buffer, 0, 1 ) ne ' ';
 538                     }
 539                 }
 540                 next;
 541             }
 542
 543             # Corresponding Genbank nucleotide id, Genpept only
 544             elsif ($line =~ /^DB(?:SOURCE|LINK)\s+(\S.+)/) {
 545                 if ($annotation) {
 546                     my $dbsource = $1;
 547                     while ( defined( $line = $self->_readline ) ) {
 548                         last if ($line =~ /^\S/);
 549                         $dbsource .= $line;
 550                     }
 551
 552                     # deal with UniProKB dbsources
 553                     if ( $dbsource =~
 554                         s/(UniProt(?:KB)?|swissprot):\s+locus\s+(\S+)\,.+\n//
 555                         ) {
 556                         $annotation->add_Annotation(
 557                             'dblink',
 558                             Bio::Annotation::DBLink->new(
 559                                 -primary_id => $2,
 560                                 -database   => $1,
 561                                 -tagname    => 'dblink'
 562                             )
 563                         );
 564                         if ( $dbsource =~ s/\s+created:\s+([^\.]+)\.\n// ) {
 565                             $annotation->add_Annotation(
 566                                 'swissprot_dates',
 567                                 Bio::Annotation::SimpleValue->new(
 568                                     -tagname => 'date_created',
 569                                     -value   => $1
 570                                 )
 571                             );
 572                         }
 573                         while ( $dbsource =~
 574                                s/\s+(sequence|annotation)\s+updated:\s+([^\.]+)\.\n//g
 575                             ) {
 576                             $annotation->add_Annotation(
 577                                 'swissprot_dates',
 578                                 Bio::Annotation::SimpleValue->new(
 579                                     -tagname => 'date_updated',
 580                                     -value   => $2
 581                                 )
 582                             );
 583                         }
 584                         $dbsource =~ s/\n/ /g;
 585                         if ( $dbsource =~
 586                             s/\s+xrefs:\s+((?:\S+,\s+)+\S+)\s+xrefs/xrefs/
 587                             ) {
 588                             # will use $i to determine even or odd
 589                             # for swissprot the accessions are paired
 590                             my $i = 0;
 591                             for my $dbsrc ( split( /,\s+/, $1 ) ) {
 592                                 if (   $dbsrc =~ /(\S+)\.(\d+)/
 593                                     or $dbsrc =~ /(\S+)/
 594                                     ) {
 595                                     my ( $id, $version ) = ( $1, $2 );
 596                                     $version = '' unless defined $version;
 597                                     my $db = ( $id =~ /^\d\S{3}/ ) ? 'PDB'
 598                                            : ( $i++ % 2 )          ? 'GenPept'
 599                                            : 'GenBank';
 600
 601                                     $annotation->add_Annotation(
 602                                         'dblink',
 603                                         Bio::Annotation::DBLink->new(
 604                                             -primary_id => $id,
 605                                             -version    => $version,
 606                                             -database   => $db,
 607                                             -tagname    => 'dblink'
 608                                         )
 609                                     );
 610                                 }
 611                             }
 612                         }
 613                         elsif ( $dbsource =~ s/\s+xrefs:\s+(.+)\s+xrefs/xrefs/i ) {
 614                             # download screwed up and ncbi didn't put acc in for gi numbers
 615                             my $i = 0;
 616                             for my $id ( split( /\,\s+/, $1 ) ) {
 617                                 my ( $acc, $db );
 618                                 if ( $id =~ /gi:\s+(\d+)/ ) {
 619                                     $acc = $1;
 620                                     $db = ( $i++ % 2 ) ? 'GenPept' : 'GenBank';
 621                                 }
 622                                 elsif ( $id =~ /pdb\s+accession\s+(\S+)/ ) {
 623                                     $acc = $1;
 624                                     $db  = 'PDB';
 625                                 }
 626                                 else {
 627                                     $acc = $id;
 628                                     $db  = '';
 629                                 }
 630                                 $annotation->add_Annotation(
 631                                     'dblink',
 632                                     Bio::Annotation::DBLink->new(
 633                                         -primary_id => $acc,
 634                                         -database   => $db,
 635                                         -tagname    => 'dblink'
 636                                     )
 637                                 );
 638                             }
 639                         }
 640                         else {
 641                             $self->debug("Cannot match $dbsource\n");
 642                         }
 643                         if ( $dbsource =~ s/xrefs\s+
 644                                             \(non\-sequence\s+databases\):\s+
 645                                             ((?:\S+,\s+)+\S+)//x
 646                             ) {
 647                             for my $id ( split( /\,\s+/, $1 ) ) {
 648                                 my $db;
 649
 650                                 # this is because GenBank dropped the spaces!!!
 651                                 # I'm sure we're not going to get this right
 652                                 ##if ( $id =~ s/^://i ) {
 653                                 ##    $db = $1;
 654                                 ##}
 655                                 $db = substr( $id, 0, index( $id, ':' ) );
 656                                 if ( not exists $DBSOURCE{$db} ) {
 657                                     $db = '';    # do we want 'GenBank' here?
 658                                 }
 659                                 $id = substr( $id, index( $id, ':' ) + 1 );
 660                                 $annotation->add_Annotation(
 661                                     'dblink',
 662                                     Bio::Annotation::DBLink->new(
 663                                         -primary_id => $id,
 664                                         -database   => $db,
 665                                         -tagname    => 'dblink'
 666                                     )
 667                                 );
 668                             }
 669                         }
 670                     }
 671                     else {
 672                         if ( $dbsource =~
 673                             /^(\S*?):?\s*accession\s+(\S+)\.(\d+)/
 674                             ) {
 675                             my ( $db, $id, $version ) = ( $1, $2, $3 );
 676                             $annotation->add_Annotation(
 677                                 'dblink',
 678                                 Bio::Annotation::DBLink->new(
 679                                     -primary_id => $id,
 680                                     -version    => $version,
 681                                     -database   => $db || 'GenBank',
 682                                     -tagname    => 'dblink'
 683                                 )
 684                             );
 685                         }
 686                         elsif ( $dbsource =~ /^(\S*?):?\s*accession\s+(\S+)/ ) {
 687                             my ( $db, $id ) = ( $1, $2 );
 688                             $annotation->add_Annotation(
 689                                 'dblink',
 690                                 Bio::Annotation::DBLink->new(
 691                                     -primary_id => $id,
 692                                     -database   => $db || 'GenBank',
 693                                     -tagname    => 'dblink'
 694                                 )
 695                             );
 696                         }
 697                         elsif ( $dbsource =~ /(\S+)([\.:])\s*(\S+)/ ) {
 698                             my ( $db, $version );
 699                             my @ids = ();
 700                             if ( $2 eq ':' ) {
 701                                 $db = $1;
 702
 703                                 # Genbank 192 release notes say this: "The second
 704                                 # field can consist of multiple comma-separated
 705                                 # identifiers, if a sequence record has multiple
 706                                 # DBLINK cross-references of a given type."
 707                                 # For example: DBLINK      Project:100,200,300"
 708                                 @ids = split( /,/, $3 );
 709                             }
 710                             else {
 711                                 ( $db, $version ) = ( 'GenBank', $3 );
 712                                 $ids[0] = $1;
 713                             }
 714
 715                             foreach my $id (@ids) {
 716                                 $annotation->add_Annotation(
 717                                     'dblink',
 718                                     Bio::Annotation::DBLink->new(
 719                                         -primary_id => $id,
 720                                         -version    => $version,
 721                                         -database   => $db,
 722                                         -tagname    => 'dblink'
 723                                     )
 724                                 );
 725                             }
 726                         }
 727                         else {
 728                             $self->warn(
 729                                 "Unrecognized DBSOURCE data: $dbsource\n");
 730                         }
 731                     }
 732                     $buffer = $line;
 733                 }
 734                 else {
 735                     while ( defined( $buffer = $self->_readline ) ) {
 736                         last if substr( $buffer, 0, 1 ) ne ' ';
 737                     }
 738                 }
 739                 next;
 740             }
 741
 742             # Exit at start of Feature table, or start of sequence
 743             if ($line =~ /^(FEATURES|ORIGIN)/) {
 744                 my $trap;
 745             }
 746             last if ($line =~ /^(FEATURES|ORIGIN)/);
 747
 748             # Get next line and loop again
 749             $buffer = $self->_readline;
 750         }
 751         return unless defined $buffer;
 752
 753         # add them all at once for efficiency
 754         $builder->add_slot_value(
 755             -accession_number     => shift(@acc),
 756             -secondary_accessions => \@acc,
 757             %params
 758         );
 759         $builder->add_slot_value( -annotation => $annotation ) if $annotation;
 760         %params = ();    # reset before possible re-use to avoid setting twice
 761
 762         # start over if we don't want to continue with this entry
 763         if ( not $builder->want_object ) {
 764             $builder->make_object;
 765             next RECORDSTART;
 766         }
 767
 768         # some "minimal" formats may not necessarily have a feature table
 769         if (    $builder->want_slot('features')
 770             and defined $buffer
 771             and $buffer =~ /^FEATURES/o
 772             ) {
 773             # need to read the first line of the feature table
 774             $buffer = $self->_readline;
 775
 776             # DO NOT read lines in the while condition -- this is done
 777             # as a side effect in _read_FTHelper_GenBank!
 778
 779             # part of new circular spec:
 780             # commented out for now until kinks worked out
 781             #my $sourceEnd = 0;
 782             #$sourceEnd = $2 if ($buffer =~ /(\d+?)\.\.(\d+?)$/);
 783
 784             while ( defined $buffer ) {
 785                 # check immediately -- not at the end of the loop
 786                 # note: GenPept entries obviously do not have a BASE line
 787                 last if ( $buffer =~ /^BASE|ORIGIN|CONTIG|WGS/o );
 788
 789                 # slurp in one feature at a time -- at return, the start of
 790                 # the next feature will have been read already, so we need
 791                 # to pass a reference, and the called method must set this
 792                 # to the last line read before returning
 793                 my $ftunit = $self->_read_FTHelper_GenBank( \$buffer );
 794
 795                 # implement new circular spec: features that cross the origin are now
 796                 # seamless instead of being 2 separate joined features
 797                 # commented out until kinks get worked out
 798                 #if ((! $args{'-nojoin'}) && $ftunit->{'loc'} =~ /^join\((\d+?)\.\.(\d+?),(\d+?)..(\d+?)\)$/
 799                 #&& $sourceEnd == $2 && $3 == 1) {
 800                 #my $start = $1;
 801                 #my $end = $2 + $4;
 802                 #$ftunit->{'loc'} = "$start..$end";
 803                 #}
 804
 805                 # fix suggested by James Diggans
 806
 807                 if ( not defined $ftunit ) {
 808                     # GRRRR. We have fallen over. Try to recover
 809                     $self->warn(  "Unexpected error in feature table for "
 810                                 . $params{'-display_id'}
 811                                 . " Skipping feature, attempting to recover" );
 812
 813                     unless (   $buffer =~ /^\s{5,5}\S+/o
 814                             or $buffer =~ /^\S+/o
 815                         ) {
 816                         $buffer = $self->_readline;
 817                     }
 818                     next;    # back to reading FTHelpers
 819                 }
 820
 821                 # process ftunit
 822                 my $feat =
 823                     $ftunit->_generic_seqfeature( $self->location_factory,
 824                                                   $display_id );
 825
 826                 # add taxon_id from source if available
 827                 if (   $species
 828                     and $feat->primary_tag eq 'source'
 829                     and $feat->has_tag('db_xref')
 830                     and (    not $species->ncbi_taxid
 831                          or (    $species->ncbi_taxid
 832                              and $species->ncbi_taxid =~ /^list/ ) )
 833                     ) {
 834                     foreach my $tagval ( $feat->get_tag_values('db_xref') ) {
 835                         if ( index( $tagval, "taxon:" ) == 0 ) {
 836                             $species->ncbi_taxid( substr( $tagval, 6 ) );
 837                             last;
 838                         }
 839                     }
 840                 }
 841
 842                 # add feature to list of features
 843                 push( @features, $feat );
 844             }
 845             $builder->add_slot_value( -features => \@features );
 846         }
 847
 848         if ( defined $buffer ) {
 849             # CONTIG lines: TODO, this needs to be cleaned up
 850             if ($buffer =~/^CONTIG\s+(.*)/o) {
 851                 my $ctg = $1;
 852                 while ( defined( $buffer = $self->_readline ) ) {
 853                     last if $buffer =~ m{^ORIGIN|//}o;
 854                     $buffer =~ s/\s+(.*)/$1/;
 855                     $ctg .= $buffer;
 856                 }
 857                 if ($ctg) {
 858                     $annotation->add_Annotation(
 859                         Bio::Annotation::SimpleValue->new(
 860                             -tagname => 'contig',
 861                             -value   => $ctg
 862                         )
 863                     );
 864                 }
 865             }
 866             elsif ($buffer =~ /^WGS|WGS_SCAFLD\s+/o) {    # catch WGS/WGS_SCAFLD lines
 867                 while ( $buffer =~ s/(^WGS|WGS_SCAFLD)\s+// ) {    # gulp lines
 868                     chomp $buffer;
 869                     $annotation->add_Annotation(
 870                         Bio::Annotation::SimpleValue->new(
 871                             -value   => $buffer,
 872                             -tagname => lc $1
 873                         )
 874                     );
 875                     $buffer = $self->_readline;
 876                 }
 877             }
 878             elsif ( $buffer !~ m{^ORIGIN|//}o ) {    # advance to the sequence, if any
 879                 while ( defined( $buffer = $self->_readline ) ) {
 880                     last if $buffer =~ m{^(ORIGIN|//)};
 881                 }
 882             }
 883         }
 884         if ( not $builder->want_object ) {
 885             $builder->make_object;        # implicit end-of-object
 886             next RECORDSTART;
 887         }
 888         if ( $builder->want_slot('seq') ) {
 889             # the fact that we want a sequence does not necessarily mean that
 890             # there also is a sequence ...
 891             if ( defined $buffer and $buffer =~ s/^ORIGIN\s+// ) {
 892                 if ( $annotation and length($buffer) > 0 ) {
 893                     $annotation->add_Annotation(
 894                         'origin',
 895                         Bio::Annotation::SimpleValue->new(
 896                             -tagname => 'origin',
 897                             -value   => $buffer
 898                         )
 899                     );
 900                 }
 901                 my $seqc = '';
 902                 while ( defined( $buffer = $self->_readline ) ) {
 903                     last if $buffer =~ m{^//};
 904                     $buffer = uc $buffer;
 905                     $buffer =~ s/[^A-Za-z]//g;
 906                     $seqc .= $buffer;
 907                 }
 908
 909                 $builder->add_slot_value( -seq => $seqc );
 910             }
 911         }
 912         elsif ( defined($buffer) and ( substr( $buffer, 0, 2 ) ne '//' ) ) {
 913             # advance to the end of the record
 914             while ( defined( $buffer = $self->_readline ) ) {
 915                 last if substr( $buffer, 0, 2 ) eq '//';
 916             }
 917         }
 918
 919         # Unlikely, but maybe the sequence is so weird that we don't want it
 920         # anymore. We don't want to return undef if the stream's not exhausted
 921         # yet.
 922         $seq = $builder->make_object;
 923         next RECORDSTART unless $seq;
 924         last RECORDSTART;
 925     }    # end while RECORDSTART
 926
 927     return $seq;
 928 }
 929
 930 =head2 write_seq
 931
 932  Title   : write_seq
 933  Usage   : $stream->write_seq($seq)
 934  Function: writes the $seq object (must be seq) to the stream
 935  Returns : 1 for success and 0 for error
 936  Args    : array of 1 to n Bio::SeqI objects
 937
 938 =cut
 939
 940 sub write_seq {
 941     my ($self,@seqs) = @_;
 942
 943     foreach my $seq ( @seqs ) {
 944         $self->throw("Attempting to write with no seq!") unless defined $seq;
 945
 946         if ( not ref $seq or not $seq->isa('Bio::SeqI') ) {
 947             $self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
 948         }
 949
 950         my $str   = $seq->seq;
 951         my $len   = $seq->length;
 952         my $alpha = $seq->alphabet;
 953
 954         my ($div, $mol);
 955         if (   not $seq->can('division')
 956             or not defined($div = $seq->division)
 957             ) {
 958             $div = 'UNK';
 959         }
 960         if (   not $seq->can('molecule')
 961             or not defined ($mol = $seq->molecule)
 962            ) {
 963             $mol =  $alpha || 'DNA';
 964         }
 965
 966         my $circular = ($seq->is_circular) ? 'circular' : 'linear  ';
 967
 968         local($^W) = 0; # suppressing warnings about uninitialized fields.
 969
 970         my $temp_line;
 971         if ( $self->_id_generation_func ) {
 972             $temp_line = &{$self->_id_generation_func}($seq);
 973         }
 974         else {
 975             my $date = '';
 976             if ( $seq->can('get_dates') ) {
 977                 ($date) = $seq->get_dates;
 978             }
 979
 980             $self->warn("No whitespace allowed in GenBank display id [". $seq->display_id. "]")
 981                 if $seq->display_id =~ /\s/;
 982
 983             my @data = ( lc($alpha) eq 'protein' ) ? ('aa', '', '') : ('bp', '', $mol);
 984             $temp_line = sprintf ("%-12s%-15s%13s %s%4s%-8s%-8s %3s %-s\n",
 985                                   'LOCUS', $seq->id, $len,
 986                                   @data, $circular, $div, $date);
 987         }
 988
 989         $self->_print($temp_line);
 990         $self->_write_line_GenBank_regex("DEFINITION  ", "            ",
 991                                          $seq->desc,     "\\s\+\|\$",80);
 992
 993         # if there, write the accession line
 994
 995         if ( $self->_ac_generation_func ) {
 996             $temp_line = &{$self->_ac_generation_func}($seq);
 997             $self->_print("ACCESSION   $temp_line\n");
 998         }
 999         else {
1000             my @acc = ();
1001             push @acc, $seq->accession_number;
1002             if ( $seq->isa('Bio::Seq::RichSeqI') ) {
1003                 push @acc, $seq->get_secondary_accessions;
1004             }
1005             $self->_print("ACCESSION   ", join(" ", @acc), "\n");
1006             # otherwise - cannot print <sigh>
1007         }
1008
1009         # if PID defined, print it
1010         if ($seq->isa('Bio::Seq::RichSeqI') and $seq->pid) {
1011             $self->_print("PID         ", $seq->pid, "\n");
1012         }
1013
1014         # if there, write the version line
1015         if ( defined $self->_sv_generation_func ) {
1016             $temp_line = &{$self->_sv_generation_func}($seq);
1017             if ( $temp_line ) {
1018                 $self->_print("VERSION     $temp_line\n");
1019             }
1020         }
1021         elsif ($seq->isa('Bio::Seq::RichSeqI') and defined($seq->seq_version)) {
1022             my $id = $seq->primary_id; # this may be a GI number
1023             my $data = (defined $id and $id =~ /^\d+$/) ? "  GI:$id" : "";
1024             $self->_print("VERSION     ",
1025                           $seq->accession_number, ".",
1026                           $seq->seq_version, $data, "\n");
1027         }
1028
1029         # if there, write the PROJECT line
1030         for my $proj ( $seq->annotation->get_Annotations('project') ) {
1031             $self->_print("PROJECT     ".$proj->value."\n");
1032         }
1033
1034         # if there, write the DBSOURCE line
1035         foreach my $ref ( $seq->annotation->get_Annotations('dblink') ) {
1036             my ($db, $id) = ($ref->database, $ref->primary_id);
1037             my $prefix = $db eq 'Project' ? 'DBLINK' : 'DBSOURCE';
1038             my $text   = $db eq 'GenBank' ? ''
1039                        : $db eq 'Project' ? "$db:$id"
1040                        : "$db accession $id";
1041             $self->_print(sprintf ("%-11s %s\n", $prefix, $text));
1042         }
1043
1044         # if there, write the keywords line
1045         if ( defined $self->_kw_generation_func ) {
1046             $temp_line = &{$self->_kw_generation_func}($seq);
1047             $self->_print("KEYWORDS    $temp_line\n");
1048         }
1049         elsif ( $seq->can('keywords') ) {
1050             my $kw = $seq->keywords;
1051             $kw .= '.' if ( $kw !~ /\.$/ );
1052             $self->_print("KEYWORDS    $kw\n");
1053         }
1054
1055         # SEGMENT if it exists
1056         foreach my $ref ( $seq->annotation->get_Annotations('segment') ) {
1057             $self->_print(sprintf ("%-11s %s\n",'SEGMENT',
1058                                    $ref->value));
1059         }
1060
1061         # Organism lines
1062         if (my $spec = $seq->species) {
1063             my ($on, $sn, $cn) = ($spec->can('organelle') ? $spec->organelle : '',
1064                                   $spec->scientific_name,
1065                                   $spec->common_name);
1066             my @classification;
1067             if ($spec->isa('Bio::Species')) {
1068                 @classification = $spec->classification;
1069                 shift @classification;
1070             }
1071             else {
1072                 # Bio::Taxon should have a DB handle of some type attached, so
1073                 # derive the classification from that
1074                 my $node = $spec;
1075                 while ($node) {
1076                     $node = $node->ancestor || last;
1077                     unshift @classification, $node->node_name;
1078                     #$node eq $root && last;
1079                 }
1080                 @classification = reverse @classification;
1081             }
1082             my $abname = $spec->name('abbreviated') ? # from genbank file
1083                          $spec->name('abbreviated')->[0] : $sn;
1084             my $sl = $on ? "$on "          : '';
1085             $sl   .= $cn ? "$abname ($cn)" : $abname;
1086
1087             $self->_write_line_GenBank_regex("SOURCE      ", ' 'x12, $sl, "\\s\+\|\$", 80);
1088             $self->_print("  ORGANISM  ", $spec->scientific_name, "\n");
1089             my $OC = join('; ', reverse @classification) . '.';
1090             $self->_write_line_GenBank_regex(' 'x12,' 'x12, $OC, "\\s\+\|\$", 80);
1091         }
1092
1093         # Reference lines
1094         my $count = 1;
1095         foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
1096             $temp_line = "REFERENCE   $count";
1097             if ($ref->start) {
1098                 $temp_line .= sprintf ("  (%s %d to %d)",
1099                                        ($seq->alphabet() eq "protein" ?
1100                                         "residues" : "bases"),
1101                                        $ref->start, $ref->end);
1102             }
1103             elsif ($ref->gb_reference) {
1104                 $temp_line .= sprintf ("  (%s)", $ref->gb_reference);
1105             }
1106             $self->_print("$temp_line\n");
1107             $self->_write_line_GenBank_regex("  AUTHORS   ", ' 'x12,
1108                                              $ref->authors,    "\\s\+\|\$", 80);
1109             $self->_write_line_GenBank_regex("  CONSRTM   ", ' 'x12,
1110                                              $ref->consortium, "\\s\+\|\$", 80) if $ref->consortium;
1111             $self->_write_line_GenBank_regex("  TITLE     ", ' 'x12,
1112                                              $ref->title,      "\\s\+\|\$", 80);
1113             $self->_write_line_GenBank_regex("  JOURNAL   ", ' 'x12,
1114                                              $ref->location,   "\\s\+\|\$", 80);
1115             if ( $ref->medline) {
1116                 $self->_write_line_GenBank_regex("  MEDLINE   ", ' 'x12,
1117                                                  $ref->medline,  "\\s\+\|\$", 80);
1118                 # I am assuming that pubmed entries only exist when there
1119                 # are also MEDLINE entries due to the indentation
1120             }
1121             # This could be a wrong assumption
1122             if ( $ref->pubmed ) {
1123                 $self->_write_line_GenBank_regex("   PUBMED   ", ' 'x12,
1124                                                  $ref->pubmed,   "\\s\+\|\$", 80);
1125             }
1126             # put remark at the end
1127             if ($ref->comment) {
1128                 $self->_write_line_GenBank_regex("  REMARK    ", ' 'x12,
1129                                                  $ref->comment,  "\\s\+\|\$", 80);
1130             }
1131             $count++;
1132         }
1133
1134         # Comment lines
1135         foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
1136             $self->_write_line_GenBank_regex("COMMENT     ", ' 'x12,
1137                                              $comment->text, "\\s\+\|\$", 80);
1138         }
1139
1140         # FEATURES section
1141         $self->_print("FEATURES             Location/Qualifiers\n");
1142
1143         if ( defined $self->_post_sort ) {
1144             # we need to read things into an array. Process. Sort them. Print 'em
1145             my $post_sort_func = $self->_post_sort;
1146             my @fth;
1147
1148             foreach my $sf ( $seq->top_SeqFeatures ) {
1149                 push @fth, Bio::SeqIO::FTHelper::from_SeqFeature($sf, $seq);
1150             }
1151
1152             @fth = sort { &$post_sort_func($a, $b) } @fth;
1153
1154             foreach my $fth ( @fth ) {
1155                 $self->_print_GenBank_FTHelper($fth);
1156             }
1157         }
1158         else {
1159             # not post sorted. And so we can print as we get them.
1160             # lower memory load...
1161             foreach my $sf ( $seq->top_SeqFeatures ) {
1162                 my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf, $seq);
1163                 foreach my $fth ( @fth ) {
1164                     if ( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1165                         $sf->throw("Cannot process FTHelper... $fth");
1166                     }
1167                     $self->_print_GenBank_FTHelper($fth);
1168                 }
1169             }
1170         }
1171
1172         # deal with WGS; WGS_SCAFLD present only if WGS is also present
1173         if ($seq->annotation->get_Annotations('wgs')) {
1174             foreach my $wgs (map {$seq->annotation->get_Annotations($_)}
1175                              qw(wgs wgs_scaffold)
1176                 ) {
1177                 $self->_print(sprintf ("%-11s %s\n",
1178                                        uc($wgs->tagname),
1179                                        $wgs->value));
1180             }
1181             $self->_show_dna(0);
1182         }
1183         if ($seq->annotation->get_Annotations('contig')) {
1184             my $ct = 0;
1185             my $cline;
1186             foreach my $contig ($seq->annotation->get_Annotations('contig')) {
1187                 unless ($ct) {
1188                     $cline = uc($contig->tagname) . "      " . $contig->value . "\n";
1189                 }
1190                 else {
1191                     $cline = "            " . $contig->value . "\n";
1192                 }
1193                 $self->_print($cline);
1194                 $ct++;
1195             }
1196         }
1197         if ( $seq->length == 0 ) {
1198             $self->_show_dna(0);
1199         }
1200
1201         if ( $self->_show_dna == 0 ) {
1202             $self->_print("\n//\n");
1203             return;
1204         }
1205
1206         # finished printing features.
1207
1208         $str =~ tr/A-Z/a-z/;
1209
1210         my ($o) = $seq->annotation->get_Annotations('origin');
1211         $self->_print(sprintf("%-12s%s\n",
1212                               'ORIGIN', $o ? $o->value : ''));
1213         # print out the sequence
1214         my $nuc = 60;           # Number of nucleotides per line
1215         my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line
1216         my $out_pat   = 'A11' x 6; # Pattern for packing a line
1217         my $length = length $str;
1218
1219         # Calculate the number of nucleotides which fit on whole lines
1220         my $whole = int($length / $nuc) * $nuc;
1221
1222         # Print the whole lines
1223         my $i;
1224         for ($i = 0; $i < $whole; $i += $nuc) {
1225             my $blocks = pack $out_pat,
1226             unpack $whole_pat,
1227             substr($str, $i, $nuc);
1228             chop $blocks;
1229             $self->_print(sprintf("%9d $blocks\n", $i + $nuc - 59));
1230         }
1231
1232         # Print the last line
1233         if (my $last = substr($str, $i)) {
1234             my $last_len = length($last);
1235             my $last_pat = 'a10' x int($last_len / 10)
1236                          . 'a' . $last_len % 10;
1237             my $blocks = pack $out_pat,
1238             unpack($last_pat, $last);
1239             $blocks =~ s/ +$//;
1240             $self->_print(sprintf("%9d $blocks\n",
1241                                   $length - $last_len + 1));
1242         }
1243
1244         $self->_print("//\n");
1245
1246         $self->flush if $self->_flush_on_write && defined $self->_fh;
1247         return 1;
1248     }
1249 }
1250
1251 =head2 _print_GenBank_FTHelper
1252
1253  Title   : _print_GenBank_FTHelper
1254  Usage   :
1255  Function:
1256  Example :
1257  Returns :
1258  Args    :
1259
1260 =cut
1261
1262 sub _print_GenBank_FTHelper {
1263     my ( $self, $fth ) = @_;
1264
1265     if ( not ref $fth or not $fth->isa('Bio::SeqIO::FTHelper') ) {
1266         $fth->warn(
1267             "$fth is not a FTHelper class. Attempting to print but there could be issues"
1268         );
1269     }
1270
1271     my $spacer = ( length $fth->key >= 15 ) ? ' ' : '';
1272     $self->_write_line_GenBank_regex(
1273         sprintf( "     %-16s%s", $fth->key, $spacer ),
1274                  " " x 21, $fth->loc, "\,\|\$", 80 );
1275
1276     foreach my $tag ( sort keys %{ $fth->field } ) {
1277         # Account for hash structure in Annotation::DBLink, not the expected array
1278         if ( $tag eq 'db_xref' and grep /HASH/, @{ $fth->field->{$tag} }) {
1279             for my $ref ( @{ $fth->field->{$tag} } ) {
1280                 my $db = $ref->{'database'};
1281                 my $id = $ref->{'primary_id'};
1282                 $self->_write_line_GenBank_regex
1283                     ( " " x 21, " " x 21,
1284                       "/$tag=\"$db:$id\"", "\.\|\$", 80 );
1285             }
1286         }
1287         # The usual case, where all values are found in an array
1288         else {
1289             foreach my $value ( @{ $fth->field->{$tag} } ) {
1290                 $value =~ s/\"/\"\"/g;
1291                 if ( $value eq "_no_value" ) {
1292                     $self->_write_line_GenBank_regex
1293                         ( " " x 21, " " x 21,
1294                           "/$tag", "\.\|\$", 80 );
1295                 }
1296
1297                # There are almost 3x more quoted qualifier values and they
1298                # are more common too so we take quoted ones first.
1299                # Long qualifiers, that will be line wrapped, are always quoted
1300
1301                # Note (cjf): any tags explicitly *not* quoted must
1302                # remain unquoted in all output for proper round trip
1303                # parsing:
1304                # see https://github.com/bioperl/bioperl-live/issues/321
1305
1306                # If this breaks *NCBI* examples please file an issue with
1307                # example files
1308                 elsif (   not $FTQUAL_NO_QUOTE{$tag} ) {
1309                     my ($pat) = ( $value =~ /\s/ ? '\s|$' :
1310                                   '.|$' );
1311                     $self->_write_line_GenBank_regex
1312                         ( " " x 21, " " x 21,
1313                           "/$tag=\"$value\"", $pat, 80 );
1314                 }
1315                 else {
1316                     my ($pat) = ( $value =~ /[,]/ ? '[,]|$' :
1317                                   '.|$' );
1318                     $self->_write_line_GenBank_regex
1319                         ( " " x 21, " " x 21,
1320                           "/$tag=$value", $pat, 80 );
1321                 }
1322             }
1323         }
1324     }
1325 }
1326
1327 =head2 _read_GenBank_References
1328
1329  Title   : _read_GenBank_References
1330  Usage   :
1331  Function: Reads references from GenBank format. Internal function really
1332  Returns :
1333  Args    :
1334
1335 =cut
1336
1337 sub _read_GenBank_References {
1338     my ($self, $buffer) = @_;
1339     my (@refs);
1340     my $ref;
1341
1342     # assume things are starting with RN
1343     if ( $$buffer !~ /^REFERENCE/ ) {
1344         warn("Not parsing line '$$buffer' which maybe important");
1345     }
1346
1347     my $line = $$buffer;
1348
1349     my (@title,@loc,@authors,@consort,@com,@medline,@pubmed);
1350
1351   REFLOOP:
1352     while( defined($line) or defined($line = $self->_readline) ) {
1353         if ($line =~ /^\s{2}AUTHORS\s+(.*)/o) {
1354             push @authors, $1;
1355             while ( defined($line = $self->_readline) ) {
1356                 if ($line =~ /^\s{9,}(.*)/o) {
1357                     push @authors, $1;
1358                     next;
1359                 }
1360                 last;
1361             }
1362             $ref->authors(join(' ', @authors));
1363         }
1364
1365         if ($line =~ /^\s{2}CONSRTM\s+(.*)/o) {
1366             push @consort, $1;
1367             while ( defined($line = $self->_readline) ) {
1368                 if ($line =~ /^\s{9,}(.*)/o) {
1369                     push @consort, $1;
1370                     next;
1371                 }
1372                 last;
1373             }
1374             $ref->consortium(join(' ', @consort));
1375         }
1376
1377         if ($line =~ /^\s{2}TITLE\s+(.*)/o) {
1378             push @title, $1;
1379             while ( defined($line = $self->_readline) ) {
1380                 if ($line =~ /^\s{9,}(.*)/o) {
1381                     push @title, $1;
1382                     next;
1383                 }
1384                 last;
1385             }
1386             $ref->title(join(' ', @title));
1387         }
1388
1389         if ($line =~ /^\s{2}JOURNAL\s+(.*)/o) {
1390             push @loc, $1;
1391             while ( defined($line = $self->_readline) ) {
1392                 # we only match when there are at least 4 spaces
1393                 # there is probably a better way to match this
1394                 # as it assumes that the describing tag is short enough
1395                 if ($line =~ /^\s{9,}(.*)/o) {
1396                     push @loc, $1;
1397                     next;
1398                 }
1399                 last;
1400             }
1401             $ref->location(join(' ', @loc));
1402             redo REFLOOP;
1403         }
1404
1405         if ($line =~ /^\s{2}REMARK\s+(.*)/o) {
1406             push @com, $1;
1407             while ( defined($line = $self->_readline) ) {
1408                 if ($line =~ /^\s{9,}(.*)/o) {
1409                     push @com, $1;
1410                     next;
1411                 }
1412                 last;
1413             }
1414             $ref->comment(join(' ', @com));
1415             redo REFLOOP;
1416         }
1417
1418         if ( $line =~ /^\s{2}MEDLINE\s+(.*)/ ) {
1419             push @medline, $1;
1420             while ( defined($line = $self->_readline) ) {
1421                 if ($line =~ /^\s{9,}(.*)/) {
1422                     push @medline, $1;
1423                     next;
1424                 }
1425                 last;
1426             }
1427             $ref->medline(join(' ', @medline));
1428             redo REFLOOP;
1429         }
1430
1431         if ( $line =~ /^\s{3}PUBMED\s+(.*)/ ) {
1432             push @pubmed, $1;
1433             while ( defined($line = $self->_readline) ) {
1434                 if ($line =~ /^\s{9,}(.*)/) {
1435                     push @pubmed, $1;
1436                     next;
1437                 }
1438                 last;
1439             }
1440             $ref->pubmed(join(' ', @pubmed));
1441             redo REFLOOP;
1442         }
1443
1444         if ( $line =~ /^REFERENCE/o ) {
1445             # store current reference
1446             $self->_add_ref_to_array(\@refs,$ref) if defined $ref;
1447
1448             # reset
1449             @authors = ();
1450             @title   = ();
1451             @loc     = ();
1452             @com     = ();
1453             @pubmed  = ();
1454             @medline = ();
1455
1456             # create the new reference object
1457             $ref = Bio::Annotation::Reference->new(-tagname => 'reference');
1458
1459             # check whether start and end base is given
1460             if ($line =~ /^REFERENCE\s+\d+\s+\([a-z]+ (\d+) to (\d+)\)/){
1461                 $ref->start($1);
1462                 $ref->end($2);
1463             }
1464             elsif ($line =~ /^REFERENCE\s+\d+\s+\((.*)\)/) {
1465                 $ref->gb_reference($1);
1466             }
1467         }
1468
1469         last if ($line =~ /^(FEATURES)|(COMMENT)/o);
1470
1471         $line = undef; # Empty $line to trigger read of next line
1472     }
1473
1474     # store last reference
1475     $self->_add_ref_to_array(\@refs, $ref) if defined $ref;
1476
1477     $$buffer = $line;
1478
1479     #print "\nnumber of references found: ", $#refs+1,"\n";
1480
1481     return @refs;
1482 }
1483
1484 =head2 _add_ref_to_array
1485
1486 Title: _add_ref_to_array
1487 Usage:
1488 Function: Adds a Reference object to an array of Reference objects, takes
1489           care of possible cleanups to be done (currently, only author and title
1490           will be chopped of trailing semicolons).
1491 Args:     A reference to an array of Reference objects and
1492           the Reference object to be added
1493 Returns: nothing
1494
1495 =cut
1496
1497 sub _add_ref_to_array {
1498     my ($self, $refs, $ref) = @_;
1499
1500     # first, polish author and title by removing possible trailing semicolons
1501     my $au    = $ref->authors;
1502     my $title = $ref->title;
1503     $au    =~ s/;\s*$//g if $au;
1504     $title =~ s/;\s*$//g if $title;
1505     $ref->authors($au);
1506     $ref->title($title);
1507     # the rest should be clean already, so go ahead and add it
1508     push @{$refs}, $ref;
1509 }
1510
1511 =head2 _read_GenBank_Species
1512
1513  Title   : _read_GenBank_Species
1514  Usage   :
1515  Function: Reads the GenBank Organism species and classification
1516            lines. Able to deal with unconvential Organism naming
1517            formats, and varietas in plants
1518  Example : ORGANISM  unknown marine gamma proteobacterium NOR5
1519            $genus = undef
1520            $species = unknown marine gamma proteobacterium NOR5
1521
1522            ORGANISM  Drosophila sp. 'white tip scutellum'
1523            $genus = Drosophila
1524            $species = sp. 'white tip scutellum'
1525            (yes, this really is a species and that is its name)
1526            $subspecies = undef
1527
1528            ORGANISM  Ajellomyces capsulatus var. farciminosus
1529            $genus = Ajellomyces
1530            $species = capsulatus
1531            $subspecies = var. farciminosus
1532
1533            ORGANISM  Hepatitis delta virus
1534            $genus = undef (though this virus has a genus in its lineage, we
1535                            cannot know that without a database lookup)
1536            $species = Hepatitis delta virus
1537
1538  Returns : A Bio::Species object
1539  Args    : A reference to the current line buffer
1540
1541 =cut
1542
1543 sub _read_GenBank_Species {
1544     my ($self, $buffer) = @_;
1545
1546     my @unkn_names = ('other', 'unknown organism', 'not specified', 'not shown',
1547                       'Unspecified', 'Unknown', 'None', 'unclassified',
1548                       'unidentified organism', 'not supplied');
1549     # dictionary of synonyms for taxid 32644
1550     my @unkn_genus = ('unknown', 'unclassified', 'uncultured', 'unidentified');
1551     # all above can be part of valid species name
1552
1553     my $line = $$buffer;
1554
1555     my( $sub_species, $species, $genus, $sci_name, $common,
1556         $class_lines, $source_flag, $abbr_name, $organelle, $sl );
1557     my %source = map { $_ => 1 } qw(SOURCE ORGANISM CLASSIFICATION);
1558
1559     # upon first entering the loop, we must not read a new line -- the SOURCE
1560     # line is already in the buffer (HL 05/10/2000)
1561     my ($ann, $tag, $data);
1562     while (defined($line) or defined($line = $self->_readline)) {
1563         # de-HTMLify (links that may be encountered here don't contain
1564         # escaped '>', so a simple-minded approach suffices)
1565         $line =~ s{<[^>]+>}{}g;
1566         if ($line =~ m{^(?:\s{0,2})(\w+)\s+(.+)?$}ox) {
1567             ($tag, $data) = ($1, $2 || '');
1568             last if ($tag and not exists $source{$tag});
1569         }
1570         else {
1571             return unless $tag;
1572             ($data = $line) =~ s{^\s+}{};
1573             chomp $data;
1574             $tag = 'CLASSIFICATION' if (    $tag ne 'CLASSIFICATION'
1575                                         and $tag eq 'ORGANISM'
1576                                         # Don't match "str." or "var." (fix NC_021815),
1577                                         # and don't match ".1" (fix NC_021902)
1578                                         and $line =~ m{(?<!\bstr|\bvar)[;\.]+(?!\d)});
1579         }
1580         (exists $ann->{$tag}) ? ($ann->{$tag} .= ' '.$data) : ($ann->{$tag} .= $data);
1581         $line = undef;
1582     }
1583
1584     ($sl, $class_lines, $sci_name) = ($ann->{SOURCE}, $ann->{CLASSIFICATION}, $ann->{ORGANISM});
1585
1586     $$buffer = $line;
1587
1588     $sci_name or return;
1589
1590     # parse out organelle, common name, abbreviated name if present;
1591     # this should catch everything, but falls back to
1592     # entire SOURCE line just in case
1593     if ($sl =~ m{^(mitochondrion|chloroplast|plastid)?
1594                   \s*(.*?)
1595                   \s*(?: \( (.*?) \) )?\.?
1596                   $
1597                  }xms
1598         ) {
1599         ($organelle, $abbr_name, $common) = ($1, $2, $3); # optional
1600     }
1601     else {
1602         $abbr_name = $sl; # nothing caught; this is a backup!
1603     }
1604
1605     # Convert data in classification lines into classification array.
1606     # only split on ';' or '.' so that classification that is 2 or more words will
1607     # still get matched, use map() to remove trailing/leading/intervening spaces
1608     my @class = map { $_ =~ s/^\s+//;
1609                       $_ =~ s/\s+$//;
1610                       $_ =~ s/\s{2,}/ /g;
1611                       $_; }
1612                 split /(?<!subgen)[;\.]+/, $class_lines;
1613
1614     # do we have a genus?
1615     my $possible_genus =  quotemeta($class[-1])
1616                        . ($class[-2] ? "|" . quotemeta($class[-2]) : '');
1617     if ($sci_name =~ /^($possible_genus)/) {
1618         $genus = $1;
1619         ($species) = $sci_name =~ /^$genus\s+(.+)/;
1620     }
1621     else {
1622         $species = $sci_name;
1623     }
1624
1625     # is this organism of rank species or is it lower?
1626     # (we don't catch everything lower than species, but it doesn't matter -
1627     # this is just so we abide by previous behaviour whilst not calling a
1628     # species a subspecies)
1629     if ($species and $species =~ /(.+)\s+((?:subsp\.|var\.).+)/) {
1630         ($species, $sub_species) = ($1, $2);
1631     }
1632
1633     # Don't make a species object if it's empty or "Unknown" or "None"
1634     # return unless $genus and  $genus !~ /^(Unknown|None)$/oi;
1635     # Don't make a species object if it belongs to taxid 32644
1636 #    my $unkn = grep { $_ =~ /^\Q$sl\E$/; } @unkn_names;
1637     my $unkn = grep { $_ eq $sl } @unkn_names;
1638     return unless (defined $species or defined $genus) and $unkn == 0;
1639
1640     # Bio::Species array needs array in Species -> Kingdom direction
1641     push @class, $sci_name;
1642     @class = reverse @class;
1643
1644     my $make = Bio::Species->new;
1645     $make->scientific_name($sci_name);
1646     $make->classification(@class)          if @class > 0;
1647     $make->common_name( $common )          if $common;
1648     $make->name('abbreviated', $abbr_name) if $abbr_name;
1649     $make->organelle($organelle)           if $organelle;
1650     #$make->sub_species( $sub_species )     if $sub_species;
1651     return $make;
1652 }
1653
1654 =head2 _read_FTHelper_GenBank
1655
1656  Title   : _read_FTHelper_GenBank
1657  Usage   : _read_FTHelper_GenBank($buffer)
1658  Function: reads the next FT key line
1659  Example :
1660  Returns : Bio::SeqIO::FTHelper object
1661  Args    : filehandle and reference to a scalar
1662
1663 =cut
1664
1665 sub _read_FTHelper_GenBank {
1666     my ($self, $buffer) = @_;
1667
1668     my ($key, # The key of the feature
1669         $loc  # The location line from the feature
1670     );
1671     my @qual = (); # An array of lines making up the qualifiers
1672
1673     if ($$buffer =~ /^\s{5}(\S+)\s+(.+?)\s*$/o) {
1674         $key = $1;
1675         $loc = $2;
1676         # Read all the lines up to the next feature
1677         while ( defined(my $line = $self->_readline) ) {
1678             if ($line =~ /^(\s+)(.+?)\s*$/o) {
1679                 # Lines inside features are preceded by 21 spaces
1680                 # A new feature is preceded by 5 spaces
1681                 if (length($1) > 6) {
1682                     # Add to qualifiers if we're in the qualifiers, or if it's
1683                     # the first qualifier
1684                     if (@qual or (index($2,'/') == 0)) {
1685                         push @qual, $2;
1686                     }
1687                     # We're still in the location line, so append to location
1688                     else {
1689                         $loc .= $2;
1690                     }
1691                 }
1692                 else {
1693                     # We've reached the start of the next feature
1694                     # Put the first line of the next feature into the buffer
1695                     $$buffer = $line;
1696                     last;
1697                 }
1698             }
1699             else {
1700                 # We're at the end of the feature table
1701                 # Put the first line of the next feature into the buffer
1702                 $$buffer = $line;
1703                 last;
1704             }
1705         }
1706     }
1707     else {
1708         # No feature key
1709         $self->debug("no feature key!\n");
1710         # change suggested by JDiggans to avoid infinite loop-
1711         # see bugreport 1062.
1712         # reset buffer to prevent infinite loop
1713         $$buffer = $self->_readline;
1714         return;
1715     }
1716
1717     # Make the new FTHelper object
1718     my $out = Bio::SeqIO::FTHelper->new;
1719     $out->verbose($self->verbose);
1720     $out->key($key);
1721     $out->loc($loc);
1722
1723     # Now parse and add any qualifiers.  (@qual is kept
1724     # intact to provide informative error messages.)
1725   QUAL:
1726     for (my $i = 0; $i < @qual; $i++) {
1727         my $data = $qual[$i];
1728         my ( $qualifier, $value ) = ($data =~ m{^/([^=]+)(?:=\s*(.+))?})
1729             or $self->warn(  "cannot see new qualifier in feature $key: "
1730                            . $data);
1731         $qualifier = '' if not defined $qualifier;
1732
1733         if (defined $value) {
1734             # Do we have a quoted value?
1735             if (substr($value, 0, 1) eq '"') {
1736                 # Keep adding to value until we find the trailing quote
1737                 # and the quotes are balanced
1738                 while ($value !~ /\"$/ or $value =~ tr/"/"/ % 2) {
1739                     if ($i >= $#qual) {
1740                         $self->warn(  "Unbalanced quote in:\n"
1741                                     . join("\n", @qual)
1742                                     . "No further qualifiers will "
1743                                     . "be added for this feature");
1744                         last QUAL;
1745                     }
1746                     # modifying a for-loop variable inside of the loop
1747                     # is not the best programming style ...
1748                     $i++;
1749                     my $next = $qual[$i];
1750
1751                     # add to value with a space unless the value appears
1752                     # to be a sequence (translation for example)
1753                     # if (($value.$next) =~ /[^A-Za-z\"\-]/o) {
1754                     # changed to explicitly look for translation tag - cjf 06/8/29
1755                     if ($qualifier !~ /^translation$/i ) {
1756                         $value .= " ";
1757                     }
1758                     $value .= $next;
1759                 }
1760                 # Trim leading and trailing quotes
1761                 $value =~ s/^"|"$//g;
1762                 # Undouble internal quotes
1763                 $value =~ s/""/\"/g;
1764             }
1765             elsif ( $value =~ /^\(/ ) { # values quoted by ()s
1766                 # Keep adding to value until we find the trailing bracket
1767                 # and the ()s are balanced
1768                 my $left  = ($value =~ tr/\(/\(/); # count left parens
1769                 my $right = ($value =~ tr/\)/\)/); # count right parens
1770
1771                 while( $left != $right ) { # was "$value !~ /\)$/ or $left != $right"
1772                     if ( $i >= $#qual) {
1773                         $self->warn(  "Unbalanced parens in:\n"
1774                                     . join("\n", @qual)
1775                                     . "\nNo further qualifiers will "
1776                                     . "be added for this feature");
1777                         last QUAL;
1778                     }
1779                     $i++;
1780                     my $next = $qual[$i];
1781                     $value .=  $next;
1782                     $left  += ($next =~ tr/\(/\(/);
1783                     $right += ($next =~ tr/\)/\)/);
1784                 }
1785             }
1786         }
1787         else {
1788             $value = '_no_value';
1789         }
1790         # Store the qualifier
1791         $out->field->{$qualifier} ||= [];
1792         push @{$out->field->{$qualifier}}, $value;
1793     }
1794     return $out;
1795 }
1796
1797 =head2 _write_line_GenBank
1798
1799  Title   : _write_line_GenBank
1800  Usage   :
1801  Function: internal function
1802  Example :
1803  Returns :
1804  Args    :
1805
1806 =cut
1807
1808 sub _write_line_GenBank {
1809     my ($self, $pre1, $pre2, $line, $length) = @_;
1810
1811     $length or $self->throw("Miscalled write_line_GenBank without length. Programming error!");
1812     my $subl  = $length - length $pre2;
1813     my $linel = length $line;
1814     my $i;
1815
1816     my $subr = substr($line,0,$length - length $pre1);
1817
1818     $self->_print("$pre1$subr\n");
1819     for($i = ($length - length $pre1); $i < $linel; $i += $subl) {
1820         $subr = substr($line, $i, $subl);
1821         $self->_print("$pre2$subr\n");
1822     }
1823 }
1824
1825 =head2 _write_line_GenBank_regex
1826
1827  Title   : _write_line_GenBank_regex
1828  Usage   :
1829  Function: internal function for writing lines of specified
1830            length, with different first and the next line
1831            left hand headers and split at specific points in the
1832            text
1833  Example :
1834  Returns : nothing
1835  Args    : file handle,
1836            first header,
1837            second header,
1838            text-line,
1839            regex for line breaks,
1840            total line length
1841
1842 =cut
1843
1844 sub _write_line_GenBank_regex {
1845     my ($self, $pre1, $pre2, $line, $regex, $length) = @_;
1846
1847     # print STDOUT "Going to print with $line!\n";
1848
1849     $length or $self->throw("Miscalled write_line_GenBank without length. Programming error!");
1850
1851     my $subl  = $length - (length $pre1) - 2;
1852     my @lines = ();
1853
1854     CHUNK:
1855     while ($line) {
1856         foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) {
1857             if ($line =~ m/^(.{0,$subl})($pat)(.*)/ ) {
1858                 my $l = $1 . $2;
1859                 $line = substr($line, length $l);
1860                 # be strict about not padding spaces according to
1861                 # genbank format
1862                 $l =~ s/\s+$//;
1863                 next CHUNK if ($l eq '');
1864                 push @lines, $l;
1865                 next CHUNK;
1866             }
1867         }
1868         # if we get here none of the patterns matched $subl or less chars
1869         $self->warn(  "trouble dissecting \"$line\"\n     into chunks "
1870                     . "of $subl chars or less - this tag won't print right");
1871         # insert a space char to prevent infinite loops
1872         $line = substr($line, 0, $subl) . " " . substr($line, $subl);
1873     }
1874     my $s = shift @lines;
1875     $self->_print("$pre1$s\n") if $s;
1876     foreach my $s ( @lines ) {
1877         $self->_print("$pre2$s\n");
1878     }
1879 }
1880
1881 =head2 _post_sort
1882
1883  Title   : _post_sort
1884  Usage   : $obj->_post_sort($newval)
1885  Function:
1886  Returns : value of _post_sort
1887  Args    : newvalue (optional)
1888
1889 =cut
1890
1891 sub _post_sort {
1892     my ($obj,$value) = @_;
1893     if ( defined $value) {
1894         $obj->{'_post_sort'} = $value;
1895     }
1896     return $obj->{'_post_sort'};
1897 }
1898
1899 =head2 _show_dna
1900
1901  Title   : _show_dna
1902  Usage   : $obj->_show_dna($newval)
1903  Function:
1904  Returns : value of _show_dna
1905  Args    : newvalue (optional)
1906
1907 =cut
1908
1909 sub _show_dna {
1910     my ($obj,$value) = @_;
1911     if ( defined $value) {
1912         $obj->{'_show_dna'} = $value;
1913     }
1914     return $obj->{'_show_dna'};
1915 }
1916
1917 =head2 _id_generation_func
1918
1919  Title   : _id_generation_func
1920  Usage   : $obj->_id_generation_func($newval)
1921  Function:
1922  Returns : value of _id_generation_func
1923  Args    : newvalue (optional)
1924
1925 =cut
1926
1927 sub _id_generation_func {
1928     my ($obj,$value) = @_;
1929     if ( defined $value ) {
1930         $obj->{'_id_generation_func'} = $value;
1931     }
1932     return $obj->{'_id_generation_func'};
1933 }
1934
1935 =head2 _ac_generation_func
1936
1937  Title   : _ac_generation_func
1938  Usage   : $obj->_ac_generation_func($newval)
1939  Function:
1940  Returns : value of _ac_generation_func
1941  Args    : newvalue (optional)
1942
1943 =cut
1944
1945 sub _ac_generation_func {
1946     my ($obj,$value) = @_;
1947     if ( defined $value ) {
1948         $obj->{'_ac_generation_func'} = $value;
1949     }
1950     return $obj->{'_ac_generation_func'};
1951 }
1952
1953 =head2 _sv_generation_func
1954
1955  Title   : _sv_generation_func
1956  Usage   : $obj->_sv_generation_func($newval)
1957  Function:
1958  Returns : value of _sv_generation_func
1959  Args    : newvalue (optional)
1960
1961 =cut
1962
1963 sub _sv_generation_func {
1964     my ($obj,$value) = @_;
1965     if ( defined $value ) {
1966         $obj->{'_sv_generation_func'} = $value;
1967     }
1968     return $obj->{'_sv_generation_func'};
1969 }
1970
1971 =head2 _kw_generation_func
1972
1973  Title   : _kw_generation_func
1974  Usage   : $obj->_kw_generation_func($newval)
1975  Function:
1976  Returns : value of _kw_generation_func
1977  Args    : newvalue (optional)
1978
1979 =cut
1980
1981 sub _kw_generation_func {
1982     my ($obj,$value) = @_;
1983     if ( defined $value ) {
1984         $obj->{'_kw_generation_func'} = $value;
1985     }
1986     return $obj->{'_kw_generation_func'};
1987 }
1988
1989 1;