maint: fix multiple typos identified by lintian
[bioperl-live.git] / Bio / SeqIO / bsml.pm
blob6e48cbcada8e4ac4aaa4d929282b7abd1fe95729
2 # BioPerl module for Bio::SeqIO::bsml
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Charles Tilford (tilfordc@bms.com)
7 # Copyright (C) Charles Tilford 2001
9 # This library is free software; you can redistribute it and/or
10 # modify it under the terms of the GNU Lesser General Public
11 # License as published by the Free Software Foundation; either
12 # version 2.1 of the License, or (at your option) any later version.
14 # This library is distributed in the hope that it will be useful,
15 # but WITHOUT ANY WARRANTY; without even the implied warranty of
16 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
17 # Lesser General Public License for more details.
19 # You should have received a copy of the GNU Lesser General Public
20 # License along with this library; if not, write to the Free Software
21 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
22 # Also at: http://www.gnu.org/copyleft/lesser.html
24 # Much of the basic documentation in this module has been
25 # cut-and-pasted from the embl.pm (Ewan Birney) SeqIO module.
27 =head1 NAME
29 Bio::SeqIO::bsml - BSML sequence input/output stream
31 =head1 SYNOPSIS
33 It is probably best not to use this object directly, but rather go
34 through the SeqIO handler system. To read a BSML file:
36 $stream = Bio::SeqIO->new( -file => $filename, -format => 'bsml');
38 while ( my $bioSeqObj = $stream->next_seq() ) {
39 # do something with $bioSeqObj
42 To write a Seq object to the current file handle in BSML XML format:
44 $stream->write_seq( -seq => $seqObj);
46 If instead you would like a XML::DOM object containing the BSML, use:
48 my $newXmlObject = $stream->to_bsml( -seq => $seqObj);
50 =head1 DEPENDENCIES
52 In addition to parts of the Bio:: hierarchy, this module uses:
54 XML::DOM
56 =head1 DESCRIPTION
58 This object can transform Bio::Seq objects to and from BSML (XML)
59 flatfiles.
61 =head2 NOTE:
63 2/1/02 - I have changed the API to more closely match argument
64 passing used by other BioPerl methods ( -tag => value ). Internal
65 methods are using the same API, but you should not be calling those
66 anyway...
68 =head1 FEEDBACK
70 =head2 Mailing Lists
72 User feedback is an integral part of the evolution of this and other
73 Bioperl modules. Send your comments and suggestions preferably to one
74 of the Bioperl mailing lists. Your participation is much appreciated.
76 bioperl-l@bioperl.org - General discussion
77 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
79 =head2 Support
81 Please direct usage questions or support issues to the mailing list:
83 I<bioperl-l@bioperl.org>
85 rather than to the module maintainer directly. Many experienced and
86 reponsive experts will be able look at the problem and quickly
87 address it. Please include a thorough description of the problem
88 with code and data examples if at all possible.
90 =head2 Reporting Bugs
92 Report bugs to the Bioperl bug tracking system to help us keep track
93 the bugs and their resolution.
94 Bug reports can be submitted via the web:
96 https://github.com/bioperl/bioperl-live/issues
98 =head2 Things Still to Do
100 * The module now uses the new Collection.pm system. However,
101 Annotations associated with a Feature object still seem to use the
102 old system, so parsing with the old methods are included..
104 * Generate Seq objects with no sequence data but an assigned
105 length. This appears to be an issue with Bio::Seq. It is possible
106 (and reasonable) to make a BSML document with features but no
107 sequence data.
109 * Support <Seq-data-import>. Do not know how commonly this is used.
111 * Some features are awaiting implementation in later versions of
112 BSML. These include:
114 * Nested feature support
116 * Complex feature (ie joins)
118 * Unambiguity in strand (ie -1,0,1, not just 'complement' )
120 * More friendly dblink structures
122 * Location.pm (or RangeI::union?) appears to have a bug when 'expand'
123 is used.
125 * More intelligent hunting for sequence and feature titles? It is not
126 terribly clear where the most appropriate field is located, better
127 grepping (eg looking for a reasonable count for spaces and numbers)
128 may allow for titles better than "AE008041".
130 =head1 AUTHOR - Charles Tilford
132 Bristol-Myers Squibb Bioinformatics
134 Email tilfordc@bms.com
136 I have developed the BSML specific code for this package, but have used
137 code from other SeqIO packages for much of the nuts-and-bolts. In particular
138 I have used code from the embl.pm module either directly or as a framework
139 for many of the subroutines that are common to SeqIO modules.
141 =cut
143 package Bio::SeqIO::bsml;
144 use strict;
146 use Bio::SeqFeature::Generic;
147 use Bio::Species;
148 use XML::DOM;
149 use Bio::Seq::SeqFactory;
150 use Bio::Annotation::Collection;
151 use Bio::Annotation::Comment;
152 use Bio::Annotation::Reference;
153 use Bio::Annotation::DBLink;
155 use base qw(Bio::SeqIO);
157 my $idcounter = {}; # Used to generate unique id values
158 my $nvtoken = ": "; # The token used if a name/value pair has to be stuffed
159 # into a single line
161 =head1 METHODS
163 =cut
165 # LS: this seems to get overwritten on line 1317, generating a redefinition error.
166 # Dead code?
167 # CAT: This was inappropriately added in revision 1.10 - I added the check for
168 # existence of a sequence factory to the actual _initialize
169 # sub _initialize {
170 # my($self,@args) = @_;
171 # $self->SUPER::_initialize(@args);
172 # if( ! defined $self->sequence_factory ) {
173 # $self->sequence_factory(Bio::Seq::SeqFactory->new(-verbose => $self->verbose(),
174 # -type => 'Bio::Seq::RichSeq'));
178 =head2 next_seq
180 Title : next_seq
181 Usage : my $bioSeqObj = $stream->next_seq
182 Function: Retrieves the next sequence from a SeqIO::bsml stream.
183 Returns : A reference to a Bio::Seq::RichSeq object
184 Args :
186 =cut
188 sub next_seq {
189 my $self = shift;
190 my ($desc);
191 my $bioSeq = $self->sequence_factory->create(-verbose =>$self->verbose());
193 unless (exists $self->{'domtree'}) {
194 $self->throw("A BSML document has not yet been parsed.");
195 return;
197 my $dom = $self->{'domtree'};
198 my $seqElements = $dom->getElementsByTagName ("Sequence");
199 if ($self->{'current_node'} == $seqElements->getLength ) {
200 # There are no more <Sequence>s to process
201 return;
203 my $xmlSeq = $seqElements->item($self->{'current_node'});
205 # Assume that title attribute contains the best display id
206 if (my $val = $xmlSeq->getAttribute( "title")) {
207 $bioSeq->display_id($val);
210 # Set the molecule type
211 if (my $val = $xmlSeq->getAttribute( "molecule" )) {
212 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'aa' => 'protein');
213 $bioSeq->molecule($mol{ lc($val) });
216 # Set the accession number
217 if (my $val = $xmlSeq->getAttribute( "ic-acckey" )) {
218 $bioSeq->accession_number($val);
221 # Get the sequence data for the element
222 if (my $seqData = &FIRSTDATA($xmlSeq->getElementsByTagName("Seq-data")
223 ->item(0) ) ) {
224 # Sequence data exists, transfer to the Seq object
225 # Remove white space and CRs (not neccesary?)
226 $seqData =~ s/[\s\n\r]//g;
227 $bioSeq->seq($seqData);
228 } elsif (my $import = $xmlSeq->getElementsByTagName("Seq-dataimport")
229 ->item(0) ) {
230 #>>>> # What about <Seq-data-import> ??
232 } elsif (my $val = $xmlSeq->getAttribute("length")) {
233 # No sequence defined, set the length directly
235 #>>>> # This does not appear to work - length is apparently calculated
236 # from the sequence. How to make a "virtual" sequence??? Such
237 # creatures are common in BSML...
238 $bioSeq->length($val);
241 my $species = Bio::Species->new();
242 my @classification = ();
244 # Peruse the generic <Attributes> - those that are direct children of
245 # the <Sequence> or the <Feature-tables> element
246 # Sticky wicket here - data not controlled by schema, could be anything
247 my @seqDesc = ();
248 my %specs = ('common_name' => 'y',
249 'genus' => 'y',
250 'species' => 'y',
251 'sub_species' => 'y',
253 my %seqMap = (
254 'add_date' => [ qw(date date-created date-last-updated)],
255 'keywords' => [ 'keyword', ],
256 'seq_version' => [ 'version' ],
257 'division' => [ 'division' ],
258 'add_secondary_accession' => ['accession'],
259 'pid' => ['pid'],
260 'primary_id' => [ 'primary.id', 'primary_id' ],
262 my @links;
263 my $floppies = &GETFLOPPIES($xmlSeq);
264 for my $attr (@{$floppies}) {
265 # Don't want to get attributes from <Feature> or <Table> elements yet
266 my $parent = $attr->getParentNode->getNodeName;
267 next unless($parent eq "Sequence" || $parent eq "Feature-tables");
269 my ($name, $content) = &FLOPPYVALS($attr);
270 $name = lc($name);
271 if (exists $specs{$name}) { # It looks like part of species...
272 $species->$name($content);
273 next;
275 my $value = "";
276 # Cycle through the Seq methods:
277 for my $method (keys %seqMap) {
278 # Cycle through potential matching attributes:
279 for my $match (@{$seqMap{$method}}) {
280 # If the <Attribute> name matches one of the keys,
281 # set $value, unless it has already been set
282 $value ||= $content if ($name =~ /$match/i);
284 if ($value ne "") {
286 if( $method eq 'seq_version'&& $value =~ /\S+\.(\d+)/ ) {
287 # hack for the fact that data in version is actually
288 # ACCESSION.VERSION
289 ($value) = $1;
291 $bioSeq->$method($value);
292 last;
295 if( $name eq 'database-xref' ) {
296 my ($link_id,$link_db) = split(/:/,$value);
297 push @links, Bio::Annotation::DBLink->new(-primary_id => $link_id,
298 -database => $link_db);
300 next if ($value ne "");
302 if ($name =~ /^species$/i) { # Uh, it's the species designation?
303 if ($content =~ / /) {
304 # Assume that a full species name has been provided
305 # This will screw up if the last word is the subspecies...
306 my @break = split " ", $content;
307 @classification = reverse @break;
308 } else {
309 $classification[0] = $content;
311 next;
313 if ($name =~ /sub[_ ]?species/i) { # Should be the subspecies...
314 $species->sub_species( $content );
315 next;
317 if ($name =~ /classification/i) { # Should be species classification
318 # We will assume that there are spaces separating the terms:
319 my @bits = split " ", $content;
320 # Now make sure there is not other cruft as well (eg semi-colons)
321 for my $i (0..$#bits) {
322 $bits[$i] =~ /(\w+)/;
323 $bits[$i] = $1;
325 $species->classification( @bits );
326 next;
328 if ($name =~ /comment/) {
329 my $com = Bio::Annotation::Comment->new('-text' => $content);
330 # $bioSeq->annotation->add_Comment($com);
331 $bioSeq->annotation->add_Annotation('comment', $com);
332 next;
334 # Description line - collect all descriptions for later assembly
335 if ($name =~ /descr/) {
336 push @seqDesc, $content;
337 next;
339 # Ok, we have no idea what this attribute is. Dump to SimpleValue
340 my $simp = Bio::Annotation::SimpleValue->new( -value => $content);
341 $bioSeq->annotation->add_Annotation($name, $simp);
343 unless ($#seqDesc < 0) {
344 $bioSeq->desc( join "; ", @seqDesc);
347 #>>>> This should be modified so that any IDREF associated with the
348 # <Reference> is then used to associate the reference with the
349 # appropriate Feature
351 # Extract out <Reference>s associated with the sequence
352 my @refs;
353 my %tags = (
354 -title => "RefTitle",
355 -authors => "RefAuthors",
356 -location => "RefJournal",
358 for my $ref ( $xmlSeq->getElementsByTagName ("Reference") ) {
359 my %refVals;
360 for my $tag (keys %tags) {
361 my $rt = &FIRSTDATA($ref->getElementsByTagName($tags{$tag})->item(0));
362 next unless ($rt);
363 $rt =~ s/^[\s\r\n]+//; # Kill leading space
364 $rt =~ s/[\s\r\n]+$//; # Kill trailing space
365 $rt =~ s/[\s\r\n]+/ /; # Collapse internal space runs
366 $refVals{$tag} = $rt;
368 my $reference = Bio::Annotation::Reference->new( %refVals );
370 # Pull out any <Reference> information hidden in <Attributes>
371 my %refMap = (
372 comment => [ 'comment', 'remark' ],
373 medline => [ 'medline', ],
374 pubmed => [ 'pubmed' ],
375 start => [ 'start', 'begin' ],
376 end => [ 'stop', 'end' ],
378 my @refCom = ();
379 my $floppies = &GETFLOPPIES($ref);
380 for my $attr (@{$floppies}) {
381 my ($name, $content) = &FLOPPYVALS($attr);
382 my $value = "";
383 # Cycle through the Seq methods:
384 for my $method (keys %refMap) {
385 # Cycle through potential matching attributes:
386 for my $match (@{$refMap{$method}}) {
387 # If the <Attribute> name matches one of the keys,
388 # set $value, unless it has already been set
389 $value ||= $content if ($name =~ /$match/i);
391 if ($value ne "") {
392 my $str = '$reference->' . $method . "($value)";
393 eval($str);
394 next;
397 next if ($value ne "");
398 # Don't know what the <Attribute> is, dump it to comments:
399 push @refCom, $name . $nvtoken . $content;
401 unless ($#refCom < 0) {
402 # Random stuff was found, tack it to the comment field
403 my $exist = $reference->comment;
404 $exist .= join ", ", @refCom;
405 $reference->comment($exist);
407 push @refs, $reference;
409 $bioSeq->annotation->add_Annotation('reference' => $_) for @refs;
410 my $ann_col = $bioSeq->annotation;
411 # Extract the <Feature>s for this <Sequence>
412 for my $feat ( $xmlSeq->getElementsByTagName("Feature") ) {
413 $bioSeq->add_SeqFeature( $self->_parse_bsml_feature($feat) );
416 $species->classification( @classification );
417 $bioSeq->species( $species );
419 $bioSeq->annotation->add_Annotation('dblink' => $_) for @links;
421 $self->{'current_node'}++;
422 return $bioSeq;
424 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
425 # Get all the <Attribute> and <Qualifier> children for an object, and
426 # return them as an array reference
427 # ('floppy' since these elements have poor/no schema control)
428 sub GETFLOPPIES {
429 my $obj = shift;
431 my @floppies;
432 my $attributes = $obj->getElementsByTagName ("Attribute");
433 for (my $i = 0; $i < $attributes->getLength; $i++) {
434 push @floppies, $attributes->item($i);
436 my $qualifiers = $obj->getElementsByTagName ("Qualifier");
437 for (my $i = 0; $i < $qualifiers->getLength; $i++) {
438 push @floppies, $qualifiers->item($i);
440 return \@floppies;
442 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
443 # Given a DOM <Attribute> or <Qualifier> object, return the [name, value] pair
444 sub FLOPPYVALS {
445 my $obj = shift;
447 my ($name, $value);
448 if ($obj->getNodeName eq "Attribute") {
449 $name = $obj->getAttribute('name');
450 $value = $obj->getAttribute('content');
451 } elsif ($obj->getNodeName eq "Qualifier") {
452 # Wheras <Attribute>s require both 'name' and 'content' attributes,
453 # <Qualifier>s can technically have either blank (and sometimes do)
454 my $n = $obj->getAttribute('value-type');
455 $name = $n if ($n ne "");
456 my $v = $obj->getAttribute('value');
457 $value = $v if ($v ne "");
459 return ($name, $value);
461 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
462 # Returns the value of the first TEXT_NODE encountered below an element
463 # Rational - avoid grabbing a comment rather than the PCDATA. Not foolproof...
464 sub FIRSTDATA {
465 my $element = shift;
466 return unless ($element);
468 my $hopefuls = $element->getChildNodes;
469 my $data;
470 for (my $i = 0; $i < $hopefuls->getLength; $i++) {
471 if ($hopefuls->item($i)->getNodeType ==
472 XML::DOM::Node::TEXT_NODE() ) {
473 $data = $hopefuls->item($i)->getNodeValue;
474 last;
477 return $data;
479 #- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
480 # Just collapses whitespace runs in a string
481 sub STRIP {
482 my $string = shift;
483 $string =~ s/[\s\r\n]+/ /g;
484 return $string;
487 =head2 to_bsml
489 Title : to_bsml
490 Usage : my $domDoc = $obj->to_bsml(@args)
491 Function: Generates an XML structure for one or more Bio::Seq objects.
492 If $seqref is an array ref, the XML tree generated will include
493 all the sequences in the array.
494 Returns : A reference to the XML DOM::Document object generated / modified
495 Args : Argument array in form of -key => val. Recognized keys:
497 -seq A Bio::Seq reference, or an array reference of many of them
499 -xmldoc Specifies an existing XML DOM document to add the sequences
500 to. If included, then only data (no page formatting) will
501 be added. If not, a new XML::DOM::Document will be made,
502 and will be populated with both <Sequence> data, as well as
503 <Page> display elements.
505 -nodisp Do not generate <Display> elements, or any children
506 thereof, even if -xmldoc is not set.
508 -skipfeat If set to 'all', all <Feature>s will be skipped. If it is
509 a hash reference, any <Feature> with a class matching a key
510 in the hash will be skipped - for example, to skip 'source'
511 and 'score' features, use:
513 -skipfeat => { source => 'Y', score => 'Y' }
515 -skiptags As above: if set to 'all', no tags are included, and if a
516 hash reference, those specific tags will be ignored.
518 Skipping some or all tags and features can result in
519 noticeable speed improvements.
521 -nodata If true, then <Seq-data> will not be included. This may be
522 useful if you just want annotations and do not care about
523 the raw ACTG information.
525 -return Default is 'xml', which will return a reference to the BSML
526 XML object. If set to 'seq' will return an array ref of the
527 <Sequence> objects added (rather than the whole XML object)
529 -close Early BSML browsers will crash if an element *could* have
530 children but does not, and is closed as an empty element
531 e.g. <Styles/>. If -close is true, then such tags are given
532 a comment child to explicitly close them e.g. <Styles><!--
533 --></Styles>. This is default true, set to "0" if you do
534 not want this behavior.
536 Examples : my $domObj = $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
537 -skipfeat => { source => 1 },
540 # Or add sequences to an existing BSML document:
541 $stream->to_bsml( -seq => \@fourCoolSequenceObjects,
542 -skipfeat => { source => 1 },
543 -xmldoc => $myBsmlDocumentInProgress, );
545 =cut
547 sub to_bsml {
548 my $self = shift;
549 my $args = $self->_parseparams( -close => 1,
550 -return => 'xml',
551 @_);
552 $args->{NODISP} ||= $args->{NODISPLAY};
553 my $seqref = $args->{SEQ};
554 $seqref = (ref($seqref) eq 'ARRAY') ? $seqref : [ $seqref ];
556 #############################
557 # Basic BSML XML Components #
558 #############################
560 my $xml;
561 my ($bsmlElem, $defsElem, $seqsElem, $dispElem);
562 if ($args->{XMLDOC}) {
563 # The user has provided an existing XML DOM object
564 $xml = $args->{XMLDOC};
565 unless ($xml->isa("XML::DOM::Document")) {
566 $self->throw('SeqIO::bsml.pm error:\n'.
567 'When calling ->to_bsml( { xmldoc => $myDoc }), $myDoc \n' .
568 'should be an XML::DOM::Document object, or an object that\n'.
569 'inherits from that class (like BsmlHelper.pm)');
571 } else {
572 # The user has not provided a new document, make one from scratch
573 $xml = XML::DOM::Document->new();
574 $xml->setXMLDecl( $xml->createXMLDecl("1.0") );
575 my $url = "http://www.labbook.com/dtd/bsml2_2.dtd";
576 my $doc = $xml->createDocumentType("Bsml",$url);
577 $xml->setDoctype($doc);
578 $bsmlElem = $self->_addel( $xml, 'Bsml');
579 $defsElem = $self->_addel( $bsmlElem, 'Definitions');
580 $seqsElem = $self->_addel( $defsElem, 'Sequences');
581 unless ($args->{NODISP}) {
582 $dispElem = $self->_addel( $bsmlElem, 'Display');
583 my $stylElem = $self->_addel( $dispElem, 'Styles');
584 my $style = $self->_addel( $stylElem, 'Style', {
585 type => "text/css" });
586 my $styleText =
587 qq(Interval-widget { display : "1"; }\n) .
588 qq(Feature { display-auto : "1"; });
589 $style->appendChild( $xml->createTextNode($styleText) );
593 # Establish fundamental BSML elements, if they do not already exist
594 $bsmlElem ||= $xml->getElementsByTagName("Bsml")->item(0);
595 $defsElem ||= $xml->getElementsByTagName("Definitions")->item(0);
596 $seqsElem ||= $xml->getElementsByTagName("Sequences")->item(0);
598 ###############
599 # <Sequences> #
600 ###############
602 # Map over Bio::Seq to BSML
603 my %mol = ('dna' => 'DNA', 'rna' => 'RNA', 'protein' => 'AA');
604 my @xmlSequences;
606 for my $bioSeq (@{$seqref}) {
607 my $xmlSeq = $xml->createElement("Sequence");
608 my $FTs = $xml->createElement("Feature-tables");
610 # Array references to hold <Reference> objects:
611 my $seqRefs = []; my $featRefs = [];
612 # Array references to hold <Attribute> values (not objects):
613 my $seqDesc = [];
614 push @{$seqDesc},
616 "comment",
617 "This file generated to BSML 2.2 standards - " .
618 "joins will be collapsed to a single feature enclosing all members of the join"
620 push @{$seqDesc}, ["description" , eval{$bioSeq->desc}];
621 for my $kwd ( eval{$bioSeq->get_keywords} ) {
622 push @{$seqDesc}, ["keyword" , $kwd];
624 push @{$seqDesc}, ["keyword" , eval{$bioSeq->keywords}];
625 push @{$seqDesc}, ["version" , eval{
626 join(".", $bioSeq->accession_number, $bioSeq->seq_version); }];
627 push @{$seqDesc}, ["division" , eval{$bioSeq->division}];
628 push @{$seqDesc}, ["pid" , eval{$bioSeq->pid}];
629 # push @{$seqDesc}, ["bio_object" , ref($bioSeq)];
630 push @{$seqDesc}, ["primary_id" , eval{$bioSeq->primary_id}];
631 for my $dt (eval{$bioSeq->get_dates()} ) {
632 push @{$seqDesc}, ["date" , $dt];
634 for my $ac (eval{$bioSeq->get_secondary_accessions()} ) {
635 push @{$seqDesc}, ["secondary_accession" , $ac];
638 # Determine the accession number and a unique identifier
639 my $acc = $bioSeq->accession_number eq "unknown" ?
640 "" : $bioSeq->accession_number;
641 my $id;
642 my $pi = $bioSeq->primary_id;
643 if ($pi && $pi !~ /Bio::/) {
644 # Not sure I understand what primary_id is... It sometimes
645 # is a string describing a reference to a BioSeq object...
646 $id = "SEQ" . $bioSeq->primary_id;
647 } else {
648 # Nothing useful found, make a new unique ID
649 $id = $acc || ("SEQ-io" . $idcounter->{Sequence}++);
651 # print "$id->",ref($bioSeq->primary_id),"\n";
652 # An id field with spaces is interpreted as an idref - kill the spaces
653 $id =~ s/ /-/g;
654 # Map over <Sequence> attributes
655 my %attr = ( 'title' => $bioSeq->display_id,
656 'length' => $bioSeq->length,
657 'ic-acckey' => $acc,
658 'id' => $id,
659 'representation' => 'raw',
661 $attr{molecule} = $mol{ lc($bioSeq->molecule) } if $bioSeq->can('molecule');
664 for my $a (keys %attr) {
665 $xmlSeq->setAttribute($a, $attr{$a}) if (defined $attr{$a} &&
666 $attr{$a} ne "");
668 # Orphaned Attributes:
669 $xmlSeq->setAttribute('topology', 'circular')
670 if ($bioSeq->is_circular);
671 # <Sequence> strand, locus
673 $self->_add_page($xml, $xmlSeq) if ($dispElem);
674 ################
675 # <Attributes> #
676 ################
678 # Check for Bio::Annotations on the * <Sequence> *.
679 $self->_parse_annotation( -xml => $xml, -obj => $bioSeq,
680 -desc => $seqDesc, -refs => $seqRefs);
682 # Incorporate species data
683 if (ref($bioSeq->species) eq 'Bio::Species') {
684 # Need to peer into Bio::Species ...
685 my @specs = ('common_name', 'genus', 'species', 'sub_species');
686 for my $sp (@specs) {
687 next unless (my $val = $bioSeq->species()->$sp());
688 push @{$seqDesc}, [$sp , $val];
690 push @{$seqDesc}, ['classification',
691 (join " ", $bioSeq->species->classification) ];
692 # Species::binomial will return "genus species sub_species" ...
693 } elsif (my $val = $bioSeq->species) {
694 # Ok, no idea what it is, just dump it in there...
695 push @{$seqDesc}, ["species", $val];
698 # Add the description <Attribute>s for the <Sequence>
699 for my $seqD (@{$seqDesc}) {
700 $self->_addel($xmlSeq, "Attribute", {
701 name => $seqD->[0], content => $seqD->[1]}) if ($seqD->[1]);
704 # If sequence references were added, make a Feature-table for them
705 unless ($#{$seqRefs} < 0) {
706 my $seqFT = $self->_addel($FTs, "Feature-table", {
707 title => "Sequence References", });
708 for my $feat (@{$seqRefs}) {
709 $seqFT->appendChild($feat);
713 # This is the appropriate place to add <Feature-tables>
714 $xmlSeq->appendChild($FTs);
716 #############
717 # <Feature> #
718 #############
720 #>>>> # Perhaps it is better to loop through top_Seqfeatures?...
721 #>>>> # ...however, BSML does not have a hierarchy for Features
723 if (defined $args->{SKIPFEAT} &&
724 $args->{SKIPFEAT} eq 'all') {
725 $args->{SKIPFEAT} = { all => 1};
726 } else { $args->{SKIPFEAT} ||= {} }
727 for my $class (keys %{$args->{SKIPFEAT}}) {
728 $args->{SKIPFEAT}{lc($class)} = $args->{SKIPFEAT}{$class};
730 # Loop through all the features
731 my @features = $bioSeq->all_SeqFeatures();
732 if (@features && !$args->{SKIPFEAT}{all}) {
733 my $ft = $self->_addel($FTs, "Feature-table", {
734 title => "Features", });
735 for my $bioFeat (@features ) {
736 my $featDesc = [];
737 my $class = lc($bioFeat->primary_tag);
738 # The user may have specified to ignore this type of feature
739 next if ($args->{SKIPFEAT}{$class});
740 my $id = "FEAT-io" . $idcounter->{Feature}++;
741 my $xmlFeat = $self->_addel( $ft, 'Feature', {
742 'id' => $id,
743 'class' => $class ,
744 'value-type' => $bioFeat->source_tag });
745 # Check for Bio::Annotations on the * <Feature> *.
746 $self->_parse_annotation( -xml => $xml, -obj => $bioFeat,
747 -desc => $featDesc, -id => $id,
748 -refs =>$featRefs, );
749 # Add the description stuff for the <Feature>
750 for my $de (@{$featDesc}) {
751 $self->_addel($xmlFeat, "Attribute", {
752 name => $de->[0], content => $de->[1]}) if ($de->[1]);
754 $self->_parse_location($xml, $xmlFeat, $bioFeat);
756 # loop through the tags, add them as <Qualifiers>
757 next if (defined $args->{SKIPTAGS} &&
758 $args->{SKIPTAGS} =~ /all/i);
759 # Tags can consume a lot of CPU cycles, and can often be
760 # rather non-informative, so -skiptags can allow one total or
761 # selective omission of tags.
762 for my $tag ($bioFeat->all_tags()) {
763 next if (exists $args->{SKIPTAGS}{$tag});
764 for my $val ($bioFeat->each_tag_value($tag)) {
765 $self->_addel( $xmlFeat, 'Qualifier', {
766 'value-type' => $tag ,
767 'value' => $val });
773 ##############
774 # <Seq-data> #
775 ##############
777 # Add sequence data
778 if ( (my $data = $bioSeq->seq) && !$args->{NODATA} ) {
779 my $d = $self->_addel($xmlSeq, 'Seq-data');
780 $d->appendChild( $xml->createTextNode($data) );
783 # If references were added, make a Feature-table for them
784 unless ($#{$featRefs} < 0) {
785 my $seqFT = $self->_addel($FTs, "Feature-table", {
786 title => "Feature References", });
787 for my $feat (@{$featRefs}) {
788 $seqFT->appendChild($feat);
792 # Place the completed <Sequence> tree as a child of <Sequences>
793 $seqsElem->appendChild($xmlSeq);
794 push @xmlSequences, $xmlSeq;
797 # Prevent browser crashes by explicitly closing empty elements:
798 if ($args->{CLOSE}) {
799 my @problemChild = ('Sequences', 'Sequence', 'Feature-tables',
800 'Feature-table', 'Screen', 'View',);
801 for my $kid (@problemChild) {
802 for my $prob ($xml->getElementsByTagName($kid)) {
803 unless ($prob->hasChildNodes) {
804 $prob->appendChild(
805 $xml->createComment(" Must close <$kid> explicitly "));
811 if (defined $args->{RETURN} &&
812 $args->{RETURN} =~ /seq/i) {
813 return \@xmlSequences;
814 } else {
815 return $xml;
819 =head2 write_seq
821 Title : write_seq
822 Usage : $obj->write_seq(@args)
823 Function: Prints out an XML structure for one or more Bio::Seq objects.
824 If $seqref is an array ref, the XML tree generated will include
825 all the sequences in the array. This method is fairly simple,
826 most of the processing is performed within to_bsml.
827 Returns : A reference to the XML object generated / modified
828 Args : Argument array. Recognized keys:
830 -seq A Bio::Seq reference, or an array reference of many of them
832 Alternatively, the method may be called simply as...
834 $obj->write_seq( $bioseq )
836 ... if only a single argument is passed, it is assumed that
837 it is the sequence object (can also be an array ref of
838 many Seq objects )
840 -printmime If true prints "Content-type: $mimetype\n\n" at top of
841 document, where $mimetype is the value designated by this
842 key. For generic XML use text/xml, for BSML use text/x-bsml
844 -return This option will be suppressed, since the nature of this
845 method is to print out the XML document. If you wish to
846 retrieve the <Sequence> objects generated, use the to_bsml
847 method directly.
849 =cut
851 sub write_seq {
852 my $self = shift;
853 my $args = $self->_parseparams( @_);
854 if ($#_ == 0 ) {
855 # If only a single value is passed, assume it is the seq object
856 unshift @_, "-seq";
858 # Build a BSML XML DOM object based on the sequence(s)
859 my $xml = $self->to_bsml( @_,
860 -return => undef );
861 # Convert to a string
862 my $out = $xml->toString;
863 # Print after putting a return after each element - more readable
864 $out =~ s/>/>\n/g;
865 $self->_print("Content-type: " . $args->{PRINTMIME} . "\n\n")
866 if ($args->{PRINTMIME});
867 $self->_print( $out );
868 # Return the DOM tree in case the user wants to do something with it
870 $self->flush if $self->_flush_on_write && defined $self->_fh;
871 return $xml;
874 =head1 INTERNAL METHODS
875 #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-
877 The following methods are used for internal processing, and should probably
878 not be accessed by the user.
880 =head2 _parse_location
882 Title : _parse_location
883 Usage : $obj->_parse_location($xmlDocument, $parentElem, $SeqFeatureObj)
884 Function: Adds <Interval-loc> and <Site-loc> children to <$parentElem> based
885 on locations / sublocations found in $SeqFeatureObj. If
886 sublocations exist, the original location will be ignored.
887 Returns : An array ref containing the elements added to the parent.
888 These will have already been added to <$parentElem>
889 Args : 0 The DOM::Document being modified
890 1 The DOM::Element parent that you want to add to
891 2 Reference to the Bio::SeqFeature being analyzed
893 =cut
895 ###############################
896 # <Interval-loc> & <Site-loc> #
897 ###############################
899 sub _parse_location {
900 my $self = shift;
901 my ($xml, $xmlFeat, $bioFeat) = @_;
902 my $bioLoc = $bioFeat->location;
903 my @locations;
904 if (ref($bioLoc) =~ /Split/) {
905 @locations = $bioLoc->sub_Location;
906 # BSML 2.2 does not recognize / support joins. For this reason,
907 # we will just use the upper-level location. The line below can
908 # be deleted or commented out if/when BSML 3 supports complex
909 # interval deffinitions:
910 @locations = ($bioLoc);
911 } else {
912 @locations = ($bioLoc);
914 my @added = ();
916 # Add the site or interval positional information:
917 for my $loc (@locations) {
918 my ($start, $end) = ($loc->start, $loc->end);
919 my %locAttr;
920 # Strand information is not well described in BSML
921 $locAttr{complement} = 1 if ($loc->strand == -1);
922 if ($start ne "" && ($start == $end || $end eq "")) {
923 $locAttr{sitepos} = $start;
924 push @added, $self->_addel($xmlFeat,'Site-loc',\%locAttr);
925 } elsif ($start ne "" && $end ne "") {
926 if ($start > $end) {
927 # The feature is on the complementary strand
928 ($start, $end) = ($end, $start);
929 $locAttr{complement} = 1;
931 $locAttr{startpos} = $start;
932 $locAttr{endpos} = $end;
933 push @added, $self->_addel($xmlFeat,'Interval-loc',\%locAttr);
934 } else {
935 warn "Failure to parse SeqFeature location. Start = '$start' & End = '$end'";
938 return \@added;
941 =head2 _parse_bsml_feature
943 Title : _parse_bsml_feature
944 Usage : $obj->_parse_bsml_feature($xmlFeature )
945 Function: Will examine the <Feature> element provided by $xmlFeature and
946 return a generic seq feature.
947 Returns : Bio::SeqFeature::Generic
948 Args : 0 XML::DOM::Element <Feature> being analyzed.
950 =cut
952 sub _parse_bsml_feature {
953 my $self = shift;
954 my ($feat) = @_;
956 my $basegsf = Bio::SeqFeature::Generic->new();
957 # score
958 # frame
959 # source_tag
961 # Use the class as the primary tag value, if it is present
962 if ( my $val = $feat->getAttribute("class") ) {
963 $basegsf->primary_tag($val);
966 # Positional information is in <Interval-loc>s or <Site-loc>s
967 # We need to grab these in order, to try to recreate joins...
968 my @locations = ();
969 for my $kid ($feat->getChildNodes) {
970 my $nodeName = $kid->getNodeName;
971 next unless ($nodeName eq "Interval-loc" ||
972 $nodeName eq "Site-loc");
973 push @locations, $kid;
975 if ($#locations == 0) {
976 # There is only one location specified
977 $self->_parse_bsml_location($locations[0], $basegsf);
978 } elsif ($#locations > 0) {
979 #>>>> # This is not working, I think the error is somewhere downstream
980 # of add_sub_SeqFeature, probably in RangeI::union ?
981 # The sub features are added fine, but the EXPANDed parent feature
982 # location has a messed up start - Bio::SeqFeature::Generic ref
983 # instead of an integer - and an incorrect end - the end of the first
984 # sub feature added, not of the union of all of them.
986 # Also, the SeqIO::genbank.pm output is odd - the sub features appear
987 # to be listed with the *previous* feature, not this one.
989 for my $location (@locations) {
990 my $subgsf = $self->_parse_bsml_location($location);
991 # print "start ", $subgsf->start,"\n";
992 # print "end ", $subgsf->end,"\n";
993 $basegsf->add_sub_SeqFeature($subgsf, 'EXPAND');
995 # print $feat->getAttribute('id'),"\n";
996 # print $basegsf->primary_tag,"\n";
998 } else {
999 # What to do if there are no locations? Nothing needed?
1002 # Look at any <Attribute>s or <Qualifier>s that are present:
1003 my $floppies = &GETFLOPPIES($feat);
1004 for my $attr (@{$floppies}) {
1005 my ($name, $content) = &FLOPPYVALS($attr);
1006 # Don't know what the object is, dump it to a tag:
1007 $basegsf->add_tag_value(lc($name), $content);
1010 # Mostly this helps with debugging, but may be of utility...
1011 # Add a tag holding the BSML id value
1012 if ( (my $val = $feat->getAttribute('id')) &&
1013 !$basegsf->has_tag('bsml-id')) {
1014 # Decided that this got a little sloppy...
1015 # $basegsf->add_tag_value("bsml-id", $val);
1017 return $basegsf;
1020 =head2 _parse_bsml_location
1022 Title : _parse_bsml_location
1023 Usage : $obj->_parse_bsml_feature( $intOrSiteLoc, $gsfObject )
1024 Function: Will examine the <Interval-loc> or <Site-loc> element provided
1025 Returns : Bio::SeqFeature::Generic
1026 Args : 0 XML::DOM::Element <Interval/Site-loc> being analyzed.
1027 1 Optional SeqFeature::Generic to use
1029 =cut
1031 sub _parse_bsml_location {
1032 my $self = shift;
1033 my ($loc, $gsf) = @_;
1035 $gsf ||= Bio::SeqFeature::Generic->new();
1036 my $type = $loc->getNodeName;
1037 my ($start, $end);
1038 if ($type eq 'Interval-loc') {
1039 $start = $loc->getAttribute('startpos');
1040 $end = $loc->getAttribute('endpos');
1041 } elsif ($type eq 'Site-loc') {
1042 $start = $end = $loc->getAttribute('sitepos');
1043 } else {
1044 warn "Unknown location type '$type', could not make GSF\n";
1045 return;
1047 $gsf->start($start);
1048 $gsf->end($end);
1050 # BSML does not have an explicit method to set undefined strand
1051 if (my $s = $loc->getAttribute("complement")) {
1052 if ($s) {
1053 $gsf->strand(-1);
1054 } else {
1055 $gsf->strand(1);
1057 } else {
1058 # We're setting "strand nonspecific" here - bad idea?
1059 # In most cases the user likely meant it to be on the + strand
1060 $gsf->strand(0);
1063 return $gsf;
1066 =head2 _parse_reference
1068 Title : _parse_reference
1069 Usage : $obj->_parse_reference(@args )
1070 Function: Makes a new <Reference> object from a ::Reference, which is
1071 then stored in an array provide by -refs. It will be
1072 appended to the XML tree later.
1073 Returns :
1074 Args : Argument array. Recognized keys:
1076 -xml The DOM::Document being modified
1078 -refobj The Annotation::Reference Object
1080 -refs An array reference to hold the new <Reference> DOM object
1082 -id Optional. If the XML id for the 'calling' element is
1083 provided, it will be placed in any <Reference> refs
1084 attribute.
1086 =cut
1088 sub _parse_reference {
1089 my $self = shift;
1090 my $args = $self->_parseparams( @_);
1091 my ($xml, $ref, $refRef) = ($args->{XML}, $args->{REFOBJ}, $args->{REFS});
1093 ###############
1094 # <Reference> #
1095 ###############
1097 my $xmlRef = $xml->createElement("Reference");
1098 #>> This may not be the right way to make a BSML dbxref...
1099 if (my $link = $ref->medline) {
1100 $xmlRef->setAttribute('dbxref', $link);
1103 # Make attributes for some of the characteristics
1104 my %stuff = ( start => $ref->start,
1105 end => $ref->end,
1106 rp => $ref->rp,
1107 comment => $ref->comment,
1108 pubmed => $ref->pubmed,
1110 for my $s (keys %stuff) {
1111 $self->_addel($xmlRef, "Attribute", {
1112 name => $s, content => $stuff{$s} }) if ($stuff{$s});
1114 $xmlRef->setAttribute('refs', $args->{ID}) if ($args->{ID});
1115 # Add the basic information
1116 # Should probably check for content before creation...
1117 $self->_addel($xmlRef, "RefAuthors")->
1118 appendChild( $xml->createTextNode(&STRIP($ref->authors)) );
1119 $self->_addel($xmlRef, "RefTitle")->
1120 appendChild( $xml->createTextNode(&STRIP($ref->title)) );
1121 $self->_addel($xmlRef, "RefJournal")->
1122 appendChild( $xml->createTextNode(&STRIP($ref->location)) );
1123 # References will be added later in a <Feature-Table>
1124 push @{$refRef}, $xmlRef;
1127 =head2 _parse_annotation
1129 Title : _parse_annotation
1130 Usage : $obj->_parse_annotation(@args )
1131 Function: Will examine any Annotations found in -obj. Data found in
1132 ::Comment and ::DBLink structures, as well as Annotation
1133 description fields are stored in -desc for later
1134 generation of <Attribute>s. <Reference> objects are generated
1135 from ::References, and are stored in -refs - these will
1136 be appended to the XML tree later.
1137 Returns :
1138 Args : Argument array. Recognized keys:
1140 -xml The DOM::Document being modified
1142 -obj Reference to the Bio object being analyzed
1144 -descr An array reference for holding description text items
1146 -refs An array reference to hold <Reference> DOM objects
1148 -id Optional. If the XML id for the 'calling' element is
1149 provided, it will be placed in any <Reference> refs
1150 attribute.
1152 =cut
1154 sub _parse_annotation {
1155 my $self = shift;
1156 my $args = $self->_parseparams( @_);
1157 my ($xml, $obj, $descRef, $refRef) =
1158 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1159 # No good place to put any of this (except for references). Most stuff
1160 # just gets dumped to <Attribute>s
1161 my $ann = $obj->annotation;
1162 return unless ($ann);
1163 # use BMS::Branch; my $debug = BMS::Branch->new( ); warn "$obj :"; $debug->branch($ann);
1164 unless (ref($ann) =~ /Collection/) {
1165 # Old style annotation. It seems that Features still use this
1166 # form of object
1167 $self->_parse_annotation_old(@_);
1168 return;
1171 for my $key ($ann->get_all_annotation_keys()) {
1172 for my $thing ($ann->get_Annotations($key)) {
1173 if ($key eq 'description') {
1174 push @{$descRef}, ["description" , $thing->value];
1175 } elsif ($key eq 'comment') {
1176 push @{$descRef}, ["comment" , $thing->text];
1177 } elsif ($key eq 'dblink') {
1178 # DBLinks get dumped to attributes, too
1179 push @{$descRef}, ["db_xref" , $thing->database . ":"
1180 . $thing->primary_id ];
1181 if (my $com = $thing->comment) {
1182 push @{$descRef}, ["link" , $com->text ];
1185 } elsif ($key eq 'reference') {
1186 $self->_parse_reference( @_, -refobj => $thing );
1187 } elsif (ref($thing) =~ /SimpleValue/) {
1188 push @{$descRef}, [$key , $thing->value];
1189 } else {
1190 # What is this??
1191 push @{$descRef}, ["error", "bsml.pm did not understand ".
1192 "'$key' = '$thing'" ];
1198 =head2 _parse_annotation_old
1200 Title : _parse_annotation_old
1201 Usage : $obj->_parse_annotation_old(@args)
1202 Function: As above, but for the old Annotation system.
1203 Apparently needed because Features are still using the old-style
1204 annotations?
1205 Returns :
1206 Args : Argument array. Recognized keys:
1208 -xml The DOM::Document being modified
1210 -obj Reference to the Bio object being analyzed
1212 -descr An array reference for holding description text items
1214 -refs An array reference to hold <Reference> DOM objects
1216 -id Optional. If the XML id for the 'calling' element is
1217 provided, it will be placed in any <Reference> refs
1218 attribute.
1220 =cut
1222 ###############
1223 # <Reference> #
1224 ###############
1226 sub _parse_annotation_old {
1227 my $self = shift;
1228 my $args = $self->_parseparams( @_);
1229 my ($xml, $obj, $descRef, $refRef) =
1230 ( $args->{XML}, $args->{OBJ}, $args->{DESC}, $args->{REFS} );
1231 # No good place to put any of this (except for references). Most stuff
1232 # just gets dumped to <Attribute>s
1233 if (my $ann = $obj->annotation) {
1234 push @{$descRef}, ["annotation", $ann->description];
1235 for my $com ($ann->each_Comment) {
1236 push @{$descRef}, ["comment" , $com->text];
1239 # Gene names just get dumped to <Attribute name="gene">
1240 for my $gene ($ann->each_gene_name) {
1241 push @{$descRef}, ["gene" , $gene];
1244 # DBLinks get dumped to attributes, too
1245 for my $link ($ann->each_DBLink) {
1246 push @{$descRef}, ["db_xref" ,
1247 $link->database . ":" . $link->primary_id ];
1248 if (my $com = $link->comment) {
1249 push @{$descRef}, ["link" , $com->text ];
1253 # References get produced and temporarily held
1254 for my $ref ($ann->each_Reference) {
1255 $self->_parse_reference( @_, -refobj => $ref );
1260 =head2 _add_page
1262 Title : _add_page
1263 Usage : $obj->_add_page($xmlDocument, $xmlSequenceObject)
1264 Function: Adds a simple <Page> and <View> structure for a <Sequence>
1265 Returns : a reference to the newly created <Page>
1266 Args : 0 The DOM::Document being modified
1267 1 Reference to the <Sequence> object
1269 =cut
1271 sub _add_page {
1272 my $self = shift;
1273 my ($xml, $seq) = @_;
1274 my $disp = $xml->getElementsByTagName("Display")->item(0);
1275 my $page = $self->_addel($disp, "Page");
1276 my ($width, $height) = ( 7.8, 5.5);
1277 my $screen = $self->_addel($page, "Screen", {
1278 width => $width, height => $height, });
1279 # $screen->appendChild($xml->createComment("Must close explicitly"));
1280 my $view = $self->_addel($page, "View", {
1281 seqref => $seq->getAttribute('id'),
1282 title => $seq->getAttribute('title'),
1283 title1 => "{NAME}",
1284 title2 => "{LENGTH} {UNIT}",
1286 $self->_addel($view, "View-line-widget", {
1287 shape => 'horizontal',
1288 hcenter => $width/2 + 0.7,
1289 'linear-length' => $width - 2,
1291 $self->_addel($view, "View-axis-widget");
1292 return $page;
1296 =head2 _addel
1298 Title : _addel
1299 Usage : $obj->_addel($parentElem, 'ChildName',
1300 { anAttr => 'someValue', anotherAttr => 'aValue',})
1301 Function: Add an element with attribute values to a DOM tree
1302 Returns : a reference to the newly added element
1303 Args : 0 The DOM::Element parent that you want to add to
1304 1 The name of the new child element
1305 2 Optional hash reference containing
1306 attribute name => attribute value assignments
1308 =cut
1310 sub _addel {
1311 my $self = shift;
1312 my ($root, $name, $attr) = @_;
1314 # Find the DOM::Document for the parent
1315 my $doc = $root->getOwnerDocument || $root;
1316 my $elem = $doc->createElement($name);
1317 for my $a (keys %{$attr}) {
1318 $elem->setAttribute($a, $attr->{$a});
1320 $root->appendChild($elem);
1321 return $elem;
1324 =head2 _show_dna
1326 Title : _show_dna
1327 Usage : $obj->_show_dna($newval)
1328 Function: (cut-and-pasted directly from embl.pm)
1329 Returns : value of _show_dna
1330 Args : newvalue (optional)
1332 =cut
1334 sub _show_dna {
1335 my $obj = shift;
1336 if( @_ ) {
1337 my $value = shift;
1338 $obj->{'_show_dna'} = $value;
1340 return $obj->{'_show_dna'};
1343 =head2 _initialize
1345 Title : _initialize
1346 Usage : $dom = $obj->_initialize(@args)
1347 Function: Coppied from embl.pm, and augmented with initialization of the
1348 XML DOM tree
1349 Returns :
1350 Args : -file => the XML file to be parsed
1352 =cut
1354 sub _initialize {
1355 my($self,@args) = @_;
1357 $self->SUPER::_initialize(@args);
1358 # hash for functions for decoding keys.
1359 $self->{'_func_ftunit_hash'} = {};
1360 $self->_show_dna(1); # sets this to one by default. People can change it
1362 my %param = @args; # From SeqIO.pm
1363 @param{ map { lc $_ } keys %param } = values %param; # lowercase keys
1364 if ( exists $param{-file} && $param{-file} !~ /^>/) {
1365 # Is it blasphemy to add your own keys to an object in another package?
1366 # domtree => the parsed DOM tree retruned by XML::DOM
1367 $self->{'domtree'} = $self->_parse_xml( $param{-file} );
1368 # current_node => the <Sequence> node next in line for next_seq
1369 $self->{'current_node'} = 0;
1372 $self->sequence_factory( Bio::Seq::SeqFactory->new
1373 ( -verbose => $self->verbose(),
1374 -type => 'Bio::Seq::RichSeq'))
1375 if( ! defined $self->sequence_factory );
1379 =head2 _parseparams
1381 Title : _parseparams
1382 Usage : my $paramHash = $obj->_parseparams(@args)
1383 Function: Borrowed from Bio::Parse.pm, who borrowed it from CGI.pm
1384 Lincoln Stein -> Richard Resnick -> here
1385 Returns : A hash reference of the parameter keys (uppercase) pointing to
1386 their values.
1387 Args : An array of key, value pairs. Easiest to pass values as:
1388 -key1 => value1, -key2 => value2, etc
1389 Leading "-" are removed.
1391 =cut
1393 sub _parseparams {
1394 my $self = shift;
1395 my %hash = ();
1396 my @param = @_;
1398 # Hacked out from Parse.pm
1399 # The next few lines strip out the '-' characters which
1400 # preceed the keys, and capitalizes them.
1401 for (my $i=0;$i<@param;$i+=2) {
1402 $param[$i]=~s/^\-//;
1403 $param[$i]=~tr/a-z/A-Z/;
1405 pop @param if @param %2; # not an even multiple
1406 %hash = @param;
1407 return \%hash;
1410 =head2 _parse_xml
1412 Title : _parse_xml
1413 Usage : $dom = $obj->_parse_xml($filename)
1414 Function: uses XML::DOM to construct a DOM tree from the BSML document
1415 Returns : a reference to the parsed DOM tree
1416 Args : 0 Path to the XML file needing to be parsed
1418 =cut
1420 sub _parse_xml {
1421 my $self = shift;
1422 my $file = shift;
1424 unless (-e $file) {
1425 $self->throw("Could not parse non-existant XML file '$file'.");
1426 return;
1428 my $parser = XML::DOM::Parser->new();
1429 my $doc = $parser->parsefile ($file);
1430 return $doc;
1433 sub DESTROY {
1434 my $self = shift;
1435 # Reports off the net imply that DOM::Parser will memory leak if you
1436 # do not explicitly dispose of it:
1437 # http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/788458
1438 my $dom = $self->{'domtree'};
1439 # For some reason the domtree can get undef-ed somewhere...
1440 $dom->dispose if ($dom);
1444 =head1 TESTING SCRIPT
1446 The following script may be used to test the conversion process. You
1447 will need a file of the format you wish to test. The script will
1448 convert the file to BSML, store it in /tmp/bsmltemp, read that file
1449 into a new SeqIO stream, and write it back as the original
1450 format. Comparison of this second file to the original input file
1451 will allow you to track where data may be lost or corrupted. Note
1452 that you will need to specify $readfile and $readformat.
1454 use Bio::SeqIO;
1455 # Tests preservation of details during round-trip conversion:
1456 # $readformat -> BSML -> $readformat
1457 my $tempspot = "/tmp/bsmltemp"; # temp folder to hold generated files
1458 my $readfile = "rps4y.embl"; # The name of the file you want to test
1459 my $readformat = "embl"; # The format of the file being tested
1461 system "mkdir $tempspot" unless (-d $tempspot);
1462 # Make Seq object from the $readfile
1463 my $biostream = Bio::SeqIO->new( -file => "$readfile" );
1464 my $seq = $biostream->next_seq();
1466 # Write BSML from SeqObject
1467 my $bsmlout = Bio::SeqIO->new( -format => 'bsml',
1468 -file => ">$tempspot/out.bsml");
1469 warn "\nBSML written to $tempspot/out.bsml\n";
1470 $bsmlout->write_seq($seq);
1471 # Need to kill object for following code to work... Why is this so?
1472 $bsmlout = "";
1474 # Make Seq object from BSML
1475 my $bsmlin = Bio::SeqIO->new( -file => "$tempspot/out.bsml",
1476 -format => 'bsml');
1477 my $seq2 = $bsmlin->next_seq();
1479 # Write format back from Seq Object
1480 my $genout = Bio::SeqIO->new( -format => $readformat,
1481 -file => ">$tempspot/out.$readformat");
1482 $genout->write_seq($seq2);
1483 warn "$readformat written to $tempspot/out.$readformat\n";
1485 # BEING LOST:
1486 # Join information (not possible in BSML 2.2)
1487 # Sequence type (??)
1489 =cut