1 # to do: support for comment, reference annotations
3 # $Id: HIVQuery.pm 232 2008-12-11 14:51:51Z maj $
5 # BioPerl module for Bio::DB::Query::LANLQuery
7 # Please direct questions and support issues to <bioperl-l@bioperl.org>
9 # Cared for by Mark A. Jensen <maj@fortinbras.us>
11 # Copyright Mark A. Jensen
13 # You may distribute this module under the same terms as perl itself
15 # POD documentation - main docs before the code
19 Bio::DB::Query::HIVQuery - Query interface to the Los Alamos HIV Sequence Database
23 $q = new Bio::DB::Query::HIVQuery(" C[subtype] ZA[country] CXCR4[coreceptor] ");
24 $q = new Bio::DB::Query::HIVQuery(
25 -query=>{'subtype'=>'C',
27 'coreceptor'=>'CXCR4'});
29 $ac = $q->get_annotations_by_id(($q->ids)[0]);
30 $ac->get_value('Geo', 'country') # returns 'SOUTH AFRICA'
32 $db = new Bio::DB::HIV();
33 $seqio = $db->get_Stream_by_query($q); # returns annotated Bio::Seqs
35 # get subtype C sequences from South Africa and Brazil,
36 # with associated info on patient health, coreceptor use, and
39 $q = new Bio::DB::Query::HIVQuery(
41 'query' => {'subtype'=>'C',
42 'country'=>['ZA', 'BR']},
43 'annot' => ['patient_health',
45 'days_post_infection']
51 Bio::DB::Query::HIVQuery provides a query-like interface to the
52 cgi-based Los Alamos National Laboratory (LANL) HIV Sequence
53 Database. It uses Bioperl facilities to capture both sequences and
54 annotations in batch in an automated and computable way. Use with
55 L<Bio::DB::HIV> to create C<Bio::Seq> objects and annotated C<Bio::SeqIO>
60 The interface implements a simple query language emulation that understands AND,
61 OR, and parenthetical nesting. The basic query unit is
63 (match1 match2 ...)[fieldname]
65 Sequences are returned for which C<fieldname> equals C<match1 OR match2 OR ...>.
66 These units can be combined with AND, OR and parentheses. For example:
68 (B, C)[subtype] AND (2000, 2001, 2002, 2003)[year] AND ((CN)[country] OR (ZA)[country])
70 which can be shortened to
72 (B C)[subtype] (2000 2001 2002 2003)[year] (CN ZA)[country]
74 The user can specify annotation fields, that do not restrict the query, but
75 arrange for the return of the associated field data for each sequence returned.
76 Specify annotation fields between curly braces, as in:
78 (B C)[subtype] 2000[year] {country cd4_count cd8_count}
80 Annotations can be accessed off the query using methods described in APPENDIX.
82 =head2 Hash specifications for query construction
84 Single query specifications can be made as hash references provided to the
85 C<-query> argument of the constructor. There are two forms:
87 -query => { 'country'=>'BR', 'phenotype'=>'NSI', 'cd4_count'=>'Any' }
91 -query => [ 'country'=>'BR', 'phenotype'=>'NSI', 'cd4_count'=>'Any' ]
95 -query => { 'query' => {'country'=>'BR', 'phenotype'=>'NSI'},
96 'annot' => ['cd4_count'] }
98 In both cases, the CD4 count is included in the annotations returned, but does
99 not restrict the rest of the query.
101 To 'OR' multiple values of a field, use an anonymous array ref:
103 -query => { 'country'=>['ZA','BR','NL'], 'subtype'=>['A', 'C', 'D'] }
105 =head2 Valid query field names
107 An attempt was made to make the query field names natural and easy to
108 remember. Aliases are specified in an XML file (C<lanl-schema.xml>) that is part
109 of the distribution. Custom field aliases can be set up by modifying this file.
111 An HTML cheatsheet with valid field names, aliases, and match data can be
112 generated from the XML by using C<hiv_object-E<gt>help('help.html')>. A query
113 can also be validated locally before it is unleashed on the server; see below.
117 LANL DB annotations have been organized into a number of natural
118 groupings, tagged C<Geo>, C<Patient>, C<Virus>, and C<StdMap>. After a
119 successful query, each id is associated with a tree of
120 L<Bio::Annotation::SimpleValue> objects. These can be accessed with
121 methods C<get_value> and C<put_value> described in APPENDIX.
123 =head2 Delayed/partial query runs
125 Accessing the LANL DB involves multiple HTTP requests. The query can
126 be instructed to proceed through all (the default) or only some of
127 them, using the named parameter C<RUN_OPTION>.
129 To validate a query locally, use
131 $q = new Bio::DB::Query::HIVQuery( -query => {...}, -RUN_OPTION=>0 )
133 which will throw an exception if a field name or option is invalid.
135 To get a query count only, you can save a server hit by using
137 $q = new Bio::DB::Query::HIVQuery( -query => {...}, -RUN_OPTION=>1 )
139 and asking for C<$q-E<gt>count>. To finish the query, do
143 which picks up where you left off.
145 C<-RUN_OPTION=E<gt>2>, the default, runs the full query, returning ids and
150 You can clear the query results, retaining the same LANL session and query spec,
151 by doing C<$q-E<gt>_reset>. Change the query, and rerun with
152 C<$q-E<gt>_do_query($YOUR_RUN_OPTION)>.
158 User feedback is an integral part of the evolution of this and other
159 Bioperl modules. Send your comments and suggestions preferably to
160 the Bioperl mailing list. Your participation is much appreciated.
162 bioperl-l@bioperl.org - General discussion
163 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
167 Please direct usage questions or support issues to the mailing list:
169 I<bioperl-l@bioperl.org>
171 rather than to the module maintainer directly. Many experienced and
172 reponsive experts will be able look at the problem and quickly
173 address it. Please include a thorough description of the problem
174 with code and data examples if at all possible.
176 =head2 Reporting Bugs
178 Report bugs to the Bioperl bug tracking system to help us keep track
179 of the bugs and their resolution. Bug reports can be submitted via
182 https://github.com/bioperl/bioperl-live/issues
184 =head1 AUTHOR - Mark A. Jensen
186 Email maj@fortinbras.us
194 The rest of the documentation details each of the object methods.
195 Internal methods are usually preceded with a _
199 # Let the code begin...
201 package Bio
::DB
::Query
::HIVQuery
;
203 use vars
qw( $LANL_BASE $LANL_MAP_DB $LANL_MAKE_SEARCH_IF $LANL_SEARCH $SCHEMA_FILE $RUN_OPTION );
205 # Object preamble - inherits from Bio::DB::QueryI
207 use Bio::Annotation::Collection;
208 use Bio::Annotation::Comment;
209 use Bio::Annotation::Reference;
214 use Bio::DB::HIV::HIVQueryHelper;
216 use base qw(Bio::Root::Root Bio::DB::QueryI);
220 # change base from http -> https /maj 1/23/18
221 $LANL_BASE = "https://www.hiv.lanl.gov/components/sequence/HIV/asearch";
222 $LANL_MAP_DB = "map_db.comp";
223 $LANL_MAKE_SEARCH_IF = "make_search_if.comp";
224 $LANL_SEARCH = "search.comp";
225 $SCHEMA_FILE = Bio
::Root
::IO
->catfile(qw(Bio DB HIV lanl-schema.xml));
226 $RUN_OPTION = 2; # execute query
228 @Bio::SchemaNotInit
::Exception
::ISA
= qw( Bio::Root::Exception );
229 @Bio::WebError
::Exception
::ISA
= qw( Bio::Root::Exception );
230 @Bio::QueryNotMade
::Exception
::ISA
= qw( Bio::Root::Exception );
231 @Bio::QueryStringException
::Exception
::ISA
= qw( Bio::Root::Exception );
232 @Bio::HIVSorry
::Exception
::ISA
= qw
( Bio
::Root
::Exception
);
241 Usage : my $hiv_query = new Bio::DB::Query::HIVQuery();
242 Function: Builds a new Bio::DB::Query::HIVQuery object,
243 running a sequence query against the Los Alamos
244 HIV sequence database
245 Returns : an instance of Bio::DB::Query::HIVQuery
251 my($class,@args) = @_;
252 my $self = $class->SUPER::new
(@args);
253 # constructor option for web agent parameter spec: added 01/14/09 /maj
254 my ($query, $ids, $lanl_base, $lanl_map_db, $lanl_make_search_if, $lanl_search, $schema_file,$run_option, $uahash) =
255 $self->_rearrange([ qw(QUERY
267 $lanl_base||= $LANL_BASE;
268 $lanl_map_db||=$LANL_MAP_DB;
269 $lanl_make_search_if||=$LANL_MAKE_SEARCH_IF;
270 $lanl_search||=$LANL_SEARCH;
271 $schema_file||=$SCHEMA_FILE;
272 $uahash ||= {timeout
=> 90};
273 defined $run_option || ($run_option = $RUN_OPTION);
275 $self->lanl_base($lanl_base);
276 $self->map_db($lanl_map_db);
277 $self->make_search_if($lanl_make_search_if);
278 $self->search_($lanl_search);
279 $self->_run_option($run_option);
280 $self->_ua_hash($uahash);
282 # catch this at the top
283 if (-e
$schema_file) {
284 $self->_schema_file($schema_file);
287 my ($p) = $self->_schema_file( [grep {$_} map {
288 my $p = Bio
::Root
::IO
->catfile($_, $schema_file);
291 $self->throw(-class=>"Bio::Root::NoSuchThing",
292 -text
=>"Schema file \"".$self->_schema_file."\" cannot be found",
293 -value
=>$self->_schema_file) unless -e
$self->_schema_file;
297 $self->{_schema
} = HIVSchema
->new($self->_schema_file);
299 # internal storage and flags
300 $self->{'_lanl_query'} = [];
301 $self->{'_lanl_response'} = [];
302 $self->{'_annotations'} = {}; # container for annotation collections assoc. with ids
303 $self->{'_RUN_LEVEL'} = undef; # set in _do_query()
306 defined $query && $self->query($query);
307 defined $ids && $self->ids($ids);
311 $self->_do_query($self->_run_option) if $self->query;
316 =head1 QueryI compliance
321 Usage : $hiv_query->count($newval)
322 Function: return number of sequences found
324 Returns : value of count (a scalar)
325 Args : on set, new value (a scalar or undef, optional)
326 Note : count warns if it is accessed for reading before query
327 has been executed to at least level 1
333 return $self->{'count'} = shift if @_;
334 if (!$self->{'_RUN_LEVEL'} || ($self->{'_RUN_LEVEL'} < 1)) {
335 $self->warn('Query not yet run at > level 1');
337 return $self->{'count'};
343 Usage : $hiv_query->ids($newval)
344 Function: LANL ids of returned sequences
346 Returns : value of ids (an arrayref of sequence accessions/ids)
347 Args : on set, new value (an arrayref or undef, optional)
355 $self->throw(-class=>'Bio::Root::BadParameter',
356 -text
=>'Arrayref required',
357 -value
=> ref $a) unless ref($a) eq 'ARRAY';
359 @
{$self->{'ids'}}{@
$a} = (1) x @
$a;
362 else { #with empty arrayref, clear the hash
366 return keys %{$self->{'ids'}} if $self->{'ids'};
372 Usage : $hiv_query->query
373 Function: Get/set the submitted query hash or string
375 Returns : hashref or string
376 Args : query in hash or string form (see DESCRIPTION)
382 return $self->{'query'} = shift if @_;
383 return $self->{'query'};
386 =head1 Bio::DB::Query::HIVQuery specific methods
391 Usage : $hiv_query->help("help.html")
392 Function: get html-formatted listing of valid fields/aliases/options
393 based on current schema xml
394 Example : perl -MBio::DB::Query::HIVQuery -e "new Bio::DB::Query::HIVQuery()->help" | lynx -stdin
396 Args : optional filename; otherwise prints to stdout
401 my ($self, $fname) = @_;
403 my $schema = $self->_schema;
406 my (@tbls, @flds, @als, @opts, $fh);
408 open $fh, '>', $fname or $self->throw(-class => 'Bio::Root::IOException',
409 -text
=> "Error opening help html file $fname for writing",
415 @tbls = $schema->tables;
416 @tbls = ('COMMAND', grep !/COMMAND/,@tbls);
418 $h->start_html(-title
=>"HIVQuery Help")
420 print $fh $h->a({-id
=>'TOP'}, $h->h2("Valid <span style='font-variant:small-caps'>HIVQuery</span> query fields and match data"));
421 print $fh "Fields are organized below according to their Los Alamos HIV database tables. Use aliases in place of full field names in queries; for example:<br/>";
422 print $fh "<blockquote><code> (CCR5 CXCR4)[coreceptor]</code></blockquote>";
423 print $fh "rather than";
424 print $fh "<blockquote><code>(CCR5 CXCR4)[seq_sample.ssam_second_receptor] </code></blockquote>";
425 print $fh "(which does work, however). Click hyperlinks to see valid search options within the field. The token <code><b>Any</b></code> is the wildcard for all fields.<br/><br/>";
426 print $fh $h->start_table({-style
=>"font-family:sans-serif;"}) ;
427 foreach my $tbl (@tbls) {
428 @flds = grep /^$tbl/, $schema->fields;
429 @flds = grep !/_id/, @flds;
431 $h->start_Tr({-style
=>"background-color: lightblue;"}),
432 $h->td([$h->a({-id
=>$tbl},$tbl), $h->span({-style
=>"font-style:italic"},"fields"), $h->span({-style
=>"font-style:italic"}, "aliases")]),
435 foreach my $fld (@flds) {
436 @als = reverse $schema->aliases($fld);
438 # note that aliases can sometimes be empty
439 $h->Tr( $h->td( ["", $h->a({-href
=>"#opt$fld"}, shift @als || '???'), $h->code(join(',',@als))] ))
441 my @tmp = grep {$_} $schema->options($fld);
442 @tmp = sort {(($a =~ /^[0-9]+$/) && $b =~ /^[0-9]+$/) ?
$a<=>$b : $a cmp $b} @tmp;
443 if (grep /Any/,@tmp) {
444 @tmp = grep !/Any/, @tmp;
447 #print STDERR join(', ',@tmp)."\n";
449 {-style
=>"font-family:sans-serif;font-size:small"},
453 "<i>Valid options for</i> <b>$fld</b>: "
456 @tmp ?
$h->code(join(", ", @tmp)) : $h->i("free text")
459 "<i>Other aliases</i>: "
462 @als ?
$h->code(join(",",@als)) : "<i>none</i>"
468 $h->a({-href
=>"#$tbl"}, $h->small('BACK')),
469 $h->a({-href
=>"#TOP"}, $h->small('TOP'))
477 print $fh $h->end_table;
479 print $fh $h->end_html;
484 =head1 Annotation manipulation methods
486 =head2 get_annotations_by_ids
488 Title : get_annotations_by_ids (or ..._by_id)
489 Usage : $ac = $hiv_query->get_annotations_by_ids(@ids)
490 Function: Get the Bio::Annotation::Collection for these sequence ids
492 Returns : A Bio::Annotation::Collection object
493 Args : an array of sequence ids
497 sub get_annotations_by_ids
{
501 if (!$self->{'_RUN_LEVEL'} || ($self->{'_RUN_LEVEL'} < 2)) {
502 $self->warn('Requires query run at level 2');
505 @ret = map {$self->{'_annotations'}->{$_}} @ids if exists($self->{'_annotations'});
507 return (wantarray ?
@ret : $ret[0]) if @ret;
512 sub get_annotations_by_id
{
513 shift->get_annotations_by_ids(@_);
516 =head2 add_annotations_for_id
518 Title : add_annotations_for_id
519 Usage : $hiv_query->add_annotations_for_id( $id ) to create a new
520 empty collection for $id
521 $hiv_query->add_annotations_for_id( $id, $ac ) to associate
523 Function: Associate a Bio::Annotation::Collection with this sequence id
525 Returns : a Bio::Annotation::Collection object
526 Args : sequence id [, Bio::Annotation::Collection object]
530 sub add_annotations_for_id
{
533 $id = "" unless defined $id; # avoid warnings
534 $ac = Bio
::Annotation
::Collection
->new() unless defined $ac;
535 $self->throw(-class=>'Bio::Root::BadParameter'
536 -text
=>'Bio::Annotation::Collection required at arg 2',
537 -value
=>"") unless ref($ac) eq 'Bio::Annotation::Collection';
539 $self->{'_annotations'}->{$id} = $ac unless exists($self->{'_annotations'}->{$id});
543 =head2 remove_annotations_for_ids
545 Title : remove_annotations_for_ids (or ..._for_id)
546 Usage : $hiv_query->remove_annotations_for_ids( @ids)
547 Function: Remove annotation collection for this sequence id
549 Returns : An array of the previous annotation collections for these ids
550 Args : an array of sequence ids
554 sub remove_annotations_for_ids
{
559 push @ac, delete $self->{'_annotations'}->{$_};
565 sub remove_annotations_for_id
{
566 shift->remove_annotations_for_ids(@_);
569 =head2 remove_annotations
571 Title : remove_annotations
572 Usage : $hiv_query->remove_annotations()
573 Function: Remove all annotation collections for this object
575 Returns : The previous annotation collection hash for this object
580 sub remove_annotations
{
583 my $ach = $self->{'_annotations'};
584 $self->{'_annotations'} = {};
591 Usage : $ac->get_value($tagname) -or-
592 $ac->get_value( $tag_level1, $tag_level2,... )
593 Function: access the annotation value associated with the given tags
596 Args : an array of tagnames that descend into the annotation tree
597 Note : this is a L<Bio::AnnotationCollectionI> method added in
598 L<Bio::DB::HIV::HIVQueryHelper>
605 Usage : $ac->put_value($tagname, $value) -or-
606 $ac->put_value([$tag_level1, $tag_level2, ...], $value) -or-
607 $ac->put_value( [$tag_level1, $tag_level2, ...] )
608 Function: create a node in an annotation tree, and assign a scalar value to it
609 if a value is specified
611 Returns : scalar or a Bio::AnnotationCollection object
612 Args : $tagname, $value scalars (can be specified as -KEYS=>$tagname,
614 \@tagnames, $value (or as -KEYS=>\@tagnames, -VALUE=>$value )
615 Notes : This is a L<Bio::AnnotationCollectionI> method added in
616 L<Bio::DB::HIV::HIVQueryHelper>.
617 If intervening nodes do not exist, put_value creates them, replacing
618 existing nodes. So if $ac->put_value('x', 10) was done, then later,
619 $ac->put_value(['x', 'y'], 20), the original value of 'x' is trashed,
620 and $ac->get_value('x') will now return the annotation collection
628 Usage : $ac->get_keys($tagname_level_1, $tagname_level_2,...)
629 Function: Get an array of tagnames underneath the named tag nodes
630 Example : # prints the values of the members of Category 1...
631 print map { $ac->get_value($_) } $ac->get_keys('Category 1') ;
632 Returns : array of tagnames or empty list if the arguments represent a leaf
633 Args : [array of] tagname[s]
637 =head1 GenBank accession manipulation methods
639 =head2 get_accessions
641 Title : get_accessions
642 Usage : $hiv_query->get_accessions()
643 Function: Return an array of GenBank accessions associated with these
644 sequences (available only after a query is subjected to a
645 full run (i.e., when $RUN_OPTION == 2)
647 Returns : array of gb accession numbers, or () if none found for this query
655 if (!$self->{'_RUN_LEVEL'} || ($self->{'_RUN_LEVEL'} < 2)) {
656 $self->warn('Requires query run at level 2');
659 my @ac = $self->get_annotations_by_ids($self->ids);
661 push @ret, $_->get_value('Special','accession');
666 =head2 get_accessions_by_ids
668 Title : get_accessions_by_ids (or ..._by_id)
669 Usage : $hiv_query->get_accessions_by_ids(@ids)
670 Function: Return an array of GenBank accessions associated with these
671 LANL ids (available only after a query is subjected to a
672 full run (i.e., when $RUN_OPTION == 2)
674 Returns : array of gb accession numbers, or () if none found for this query
679 sub get_accessions_by_ids
{
683 if (!$self->{'_RUN_LEVEL'} || ($self->{'_RUN_LEVEL'} < 2)) {
684 $self->warn('Requires query run at level 2');
687 my @ac = $self->get_annotations_by_ids(@ids);
689 push @ret, $_->get_value('Special', 'accession');
691 return wantarray ?
@ret : $ret[0];
695 sub get_accessions_by_id
{
696 shift->get_accessions_by_ids(@_);
701 =head1 Query control methods
706 Usage : $hiv_query->_do_query or $hiv_query->_do_query($run_level)
707 Function: Execute the query according to argument or $RUN_OPTION
709 extent of query reflects the value of argument
710 0 : validate only (no HTTP action)
711 1 : return sequence count only
712 2 : return sequence ids (full query, returns with annotations)
713 noop if current _RUN_LEVEL of query is >= argument or $RUN_OPTION,
715 Returns : actual _RUN_LEVEL (0, 1, or 2) achieved
716 Args : desired run level (optional, global $RUN_OPTION is default)
722 $rl = $RUN_OPTION unless defined $rl;
723 $self->throw(-class=>"Bio::Root::BadParameter",
724 -text
=>"Invalid run option \"$RUN_OPTION\"",
725 -value
=>$RUN_OPTION) unless grep /^$RUN_OPTION$/, (0, 1, 2);
726 (!defined($self->{'_RUN_LEVEL'})) && do {
727 $self->_create_lanl_query();
728 $self->{'_RUN_LEVEL'} = 0;
730 ($rl > 0) && (!defined($self->{'_RUN_LEVEL'}) || ($self->{'_RUN_LEVEL'} <= 0)) && do {
731 $self->_do_lanl_request();
732 $self->{'_RUN_LEVEL'} = 1;
734 ($rl > 1) && (!defined($self->{'_RUN_LEVEL'}) || ($self->{'_RUN_LEVEL'} <= 1)) && do {
735 $self->_parse_lanl_response();
736 $self->{'_RUN_LEVEL'} = 2;
738 return $self->{'_RUN_LEVEL'};
744 Usage : $hiv_query->_reset
745 Function: Resets query storage, count, and ids, while retaining session id,
746 original query string, and db schema
757 $self->{'_annotations'} = {};
758 $self->{'_lanl_response'} = [];
759 $self->{'_lanl_query'} = [];
760 $self->{'_RUN_LEVEL'} = undef;
767 Usage : $hiv_query->_session_id($newval)
768 Function: Get/set HIV db session id (initialized in _do_lanl_request)
770 Returns : value of _session_id (a scalar)
771 Args : on set, new value (a scalar or undef, optional)
778 return $self->{'_session_id'} = shift if @_;
779 return $self->{'_session_id'};
784 Usage : $obj->_run_level($newval)
785 Function: returns the level at which the query has so far been run
787 Returns : value of _run_level (a scalar)
788 Args : on set, new value (a scalar or undef, optional)
795 return $self->{'_RUN_LEVEL'} = shift if @_;
796 return $self->{'_RUN_LEVEL'};
802 Usage : $hiv_query->_run_option($newval)
803 Function: Get/set HIV db query run option (see _do_query for values)
805 Returns : value of _run_option (a scalar)
806 Args : on set, new value (a scalar or undef, optional)
813 return $self->{'_run_option'} = shift if @_;
814 return $self->{'_run_option'};
820 Usage : $obj->_ua_hash($newval)
823 Returns : value of _ua_hash (a scalar)
824 Args : on set, new value (a scalar or undef, optional)
833 $self->{'_ua_hash'} = $_[0];
837 $self->{'_ua_hash'} = {@_};
841 $self->throw("Type ".ref($_)." unsupported as arg in _ua_hash");
846 return %{$self->{'_ua_hash'}};
857 Usage : $hiv_query->add_id($id)
858 Function: Add new id to ids
868 $id = "" unless defined $id; # avoid warnings
869 ${$self->{'ids'}}{$id}++;
876 return $self->{'lanl_base'} = shift if @_;
877 return $self->{'lanl_base'};
883 Usage : $obj->map_db($newval)
886 Returns : value of map_db (a scalar)
887 Args : on set, new value (a scalar or undef, optional)
893 return $self->{'map_db'} = shift if @_;
894 return $self->{'map_db'};
897 =head2 make_search_if
899 Title : make_search_if
900 Usage : $obj->make_search_if($newval)
903 Returns : value of make_search_if (a scalar)
904 Args : on set, new value (a scalar or undef, optional)
910 return $self->{'make_search_if'} = shift if @_;
911 return $self->{'make_search_if'};
917 Usage : $obj->search_($newval)
920 Returns : value of search_ (a scalar)
921 Args : on set, new value (a scalar or undef, optional)
927 return $self->{'search_'} = shift if @_;
928 return $self->{'search_'};
935 Function: return the full map_db uri ("Database Map")
937 Returns : scalar string
944 return $self->lanl_base."/".$self->map_db;
948 =head2 _make_search_if_uri
950 Title : _make_search_if_uri
952 Function: return the full make_search_if uri ("Make Search Interface")
954 Returns : scalar string
959 sub _make_search_if_uri
{
961 return $self->lanl_base."/".$self->make_search_if;
968 Function: return the full search cgi uri ("Search Database")
970 Returns : scalar string
977 return $self->lanl_base."/".$self->search_;
983 Usage : $hiv_query->_schema_file($newval)
986 Returns : value of _schema_file (an XML string or filename)
987 Args : on set, new value (an XML string or filename, or undef, optional)
994 return $self->{'_schema_file'} = shift if @_;
995 return $self->{'_schema_file'};
1001 Usage : $hiv_query->_schema($newVal)
1004 Returns : value of _schema (an HIVSchema object in package
1005 L<Bio::DB::HIV::HIVQueryHelper>)
1006 Args : none (field set directly in new())
1013 $self->{'_schema'} ?
1014 return $self->{'_schema'} :
1015 $self->throw(-class=>'Bio::SchemaNotInit::Exception',
1016 -text
=>"DB schema not initialized",
1024 Usage : $hiv_query->_lanl_query(\@query_parms)
1025 Function: pushes \@query_parms onto @{$self->{'_lanl_query'}
1027 Returns : value of _lanl_query (an arrayref)
1028 Args : on set, new value (an arrayref or undef, optional)
1035 return $self->{'_lanl_query'} unless $a;
1036 if (ref $a eq 'ARRAY') {
1037 push @
{$self->{'_lanl_query'}}, $a;
1041 $self->throw(-class=>'Bio::Root::BadParameter',
1042 -text
=>'Array ref required for argument.',
1048 =head2 _lanl_response
1050 Title : _lanl_response
1051 Usage : $hiv_query->_lanl_response($response)
1052 Function: pushes $response onto @{$hiv_query->{'_lanl_response'}}
1054 Returns : value of _lanl_response (an arrayref of HTTP::Response objects)
1055 Args : on set, new value (an HTTP::Response object or undef, optional)
1063 $self->throw(-class=>'Bio::Root::BadParameter',
1064 -text
=>'Requires an HTTP::Response object',
1065 -value
=> ref $r) unless ref($r) eq 'HTTP::Response';
1066 push @
{$self->{'_lanl_response'}}, $r;
1069 return $self->{'_lanl_response'};
1072 =head2 _create_lanl_query
1074 Title : _create_lanl_query
1075 Usage : $hiv_query->_create_lanl_query()
1076 Function: validate query hash or string, prepare for _do_lanl_request
1078 Returns : 1 if successful; throws exception on invalid query
1083 sub _create_lanl_query
{
1085 my (%inhash, @query, @qhashes);
1086 my ($schema, @validFields, @validAliases);
1088 for ($self->query) {
1090 $self->throw(-class=>'Bio::Root::NoSuchThing',
1091 -text
=>'Query not specified',
1095 ref eq 'HASH' && do {
1097 if ( grep /HASH/, map {ref} values %inhash ) {
1098 # check for {query=>{},annot=>[]} style
1099 $self->throw(-class=>'Bio::Root::BadParameter',
1100 -text
=>'Query style unrecognized',
1101 -value
=>"") unless defined $inhash{query
};
1106 ref eq 'ARRAY' && do {
1107 $inhash{'query'} = {@
$_};
1108 push @qhashes, \
%inhash;
1113 @qhashes = $self->_parse_query_string($_);
1116 $schema = $self->_schema;
1117 @validFields = $schema->fields;
1118 @validAliases = $schema->aliases;
1120 # validate args based on the xml specification file
1121 # only checks blanks and fields with explicitly specified options
1122 # text fields can put anything, and the query will be run before
1123 # an error is caught in these
1124 foreach my $qh (@qhashes) {
1126 foreach my $k (keys %{$$qh{'query'}}) {
1129 if (grep /^$k$/, @validFields) {
1132 elsif (grep /^$k$/, @validAliases) {
1133 foreach (@validFields) {
1134 if (grep (/^$k$/, $schema->aliases($_))) {
1138 # $fld contains the field corresp. to the alias
1142 $self->throw(-class=>'Bio::Root::BadParameter',
1143 -text
=>"Invalid field or alias \"$k\"",
1146 # validate matchdata
1147 my $vf = $schema->_sfieldh($fld);
1148 my @md = (ref($qh->{'query'}{$k}) eq 'ARRAY') ? @
{$qh->{'query'}{$k}} : $qh->{'query'}{$k};
1149 if ($$vf{type
} eq 'text') {
1151 $self->throw(-class=>'Bio::Root::BadParameter',
1152 -text
=>'Value for field \"$k\" cannot be empty',
1154 if ($_ eq "") && ($$vf{blank_ok
} eq 'false');
1157 elsif ($$vf{type
} eq 'option') {
1158 foreach my $md (@md) {
1159 $self->throw(-class=>'Bio::Root::BadParameter',
1160 -text
=>"Invalid value \"".$md."\" for field \"$fld\"",
1162 unless $$vf{option
} && grep {defined $_ && /^$md$/} @
{$$vf{option
}};
1165 # validated; add to query
1167 push @query, ($fld => $_);
1170 if ($qh->{'annot'}) {
1171 # validate the column names to be included in the query
1172 # to obtain annotations
1173 my @annot_cols = @
{$qh->{'annot'}};
1174 foreach my $k (@annot_cols) {
1177 if (grep /^$k$/, @validFields) {
1180 elsif (grep /^$k$/, @validAliases) {
1181 foreach (@validFields) {
1182 if (grep (/^$k$/, $schema->aliases($_))) {
1186 # $fld should contain the field corresp. to the alias
1190 $self->throw(-class=>'Bio::Root::NoSuchThing',
1191 -text
=>"Invalid field or alias \"$k\"",
1194 # lazy: 'Any' may not be the right default (but appears to
1195 # be, based on the lanl html)
1196 push @query, ($fld => 'Any');
1200 # insure that LANL and GenBank ids are retrieved
1201 push @query, ('sequenceentry.se_id' => 'Any') unless grep /SequenceEntry\.SE_id/, @query;
1202 push @query, ('sequenceaccessions.sa_genbankaccession' => 'Any')
1203 unless grep /SequenceAccessions\.SA_GenBankAccession/, @query;
1205 # an "order" field is required by the LANL CGI
1206 # if not specified, default to SE_id
1208 push @query, ('order'=>'sequenceentry.se_id') unless grep /order/, @query;
1210 # @query now contains sfield=>matchdata pairs, as specified by user
1211 # include appropriate indexes to create correct automatic joins
1212 # established by the LANL CGI
1213 my (@qtbl, @qpk, @qfk);
1215 # the tables represented in query:
1216 my %q = @query; # squish the tables in the current query into hash keys
1217 @qtbl = $schema->tbl('-s', keys %q);
1220 # more than one table, see if they can be connected
1221 # get primary keys of query tables
1222 @qpk = $schema->pk(@qtbl);
1224 # we need to get each query table to join to
1227 # The schema is a graph with tables as nodes and
1228 # foreign keys<->primary keys as branches. To get a
1229 # join that works, need to include in the query
1230 # all branches along a path from SequenceEntry
1231 # to each query table.
1233 # find_join does it...
1235 my @k = $schema->find_join($_,'sequenceentry');
1238 # squish the keys in @joink
1240 @j{@joink} = (1) x
@joink;
1242 # add the fields not currently in the query
1243 foreach (@qpk, @joink) {
1245 if (!grep(/^$fld$/,keys %q)) {
1246 # lazy: 'Any' may not be the right default (but appears to
1247 # be, based on the lanl html)
1248 push @query, ($_ => 'Any');
1254 # set object property
1255 $self->_lanl_query([@query]);
1260 # _do_lanl_request : post the queries created by _create_lanl_query
1262 # @args (or {@args}) should be unaliased Table.Column=>Matchdata
1263 # pairs (these will be used directly in the POSTs)
1265 =head2 _do_lanl_request
1267 Title : _do_lanl_request
1268 Usage : $hiv_query->_do_lanl_request()
1269 Function: Perform search request on _create_lanl_query-validated query
1271 Returns : 1 if successful
1276 sub _do_lanl_request
{
1278 my (@queries, @query, @interface,$interfGet,$searchGet,$response);
1279 my ($numseqs, $count);
1282 if (!$self->_lanl_query) {
1283 $self->throw(-class=>"Bio::Root::BadParameter",
1284 -text
=>"_lanl_query empty, run _create_lanl_request first",
1288 @queries = @
{$self->_lanl_query};
1292 ## search site specific CGI parms
1293 my @search_pms = ('action'=>'Search');
1294 my @searchif_pms = ('action'=>'Search Interface');
1295 # don't get the actual sequence data here (i.e., the cgi parm
1296 # 'incl_seq' remains undefined...
1297 my @download_pms = ('action Download.x'=>1, 'action Download.y'=>1);
1299 ## HTML-testing regexps
1300 my $tags_re = qr{(?:\s*<[^>]+>\s*)};
1301 my $session_id_re = qr{<input.*name="id".*value="([0-9a-f]+)"}m;
1302 my $search_form_re = qr{<form[^>]*action=".*/search.comp"};
1303 my $seqs_found_re = qr{Displaying$tags_re*(?:\s*[0-9-]*\s*)*$tags_re*of$tags_re*\s*([0-9]+)$tags_re*sequences found};
1304 my $no_seqs_found_re = qr{Sorry.*no sequences found};
1305 my $too_many_re = qr{too many records: $tags_re*([0-9]+)};
1306 my $sys_error_re = qr{[Ss]ystem error};
1307 my $sys_error_extract_re = qr{${tags_re}error
:.*?
<td
[^>]+>${tags_re
}(.*?
)<br
>};
1308 # find something like:
1309 # <strong>tables without join:</strong><br>SequenceAccessions<br>
1310 my $tbl_no_join_re = qr{tables without join}i;
1311 # my $sorry_bud_re = qr{};
1314 foreach my $q (@queries) {
1316 # default query control parameters
1320 translate
=>'FALSE' # nucleotides
1325 # pull out commands, designated by the COMMAND pseudo-table...
1326 my @commands = map { $query[$_] =~ s/^COMMAND\.// ?
@query[$_..$_+1] : () } (0..$#query-1);
1327 @query = map { $query[$_] =~ /^COMMAND/ ?
() : @query[2*$_..2*$_+1] } (0..($#query-1)/2);
1330 # set control parameters explicitly made in query
1331 foreach my $cp (keys %qctrl) {
1332 if (!grep( /^$cp$/, @query)) {
1333 push @query, ($cp, $qctrl{$cp});
1337 # note that @interface must be an array, since a single 'key' (the table)
1338 # can be associated with multiple 'values' (the columns) in the POST
1340 # squish fieldnames into hash keys
1342 @interface = grep {defined} map {my ($tbl,$col) = /^(.*)\.(.*)$/} keys %q;
1343 my $err_val = ""; # to contain informative (ha!) value if error is parsed
1345 eval { # encapsulate communication errors here, defer biothrows...
1347 #mark the useragent should be setable from outside (so we can modify timeouts, etc)
1348 my $ua = Bio
::WebAgent
->new($self->_ua_hash);
1349 my $idPing = $ua->get($self->_map_db_uri);
1350 $idPing->is_success || do {
1352 die "Connect failed";
1354 # get the session id
1355 if (!$self->_session_id) {
1356 ($self->{'_session_id'}) = ($idPing->content =~ /$session_id_re/);
1357 $self->_session_id || do {
1359 die "Session not established";
1363 # strange bug: if action=>'Search+Interface' below (note "+"),
1364 # the response to the search (in $searchGet) shows the correct
1365 # >number< of sequences found, but also an error "No sequences
1366 # match" and an SQL barf. Changing the "+" to a " " sets up the
1367 # interface to lead to the actual sequences being delivered as
1369 $interfGet = $ua->post($self->_make_search_if_uri, [@interface, @searchif_pms, id
=>$self->_session_id]);
1370 $interfGet->is_success || do {
1371 $response=$interfGet;
1372 die "Interface request failed";
1374 # see if a search form was returned...
1376 $interfGet->content =~ /$search_form_re/ || do {
1377 $response=$interfGet;
1378 die "Interface request failed";
1381 $searchGet = $ua->post($self->_search_uri, [@query, @commands, @search_pms, id
=>$self->_session_id]);
1382 $searchGet->is_success || do {
1383 $response = $searchGet;
1384 die "Search failed";
1386 $response = $searchGet;
1387 for ($searchGet->content) {
1388 /$no_seqs_found_re/ && do {
1390 die "No sequences found";
1393 /$too_many_re/ && do {
1395 die "Too many records ($1): must be <10000";
1398 /$tbl_no_join_re/ && do {
1399 die "Some required tables went unjoined to query";
1402 /$sys_error_re/ && do {
1403 /$sys_error_extract_re/;
1405 die "LANL system error";
1407 /$seqs_found_re/ && do {
1414 die "Search failed (response not parsed)";
1417 $response = $ua->post($self->_search_uri, [@download_pms, id
=>$self->_session_id]);
1418 $response->is_success || die "Query failed";
1419 # $response->content is a tab-separated value table of sequences
1420 # and metadata, first line starts with \# and contains fieldnames
1422 $self->_lanl_response($response);
1423 # throw, if necessary
1425 ($@
!~ "No sequences found") && do {
1426 $self->throw(-class=>'Bio::WebError::Exception',
1433 $self->warn("No sequences found for this query") unless $count;
1434 $self->count($count);
1435 return 1; # made it.
1439 =head2 _parse_lanl_response
1441 Title : _parse_lanl_response
1442 Usage : $hiv_query->_parse_lanl_response()
1443 Function: Parse the tab-separated-value response obtained by _do_lanl_request
1444 for sequence ids, accessions, and annotations
1446 Returns : 1 if successful
1451 sub _parse_lanl_response
{
1453 ### handle parsing and merging multiple responses into the query object
1454 ### (ids and annotations)
1457 my ($seqGet) = (@_);
1458 my (@data, @cols, %antbl, %antype);
1460 my ($schema, @retseqs, %rec, $ac);
1461 $schema = $self->_schema;
1463 $self->_lanl_response ||
1464 $self->throw(-class=>"Bio::QueryNotMade::Exception",
1465 -text
=>"Query not yet performed; call _do_lanl_request()",
1467 foreach my $rsp (@
{$self->_lanl_response}) {
1468 @data = split(/\r|\n/, $rsp->content);
1472 } while ($l !~ /Number/);
1473 $numseq += ( $l =~ /Number.*:\s([0-9]+)/ )[0];
1474 @cols = split(/\t/, shift(@data));
1475 # mappings from column headings to annotation keys
1476 # squish into hash keys
1477 my %q = @
{ shift @
{$self->_lanl_query} };
1478 %antbl = $schema->ankh(keys %q);
1479 # get the category for each annotation
1480 map { $antype{ $_->{ankey
} } = $_->{antype
} } values %antbl;
1481 # normalize column headers
1482 map { tr/ /_/; $_ = lc; } @cols;
1484 @rec{@cols} = split /\t/;
1485 my $id = $rec{'se_id'};
1487 $ac = Bio
::Annotation
::Collection
->new();
1491 my $t = $antype{$_} || "Unclassified";
1492 my $d = $rec{$_}; # the data
1493 $ac->put_value(-KEYS
=>[$t, $_], -VALUE
=>$d);
1495 $self->add_annotations_for_id($id, $ac);
1499 return 1; # made it.
1502 =head2 _parse_query_string
1504 Title : _parse_query_string
1505 Usage : $hiv_query->_parse_query_string($str)
1506 Function: Parses a query string using query language emulator QRY
1507 : in L<Bio::DB::Query::HIVQueryHelper>
1509 Returns : arrayref of hash structures suitable for passing to _create_lanl_query
1510 Args : a string scalar
1514 sub _parse_query_string
{
1516 my $qstring = shift;
1518 #syntax errors thrown in QRY (in HIVQueryHelper module)
1519 $ptree = QRY
::_parse_q
( $qstring );
1520 @ret = QRY
::_make_q
($ptree);
1529 Usage : $hiv_query->_sorry("-president=>Powell")
1530 Function: Throws an exception for unsupported option or parameter
1533 Args : scalar string
1540 $self->throw(-class=>"Bio::HIVSorry::Exception",
1541 -text
=>"Sorry, option/parameter \"$parm\" not (yet) supported. See manpage to complain.",