bp_process_wormbase: move program to new Bio-DB-Ace distribution
[bioperl-live.git] / Bio / AnalysisI.pm
blob4b97957fbc92869fee000a7da3b6e54d5692f1d0
2 # BioPerl module for Bio::AnalysisI
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Martin Senger <martin.senger@gmail.com>
7 # For copyright and disclaimer see below.
10 # POD documentation - main docs before the code
12 =head1 NAME
14 Bio::AnalysisI - An interface to any (local or remote) analysis tool
16 =head1 SYNOPSIS
18 This is an interface module - you do not instantiate it.
19 Use C<Bio::Tools::Run::Analysis> module:
21 use Bio::Tools::Run::Analysis;
22 my $tool = Bio::Tools::Run::Analysis->new(@args);
24 =head1 DESCRIPTION
26 This interface contains all public methods for accessing and
27 controlling local and remote analysis tools. It is meant to be used on
28 the client side.
30 =head1 FEEDBACK
32 =head2 Mailing Lists
34 User feedback is an integral part of the evolution of this and other
35 Bioperl modules. Send your comments and suggestions preferably to
36 the Bioperl mailing list. Your participation is much appreciated.
38 bioperl-l@bioperl.org - General discussion
39 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
41 =head2 Support
43 Please direct usage questions or support issues to the mailing list:
45 I<bioperl-l@bioperl.org>
47 rather than to the module maintainer directly. Many experienced and
48 reponsive experts will be able look at the problem and quickly
49 address it. Please include a thorough description of the problem
50 with code and data examples if at all possible.
52 =head2 Reporting Bugs
54 Report bugs to the Bioperl bug tracking system to help us keep track
55 of the bugs and their resolution. Bug reports can be submitted via the
56 web:
58 https://github.com/bioperl/bioperl-live/issues
60 =head1 AUTHOR
62 Martin Senger (martin.senger@gmail.com)
64 =head1 COPYRIGHT
66 Copyright (c) 2003, Martin Senger and EMBL-EBI.
67 All Rights Reserved.
69 This module is free software; you can redistribute it and/or modify
70 it under the same terms as Perl itself.
72 =head1 DISCLAIMER
74 This software is provided "as is" without warranty of any kind.
76 =head1 SEE ALSO
78 http://www.ebi.ac.uk/Tools/webservices/soaplab/guide
80 =head1 APPENDIX
82 This is actually the main documentation...
84 If you try to call any of these methods directly on this
85 C<Bio::AnalysisI> object you will get a I<not implemented> error
86 message. You need to call them on a C<Bio::Tools::Run::Analysis> object instead.
88 =cut
91 # Let the code begin...
93 package Bio::AnalysisI;
94 use strict;
96 use base qw(Bio::Root::RootI);
98 # -----------------------------------------------------------------------------
100 =head2 analysis_name
102 Usage : $tool->analysis_name;
103 Returns : a name of this analysis
104 Args : none
106 =cut
108 sub analysis_name { shift->throw_not_implemented(); }
110 # -----------------------------------------------------------------------------
112 =head2 analysis_spec
114 Usage : $tool->analysis_spec;
115 Returns : a hash reference describing this analysis
116 Args : none
118 The returned hash reference uses the following keys (not all of them always
119 present, perhaps others present as well): C<name>, C<type>, C<version>,
120 C<supplier>, C<installation>, C<description>.
122 Here is an example output:
124 Analysis 'edit.seqret':
125 installation => EMBL-EBI
126 description => Reads and writes (returns) sequences
127 supplier => EMBOSS
128 version => 2.6.0
129 type => edit
130 name => seqret
132 =cut
134 sub analysis_spec { shift->throw_not_implemented(); }
136 # -----------------------------------------------------------------------------
138 =head2 describe
140 Usage : $tool->analysis_spec;
141 Returns : an XML detailed description of this analysis
142 Args : none
144 The returned XML string contains metadata describing this analysis
145 service. It includes also metadata returned (and easier used) by
146 method C<analysis_spec>, C<input_spec> and C<result_spec>.
148 The DTD used for returned metadata is based on the adopted standard
149 (BSA specification for analysis engine):
151 <!ELEMENT DsLSRAnalysis (analysis)+>
153 <!ELEMENT analysis (description?, input*, output*, extension?)>
155 <!ATTLIST analysis
156 type CDATA #REQUIRED
157 name CDATA #IMPLIED
158 version CDATA #IMPLIED
159 supplier CDATA #IMPLIED
160 installation CDATA #IMPLIED>
162 <!ELEMENT description ANY>
163 <!ELEMENT extension ANY>
165 <!ELEMENT input (default?, allowed*, extension?)>
167 <!ATTLIST input
168 type CDATA #REQUIRED
169 name CDATA #REQUIRED
170 mandatory (true|false) "false">
172 <!ELEMENT default (#PCDATA)>
173 <!ELEMENT allowed (#PCDATA)>
175 <!ELEMENT output (extension?)>
177 <!ATTLIST output
178 type CDATA #REQUIRED
179 name CDATA #REQUIRED>
181 But the DTD may be extended by provider-specific metadata. For
182 example, the EBI experimental SOAP-based service on top of EMBOSS uses
183 DTD explained at C<http://www.ebi.ac.uk/~senger/applab>.
185 =cut
187 sub describe { shift->throw_not_implemented(); }
189 # -----------------------------------------------------------------------------
191 =head2 input_spec
193 Usage : $tool->input_spec;
194 Returns : an array reference with hashes as elements
195 Args : none
197 The analysis input data are named, and can be also associated with a
198 default value, with allowed values and with few other attributes. The
199 names are important for feeding the service with the input data (the
200 inputs are given to methods C<create_job>, C<Bio::AnalysisI|run>, and/or
201 C<Bio::AnalysisI|wait_for> as name/value pairs).
203 Here is a (slightly shortened) example of an input specification:
205 $input_spec = [
207 'mandatory' => 'false',
208 'type' => 'String',
209 'name' => 'sequence_usa'
212 'mandatory' => 'false',
213 'type' => 'String',
214 'name' => 'sequence_direct_data'
217 'mandatory' => 'false',
218 'allowed_values' => [
219 'gcg',
220 'gcg8',
222 'raw'
224 'type' => 'String',
225 'name' => 'sformat'
228 'mandatory' => 'false',
229 'type' => 'String',
230 'name' => 'sbegin'
233 'mandatory' => 'false',
234 'type' => 'String',
235 'name' => 'send'
238 'mandatory' => 'false',
239 'type' => 'String',
240 'name' => 'sprotein'
243 'mandatory' => 'false',
244 'type' => 'String',
245 'name' => 'snucleotide'
248 'mandatory' => 'false',
249 'type' => 'String',
250 'name' => 'sreverse'
253 'mandatory' => 'false',
254 'type' => 'String',
255 'name' => 'slower'
258 'mandatory' => 'false',
259 'type' => 'String',
260 'name' => 'supper'
263 'mandatory' => 'false',
264 'default' => 'false',
265 'type' => 'String',
266 'name' => 'firstonly'
269 'mandatory' => 'false',
270 'default' => 'fasta',
271 'allowed_values' => [
272 'gcg',
273 'gcg8',
274 'embl',
276 'raw'
278 'type' => 'String',
279 'name' => 'osformat'
283 =cut
285 sub input_spec { shift->throw_not_implemented(); }
287 # -----------------------------------------------------------------------------
289 =head2 result_spec
291 Usage : $tool->result_spec;
292 Returns : a hash reference with result names as keys
293 and result types as values
294 Args : none
296 The analysis results are named and can be retrieved using their names
297 by methods C<results> and C<result>.
299 Here is an example of the result specification (again for the service
300 I<edit.seqret>):
302 $result_spec = {
303 'outseq' => 'String',
304 'report' => 'String',
305 'detailed_status' => 'String'
308 =cut
310 sub result_spec { shift->throw_not_implemented(); }
312 # -----------------------------------------------------------------------------
314 =head2 create_job
316 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
317 Returns : Bio::Tools::Run::Analysis::Job
318 Args : data and parameters for this execution
319 (in various formats)
321 Create an object representing a single execution of this analysis
322 tool.
324 Call this method if you wish to "stage the scene" - to create a job
325 with all input data but without actually running it. This method is
326 called automatically from other methods (C<Bio::AnalysisI|run> and
327 C<Bio::AnalysisI|wait_for>) so usually you do not need to call it directly.
329 The input data and prameters for this execution can be specified in
330 various ways:
332 =over
334 =item array reference
336 The array has scalar elements of the form
338 name = [[@]value]
340 where C<name> is the name of an input data or input parameter (see
341 method C<input_spec> for finding what names are recognized by this
342 analysis) and C<value> is a value for this data/parameter. If C<value>
343 is missing a 1 is assumed (which is convenient for the boolean
344 options). If C<value> starts with C<@> it is treated as a local
345 filename, and its contents is used as the data/parameter value.
347 =item hash reference
349 The same as with the array reference but now there is no need to use
350 an equal sign. The hash keys are input names and hash values their
351 data. The values can again start with a C<@> sign indicating a local
352 filename.
354 =item scalar
356 In this case, the parameter represents a job ID obtained in some
357 previous invocation - such job already exists on the server side, and
358 we are just re-creating it here using the same job ID.
360 I<TBD: here we should allow the same by using a reference to the
361 Bio::Tools::Run::Analysis::Job object.>
363 =item undef
365 Finally, if the parameter is undefined, ask server to create an empty
366 job. The input data may be added later using C<set_data...>
367 method(s) - see scripts/papplmaker.PLS for details.
369 =back
371 =cut
373 sub create_job { shift->throw_not_implemented(); }
375 # -----------------------------------------------------------------------------
377 =head2 run
379 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
380 Returns : Bio::Tools::Run::Analysis::Job,
381 representing started job (an execution)
382 Args : the same as for create_job
384 Create a job and start it, but do not wait for its completion.
386 =cut
388 sub run { shift->throw_not_implemented(); }
390 # -----------------------------------------------------------------------------
392 =head2 wait_for
394 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
395 Returns : Bio::Tools::Run::Analysis::Job,
396 representing finished job
397 Args : the same as for create_job
399 Create a job, start it and wait for its completion.
401 Note that this is a blocking method. It returns only after the
402 executed job finishes, either normally or by an error.
404 Usually, after this call, you ask for results of the finished job:
406 $analysis->wait_for (...)->results;
408 =cut
410 sub wait_for { shift->throw_not_implemented(); }
412 # -----------------------------------------------------------------------------
414 # Bio::AnalysisI::JobI
416 # -----------------------------------------------------------------------------
418 package Bio::AnalysisI::JobI;
420 =head1 Module Bio::AnalysisI::JobI
422 An interface to the public methods provided by C<Bio::Tools::Run::Analysis::Job>
423 objects.
425 The C<Bio::Tools::Run::Analysis::Job> objects represent a created,
426 running, or finished execution of an analysis tool.
428 The factory for these objects is module C<Bio::Tools::Run::Analysis>
429 where the following methods return an
430 C<Bio::Tools::Run::Analysis::Job> object:
432 create_job (returning a prepared job)
433 run (returning a running job)
434 wait_for (returning a finished job)
436 =cut
438 use strict;
439 use base qw(Bio::Root::RootI);
441 # -----------------------------------------------------------------------------
443 =head2 id
445 Usage : $job->id;
446 Returns : this job ID
447 Args : none
449 Each job (an execution) is identifiable by this unique ID which can be
450 used later to re-create the same job (in other words: to re-connect to
451 the same job). It is useful in cases when a job takes long time to
452 finish and your client program does not want to wait for it within the
453 same session.
455 =cut
457 sub id { shift->throw_not_implemented(); }
459 # -----------------------------------------------------------------------------
461 =head2 Bio::AnalysisI::JobI::run
463 Usage : $job->run
464 Returns : itself
465 Args : none
467 It starts previously created job. The job already must have all input
468 data filled-in. This differs from the method of the same name of the
469 C<Bio::Tools::Run::Analysis> object where the C<Bio::AnalysisI::JobI::run> method
470 creates also a new job allowing to set input data.
472 =cut
474 sub run { shift->throw_not_implemented(); }
476 # -----------------------------------------------------------------------------
478 =head2 Bio::AnalysisI::JobI::wait_for
480 Usage : $job->wait_for
481 Returns : itself
482 Args : none
484 It waits until a previously started execution of this job finishes.
486 =cut
488 sub wait_for { shift->throw_not_implemented(); }
490 # -----------------------------------------------------------------------------
492 =head2 terminate
494 Usage : $job->terminate
495 Returns : itself
496 Args : none
498 Stop the currently running job (represented by this object). This is a
499 definitive stop, there is no way to resume it later.
501 =cut
503 sub terminate { shift->throw_not_implemented(); }
505 # -----------------------------------------------------------------------------
507 =head2 last_event
509 Usage : $job->last_event
510 Returns : an XML string
511 Args : none
513 It returns a short XML document showing what happened last with this
514 job. This is the used DTD:
516 <!-- place for extensions -->
517 <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
519 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
521 <!ATTLIST analysis_event
522 timestamp CDATA #IMPLIED>
524 <!ELEMENT message (#PCDATA)>
526 <!ELEMENT state_changed EMPTY>
527 <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
528 <!ATTLIST state_changed
529 previous_state (%analysis_state;) "created"
530 new_state (%analysis_state;) "created">
532 <!ELEMENT heartbeat_progress EMPTY>
534 <!ELEMENT percent_progress EMPTY>
535 <!ATTLIST percent_progress
536 percentage CDATA #REQUIRED>
538 <!ELEMENT time_progress EMPTY>
539 <!ATTLIST time_progress
540 remaining CDATA #REQUIRED>
542 <!ELEMENT step_progress EMPTY>
543 <!ATTLIST step_progress
544 total_steps CDATA #IMPLIED
545 steps_completed CDATA #REQUIRED>
547 Here is an example what is returned after a job was created and
548 started, but before it finishes (note that the example uses an
549 analysis 'showdb' which does not need any input data):
551 use Bio::Tools::Run::Analysis;
552 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
553 ->run
554 ->last_event;
556 It prints:
558 <?xml version = "1.0"?>
559 <analysis_event>
560 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
561 <state_changed previous_state="created" new_state="running"/>
562 </analysis_event>
564 The same example but now after it finishes:
566 use Bio::Tools::Run::Analysis;
567 print new Bio::Tools::Run::Analysis (-name => 'display.showdb')
568 ->wait_for
569 ->last_event;
571 <?xml version = "1.0"?>
572 <analysis_event>
573 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
574 <state_changed previous_state="running" new_state="completed"/>
575 </analysis_event>
577 =cut
579 sub last_event { shift->throw_not_implemented(); }
581 # -----------------------------------------------------------------------------
583 =head2 status
585 Usage : $job->status
586 Returns : string describing the job status
587 Args : none
589 It returns one of the following strings (and perhaps more if a server
590 implementation extended possible job states):
592 CREATED
593 RUNNING
594 COMPLETED
595 TERMINATED_BY_REQUEST
596 TERMINATED_BY_ERROR
598 =cut
600 sub status { shift->throw_not_implemented(); }
602 # -----------------------------------------------------------------------------
604 =head2 created
606 Usage : $job->created (1)
607 Returns : time when this job was created
608 Args : optional
610 Without any argument it returns a time of creation of this job in
611 seconds, counting from the beginning of the UNIX epoch
612 (1.1.1970). With a true argument it returns a formatted time, using
613 rules described in C<Bio::Tools::Run::Analysis::Utils::format_time>.
615 =cut
617 sub created { shift->throw_not_implemented(); }
619 # -----------------------------------------------------------------------------
621 =head2 started
623 Usage : $job->started (1)
624 Returns : time when this job was started
625 Args : optional
627 See C<created>.
629 =cut
631 sub started { shift->throw_not_implemented(); }
633 # -----------------------------------------------------------------------------
635 =head2 ended
637 Usage : $job->ended (1)
638 Returns : time when this job was terminated
639 Args : optional
641 See C<created>.
643 =cut
645 sub ended { shift->throw_not_implemented(); }
647 # -----------------------------------------------------------------------------
649 =head2 elapsed
651 Usage : $job->elapsed
652 Returns : elapsed time of the execution of the given job
653 (in milliseconds), or 0 of job was not yet started
654 Args : none
656 Note that some server implementations cannot count in millisecond - so
657 the returned time may be rounded to seconds.
659 =cut
661 sub elapsed { shift->throw_not_implemented(); }
663 # -----------------------------------------------------------------------------
665 =head2 times
667 Usage : $job->times ('formatted')
668 Returns : a hash reference with all time characteristics
669 Args : optional
671 It is a convenient method returning a hash reference with the following
672 keys:
674 created
675 started
676 ended
677 elapsed
679 See C<create> for remarks on time formatting.
681 An example - both for unformatted and formatted times:
683 use Data::Dumper;
684 use Bio::Tools::Run::Analysis;
685 my $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
686 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
687 ->times (1);
688 print Data::Dumper->Dump ( [$rh], ['Times']);
689 $rh = Bio::Tools::Run::Analysis->new(-name => 'nucleic_cpg_islands.cpgplot')
690 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
691 ->times;
692 print Data::Dumper->Dump ( [$rh], ['Times']);
694 $Times = {
695 'ended' => 'Mon Mar 3 17:52:06 2003',
696 'started' => 'Mon Mar 3 17:52:05 2003',
697 'elapsed' => '1000',
698 'created' => 'Mon Mar 3 17:52:05 2003'
700 $Times = {
701 'ended' => '1046713961',
702 'started' => '1046713926',
703 'elapsed' => '35000',
704 'created' => '1046713926'
707 =cut
709 sub times { shift->throw_not_implemented(); }
711 # -----------------------------------------------------------------------------
713 =head2 results
715 Usage : $job->results (...)
716 Returns : one or more results created by this job
717 Args : various, see belou
719 This is a complex method trying to make sense for all kinds of
720 results. Especially it tries to help to put binary results (such as
721 images) into local files. Generally it deals with fhe following facts:
723 =over
725 =item *
727 Each analysis tool may produce more results.
729 =item *
731 Some results may contain binary data not suitable for printing into a
732 terminal window.
734 =item *
736 Some results may be split into variable number of parts (this is
737 mainly true for the image results that can consist of more *.png
738 files).
740 =back
742 Note also that results have names to distinguish if there are more of
743 them. The names can be obtained by method C<result_spec>.
745 Here are the rules how the method works:
747 Retrieving NAMED results:
748 -------------------------
749 results ('name1', ...) => return results as they are, no storing into files
751 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
752 results ( 'name1=filename', ...) => ditto
754 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
755 results ( 'name1=-', ...) => ditto
757 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
758 this method, perhaps using RESULT_NAME_TEMPLATE env
759 results ( 'name1=@', ...) => ditto
761 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
762 {'name1'=>'@' for binary files, and a regular
763 return for non-binary files
764 results ( 'name=?', ...) => ditto
766 Retrieving ALL results:
767 -----------------------
768 results() => return all results as they are, no storing into files
770 results ('@') => return all results, as if each of them given
771 as {'name' => '@'} (see above)
773 results ('?') => return all results, as if each of them given
774 as {'name' => '?'} (see above)
776 Misc:
777 -----
778 * any result can be returned as a scalar value, or as an array reference
779 (the latter is used for results consisting of more parts, such images);
780 this applies regardless whether the returned result is the result itself
781 or a filename created for the result
783 * look in the documentation of the C<panalysis[.PLS]> script for examples
784 (especially how to use various templates for inventing file names)
786 =cut
788 sub results { shift->throw_not_implemented(); }
790 # -----------------------------------------------------------------------------
792 =head2 result
794 Usage : $job->result (...)
795 Returns : the first result
796 Args : see 'results'
798 =cut
800 sub result { shift->throw_not_implemented(); }
802 # -----------------------------------------------------------------------------
804 =head2 remove
806 Usage : $job->remove
807 Returns : 1
808 Args : none
810 The job object is not actually removed in this time but it is marked
811 (setting 1 to C<_destroy_on_exit> attribute) as ready for deletion when
812 the client program ends (including a request to server to forget the job
813 mirror object on the server side).
815 =cut
817 sub remove { shift->throw_not_implemented(); }
819 # -----------------------------------------------------------------------------
822 __END__