bp_process_wormbase: move program to new Bio-DB-Ace distribution
[bioperl-live.git] / Bio / DasI.pm
blob02e8d244c81227db562ae29430807dfa10b26ba9
2 # BioPerl module for Bio::DasI
4 # Please direct questions and support issues to <bioperl-l@bioperl.org>
6 # Cared for by Lincoln Stein <lstein@cshl.org>
8 # Copyright Lincoln Stein
10 # You may distribute this module under the same terms as perl itself
12 # POD documentation - main docs before the code
14 =head1 NAME
16 Bio::DasI - DAS-style access to a feature database
18 =head1 SYNOPSIS
20 # Open up a feature database somehow...
21 $db = Bio::DasI->new(@args);
23 @segments = $db->segment(-name => 'NT_29921.4',
24 -start => 1,
25 -end => 1000000);
27 # segments are Bio::Das::SegmentI - compliant objects
29 # fetch a list of features
30 @features = $db->features(-type=>['type1','type2','type3']);
32 # invoke a callback over features
33 $db->features(-type=>['type1','type2','type3'],
34 -callback => sub { ... }
37 $stream = $db->get_seq_stream(-type=>['type1','type2','type3']);
38 while (my $feature = $stream->next_seq) {
39 # each feature is a Bio::SeqFeatureI-compliant object
42 # get all feature types
43 @types = $db->types;
45 # count types
46 %types = $db->types(-enumerate=>1);
48 @feature = $db->get_feature_by_name($class=>$name);
49 @feature = $db->get_feature_by_target($target_name);
50 @feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2);
51 $feature = $db->get_feature_by_id($id);
53 $error = $db->error;
55 =head1 DESCRIPTION
57 Bio::DasI is a simplified alternative interface to sequence annotation
58 databases used by the distributed annotation system (see
59 L<Bio::Das>). In this scheme, the genome is represented as a series of
60 features, a subset of which are named. Named features can be used as
61 reference points for retrieving "segments" (see L<Bio::Das::SegmentI>),
62 and these can, in turn, be used as the basis for exploring the genome
63 further.
65 In addition to a name, each feature has a "class", which is
66 essentially a namespace qualifier and a "type", which describes what
67 type of feature it is. Das uses the GO consortium's ontology of
68 feature types, and so the type is actually an object of class
69 Bio::Das::FeatureTypeI (see L<Bio::Das::FeatureTypeI>). Bio::DasI
70 provides methods for interrogating the database for the types it
71 contains and the counts of each type.
73 =head1 FEEDBACK
75 =head2 Mailing Lists
77 User feedback is an integral part of the evolution of this and other
78 Bioperl modules. Send your comments and suggestions preferably to one
79 of the Bioperl mailing lists. Your participation is much appreciated.
81 bioperl-l@bioperl.org
83 =head2 Support
85 Please direct usage questions or support issues to the mailing list:
87 I<bioperl-l@bioperl.org>
89 rather than to the module maintainer directly. Many experienced and
90 reponsive experts will be able look at the problem and quickly
91 address it. Please include a thorough description of the problem
92 with code and data examples if at all possible.
94 =head2 Reporting Bugs
96 Report bugs to the Bioperl bug tracking system to help us keep track
97 the bugs and their resolution. Bug reports can be submitted via the web:
99 https://github.com/bioperl/bioperl-live/issues
101 =head1 AUTHOR - Lincoln Stein
103 Email lstein@cshl.org
105 =head1 APPENDIX
107 The rest of the documentation details each of the object
108 methods. Internal methods are usually preceded with a _
110 =cut
113 # Let the code begin...
115 package Bio::DasI;
116 use strict;
118 use Bio::Das::SegmentI;
119 # Object preamble - inherits from Bio::Root::Root;
120 use base qw(Bio::Root::RootI Bio::SeqFeature::CollectionI);
122 =head2 new
124 Title : new
125 Usage : Bio::DasI->new(@args)
126 Function: Create new Bio::DasI object
127 Returns : a Bio::DasI object
128 Args : see below
130 The new() method creates a new object. The argument list is either a
131 single argument consisting of a connection string, or the following
132 list of -name=E<gt>value arguments:
134 Argument Description
135 -------- -----------
137 -dsn Connection string for database
138 -adaptor Name of an adaptor class to use when connecting
139 -aggregator Array ref containing list of aggregators
140 "semantic mappers" to apply to database
141 -user Authentication username
142 -pass Authentication password
144 Implementors of DasI may add other arguments.
146 =cut
148 sub new {shift->throw_not_implemented}
150 =head2 types
152 Title : types
153 Usage : $db->types(@args)
154 Function: return list of feature types in database
155 Returns : a list of Bio::Das::FeatureTypeI objects
156 Args : see below
158 This routine returns a list of feature types known to the database. It
159 is also possible to find out how many times each feature occurs.
161 Arguments are -option=E<gt>value pairs as follows:
163 -enumerate if true, count the features
165 The returned value will be a list of Bio::Das::FeatureTypeI objects
166 (see L<Bio::Das::FeatureTypeI>.
168 If -enumerate is true, then the function returns a hash (not a hash
169 reference) in which the keys are the stringified versions of
170 Bio::Das::FeatureTypeI and the values are the number of times each
171 feature appears in the database.
173 =cut
175 sub types { shift->throw_not_implemented; }
177 =head2 parse_types
179 Title : parse_types
180 Usage : $db->parse_types(@args)
181 Function: parses list of types
182 Returns : an array ref containing ['method','source'] pairs
183 Args : a list of types in 'method:source' form
184 Status : internal
186 This method takes an array of type names in the format "method:source"
187 and returns an array reference of ['method','source'] pairs. It will
188 also accept a single argument consisting of an array reference with
189 the list of type names.
191 =cut
193 # turn feature types in the format "method:source" into a list of [method,source] refs
194 sub parse_types {
195 my $self = shift;
196 return [] if !@_ or !defined($_[0]);
197 return $_[0] if ref $_[0] eq 'ARRAY' && ref $_[0][0];
198 my @types = ref($_[0]) ? @{$_[0]} : @_;
199 my @type_list = map { [split(':',$_,2)] } @types;
200 return \@type_list;
203 =head2 segment
205 Title : segment
206 Usage : $db->segment(@args);
207 Function: create a segment object
208 Returns : segment object(s)
209 Args : see below
211 This method generates a Bio::Das::SegmentI object (see
212 L<Bio::Das::SegmentI>). The segment can be used to find overlapping
213 features and the raw sequence.
215 When making the segment() call, you specify the ID of a sequence
216 landmark (e.g. an accession number, a clone or contig), and a
217 positional range relative to the landmark. If no range is specified,
218 then the entire region spanned by the landmark is used to generate the
219 segment.
221 Arguments are -option=E<gt>value pairs as follows:
223 -name ID of the landmark sequence.
225 -class A namespace qualifier. It is not necessary for the
226 database to honor namespace qualifiers, but if it
227 does, this is where the qualifier is indicated.
229 -version Version number of the landmark. It is not necessary for
230 the database to honor versions, but if it does, this is
231 where the version is indicated.
233 -start Start of the segment relative to landmark. Positions
234 follow standard 1-based sequence rules. If not specified,
235 defaults to the beginning of the landmark.
237 -end End of the segment relative to the landmark. If not specified,
238 defaults to the end of the landmark.
240 The return value is a list of Bio::Das::SegmentI objects. If the method
241 is called in a scalar context and there are no more than one segments
242 that satisfy the request, then it is allowed to return the segment.
243 Otherwise, the method must throw a "multiple segment exception".
245 =cut
249 sub segment { shift->throw_not_implemented }
251 =head2 features
253 Title : features
254 Usage : $db->features(@args)
255 Function: get all features, possibly filtered by type
256 Returns : a list of Bio::SeqFeatureI objects
257 Args : see below
258 Status : public
260 This routine will retrieve features in the database regardless of
261 position. It can be used to return all features, or a subset based on
262 their type
264 Arguments are -option=E<gt>value pairs as follows:
266 -types List of feature types to return. Argument is an array
267 of Bio::Das::FeatureTypeI objects or a set of strings
268 that can be converted into FeatureTypeI objects.
270 -callback A callback to invoke on each feature. The subroutine
271 will be passed each Bio::SeqFeatureI object in turn.
273 -attributes A hash reference containing attributes to match.
275 The -attributes argument is a hashref containing one or more attributes
276 to match against:
278 -attributes => { Gene => 'abc-1',
279 Note => 'confirmed' }
281 Attribute matching is simple exact string matching, and multiple
282 attributes are ANDed together. See L<Bio::DB::ConstraintsI> for a
283 more sophisticated take on this.
285 If one provides a callback, it will be invoked on each feature in
286 turn. If the callback returns a false value, iteration will be
287 interrupted. When a callback is provided, the method returns undef.
289 =cut
291 sub features { shift->throw_not_implemented }
293 =head2 get_feature_by_name
295 Title : get_feature_by_name
296 Usage : $db->get_feature_by_name(-class=>$class,-name=>$name)
297 Function: fetch features by their name
298 Returns : a list of Bio::SeqFeatureI objects
299 Args : the class and name of the desired feature
300 Status : public
302 This method can be used to fetch named feature(s) from the database.
303 The -class and -name arguments have the same meaning as in segment(),
304 and the method also accepts the following short-cut forms:
306 1) one argument: the argument is treated as the feature name
307 2) two arguments: the arguments are treated as the class and name
308 (note: this uses _rearrange() so the first argument must not
309 begin with a hyphen or it will be interpreted as a named
310 argument).
312 This method may return zero, one, or several Bio::SeqFeatureI objects.
313 The implementor may allow the name to contain wildcards, in which case
314 standard C-shell glob semantics are expected.
316 =cut
318 sub get_feature_by_name {
319 shift->throw_not_implemented();
322 =head2 get_feature_by_target
324 Title : get_feature_by_target
325 Usage : $db->get_feature_by_target($class => $name)
326 Function: fetch features by their similarity target
327 Returns : a list of Bio::SeqFeatureI objects
328 Args : the class and name of the desired feature
329 Status : public
331 This method can be used to fetch a named feature from the database
332 based on its similarity hit. The arguments are the same as
333 get_feature_by_name(). If this is not implemented, the interface
334 defaults to using get_feature_by_name().
336 =cut
338 sub get_feature_by_target {
339 shift->get_feature_by_name(@_);
342 =head2 get_feature_by_id
344 Title : get_feature_by_id
345 Usage : $db->get_feature_by_target($id)
346 Function: fetch a feature by its ID
347 Returns : a Bio::SeqFeatureI objects
348 Args : the ID of the feature
349 Status : public
351 If the database provides unique feature IDs, this can be used to
352 retrieve a single feature from the database. If not overridden, this
353 interface calls get_feature_by_name() and returns the first element.
355 =cut
357 sub get_feature_by_id {
358 (shift->get_feature_by_name(@_))[0];
361 =head2 get_feature_by_attribute
363 Title : get_feature_by_attribute
364 Usage : $db->get_feature_by_attribute(attribute1=>value1,attribute2=>value2)
365 Function: fetch features by combinations of attribute values
366 Returns : a list of Bio::SeqFeatureI objects
367 Args : the class and name of the desired feature
368 Status : public
370 This method can be used to fetch a set of features from the database.
371 Attributes are a list of name=E<gt>value pairs. They will be
372 logically ANDed together. If an attribute value is an array
373 reference, the list of values in the array is treated as an
374 alternative set of values to be ORed together.
376 =cut
378 sub get_feature_by_attribute {
379 shift->throw_not_implemented();
383 =head2 search_notes
385 Title : search_notes
386 Usage : $db->search_notes($search_term,$max_results)
387 Function: full-text search on features, ENSEMBL-style
388 Returns : an array of [$name,$description,$score]
389 Args : see below
390 Status : public
392 This routine performs a full-text search on feature attributes (which
393 attributes depend on implementation) and returns a list of
394 [$name,$description,$score], where $name is the feature ID,
395 $description is a human-readable description such as a locus line, and
396 $score is the match strength.
398 Since this is a decidedly non-standard thing to do (but the generic
399 genome browser uses it), the default method returns an empty list.
400 You do not have to implement it.
402 =cut
404 sub search_notes { return }
406 =head2 get_seq_stream
408 Title : get_seq_stream
409 Usage : $seqio = $db->get_seq_stream(@args)
410 Function: Performs a query and returns an iterator over it
411 Returns : a Bio::SeqIO stream capable of returning Bio::SeqFeatureI objects
412 Args : As in features()
413 Status : public
415 This routine takes the same arguments as features(), but returns a
416 Bio::SeqIO::Stream-compliant object. Use it like this:
418 $stream = $db->get_seq_stream('exon');
419 while (my $exon = $stream->next_seq) {
420 print $exon,"\n";
423 NOTE: In the interface this method is aliased to get_feature_stream(),
424 as the name is more descriptive.
426 =cut
428 sub get_seq_stream { shift->throw_not_implemented }
429 sub get_feature_stream {shift->get_seq_stream(@_) }
431 =head2 refclass
433 Title : refclass
434 Usage : $class = $db->refclass
435 Function: returns the default class to use for segment() calls
436 Returns : a string
437 Args : none
438 Status : public
440 For data sources which use namespaces to distinguish reference
441 sequence accessions, this returns the default namespace (or "class")
442 to use. This interface defines a default of "Accession".
444 =cut
446 sub refclass { "Accession" }