Bio::DB::TFBS namespace has been moved to its own distribution named after itself
[bioperl-live.git] / Bio / SearchDist.pm
blob1cc8ca9b0a5dfe963641316199da925ffa27c547
3 # BioPerl module for Bio::SearchDist
5 # Please direct questions and support issues to <bioperl-l@bioperl.org>
7 # Cared for by Ewan Birney <birney@ebi.ac.uk>
9 # Copyright Ewan Birney
11 # You may distribute this module under the same terms as perl itself
13 # POD documentation - main docs before the code
15 =head1 NAME
17 Bio::SearchDist - A perl wrapper around Sean Eddy's histogram object
19 =head1 SYNOPSIS
21 $dis = Bio::SearchDist->new();
22 foreach $score ( @scores ) {
23 $dis->add_score($score);
26 if( $dis->fit_evd() ) {
27 foreach $score ( @scores ) {
28 $evalue = $dis->evalue($score);
29 print "Score $score had an evalue of $evalue\n";
31 } else {
32 warn("Could not fit histogram to an EVD!");
35 =head1 DESCRIPTION
37 The Bio::SearchDist object is a wrapper around Sean Eddy's excellent
38 histogram object. The histogram object can bascially take in a number
39 of scores which are sensibly distributed somewhere around 0 that come
40 from a supposed Extreme Value Distribution. Having add all the scores
41 from a database search via the add_score method you can then fit a
42 extreme value distribution using fit_evd(). Once fitted you can then
43 get out the evalue for each score (or a new score) using
44 evalue($score).
46 The fitting procedure is better described in Sean Eddy's own code
47 (available from http://hmmer.janelia.org/, or in the histogram.h header
48 file in Compile/SW). Bascially it fits a EVD via a maximum likelhood
49 method with pruning of the top end of the distribution so that real
50 positives are discarded in the fitting procedure. This comes from
51 an originally idea of Richard Mott's and the likelhood fitting
52 is from a book by Lawless [should ref here].
55 The object relies on the fact that the scores are sensibly distributed
56 around about 0 and that integer bins are sensible for the
57 histogram. Scores based on bits are often ideal for this (bits based
58 scoring mechanisms is what this histogram object was originally
59 designed for).
62 =head1 CONTACT
64 The original code this was based on comes from the histogram module as
65 part of the HMMer2 package. Look at http://hmmer.janelia.org/
67 Its use in Bioperl is via the Compiled XS extension which is cared for
68 by Ewan Birney (birney@ebi.ac.uk). Please contact Ewan first about
69 the use of this module
71 =head1 FEEDBACK
73 =head2 Mailing Lists
75 User feedback is an integral part of the evolution of this and other
76 Bioperl modules. Send your comments and suggestions preferably to one
77 of the Bioperl mailing lists. Your participation is much appreciated.
79 bioperl-l@bioperl.org - General discussion
80 http://bioperl.org/wiki/Mailing_lists - About the mailing lists
82 =head2 Support
84 Please direct usage questions or support issues to the mailing list:
86 I<bioperl-l@bioperl.org>
88 rather than to the module maintainer directly. Many experienced and
89 reponsive experts will be able look at the problem and quickly
90 address it. Please include a thorough description of the problem
91 with code and data examples if at all possible.
93 =head2 Reporting Bugs
95 Report bugs to the Bioperl bug tracking system to help us keep track
96 the bugs and their resolution. Bug reports can be submitted via the
97 web:
99 https://github.com/bioperl/bioperl-live/issues
101 =head1 APPENDIX
103 The rest of the documentation details each of the object
104 methods. Internal methods are usually preceded with a _
106 =cut
109 # Let the code begin...
112 package Bio::SearchDist;
113 use strict;
116 BEGIN {
117 eval {
118 require Bio::Ext::Align;
120 if ( $@ ) {
121 print $@;
122 print STDERR ("\nThe C-compiled engine for histogram object (Bio::Ext::Align) has not been installed.\n Please install the bioperl-ext package\n\n");
123 exit(1);
128 use base qw(Bio::Root::Root);
130 sub new {
131 my($class,@args) = @_;
132 my $self = $class->SUPER::new(@args);
133 my($min, $max, $lump) =
134 $self->_rearrange([qw(MIN MAX LUMP)], @args);
136 if( ! $min ) {
137 $min = -100;
140 if( ! $max ) {
141 $max = +100;
144 if( ! $lump ) {
145 $lump = 50;
148 $self->_engine(&Bio::Ext::Align::new_Histogram($min,$max,$lump));
150 return $self;
153 =head2 add_score
155 Title : add_score
156 Usage : $dis->add_score(300);
157 Function: Adds a single score to the distribution
158 Returns : nothing
159 Args :
162 =cut
164 sub add_score{
165 my ($self,$score) = @_;
166 my ($eng);
167 $eng = $self->_engine();
168 #$eng->AddToHistogram($score);
169 $eng->add($score);
172 =head2 fit_evd
174 Title : fit_evd
175 Usage : $dis->fit_evd();
176 Function: fits an evd to the current distribution
177 Returns : 1 if it fits successfully, 0 if not
178 Args :
181 =cut
183 sub fit_evd{
184 my ($self,@args) = @_;
186 return $self->_engine()->fit_EVD(10000,1);
189 =head2 fit_Gaussian
191 Title : fit_Gaussian
192 Usage :
193 Function:
194 Example :
195 Returns :
196 Args :
199 =cut
201 sub fit_Gaussian{
202 my ($self,$high) = @_;
204 if( ! defined $high ) {
205 $high = 10000;
208 return $self->_engine()->fit_Gaussian($high);
212 =head2 evalue
214 Title : evalue
215 Usage : $eval = $dis->evalue($score)
216 Function: Returns the evalue of this score
217 Returns : float
218 Args :
221 =cut
223 sub evalue{
224 my ($self,$score) = @_;
226 return $self->_engine()->evalue($score);
232 =head2 _engine
234 Title : _engine
235 Usage : $obj->_engine($newval)
236 Function: underlyine bp_sw:: histogram engine
237 Returns : value of _engine
238 Args : newvalue (optional)
241 =cut
243 sub _engine{
244 my ($self,$value) = @_;
245 if( defined $value) {
246 $self->{'_engine'} = $value;
248 return $self->{'_engine'};
252 ## End of Package
255 __END__