stash analysis logfile data for histogram construction
[sgn.git] / cgi-bin / maps / physical / overgo_process_explained.pl
blobf255c5732d4be2cb84defbca8dbcd1f050634b91
1 #!/usr/bin/perl
3 ######################################################################
5 # This program is barely a program at all, although it does pull
6 # somewhat from the database and so ended up a program.
8 # Nevertheless, the point here is really to generate a largely
9 # static page, one which serves to explicate the Overgo Plating
10 # process.
12 ######################################################################
14 use strict;
15 use CXGN::Page;
16 use CXGN::DB::Connection;
17 use CXGN::DB::Physical;
18 use CXGN::Page::FormattingHelpers qw/blue_section_html page_title_html/;
20 # Presets.
21 my $overgo_stats_page = 'overgo_stats.pl';
22 my $physical_map_page = '/cview/map.pl?map_id=9&physical=1';
23 my $physical_map_overview = '/cview/map.pl?map_id=9&physical=1';
24 my $soop_home = 'http://genome.nhgri.nih.gov/soop/';
25 my $agi_page = 'http://www.genome.arizona.edu/fpc/tomato/';
26 my $bac_search_page = '/search/direct_search.pl?search=BACs';
28 # Connect to the db.
29 my $dbh=CXGN::DB::Connection->new('physical');
31 # Prepare the page.
32 our $page = CXGN::Page->new( "About the overgo plating process", "Robert Buels");
34 # Find out how many plates have been processed to date.
35 my $plates_sth = $dbh->prepare("SELECT COUNT(DISTINCT plate_number) FROM overgo_plates");
36 $plates_sth->execute;
37 my $number_of_plates_processed = $plates_sth->fetchrow_array;
38 $plates_sth->finish;
39 $number_of_plates_processed || $page->error_page("No Overgo Plates found in physical database.");
41 # Print the page.
42 $page->header("The overgo plating process");
43 print page_title_html('About the overgo plating process');
45 print blue_section_html('The overgo plating process',<<EOH);
46 This page explains (briefly) the nature of the Overgo Plating experiments
47 conducted at SGN. We also explain the terminology used on our
48 <a href="$overgo_stats_page">Overgo statistics page</a> in relation to the
49 various stages of processing we have conducted.
50 EOH
52 print blue_section_html('About the overgo plating process',<<EOH);
53 <p>Overgo plates have 96 wells, arranged in 8 rows by 12 columns. DNA sequence
54 for one overgo probe (a sequenced marker from our maps collection) is assigned
55 to each well. Wells are then <i>pooled</i> using the <a href="$soop_home">Soop</a>
56 program to produce 20 pools. (Equivalent to 8 row-pools + 12 column-pools.)
57 </p>
59 <p>BAC plates are then run against these 20 pools for each plate to determine
60 which BACs match a given pool. From this, the matches of BACs to probes
61 can be inferred. A probe which successfully matches one or more BACs is
62 said to "anchor" them to the <a href="$physical_map_page">Genetic Map</a> and
63 is thus referred to as an "anchor point".
64 </p>
67 <p>To successfully anchor to a given probe, we require a BAC to:</p>
69 <ol>
70 <li>successfully match both its row-pool and column-pool, and</li>
71 <li>not match any other pools on that plate.</li>
72 </ol>
74 <p>In the case where BACs matched more than one row and one column on a given
75 plate, we classify them as being <i>ambiguous</i> BACs. These are stored
76 separately and are not shown on our <a href="$physical_map_overview">physical
77 map</a>.
78 </p>
79 <a name="fpc_contigging"></a>
80 EOH
82 print blue_section_html('The FPC process',<<EOH);
83 Meanwhile, working from the other end, our collaborators at the
84 <a href="$agi_page">Arizona Genomics Institute</a> are using <i>Fingerprint
85 Contigging (FPC)</i> techniques to assemble the BAC collection into contigs.
86 The BAC &lt;--&gt; map position matches generated by our overgo plating experiment
87 are used to inform the FPC process and improve the contigging conducted there.
88 EOH
90 print blue_section_html('In silico processing of BAC anchoring',<<EOH);
91 <p>After establishing the unambiguous BAC &lt;--&gt; probe associations, a further stage
92 of in silico processing was done to check for the <i>plausibility</i> of BAC
93 matches. Our initial "plausible set" contains all unambiguously anchored
94 BACs. We then remove BACs from this set which fail to meet certain criteria.
95 </p>
97 <p>In general, we expect a given BAC to only match up to a portion of one chromosome.
98 Thus, as a first step, we drop from our plausible set all BACs which are anchored
99 to markers on two or more chromosomes. [<i>N.B.</i> - Once data from all plates are in,
100 it is our intention to conduct more complex analysis. Specifically, if a BAC
101 has multiple anchor points within a tight range on one chromosome and only a
102 lone anchor point elsewhere then we may reasonably conclude that that one match
103 is the aberrancy. However, at this stage the depth of data necessary to drive this
104 analysis has not been accumulated.]
105 </p>
107 <p>Secondly, we expect BACs to be anchored largely to one portion of a chromosome,
108 rather than randomly anchored down its length. Thus we require a plausible BAC
109 to have a "walk" of anchor points, all of which are within a reasonable distance
110 of one another. Arbitrarily, we have chosen a distance of 5.0 cM as the maximum
111 distance between any adjacent anchor points for a given BAC. BACs which are
112 in violation of this principle are dropped from our plausible set.
113 </p>
115 <p>Finally, we consider BAC contigs (as generated using FPC) to be plausible if:</p>
117 <ol>
118 <li>All of their member BACs which are anchored lie on the same chromosome, and</li>
119 <li>There is a "walk" of BACs down the length of the contig where no two anchor
120 points for the contig are more than 5.0 cM apart.</li>
121 </ol>
124 print blue_section_html('Summary of terms',<<EOH);
125 <p>This section gives a summary of the terms used on the
126 <a href="$overgo_stats_page">Overgo Plating Statistics</a> page. To
127 view the numbers for any of the following, please refer to that
128 page.</p>
130 <p>Terms are listed here in alphabetical order.</p>
131 </div>
133 <div class="indentedcontent">
134 <dl id="overgodefs">
135 <dt><a name="anchorpoint"></a>
136 Anchor Point
137 </dt>
138 <dd>
139 An overgo probe (qv) which has successfully been associated with one
140 or more BACs is said to anchor them to the map and is thus referred to as
141 an "anchor point".
142 </dd>
144 <dt><a name="bacs"></a>
145 BACs
146 </dt>
147 <dd>
148 BACs are <i>Bacterial Artificial Chromosomes</i>. The same library of
149 BACs is
150 used for both the FPC contigging and the overgo plating process. The
151 total number of BAC clones on the library plates is 129024, but not
152 all clones are valid and so the number of BACs reported is lower.
153 </dd>
155 <dt><a name="emptywells"></a>
156 Empty wells
157 </dt>
158 <dd>
159 Overgo plates (qv) typically have 96 wells. In order to facilitate clear
160 matching between one probe in a given well, the dispersal of markers on
161 the plates has been <i>deconvoluted</i> so that no two markers sharing
162 a pool lie within 5.0 cM of one another on the <a
163 href="$physical_map_page">Genetic Map</a>. A consequence of this is that
164 it has sometimes been necessary to leave some wells empty on given plates
165 in order to ensure a thorough dispersal. Thus, not every overgo plate
166 contains 96 probes. Instead, some wells may remain empty. The total
167 number of probes + empty wells for a given plate should add up to 96.
168 </dd>
170 <dt><a name="hittheplates"></a>
171 Hitting the plates
172 </dt>
173 <dd>
174 BACs which matched at least one overgo pool (corresponding to either a
175 row or a column) on at least one of the overgo plates are said to have
176 "hit the plates", in that they have at least been found to in some way
177 match the sequences of the genetic markers known to SGN.
178 </dd>
180 <dt><a name="plates"></a>
181 Overgo Plates
182 </dt>
183 <dd>
184 Each overgo plate contains up to 96 probes, arranged in 8 rows and 12 columns.
185 To date, $number_of_plates_processed plates have been processed.
186 </dd>
188 <dt><a name="probes"></a>
189 Overgo Probes
190 </dt>
191 <dd>
192 Each <i>probe</i> refers to one sequenced marker found in the SGN tomato
193 maps collection. Each marker is placed once on the overgo plates we
194 designed and counts as a probe. If a probe successfully associates
195 BACs to the map then it is said to be an anchor point.
196 </dd>
198 <dt><a name="plausible"></a>
199 Plausible BAC locations
200 </dt>
201 <dd>
202 The overgo plating process may produce multiple associations to probes for a given
203 BAC. If those probes all lie on the same chromosome, within a relatively
204 well clustered region, then we say the BAC is <i>plausibly</i> anchored to
205 that chromosome by those anchor points. The criterion for "well clustered"
206 is that no two consecutive anchor points are more than 5.0 cM apart.
208 An FPC contig whose plausibly anchored BACs all lie on the same chromosome
209 is said to be a <i>plausible contig</i>.
210 </dd>
212 <dt><a name="ambiguity"></a>
213 Unambiguous matching, a.k.a. plausible matching
214 </dt>
215 <dd>
216 <p>Ambiguity refers to the match between a given BAC and a given overgo plate.
217 If the BAC matches exactly one row-pool and one column-pool, thus specifying
218 a clear match to one probe on the plate, then it is taken to be unambiguously
219 anchored to that probe. Otherwise, the set of possible probes that it could
220 be matched to is the set of ambiguous matches accorded to that BAC.</p>
221 </dd>
223 </dl>
226 print blue_section_html('Links',<<EOH);
227 <ul>
228 <li><a href="$overgo_stats_page">Progress of the Overgo Plating Project</a></li>
229 <li><a href="$agi_page">BAC Contigging by FPC at the Arizona Genomics Institute</a></li>
230 <li><a href="$physical_map_page">Physical map abstract page</a></li>
231 <li><a href="$physical_map_overview">Overview of the physical map</a></li>
232 <li><a href="$bac_search_page">Search SGN for BACs</a></li>
233 </ul>
236 $page->footer;