Merge branch 'master' into topic/simple_image_upload
[sgn.git] / cgi-bin / content / unigene_builds / Lycopersicon_Combined.pl
blob26b1d45dcbe807b2fc1375b7d46e0370cec690e7
1 use strict;
2 use CXGN::Page;
3 my $page=CXGN::Page->new('Lycopersicon_Combined.html','html2pl converter');
4 $page->header('Lycopersicon Combined Unigene Build Series');
5 print<<END_HEREDOC;
7 <strong>Lycopersicon Combined Unigene Build Series</strong>
9 <p>This unigene build series incorporates ESTs derived from
10 Lycopersicon hirsutum, Lycopersion pennellii, and Lycopersion
11 esculentum cDNA libraries. No other sequences are incorporated at
12 this time. These libraries were constructed at <a href=
13 "http://www.cornell.edu/">Cornell University</a> as part of the
14 NSF funded <a href="/about/tomato_project/index.pl">Tomato
15 Genomics Project (#9872617)</a>, and sequenced predominantly by
16 <a href="http://www.tigr.org/">TIGR</a>. In pre-funding stages of
17 the project, pilot sequencing was also provided by Cereon and
18 Novartis. All sequences are 5' reads, except approximately 1\% of
19 the clones, selected at random, were also sequenced from the 3'
20 end.</p>
22 <p>Summary of new features in this build</p>
24 <ul>
25 <li>New 5' and 3' sequences from re-arrayed "TUS" library</li>
27 <li>New trimming and quality evaluation process</li>
29 <li>New chimera screening processes reduce number of chimeric
30 sequences introduced to the assembly process</li>
32 <li>rRNA and cloning host contamination screened out</li>
34 <li>BLAST results against NR and other databases cached and
35 displayed automatically</li>
36 </ul><strong>New Sequences</strong><br />
38 <p>New in the latest iteration of this build is the creation and
39 incorporation of a "re-arrayed" library which contains clones
40 selected from the set of plates originally sequenced, spanning
41 all of our cDNA libraries. Clones were selected to span our
42 previous Lycopersicon unigene build as well as all of the clones
43 used on the publicly available <a href=
44 "http://bti.cornell.edu/CGEP/CGEP.html">Tomato cDNA
45 microarray</a>. Efforts are in progress to (re)sequence this set
46 of clones from both 5' and 3' ends and incorporate these new,
47 paired reads, into our unigene assemblies. While this additional
48 sequencing project is not yet complete, the current Lycopersicon
49 Combined build incorporates 11732 new 5' reads and 9897 new 3'
50 reads.</p>
52 <p>Re-sequencing of TOM1 microarray clones was funded by <a href=
53 "http://www.inra.fr">INRA</a> (French National Institute of
54 Agronomics Research). For further information, contact:</p>
55 <pre>
56 Mondher Bouzayen (bouzayen\@ensat.fr)
57 Genomics and Fruit Biotechnology Lab.
58 UMR 990 INRA/INP-ENSAT
59 Avenue de l'agrobiopole
60 BP107 Auzeville-Tolosan
61 F-31326 Castanet Tolosan Cedex, France
62 </pre>
64 <p>Additional funding for 5'/3' sequencing non-array TUS clones
65 was provided by the Italian Ministry of Agriculture and Forestry
66 (MiPAF) as part of project DM357/7303/01, and performed by
67 <a href="http://www.avesthagen.com/">Avesthagen Technologies
68 Ltd.</a>, India. For further information, contact:</p>
69 <pre>
70 Chris Bowler (chris\@szn.it)
71 Stazione Zoologica
72 Naples, Italy
73 </pre>
75 <p><strong>New trimming and quality evaluation</strong><br /></p>
77 <p>Also new in this build is a completely redesigned raw data
78 processing pipeline. All reads currently incorporated into SGN's
79 unigenes are processed directly from the original chromatogram
80 file. The high-quality portion of the cDNA insert is recovered by
81 our own customized insert recovery process. [<font color=
82 "gray">Details page under development</font>]<br /></p>
84 <p><strong>Chimera Screening</strong><br /></p>
86 <p>Incorporated as well in this pipeline are 3 independent
87 screens for chimeric sequences. While none of these screens have
88 been validated yet in the laboratory, statistics collected during
89 EST preclustering show substantial reduction in putative false
90 joining of EST clusters. ESTs considered to be putative chimeras
91 are censored from the assembly process, reducing the false
92 representation of spurious cDNA ligations as novel genes in the
93 unigene output. [<font color="gray">Details page under
94 development</font>]<br /></p>
96 <p><strong>BLAST results stored</strong><br /></p>
98 <p>With the current Lycopersicon Combined build and future
99 unigene builds, BLASTs against common databases such as the
100 genbank non-redundant peptide database (genbank/nr) and the
101 Arabidopsis predicted proteome (TAIR) are precomputed and stored
102 in SGN's databases. Stored matches can be viewed on unigene
103 search result pages as a simple first pass annotation, where
104 matches exist. [<a href=
105 "/search/unigene.pl?unigene_id=145962&amp;force_image=1">see
106 example</a>]<br /></p>
108 <p><strong>Using this unigene build</strong><br /></p>
110 <p>Utilizing the unigene build is as simple as searching against
111 it. The most straight-forward way to search is to use the
112 <a href="/tools/blast/">SGN BLAST
113 interface</a> and select Lycopersicon Combined as the target
114 database. Note that this BLAST database is nucleotide sequence,
115 so you must use BLASTN, TBLASTN, or TBLASTX to search it. The
116 resulting matches will provide links to detail pages for those
117 unigenes.</p>
119 <p>Another way to use this unigene build is to <a href=
120 "/search/direct_search.pl">search it directly</a> with an
121 SGN-U# identifier, or search the EST database with an EST
122 identifier. The former requires you to have already identified
123 the unigene before and noted its SGN-U#, or to have noted a
124 reference somewhere using this identifier. The EST database
125 however can be searched with genbank accessions, facility
126 assigned identifiers, and clone stock identifiers, in addition to
127 SGN's native SGN-E# identifiers.</p>
129 <p>Finally, uses of TIGR's tomato gene index may search against
130 this and other Lycopersicon builds using TIGR's TC#s. Any current
131 TIGR TC# or older numbers from previous releases for which TIGR
132 still maintains tracking information can be used to identify SGN
133 unigenes which are assemblies of sequences in common those
134 assembled in TIGR's tentative consensus assembly. Note that this
135 mapping is not one-to-one, but many-to-many. Furthermore, our
136 build contains nearly 20,000 sequences not contained in TIGR's
137 most recent gene index release for tomato, and their build
138 contains input sequences from exogenous sources which we did not
139 include in this build.<br /></p>
141 <br />
143 END_HEREDOC
144 $page->footer();