cgi-bin/content/unigene_builds/Lycopersicon_Combined.pl

   1 use strict;
   2 use CXGN::Page;
   3 my $page=CXGN::Page->new('Lycopersicon_Combined.html','html2pl converter');
   4 $page->header('Lycopersicon Combined Unigene Build Series');
   5 print<<END_HEREDOC;
   6
   7   <strong>Lycopersicon Combined Unigene Build Series</strong>
   8
   9   <p>This unigene build series incorporates ESTs derived from
  10   Lycopersicon hirsutum, Lycopersion pennellii, and Lycopersion
  11   esculentum cDNA libraries. No other sequences are incorporated at
  12   this time. These libraries were constructed at <a href=
  13   "http://www.cornell.edu/">Cornell University</a> as part of the
  14   NSF funded <a href="/about/tomato_project/index.pl">Tomato
  15   Genomics Project (#9872617)</a>, and sequenced predominantly by
  16   <a href="http://www.tigr.org/">TIGR</a>. In pre-funding stages of
  17   the project, pilot sequencing was also provided by Cereon and
  18   Novartis. All sequences are 5' reads, except approximately 1\% of
  19   the clones, selected at random, were also sequenced from the 3'
  20   end.</p>
  21
  22   <p>Summary of new features in this build</p>
  23
  24   <ul>
  25     <li>New 5' and 3' sequences from re-arrayed "TUS" library</li>
  26
  27     <li>New trimming and quality evaluation process</li>
  28
  29     <li>New chimera screening processes reduce number of chimeric
  30     sequences introduced to the assembly process</li>
  31
  32     <li>rRNA and cloning host contamination screened out</li>
  33
  34     <li>BLAST results against NR and other databases cached and
  35     displayed automatically</li>
  36   </ul><strong>New Sequences</strong><br />
  37
  38   <p>New in the latest iteration of this build is the creation and
  39   incorporation of a "re-arrayed" library which contains clones
  40   selected from the set of plates originally sequenced, spanning
  41   all of our cDNA libraries. Clones were selected to span our
  42   previous Lycopersicon unigene build as well as all of the clones
  43   used on the publicly available <a href=
  44   "http://bti.cornell.edu/CGEP/CGEP.html">Tomato cDNA
  45   microarray</a>. Efforts are in progress to (re)sequence this set
  46   of clones from both 5' and 3' ends and incorporate these new,
  47   paired reads, into our unigene assemblies. While this additional
  48   sequencing project is not yet complete, the current Lycopersicon
  49   Combined build incorporates 11732 new 5' reads and 9897 new 3'
  50   reads.</p>
  51
  52   <p>Re-sequencing of TOM1 microarray clones was funded by <a href=
  53   "http://www.inra.fr">INRA</a> (French National Institute of
  54   Agronomics Research). For further information, contact:</p>
  55   <pre>
  56 Mondher Bouzayen (bouzayen\@ensat.fr)
  57 Genomics and Fruit Biotechnology Lab.
  58 UMR 990 INRA/INP-ENSAT
  59 Avenue de l'agrobiopole
  60 BP107 Auzeville-Tolosan
  61 F-31326 Castanet Tolosan Cedex, France
  62 </pre>
  63
  64   <p>Additional funding for 5'/3' sequencing non-array TUS clones
  65   was provided by the Italian Ministry of Agriculture and Forestry
  66   (MiPAF) as part of project DM357/7303/01, and performed by
  67   <a href="http://www.avesthagen.com/">Avesthagen Technologies
  68   Ltd.</a>, India. For further information, contact:</p>
  69   <pre>
  70 Chris Bowler (chris\@szn.it)
  71 Stazione Zoologica
  72 Naples, Italy
  73 </pre>
  74
  75   <p><strong>New trimming and quality evaluation</strong><br /></p>
  76
  77   <p>Also new in this build is a completely redesigned raw data
  78   processing pipeline. All reads currently incorporated into SGN's
  79   unigenes are processed directly from the original chromatogram
  80   file. The high-quality portion of the cDNA insert is recovered by
  81   our own customized insert recovery process. [<font color=
  82   "gray">Details page under development</font>]<br /></p>
  83
  84   <p><strong>Chimera Screening</strong><br /></p>
  85
  86   <p>Incorporated as well in this pipeline are 3 independent
  87   screens for chimeric sequences. While none of these screens have
  88   been validated yet in the laboratory, statistics collected during
  89   EST preclustering show substantial reduction in putative false
  90   joining of EST clusters. ESTs considered to be putative chimeras
  91   are censored from the assembly process, reducing the false
  92   representation of spurious cDNA ligations as novel genes in the
  93   unigene output. [<font color="gray">Details page under
  94   development</font>]<br /></p>
  95
  96   <p><strong>BLAST results stored</strong><br /></p>
  97
  98   <p>With the current Lycopersicon Combined build and future
  99   unigene builds, BLASTs against common databases such as the
 100   genbank non-redundant peptide database (genbank/nr) and the
 101   Arabidopsis predicted proteome (TAIR) are precomputed and stored
 102   in SGN's databases. Stored matches can be viewed on unigene
 103   search result pages as a simple first pass annotation, where
 104   matches exist. [<a href=
 105   "/search/unigene.pl?unigene_id=145962&amp;force_image=1">see
 106   example</a>]<br /></p>
 107
 108   <p><strong>Using this unigene build</strong><br /></p>
 109
 110   <p>Utilizing the unigene build is as simple as searching against
 111   it. The most straight-forward way to search is to use the
 112   <a href="/tools/blast/">SGN BLAST
 113   interface</a> and select Lycopersicon Combined as the target
 114   database. Note that this BLAST database is nucleotide sequence,
 115   so you must use BLASTN, TBLASTN, or TBLASTX to search it. The
 116   resulting matches will provide links to detail pages for those
 117   unigenes.</p>
 118
 119   <p>Another way to use this unigene build is to <a href=
 120   "/search/direct_search.pl">search it directly</a> with an
 121   SGN-U# identifier, or search the EST database with an EST
 122   identifier. The former requires you to have already identified
 123   the unigene before and noted its SGN-U#, or to have noted a
 124   reference somewhere using this identifier. The EST database
 125   however can be searched with genbank accessions, facility
 126   assigned identifiers, and clone stock identifiers, in addition to
 127   SGN's native SGN-E# identifiers.</p>
 128
 129   <p>Finally, uses of TIGR's tomato gene index may search against
 130   this and other Lycopersicon builds using TIGR's TC#s. Any current
 131   TIGR TC# or older numbers from previous releases for which TIGR
 132   still maintains tracking information can be used to identify SGN
 133   unigenes which are assemblies of sequences in common those
 134   assembled in TIGR's tentative consensus assembly. Note that this
 135   mapping is not one-to-one, but many-to-many. Furthermore, our
 136   build contains nearly 20,000 sequences not contained in TIGR's
 137   most recent gene index release for tomato, and their build
 138   contains input sequences from exogenous sources which we did not
 139   include in this build.<br /></p>
 140
 141   <br />
 142
 143 END_HEREDOC
 144 $page->footer();