4 my $page = CXGN
::Page
->new('Ghent 2006 Meeting Report','Robert Buels');
5 $page->add_style(text
=> '@page { size: 8.5in 11in; margin: 0.79in }\n' . <<EOS);
6 p { margin-bottom: 0.08in }
7 h1 { margin-bottom: 0.08in }
8 h1.western { font-family: "Arial", sans-serif; font-size: 16pt }
9 h1.cjk { font-family: "MS Mincho"; font-size: 16pt }
10 h1.ctl { font-family: "Tahoma"; font-size: 16pt }
11 h2 { margin-bottom: 0.08in }
12 h2.western { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic }
13 h2.cjk { font-family: "MS Mincho"; font-size: 14pt; font-style: italic }
14 h2.ctl { font-family: "Tahoma"; font-size: 14pt; font-style: italic }
15 h3 { margin-bottom: 0.08in }
16 h3.western { font-family: "Arial", sans-serif }
17 h3.cjk { font-family: "MS Mincho" }
18 h3.ctl { font-family: "Tahoma" }
23 <h1 class="western">Meeting Report: Tomato Annotation Meeting</h1>
24 <h2 class="western">Ghent, Belgium October 23-25<sup>th</sup>, 2006</h2>
25 <p style="margin-bottom: 0in"><br /></p>
26 <p style="margin-bottom: 0in">In Attendance</p>
27 <p style="margin-bottom: 0in"><br /></p>
28 <p style="margin-bottom: 0in"><i>Belgium</i>:</p>
30 <li><p style="margin-bottom: 0in">Stéphane Rombauts</p></li>
31 <li><p style="margin-bottom: 0in">Pierre Rouzé</p></li>
32 <li><p style="margin-bottom: 0in">Yves van der Peer (off and on)</p></li>
34 <p style="margin-bottom: 0in"><i>Netherlands</i>:</p>
36 <li><p style="margin-bottom: 0in">Erwin Datema</p></li>
39 <li><p style="margin-bottom: 0in">Mark Fiers</p></li>
40 <li><p style="margin-bottom: 0in">Roeland van Hamm</p></li>
42 <p style="margin-bottom: 0in"><i>India</i>:</p>
44 <li><p style="margin-bottom: 0in">Saloni Mathur</p></li>
45 <li><p style="margin-bottom: 0in">Saurabh Raghuvanshi</p></li>
47 <p style="margin-bottom: 0in"><i>Korea</i>:</p>
49 <li><p style="margin-bottom: 0in">Kyoo-Yeol Lee</p></li>
51 <p style="margin-bottom: 0in"><i>Spain</i>:</p>
53 <li><p style="margin-bottom: 0in">Francisco Camara</p></li>
54 <li><p style="margin-bottom: 0in">Roderic Guigo</p></li>
56 <p style="margin-bottom: 0in"><i>France</i>:</p>
58 <li><p style="margin-bottom: 0in">Thomas Schiex</p></li>
60 <p style="margin-bottom: 0in"><i>USA</i>:</p>
62 <li><p style="margin-bottom: 0in">Robert Buels</p></li>
63 <li><p style="margin-bottom: 0in">Lukas Mueller</p></li>
65 <p style="margin-bottom: 0in"><i>Italy</i>:</p>
67 <li><p style="margin-bottom: 0in">Maria Luisa Chiusano</p></li>
68 <li><p style="margin-bottom: 0in">Alessandra Traini</p></li>
70 <p style="margin-bottom: 0in"><i>Germany</i>:</p>
72 <li><p style="margin-bottom: 0in">Heiko Schoof (day 2)</p></li>
73 <li><p style="margin-bottom: 0in">Anika Joecker</p></li>
75 <p style="margin-bottom: 0in"><br /></p>
76 <h3 class="western">Purpose</h3>
77 <p style="margin-bottom: 0in">The purpose of the meeting was to
78 discuss the quality of a previously generated gene finder training
79 data set, discuss the performance of already trained, tomato specific
80 gene finders, define a distributed annotation pipeline for the tomato
81 genome sequences that are currently being generated, and to review
82 the data submission procedures. Representatives from 9 countries
83 involved in tomato sequencing and tomato annotation (through the
84 EU-SOL project) were attending the meeting.</p>
85 <p style="margin-bottom: 0in"><br /></p>
86 <h3 class="western">Day 1 – October 23, 2006</h3>
87 <p style="margin-bottom: 0in">First, a representative of every
88 country present gave a brief overview of the sequencing progress.</p>
89 <p style="margin-bottom: 0in"><br /></p>
90 <p style="margin-bottom: 0in">India discussed how overlapping
91 sequences were generated by sequencing seed BACs as far apart as 6cm.
92 Thus, the seed BACs need to be carefully analyzed for potential
93 overlaps, using the FPC fingerprint data. However, the FPC data is
94 not available for all BACs.</p>
95 <p style="margin-bottom: 0in"><br /></p>
96 <p style="margin-bottom: 0in">Mark Fiers and Erwin Datema reported on
97 trials with 454 sequences of full BACs. The feasibility of such an
98 approach is presently not clear. They also presented a demonstration
99 of Cyrille2, an interactive annotation pipeline system developed in
100 the Netherlands developed at their site.</p>
101 <p style="margin-bottom: 0in"><br /></p>
102 <p style="margin-bottom: 0in">Maria Luisa Chuisano gave an overview
103 of the EST alignment work and associated web resources that has been
104 developed in her lab at University of Naples.</p>
105 <p style="margin-bottom: 0in"><br /></p>
106 <p style="margin-bottom: 0in">Daniel Buchan gave a brief overview of
107 the state of chromosome 4 sequencing. Most of the BACs should be
108 available in early 2007.</p>
109 <p style="margin-bottom: 0in"><br /></p>
110 <p style="margin-bottom: 0in">Thomas, although himself not directly
111 involved on the sequencing side of things, presented progress on the
112 sequencing of the French project and mentioned a technology called
113 DAC, which allows many unfinished BACs to be finished in parallel.
114 The technology is being actively developed. He also mentioned that
115 enough resources may be available to also sequence the
116 heterochromatic partion of chromosome 7. He then gave a brief
117 overview of Eugene.</p>
118 <p style="margin-bottom: 0in"><br /></p>
119 <p style="margin-bottom: 0in">Francisco presented an overview of
120 GeneID and a first tomato-specific matrix that was developed.</p>
121 <p style="margin-bottom: 0in"><br /></p>
122 <p style="margin-bottom: 0in">Next, Remy presented an analysis of the
123 training set that was manually generated by some members of the
124 group. A total of 108 BACs were hand-annotated for complete and
125 clean gene models. However, the resulting dataset was not homogeneous
126 in quality, and some low-quality and/or incomplete gene models were
127 retained. A discussion followed, in which Stéphane Rombauts
128 explained that the poplar annotation project had used a very rigorous
129 automated method to generate a training set, and he suggested that we
130 try the same, letting the automated set supersede the hand-annotated
131 set. A general agreement was reached to try this course, with the
132 automated generation performed by Stéphane, using the same
133 methods from poplar. Stéphane also agreed to perform a trial
134 run of his training set generator during the meeting for evaluation.</p>
135 <p style="margin-bottom: 0in"><br /></p>
136 <p style="margin-bottom: 0in">The cornerstone of the training set
137 generation method is identifying annotations that are very
138 well-supported by EST alignments, and that match at least 75% of
139 their entire predicted protein to a known protein from Arabidopsis.
140 With the number of sequenced BACs available, the trial run of his
141 training set generator produced only 100 very confident gene models
142 with the required level of EST support and Arabidopsis homology. The
143 general conclusion was that this number was too low, but that the
144 method was quite promising, and that final evaluation of the method
145 should be deferred until more finished sequence is available.</p>
146 <p style="margin-bottom: 0in"><br /></p>
147 <p style="margin-bottom: 0in">The next discussion focused on the
148 submission process of the BAC sequences and annotations. Currently,
149 all project partners are supposed to submit to Genbank and SGN
150 independently, which can lead to inconsistencies between the
151 repositories when the two submission events are far-separated in
152 time. Daniel and Remy suggested submission to Genbank only, from
153 which SGN could pull in the sequences to feed into the annotation
154 pipelines. However, the problem with this approach is that under it,
155 the actual assembly data would not be carried by SGN. A number of
156 attendees asserted that this assembly data is valuable for final
157 assembly and should continue to be rigorously warehoused. Genbank
158 accepts the full BAC sequence and the chromatograms for the
159 individual sequence reads, but not the actual assembly data. After a
160 long discussion, agreement was reached on the following protocol:
161 First, the finished BAC sequence must be submitted to Genbank, and a
162 Genbank accession obtained. Then, the sequences, including the
163 chromatograms and assembly information, is submitted to SGN, using
164 essentially the same submission format as now, but with an additional
165 file specifying the Genbank accession of the submission. SGN will
166 determine the Genbank accessions of the currently submitted sequences
167 and update them accordingly on the SGN FTP site. In addition, the
168 following tags should be embedded in the comments field of each
169 submission to Genbank: “ITAG” (for International Tomato
170 Annotation Group) and “TOMGEN” (for Tomato Genome
171 Sequencing Project). This will allow to download all BACs that were
172 sequenced (TOMGEN) or annotated (ITAG) by searching Genbank for these
173 keywords. A quick search of Genbank determined that these keywords
174 are not presently in use by any other sequences.</p>
175 <p style="margin-bottom: 0in"><br /></p>
176 <p style="margin-bottom: 0in">In addition, Mark and Erwin at
177 Wacheningen will set up a central wiki site for use by the annotation
178 project for documenting the stages and interchange formats required
179 by the pipeline. (<b>update:</b><span style="font-weight: medium">
180 the wiki is up at <a href="http://www.ab.wur.nl/TomatoWiki">http://www.ab.wur.nl/TomatoWiki</a>
182 <p style="margin-bottom: 0in"><br /></p>
183 <p style="margin-bottom: 0in">A discussion on data formats concluded
184 that for most things, GFF3 should be sufficient. GAME XML is richer,
185 but is not as well-specified, and is the native format of Apollo.
186 Artemis is also a very viable gene editor program, and it is capable
187 of using GFF3 as a native save format. It was agreed that, for the
188 present at least, both GFF3 and GAME XML formats will be used, since
189 fairly well-developed conversion scripts exist at several of the
191 <p style="margin-bottom: 0in"><br /></p>
192 <p style="margin-bottom: 0in"><br /></p>
193 <h3 class="western">Day 2 – October 24, 2006</h3>
194 <p style="margin-bottom: 0in">The main focus of day 2 was
195 establishing a high-level design of the ITAG annotation pipeline. An
196 important aspect of the pipeline is that it is distributed, with many
197 annotation centers participating in the process, with each site doing
198 what they know to do best. The pipeline is based on BAC sequences,
199 and whole pseudomolecule assemblies will also be run once they are
200 available (in a format to be determined later by the ITAG and TOMGEN
202 <p style="margin-bottom: 0in"><br /></p>
203 <p style="margin-bottom: 0in">In summary, the complete pipeline is as
205 <p style="margin-bottom: 0in"><br /></p>
207 <li><p style="margin-bottom: 0in">BAC sequences are uploaded to
208 Genbank, and a genbank accession is obtained.</p></li>
209 <li><p style="margin-bottom: 0in">The BAC is uploaded to SGN.
211 <li><p style="margin-bottom: 0in">SGN runs vector screens and
212 contamination screens (chloroplast, mitochondrial and human
213 sequences), and does other quality control, such as comparison of <i>in
214 vitro</i> (from FPC data) vs <i>in silico</i> restriction fragment
215 sizes. The actual submission to Genbank will also be quality
216 checked, sequences compared and the presence of the keywords (ITAG
217 and TOMGEN) assured.</p></li>
218 <li><p style="margin-bottom: 0in">SGN runs RepeatMasker with
219 tomato-derived and other repeat databases. This comes before the
220 other pipeline steps so that some of them have the option of using
221 the repeat-masked BAC sequence.</p></li>
222 <li><p style="margin-bottom: 0in">in parallel:</p>
224 <li><p style="margin-bottom: 0in">TBLASTX versus mimulus and potato
226 <li><p style="margin-bottom: 0in">BLASTF (script from WUR) versus
227 protein data sets</p>
229 <li><p style="margin-bottom: 0in">arabidopsis, swissprot,
230 solanaceae combined – SGN/Korea</p></li>
231 <li><p style="margin-bottom: 0in">other plants (rice, maize,
232 medicago, poplar) – PSB</p></li>
233 <li><p style="margin-bottom: 0in">swissprot</p></li>
234 <li><p style="margin-bottom: 0in">uniprot</p></li>
235 <li><p style="margin-bottom: 0in">pfam-B</p></li>
236 <li><p style="margin-bottom: 0in">SPTG</p></li>
237 <li><p style="margin-bottom: 0in">solanaceae</p></li>
240 <li><p style="margin-bottom: 0in">BLASTN</p>
242 <li><p style="margin-bottom: 0in">vector</p></li>
243 <li><p style="margin-bottom: 0in">e. coli</p></li>
244 <li><p style="margin-bottom: 0in">chloroplast</p></li>
245 <li><p style="margin-bottom: 0in">mitochondria (when available)</p></li>
246 <li><p style="margin-bottom: 0in">h. sapien</p></li>
249 <li><p style="margin-bottom: 0in">transcript sequence alignments
252 <li><p style="margin-bottom: 0in">tomato - 98% identity, 90%
254 <li><p style="margin-bottom: 0in">solanaceae – 90% identity,
255 75% coverage</p></li>
258 <li><p style="margin-bottom: 0in">ab-initio gene finders</p>
260 <li><p style="margin-bottom: 0in">fgenesh (SGN)</p></li>
261 <li><p style="margin-bottom: 0in">genemark (remy)</p></li>
262 <li><p style="margin-bottom: 0in">glimmerhmm (erwin)</p></li>
263 <li><p style="margin-bottom: 0in">genscan - ?
265 <li><p style="margin-bottom: 0in">genemark - ?</p></li>
266 <li><p style="margin-bottom: 0in">geneid (francisco/SGN)</p></li>
267 <li><p style="margin-bottom: 0in">SNAP (erwin</p></li>
270 <li><p style="margin-bottom: 0in">RFAM – blastn/infernal(?)</p></li>
271 <li><p style="margin-bottom: 0in">tRNAscan-SE (SGN)</p></li>
278 <li><p style="margin-bottom: 0in"></p></li>
283 <li><p style="margin-bottom: 0in">All predictions, alignments and
284 BLASTs are downloaded by U. Ghent and fed into Eugene.</p></li>
285 <li><p style="margin-bottom: 0in">proteins from the Eugene
286 predictions are then functionally annotated with</p>
288 <li><p style="margin-bottom: 0in">BLASTP vs. Arabidopsis and rice
289 proteins, against SwissProt</p></li>
290 <li><p style="margin-bottom: 0in">Interpro – Imperial</p></li>
291 <li><p style="margin-bottom: 0in">GO – MPIZ?</p></li>
292 <li><p style="margin-bottom: 0in">TargetP, signalP, etc. - SGN</p></li>
293 <li><p style="margin-bottom: 0in">RPSblast (MPIZ)</p></li>
294 <li><p style="margin-bottom: 0in">TmHMM – SGN</p></li>
295 <li><p style="margin-bottom: 0in">SGN Genes DB – SGN</p></li>
298 <li><p style="margin-bottom: 0in">SGN produces downloadable files
299 and publishes them on FTP</p>
301 <li><p style="margin-bottom: 0in">protein sequences</p></li>
302 <li><p style="margin-bottom: 0in">cds/cdna sequences</p></li>
303 <li><p style="margin-bottom: 0in">non-redundant protein sequences</p></li>
308 <li><p style="margin-bottom: 0in"></p></li>
310 <p style="margin-bottom: 0in">Following the establishment of the
311 pipeline steps, a discussion began on data flow between the stages.
312 Early on, it was agreed that an implementation using a central server
313 as a pipeline coordinator would be simpler and more robust. The bulk
314 of the discussion was devoted to whether this central server would
315 call on each remote pipeline stage to perform the analysis as soon as
316 a sequence available (a “push” model), or whether the
317 central server would make the data available and wait for each
318 analysis to retrieve its input and upload its output (a “pull”
319 model). The “push” model has the advantage of allowing
320 more rigorous flow control, since the central server has more
321 knowledge of the running status of each analysis, but requires more
322 from the remote servers, such as availability for external
323 connections and the capability to run the analyses in a highly
324 automated way. The “pull” model does not require
325 external availability or complete automation from the remote pipeline
326 stages, since they only have to download their input from and upload
327 their output to the central pipeline server. Flow control in the
328 pull model would be by means of pipeline status information made
329 available by the central pipeline server, tracking what analysis
330 results are available, and for each analysis, whether its required
331 inputs are ready for download.</p>
332 <p style="margin-bottom: 0in"><br /></p>
333 <p style="margin-bottom: 0in">Since the “pull” model
334 places less of a burden on each remote pipeline stage, it was decided
335 that (like the medicago annotation project), the tomato distributed
336 annotation pipeline would be pull-driven. To simplify
337 administration, it was also decided that the pipeline should be run
338 on batches of BACs, rather than individual BACs.</p>
339 <p style="margin-bottom: 0in"><br /></p>
340 <p style="margin-bottom: 0in">Next, a discussion began on the
341 structure and location of the central annotation result repository.
342 It was decided that SGN would house the central repository, and
343 transfer to and from the repository would be accomplished either with
344 scp or sftp running over an encrypted ssh2 channel. An encrypted
345 transfer scheme was preferred over non-encrypted FTP because it
346 offers more secure and flexible authentication mechanisms, greater
347 assurance of data integrity, and acceptable transfer bandwidth
348 requirements. The repository will be configured such that all ITAG
349 participants have accounts and can upload, download, and if necessary
350 delete files from their assigned parts of the repository.</p>
351 <p style="margin-bottom: 0in"><br /></p>
352 <p style="margin-bottom: 0in">Next, the discussion turned to file
353 naming conventions. The general conclusion was that BACs in the
354 annotation pipeline should be referenced by their <b><span style="font-style: normal">unversioned
355 </span></b>Genbank accession, which is more unambiguous than their
356 well plate, row, and column designations, since wells can be
357 contaminated with other BAC sequences. The unversioned Genbank
358 accession is used to allow for keeping the locus names more stable
359 when the BAC sequence changes. File names and loci names should also
360 be based on these Genbank accessions. Genbank accession-based naming
361 also has the advantage that the accession tends to be shorter than
362 the clone name. Annotation pipeline gene identifiers should thus
363 start with the Genbank accession, followed by an underline and a
364 numeric index number, unique on that BAC. For alternative splicing,
365 the splice variants are denoted with a parenthesized letter following
366 the numeric index number. This can be followed by a dot and a
367 version number to denote slightly differently annotated versions of
368 the same locus. Version numbers are increased if the underlying BAC
369 sequence changes. For example, for the third locus to be annotated
370 on a fictional BAC AC12310, the second of two alternative
371 transcripts, and the first version, its identifier might be
372 “AC12310_3(b).1”. This scheme is similar to the one
373 used in <i>Medicago</i> annotation.</p>
374 <p style="margin-bottom: 0in"><br /></p>
375 <p style="margin-bottom: 0in">The numeric index does not specify a
376 position on the BAC, but reflects the order in which the gene models
377 were created. When a new locus is annotated, a new numeric index is
378 chosen for it that is one greater than the previous highest index
379 number. If a gene model is created by merging two older gene models,
380 the two old gene model identifiers are retired from use and a new
381 identifier is generated for the merged gene model. For example, if
382 AC12310_7.1 is merged with AC12310_11.1, the resulting locus might be
383 named AC12310_42.1.</p>
384 <p style="margin-bottom: 0in"><br /></p>
385 <p style="margin-bottom: 0in">Thus, adjacent gene models on the
386 genome will not necessarily have numerically adjacent identifiers,
387 depending on the order in which loci have been added, removed,
388 merged, and so forth since the initial assignment of locus names.</p>
389 <p style="margin-bottom: 0in"><br /></p>
390 <p style="margin-bottom: 0in">A predictable file naming scheme is
391 critical for a pull-based pipeline mechanism. The following file
392 naming convention for pipeline result files was formulated and agreed
394 <p style="margin-bottom: 0in"><br /></p>
395 <p align="center" style="margin-bottom: 0in"><versioned
396 acc.>.<analysis>.itag<pipeline ver.>.v<file
397 ver.>.<file type></p>
398 <p style="margin-bottom: 0in"><br /></p>
399 <p style="margin-bottom: 0in">For example,
400 “AC12310.1.repeatmasker_TIGRRepbase.itag12.v3.gff” would
401 be the third version of the file containing the results of running
402 the analysis 'repeatmasker_TIGRRepbase' on the BAC sequence
403 AC12310.1, as part of version 12 of the ITAG pipeline.</p>
404 <p style="margin-bottom: 0in"><br /></p>
405 <p style="margin-bottom: 0in">The analysis tags (e.g.
406 'repeatmasker_TIGRRepbase' or 'eugene') will be determined and
407 assigned by ITAG in the coming weeks.</p>
408 <p style="margin-bottom: 0in"><br /></p>
409 <p style="margin-bottom: 0in">The ITAG pipeline version is a
410 particularly important part of the file name. Since many analyses in
411 the pipeline depend on the output of other analyses, any change in
412 the methods used at any step (such as updating reference databases or
413 changing output formats) will usually require re-running of some or
414 all of the analyses in the pipeline to ensure that all analysis
415 results remain directly comparable and consistent with each other.
416 Therefore, it will be essential to make these changes in a controlled
417 and coordinated manner. It was agreed that each static snapshot of
418 the analyses and reference datasets used in the pipeline will be
419 given a pipeline version number, starting from 0 and incrementing by
420 1 each time <i>any</i> change is
421 made to the pipeline that may affect any analysis's output. Pipeline
422 versions may not be incremented while an analysis batch is in
423 progress. Pipeline version increments must be agreed upon
424 beforehand, and will not be allowed while an annotation batch is in
425 progress. It was also agreed that pipeline version 0 should be a
426 special development version. While the pipeline is at version 0,
427 developers are free to change and/or update their pipeline stages
428 without a pipeline increment. When the pipeline is considered to be
429 working and producing good results, the pipeline version will be
430 incremented to 1 and rigorous pipeline version control will begin.</p>
431 <p style="margin-bottom: 0in"><br /></p>
432 <p style="margin-bottom: 0in">How often should the pipeline be run?
433 It was felt that running the pipeline on single BACs would be a waste
434 of time and a minimum batch size of 10 should be set. In addition,
435 to avoid putting too much of a computational burden on our sites, we
436 also agreed on an initial maximum batch size of 100 BACs. However,
437 these limits should be revisited once the pipeline is running and its
438 performance characteristics are better established.</p>
439 <p style="margin-bottom: 0in"><br /></p>
440 <p style="margin-bottom: 0in"><br /></p>
441 <p style="margin-bottom: 0in">Final gene annotations will be
442 published primarily in the form of several fasta-format files
445 <li><p style="margin-bottom: 0in">protein sequences</p></li>
446 <li><p style="margin-bottom: 0in">cds/cdna sequences</p></li>
447 <li><p style="margin-bottom: 0in">non-redundant proteins</p></li>
449 <p style="margin-bottom: 0in"><br /></p>
450 <p style="margin-bottom: 0in">Fasta files will use the following
451 format for the description lines:</p>
452 <p style="margin-bottom: 0in"> ><locus name> <functional
453 description> <versioned seq. acc.> <evidence codes>
454 <location on seq> <timestamp></p>
455 <p style="margin-bottom: 0in"><br /></p>
456 <p style="margin-bottom: 0in"><b>Locus name:</b><span style="font-weight: medium">
457 properly formatted locus name as set out above</span></p>
458 <p style="margin-bottom: 0in"><b>Functional description:</b><span style="font-weight: medium">
459 a draft functional description of the locus (obtained from functional
460 analysis stages of the pipeline)</span></p>
461 <p style="margin-bottom: 0in"><b>Versioned sequence accession: </b><span style="font-weight: medium">
462 the versioned Genbank accession of the BAC sequence (e.g. AC12312.1)</span></p>
463 <p style="margin-bottom: 0in"><b>Evidence codes:</b><span style="font-weight: medium">
464 string encoding the evidence supporting this annotated locus,
465 composed of one or more of the following letters:</span></p>
466 <p style="margin-bottom: 0in"> F - Full length cDNA aligned</p>
467 <p style="margin-bottom: 0in"> E - EST coverage</p>
468 <p style="margin-bottom: 0in"> H - homology to an annotation in
469 another sequenced species</p>
470 <p style="margin-bottom: 0in"> I - ab initio prediction</p>
471 <p style="margin-bottom: 0in"><b>Location on sequence:</b><span style="font-weight: medium">
472 1-based nucleotide coordinate range on the BAC sequence, formatted as
473 <start>-<finish>. e.g. 41223-48128</span></p>
474 <p style="margin-bottom: 0in; font-weight: medium"><br /></p>
475 <p style="margin-bottom: 0in; font-weight: medium">Therefore, an
476 example of a properly-formatted description line would be:</p>
477 <p style="margin-bottom: 0in; font-weight: medium">>AC21353_4(a).2
478 putative x-ray vision protein AC21353.1 FEHI 12931-18446
479 2006-10-31/14:36:22</p>
480 <p style="margin-bottom: 0in; font-weight: medium"><br /></p>
481 <p style="margin-bottom: 0in; font-weight: medium"><br /></p>
482 <p style="margin-bottom: 0in">The annotation of pseudo genes will be
483 worked out at a later date.</p>
484 <p style="margin-bottom: 0in">The format for the pseudomolecules to
485 be used will be worked out at a later date.</p>
486 <p style="margin-bottom: 0in"><br /></p>
487 <h3 class="western">Day 3 – October 25, 2006</h3>
488 <p style="margin-bottom: 0in">This was a half-day meeting, and was
489 mostly devoted to clarifications and additions to the decisions made
490 in the preceding two days. Minimum and maximum BAC batch sizes were
491 discussed again briefly, agreeing on an initial minimum and maximum
492 batch size of 10 and 100 BACs respectively.</p>
493 <p style="margin-bottom: 0in"><br /></p>
494 <p style="margin-bottom: 0in">Additionally, a request by Lincoln
495 Stein for permission to do a genome-wide annotation using the
496 ensemble annotation pipeline was discussed. The decision was made
497 not to grant permission for him to publish an annotation at this
498 time, since his analysis pipeline will not be specifically tailored
499 to tomato, leading to a lower-quality annotation, and it would lead
500 to confusion about which genome annotation is the “official”
502 <p style="margin-bottom: 0in"><br /></p>
503 <p style="margin-bottom: 0in">Also, there was a discussion of the
504 need for a note to be attached to our BAC sequences in Genbank,
505 asking that people defer genome-wide analyses until our official
506 annotation comes out. A consensus was reached that the text of this
507 note should be discussed and agreed upon at the upcoming SOL project
508 meeting in November.</p>
509 <p style="margin-bottom: 0in"><br /></p>
510 <p style="margin-bottom: 0in">Next, some clarifications to the
511 pipeline versioning scheme were made. The idea of a free-development
512 pipeline version 0 was introduced (already covered above). The
513 mechanics of pipeline synchronization were briefly discussed, with
514 Rob clarifying that SGN intended to provide both a human-readable web
515 page showing pipeline status and a machine-readable pipeline status
516 web service, as described above.</p>
517 <p style="margin-bottom: 0in"><br /></p>
518 <p style="margin-bottom: 0in">Next came a discussion of arrangements
519 for further tomato annotation meetings. An agreement was
520 reached to hold a tomato annotation meeting at PAG in San Diego in
521 January. Also, an agreement was made to try to have a phone
522 conference of tomato annotators every two weeks. Stéphane
523 introduced the VRVS service (<a href="http://www.vrvs.org/">http://www.vrvs.org</a>),
524 a non-commercial internet conferencing service, as a possible
525 mechanism for doing this without the cost of international phone
527 <p style="margin-bottom: 0in"><br /></p>
528 <p style="margin-bottom: 0in"><br /></p>