8 <h1>SMIDs: Unique identifiers for biogenic small molecules in <i>C. elegans</i>. </h1>
11 <h2>1. The issue </h2>
12 Small molecules/secondary metabolites are referred to by a plethora of names and abbreviations. Some compounds are referred to by more than 10 different names,
13 and in certain cases different compounds are referred to by the same name. Significantly, there is no established system for naming newly identified metabolites that would permit <b><i>database
14 searching for small molecule metabolites in the same manner as for genes.</b></i> <b> <br/><br/>
16 Example I</b> highlights the large number of acceptable names for a signaling molecule recently identified from <i>C. elegans</i>.
18 </div> <div id="content2"> <br/><br/>
20 <img src="/static/images/about_EXAMPLEI.png" /> <br/><br/>
21 </div> <div id="content3">
23 <b>Example II</b> highlights a case where two different compounds are referred to by overlapping sets of names. In this example, structure <b>A</b>
24 shows "phenylpyruvic acid", frequently and somewhat
25 misleadingly referred to as "phenyl pyruvate", which based on IUPAC nomenclature would
26 denote compound <b>B</b>. However, referring to the sodium salt of <b>A</b> as "sodium phenylpyruvate"
27 would be considered correct: "phenylpyruvate" denotes a salt or ester of phenylpyruvic acid,
28 whereas "phenyl pyruvate" would refer to the phenyl ester of pyruvic acid.
30 </div> <div id="content2"> <br/><br/>
32 <img src="/static/images/about_EXAMPLEII.png" /> <br/><br/>
33 </div> <div id="content3">
37 Ambiguities and parallel usage have prevented the development of effective text mining tools for small molecules.
38 As a result, effective sharing of small-molecule data in chemical biology and metabolomics is virtually impossible.
39 Even experienced researchers familiar with the chemical nomenclature often have difficulty locating references
40 for a specific substance. Researchers that are less knowledgeable with chemical nomenclature face even greater
41 difficulties when trying to locate a specific substance or reference.
44 <h2>2. Existing naming schemes do not offer a viable solution</h2>
46 <u><b>CAS</b></u>: The Chemical Abstracting Service (CAS) assigns every new compound presented in the literature a unique Chemical Abstracts registry number (CAS#).
47 For example, using the CAS system the compounds shown above in <b>Example I</b> and <b>II</b> are referred to as 946524-24-9 (<b>Example I</b>)
48 156-06-9 (<b>Example II, Structure A</b>), and 2149-49-7 (<b>Example II, Structure B</b>). Although the CAS system is useful for archiving the chemical literature,
49 CAS numbers are cumbersome to use in scientific writing as they have no recognition value. Importantly, many biological journals are not indexed by CAS. </p>
50 <p><u><b>IUPAC</b></u>: The IUPAC nomenclature system is highly sophisticated. As a result, derivation and interpretation of IUPAC names requires extensive chemical knowledge.
51 Non-chemists are frequently unable to determine whether two IUPAC names refer to the same compound or not. In addition, IUPAC names are often exceedingly long and complicated
52 and thus unsuitable for use in scientific writing. </p>
53 <p><u><b>SMILES</b></u>: SMILES are useful as technical, unambiguous descriptors of chemical structures, but unsuitable as in-text identifiers.</p>
56 <h2>3. A new proposal for naming biogenic small molecules/secondary metabolites in <i>C. elegans</i>: <u>SMIDs</u></h2>
59 <b>I.</b><i> Small molecules newly identified from the nematode <i>C. elegans</i> are assigned a
60 unique biogenic <u><b>S</b></u>mall <b><u>M</u></b>olecule <b><u>Id</b></u>entifier </i>(<b>SMID</b>)<i> consisting of <b>four lower case non-
61 italicized letters</b> that refer to the general structural class of the compound, <b>followed by a
62 pound sign and a number.</b></i> This scheme is comparable to that used for genes and proteins: <a href="//www.wormbase.org/db/gene/gene?name=WBGene00013284;class=Gene/"<b><i>daf-22</b></i></a>
63 (three letters, italicized, lower case) or <b>DAF-22</b> (non-italicized, upper case).
66 <u>Examples</u>:<br /><br/>
68 Many pheromones in <i>C. elegans</i> belong to a class of glycosides known as <b><i>ascarosides</b></i>. Therefore, <b>ascr</b> was chosen as the four-letter SMID for this
71 <ul></div><div id="content1"><ul>
73 <li><a href="//www.smmid.org/detail/ascr%231/"<b>ascr#1</b></a>:"daumone" or "C7" or "(6R-(tetrahydro-3'R,5'R-dihydroxy-6'S-methyl-2Hpyran-2'R-yloxy)-heptanoic acid" </li>
75 </p></div> </ul><div id="content2"> <br/>
77 <!-- & /image/embed_image.mas, image_id => ?, size => "medium" &>
78 <img src="/static/structures/ascr%231.png" /> <br/ -->
79 </div> <div id="content1"><ul>
81 <li><a href="//www.smmid.org/detail/ascr%234/"<b>ascr#4</b></a>:"nematone-1" or "5R-(3'-O-beta-D-glucosyl-tetrahydro-3'R,5'R-dihydroxy-6'Smethyl-2H-pyran-2'R-yloxy)-2-hexanone" </li>
83 </div> </ul><div id="content2"> <br/>
85 <div id="image_example1"></div>
87 <& /image/embed_image.mas, image_id => 510, size => "small", div=>"image_example1" &>
89 <!-- img src="/static/structures/ascr%234.png" /> <br/ -->
90 </div> <div id="content3">
92 <br/>Similarly, steroids called <b><i>dafachronic acids</b></i> that regulate <i>C. elegans</i> development have been assigned the four-letter SMID "<b>dafa</b>":<br/>
93 </div><div id="content1"><ul>
94 <li><a href="//www.smmid.org/detail/dafa%231/"<b>dafa#1</b></a>:("delta4-dafachronic acid" or "3-keto-4-cholestenoic acid") </li>
96 </div> </ul><div id="content2"> <br/>
98 <div id="image_example2"></div>
99 <& /image/embed_image.mas, image_id => 561, size => "small", div=>"image_example2" &>
101 <!-- img src="/static/structures/dafa%231.png" /> <br/ -->
102 </div> <div id="content1"><ul>
104 <li><a href="//www.smmid.org/detail/dafa%232/"<b>dafa#2</b></a>:("delta7-dafachronic acid" or "3-keto-7,(5a)-cholestenoic acid") </li>
106 </div></ul> <div id="content2"> <br/>
108 <div id="image_example3"></div>
109 <& /image/embed_image.mas, image_id => 562, size => "small", div=>"image_example3" &>
111 <!-- img src="/static/structures/dafa%232.png" /> <br/ -->
112 </div> <div id="content3">
118 <b>II.</b><i> Stereoisomers are distinguished by the addition of a second numeral.</i> The first discovered stereoisomer of any compound will be named with the ending .1, i.e. xxxx#x.1.
119 For example, in the case of ascr#6, (-)-5R-(3'R,5'R-dihydroxy-6'S-methyl-(2H)-tetrahydropyran-2'-yloxy)-2R-hexanol would be <a href="//www.smmid.org/detail/ascr%236.1/"<b>ascr#6.1</b></a>, and (-)-5R-(3'R,5'R-dihydroxy-6'S-methyl
120 -(2H)-tetrahydropyran-2'-yloxy)-2S-hexanol would be <a href="//www.smmid.org/detail/ascr%232/"<b>ascr#6.2</b></a>.
123 <b>III.</b> <i>The SMID database is maintained by <b>Lukas Mueller</b> (METACYC SGN databases, Boyce Thompson Institute and Cornell University) and <b>Joshua Judkins</b> (Boyce Thompson Institute and Cornell University) in collaboration with <a href="//www.wormbase.org/"<b>Wormbase</b></a>. For each C. elegans metabolite, <a href="//www.smid-db.org/"<b>SMID-DB.org</b></a> provides:</i>: </p>
125 <li>Structure (structural drawing, SMILES)</p>
127 <li>Compound ID (common names, CAS, Beilstein, IUPAC)</p>
129 <li>Original reference(s)</p>
131 <li>List of references that mention the compound</p>
133 <li>Genes in associated pathways (e.g. receptors, biosynthetic enzymes)</p></ul>
136 All gene entries at <a href="//www.smid-db.org/"<b>SMID-DB.org</b></a> are linked to <a href="//www.wormbase.org/"<b>Wormbase.org</b></a>. <br/>
139 For questions and comments or to submit new compounds, please contact <b>smid-db@cornell.edu</b>.
140 <br /> <br /><br/><br/>