1 ID SC10H5 standard; DNA; PRO; 4870 BP.
5 DE Streptomyces coelicolor cosmid 10H5.
7 KW integral membrane protein.
9 OS Streptomyces coelicolor
10 OC Eubacteria; Firmicutes; Actinomycetes; Streptomycetes;
11 OC Streptomycetaceae; Streptomyces.
15 RA Oliver K., Harris D.;
21 RA Parkhill J., Barrell B.G., Rajandream M.A.;
23 RL Submitted (10-AUG-1998) to the EMBL/GenBank/DDBJ databases.
24 RL Streptomyces coelicolor sequencing project,
25 RL Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA
26 RL E-mail: barrell@sanger.ac.uk
27 RL Cosmids supplied by Prof. David A. Hopwood, [3]
28 RL John Innes Centre, Norwich Research Park, Colney,
29 RL Norwich, Norfolk NR4 7UH, UK.
33 RA Redenbach M., Kieser H.M., Denapaite D., Eichner A.,
34 RA Cullum J., Kinashi H., Hopwood D.A.;
35 RT "A set of ordered cosmids and a detailed genetic and physical
36 RT map for the 8 Mb Streptomyces coelicolor A3(2) chromosome.";
37 RL Mol. Microbiol. 21(1):77-96(1996).
41 CC Streptomyces coelicolor sequencing at The Sanger Centre is funded
44 CC Details of S. coelicolor sequencing at the Sanger Centre
45 CC are available on the World Wide Web.
46 CC (URL; http://www.sanger.ac.uk/Projects/S_coelicolor/)
48 CC CDS are numbered using the following system eg SC7B7.01c.
49 CC SC (S. coelicolor), 7B7 (cosmid name), .01 (first CDS),
50 CC c (complementary strand).
52 CC The more significant matches with motifs in the PROSITE
53 CC database are also included but some of these may be fortuitous.
55 CC The length in codons is given for each CDS.
57 CC Usually the highest scoring match found by fasta -o is given for
58 CC CDS which show significant similarity to other CDS in the database.
59 CC The position of possible ribosome binding site sequences are
60 CC given where these have been used to deduce the initiation codon.
62 CC Gene prediction is based on positional base preference in codons
63 CC using a specially developed Hidden Markov Model (Krogh et al.,
64 CC Nucleic Acids Research, 22(22):4768-4778(1994)) and the FramePlot
65 CC program of Bibb et al., Gene 30:157-66(1984) as implemented at
66 CC http://www.nih.go.jp/~jun/cgi-bin/frameplot.pl. CAUTION: We may
67 CC not have predicted the correct initiation codon. Where possible
68 CC we choose an initiation codon (atg, gtg, ttg or (att)) which is
69 CC preceded by an upstream ribosome binding site sequence (optimally
70 CC 5-13bp before the initiation codon). If this cannot be identified
71 CC we choose the most upstream initiation codon.
73 CC IMPORTANT: This sequence MAY NOT be the entire insert of
74 CC the sequenced clone. It may be shorter because we only
75 CC sequence overlapping sections once, or longer, because we
76 CC arrange for a small overlap between neighbouring submissions.
78 CC Cosmid 10H5 lies to the right of 3A7 on the AseI-B genomic restriction
81 FH Key Location/Qualifiers
84 FT /organism="Streptomyces coelicolor"
86 FT /clone="cosmid 10H5"
87 FT CDS complement(<1..327)
88 FT /note="SC10H5.01c, unknown, partial CDS, len >109 aa;
89 FT possible integral membrane protein"
91 FT /product="hypothetical protein SC10H5.01c"
92 FT CDS complement(350..805)
93 FT /note="SC10H5.02c, probable integral membrane protein, len:
94 FT 151 aa; similar to S. coelicolor hypothetical protein
95 FT TR:O54194 (EMBL:AL021411) SC7H1.35 (155 aa), fasta scores;
96 FT opt: 431 z-score: 749.8 E(): 0, 53.5% identity in 114 aa
98 FT /product="putative integral membrane protein"
100 FT RBS complement(812..815)
101 FT /note="possible RBS upstream of SC10H5.02c"
102 FT CDS complement(837..1301)
103 FT /note="SC10H5.03c, probable integral membrane protein, len:
105 FT /product="putative integral membrane protein"
106 FT /gene="SC10H5.03c"
107 FT RBS complement(1308..1312)
108 FT /note="possible RBS upstream of SC10H5.03c"
109 FT CDS complement(1427..1735)
110 FT /note="SC10H5.04c, unknown, len: 103 aa; possible membrane"
111 FT /gene="SC10H5.04c"
112 FT /product="hypothetical protein SC10H5.04c"
113 FT RBS complement(1738..1741)
114 FT /note="possible RBS upstream of SC10H5.05c"
115 FT misc_feature 1800^1801
116 FT /note="Zero-length feature added to test Bioperl parsing"
118 FT /note="SC10H5.05, questionable ORF, len: 29 aa"
120 FT /product="hypothetical protein SC10H5.05"
122 FT /note="SC10H5.06, probable membrane protein, len: 207 aa;
123 FT similar to S. coelicolor TR:O54192 SC7H1.33c (191 aa),
124 FT fasta scores; opt: 312 z-score: 355.2 E(): 1.6e-12, 36.8%
125 FT identity in 182 aa overlap"
126 FT /product="putative membrane protein"
129 FT /note="possible RBS upstream of SC10H5.07"
131 FT /note="SC10H5.07, unknown, len: 469 aa"
133 FT /product="hypothetical protein SC10H5.07"
134 FT CDS complement(4100..4297)
135 FT /note="SC10H5.08c, unknown, len: 65 aa"
136 FT /gene="SC10H5.08c"
137 FT /product="hypothetical protein SC10H5.08c"
138 FT RBS complement(4314..4319)
139 FT /note="possible RBS upstream of SC10H5.08c"
140 FT CDS complement(4439..>4870)
141 FT /note="SC10H5.09c, probable integral membrane protein,
142 FT partial CDS len: >143 aa; some similarity in C-terminus to
143 FT S. coelicolor hypothetical protein TR:O54106
144 FT (EMBL:AL021529) SC10A5.15 (114 aa), fasta scores; opt: 145
145 FT z-score: 233.8 E(): 9.2e-06, 33.3% identity in 81 aa
146 FT overlap. Overlaps and extends SC3A7.01c"
147 FT /product="putative integral membrane protein"
148 FT /gene="SC10H5.09c"
149 FT misc_feature 4769..4870
150 FT /note="overlap with cosmid 3A7 from 1 to 102"
152 SQ Sequence 4870 BP; 769 A; 1717 C; 1693 G; 691 T; 0 other;
153 gatcagtaga cccagcgaca gcagggcggg gcccagcagg ccggccgtgg cgtagagcgc 60
154 gaggacggcg accggcgtgg ccaccgacag gatggctgcg gcgacgcgga cgacaccgga 120
155 gtgtgccagg gcccaccaca cgccgatggc cgcgagcgcg agtcccgcgc tgccgaacag 180
156 ggcccacagc acactgcgca gaccggcggc cacgagtggc gccaggacgg tgcccagcag 240
157 gagcagcagg gtgacgtggg cgcgcgctgc actgtggccg ccccgtccgc ccgacgcgcg 300
158 cggctcgtca tctcgcggtc ccaccaccgg tcggccccat tactcgtcct caaccctgtg 360
159 gcgactgacg ttccccggac aggtcgtacc gattgccgcc acgccccacc acgcacaggg 420
160 cccagacgac gaagcctgac atggtgatca tgacgacgga ccacaccggg tagtacggca 480
161 gcgagaggaa gttggcgatg atcaccagcc cggcgatggc gaccccggtg acacgtgccc 540
162 acatcgccgt tttgagcagc ccggcgctga cgaccatggc gagcgcgccg agcgcgagat 600
163 ggatccaccc ccacccggtg agatcgaact ggaaaacgta gttgggcgtg gtgacgaaga 660
164 cgtcgtcctc ggcgatggcc atgatgcccc ggaagaggct gagcagcccg gcgaggaaga 720
165 gcatcaccgc cgcgaaggcg gtaaggcccg tcgcccattc ctgcctcgcg gtgtgtgccg 780
166 ggtggtgggt atgtgacgtg gtcatctcgg acctcgtttc gtggaatgcg gatgcttcag 840
167 cgagcggagg cgccggtgcc cgccgcgccc gtgtgccctg ccgggccgtg accggacagg 900
168 accaattcct tcgccttgcg gaactcctcg tccgtgatgg caccccggtc tcggatctcg 960
169 gagagccggg ccagctcgtc gacgctgctg gacccgccgc ccacggtctt cctgatgtag 1020
170 gcgtcgaact cctcctgctg agcccgtgcc cgcgttgtct cccggctgcc catgttcttg 1080
171 ccgcgagcga tcacgtagac gaaaacgccc aggaagggca ggaggatgca gaacaccaac 1140
172 cagccggcct tcgcccagcc actcagtccg tcgtcccgga agatgtcggt gacgacgcgg 1200
173 aagagcagga cgaaccacat gatccacagg aagatcatca gcatcgtcca gaaggcaccc 1260
174 agcagtgggt agtcgtacgc caggtaggtc tgtgcactca tgtccgtcct ccgtcctccg 1320
175 gggcgcggcc cggcggccct cgttccgtac tgacatcagg gtggtcacgg gtcccaccgg 1380
176 tcggcatcac ccggcacggg tgagtggggc gccgaggccg tcgtggtcag gcccgggaca 1440
177 ccggtgtgac cctggtggaa ggacgcgtcc cgtggggcac gcaccgccgg ccgagggcga 1500
178 ccaccgcctc ggtcagtccg agcaggccca gccacaggcc gagaagtcgg gtcagggcac 1560
179 gggccgactc ggcgggcagc gcgaggacga cgattccggc gacgtcgacg gccagcgggt 1620
180 tgcgcaggcc cagcactccg gccggggcgc ccggcaccag cgtggcgagg gccgatgcca 1680
181 tgagccaggt ccaggaaccc ccaagcctgg cgaggacgtg cgccggatcg ctcaatgctc 1740
182 cggtgaccgc cccgcccgac ccgtctccct tgtcggcagg ttccgccgca tcacgcggaa 1800
183 cggagatggc tcccctgtgg atcgggcggc cgctgcgggg ccgcccggtt ggtcggtcgg 1860
184 tgagcgccgg actccccctt cagctcttcc agggtcgggg tcgacaccga ggtcctggat 1920
185 cacccgtcag gggtgatccg ggcatgccgt cgtggcggtg aggtgggata cgggaacgat 1980
186 cggcccacgg gggaccggac gagacgaaga gacgtgagat gagcgatacg aactcgggcg 2040
187 gcgggcgcca ggccgcttcc ggaccggccc cacgtggccg actccctttc cgccggcgcg 2100
188 tggccctggt cgctgtcgca cgtcccctga tcgtcacggt cggtctcgtc accgcctact 2160
189 acctgcttcc cctggacgag agactcagcg ccggcaccct ggtgtcgctg gtgtgcggac 2220
190 tgctcgcagt ccttctggtg ttctgctggg aggtgcgggc catcacgcgc tccccgcatc 2280
191 cgcgtctgag agcgatcgag ggcctggccg ccacgctggt gctgttcctg gtcctcttcg 2340
192 ccggctccta ctacctgctg ggtcgctccg cgcccggctc cttcagcgag ccgctgaaca 2400
193 ggacggacgc gctgtacttc actctgacca cgttcgccac cgtcggcttc ggggacatca 2460
194 ccgcacgctc cgagaccggg cggatcctca cgatggcgca gatgacggga gggctactgc 2520
195 tcgtcggagt cgccgcccgg gtgctggcga gcgcagtgca ggcggggctg caccgacagg 2580
196 gccggggacc ggcggcatcg ccacgctccg gtgctgcgga ggagccggag gccggaccat 2640
197 gaccgtaccc ggtggcttca ccgcctccct gccgccggcc gagcgagccg cgtacggcag 2700
198 gaaggcccgt aaaagggcct cacgttcgtg ccacggctgg tacgagccgg ggcagcggcg 2760
199 gcctgacccc gtcgacctgc tggagcgcca gtccggcgag cgtgtcccgg cactcgtgcc 2820
200 catccgctac ggtcgcatgc tggagtcgcc gttccgcttc taccgcggtg cggcagcgat 2880
201 catggcggcg gacctggcac ccctgcccag cagcggactc caggtgcaat tgtgcgggga 2940
202 cgcgcacccg ttgaacttcc ggctcctggc ctcaccggag cgccggctgg tcttcgacat 3000
203 caacgacttc gacgagacgc tgcccggccc cttcgagtgg gacgtcaaac ggctggcggc 3060
204 cggattcgtg atcgcggccc ggtcgaacgg cttctcgtcc aaggaacaga accgcaccgt 3120
205 tcgggcctgt gtgcgggcct accgggagcg catgagggag ttcgccgtca tgccgaccct 3180
206 ggacatctgg tacgcccagg acgacgccga ccacgtacgg caactgctgg ctacggaggc 3240
207 cagaggagaa gctgagcagc ggctcaggga cgcggctgcg aaggcccgca cacgcaccca 3300
208 catgagggcg ttcgcgaagc tcacccgcgt cacggccgag ggccggcgca tcacccccga 3360
209 cccgccgctg atcaccccac tcggcgatct gctcaccgac ccggccgaag ccggccggga 3420
210 ggaggaactg cggtccgtcg tgaacggcta cgcacggtcc ctgccgcccg agcgccggca 3480
211 cctgctgcgt cactaccggc ttgtggacat ggcgcgcaag gtggtcggcg tcggcagtgt 3540
212 cggcacccgc tgctgggtac tgcttctgct cggcagggac gacgacgatc ctctgctgct 3600
213 ccaggccaag gaagcctcgg aatcggtgct ggcggcccac acgggcggcg aacgctacga 3660
214 ccatcagggc cgcagggtcg tggccggcca gcgtctgatc cagaccaccg gtgacatctt 3720
215 tctcggctgg gcgcgcgtca ccggcttcga cggaaaggcc cgggacttct acgtgcgtca 3780
216 actgtgggac tggaagggcg tcgcgcggcc ggaaaccatg gggcccgacc tgctctccct 3840
217 cttcgcccgg ctgtgcggtg cctgcctggc gagggcccac gcccgttccg gtgaccccgt 3900
218 cgcgctcgcc gcgtacctgg gcggcagcga ccgcttcgac ggcgcgctca ccgagttcgc 3960
219 ccagtcctac gccgatcaga atgaacgcga ccacgaagct ctgctggcgg cctgccgctc 4020
220 cggcagggtc acggccgccc gtttgtgagg ccgacccggg aacggccggc gggctggcac 4080
221 acaccgccgc cggtcggcgt cattccggaa gctgccgcat ctccaggacg cgcaggccca 4140
222 gcgactggca gcgggtgagc aacccgtaca gatgggcctc gtcgatcacc gtgccgaaca 4200
223 gcacggtctg gccggacatg acgacgtgct ccagctccgg gaacgcgttg gccagcgtcc 4260
224 gtgacaggtg tccctcgacg cggatctcgt agcgcacgag cggtcctttc accgtaggag 4320
225 ctcgggacac cgcccggggc tccgggtcgg acggtgctct tggtgacgag cctgcgcctc 4380
226 gtcgccctcc ggtgccctca cccagcacag gtgactccaa ccgcagtgtc agtgcctttc 4440
227 agtgcgtcac tgtgatcttg acgacgacga tcaccaggcc gagcagtacg ttgaccgtcg 4500
228 cggtgacggc caccagtcgt cgcgaggcgc ccgcgcggtg cgccgcggcg acggaccagc 4560
229 ccacctgacc ggcgacggcg acggacagcg ccagccacag ggtgcccggg acgtccagcc 4620
230 ccagtacggg gctgacggcg atggccgcgg ccggaggcac ggcggccttg acgatcggcc 4680
231 actcctcgcg gcacacacgc agaatcaccc gccggtccgg agtgtgccgc gcgagacgcg 4740
232 ctccgaacag ttcggcgtgg acgtgagcga tccagaacac caagctggtg agcaacagca 4800
233 gaagaaccag ttcggcgcgg gggaacgagc ccagggtgcc ggcgccgatc acgacggagg 4860