Bio::DB::TFBS namespace has been moved to its own distribution named after itself
[bioperl-live.git] / t / data / genomic-seq.genscan
blob2582520989942baa2543ee0544c95f80e6c93018
1 GENSCAN 1.0     Date run:  1-Aug-100    Time: 16:43:38
3 Sequence HSBA536C5 : 168628 bp : 49.21% C+G : Isochore 2 (43 - 51 C+G%)
5 Parameter matrix: HumanIso.smat
7 Predicted genes/exons:
9 Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..
10 ----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
12  2.04 PlyA -   7901   7896    6                               1.05
13  2.03 Term -  10642  10463  180  1  0   28   43   120 0.957  -0.89
14  2.02 Intr -  11044  10815  230  2  2   84   44   310 0.981  23.79
15  2.01 Init -  14499  13650  850  0  1  126   53  2079 0.818 202.23
16  2.00 Prom -  16112  16073   40                              -5.56
18  3.00 Prom +  18327  18366   40                              -5.06
19  3.01 Init +  18680  18726   47  1  2   84  105    30 0.585   4.46
20  3.02 Intr +  23250  23284   35  0  2  151   69    35 0.533   5.77
21  3.03 Term +  26615  26664   50  0  2  108   43    36 0.267  -1.43
22  3.04 PlyA +  27305  27310    6                               1.05
24  8.32 PlyA - 114694 114689    6                               1.05
25  8.31 Term - 117609 117581   29  1  2  139   37    35 0.986   1.74
26  8.30 Intr - 118004 117913   92  1  2  126   77   101 0.988  12.44
27  8.29 Intr - 121211 121110  102  1  0   85   89    95 0.997   8.59
28  8.28 Intr - 121457 121327  131  2  2  130   51   125 0.999  12.49
29  8.27 Intr - 125623 125478  146  2  2  108   92   121 0.958  14.50
30  8.26 Intr - 126663 126540  124  0  1  113   58   151 0.981  14.76
31  8.25 Intr - 127050 126896  155  1  2   72   91   196 0.685  18.09
32  8.24 Intr - 128563 128395  169  1  1   91   72   343 0.999  32.52
33  8.23 Intr - 129031 128881  151  0  1   68   95   202 0.996  19.06
34  8.22 Intr - 129561 129425  137  0  2  113   94   171 0.999  19.57
35  8.21 Intr - 131557 131385  173  2  2  121   94    69 0.957  10.46
36  8.20 Intr - 131891 131702  190  2  1  126   66   153 0.780  16.06
37  8.19 Intr - 135872 135738  135  2  0   37   92   171 0.802  13.16
38  8.18 Intr - 136182 136073  110  1  2  139   33   122 0.867  11.80
39  8.17 Intr - 136622 136424  199  2  1   96   22   400 0.999  33.12
40  8.16 Intr - 138994 138726  269  2  2   89   74   152 0.257  11.15
41  8.15 Intr - 143743 143626  118  1  1  100   63   113 0.289  10.04
42  8.14 Intr - 144150 144016  135  0  0   43  100   129 0.999  10.36
43  8.13 Intr - 147107 146994  114  2  0  102   91   154 0.995  17.74
44  8.12 Intr - 148107 147904  204  0  0  104   92    97 0.839  11.10
45  8.11 Intr - 149987 149928   60  2  0  114  113    90 0.999  13.03
46  8.10 Intr - 151157 150965  193  1  1   75   77   125 0.355   9.59
47  8.09 Intr - 161359 161278   82  2  1  105   95    51 0.520   6.20
48  8.08 Intr - 163259 163168   92  1  2  117   91   174 0.980  20.24
49  8.07 Intr - 163512 163411  102  2  0  141   89    85 0.999  13.19
50  8.06 Intr - 166251 166121  131  0  2  113   81   212 0.999  22.49
51  8.05 Intr - 166582 166437  146  2  2  111   92   215 0.999  24.20
52  8.04 Intr - 166905 166782  124  0  1  107   70   221 0.999  22.36
53  8.03 Intr - 167313 167159  155  1  2  116   89   268 0.999  29.49
54  8.02 Intr - 167718 167550  169  0  1   96   72   360 0.999  34.72
55  8.01 Intr - 168007 167857  151  0  1   75   99   227 0.984  22.66
57 Predicted peptide sequence(s):
59 Predicted coding sequence(s):
62 >HSBA536C5|GENSCAN_predicted_peptide_2|419_aa
63 MAQENAAFSPGQEEPPRRRGRQRYVEKDGRCNVQQGNVRETYRYLTDLFTTLVDLQWRLS
64 LLFFVLAYALTWLFFGAIWWLIAYGRGDLEHLEDTAWTPCVNNLNGFVAAFLFSIETETT
65 IGYGHRVITDQCPEGIVLLLLQAILGSMVNAFMVGCMFVKISQPNKRAATLVFSSHAVVS
66 LRDGRLCLMFRVGDLRSSHIVEASIRAKLIRSRQTLEGEFIPLHQTDLSVGFDTGDDRLF
67 LVSPLVISHEIDAASPFWEASRRALERDDFEIVVILEGMVEATGMTCQARSSYLVDEGLW
68 GHRFTSVLTLEDGFYEVDYASFHETFEVPTPSCSARELAEAAARLDAHLYWSIPSRLDEK
69 RVSPRCDQLPPDPCGRPGARHRYMGNCISEVVEEEEEEEGKAPGNVLKLESPRPPEPQV
71 >HSBA536C5|GENSCAN_predicted_CDS_2|1260_bp
72 atggcgcaggagaacgcggccttctcgcccgggcaggaggagccgccgcggcgccgcggc
73 cgccagcgctacgtggagaaggatggccggtgcaacgtgcagcagggcaacgtgcgcgag
74 acataccgctacctgacggacctgttcaccacgctggtggacctgcagtggcgcctcagc
75 ctgttgttcttcgtcctggcctacgcgctcacctggctcttcttcggcgccatctggtgg
76 ctgatcgcctacggccgcggcgacctggagcacctggaggacaccgcgtggacgccgtgc
77 gtcaacaacctcaacggcttcgtggccgccttcctcttctccatcgagaccgagaccacc
78 atcggctacgggcaccgcgtcatcaccgaccagtgccccgagggcatcgtgctgctgctg
79 ctgcaggccatcctgggctccatggtgaacgccttcatggtgggctgcatgttcgtcaag
80 atctcgcagcccaacaagcgcgcagccacgctcgtcttctcctcgcacgccgtggtgtcg
81 ctgcgcgacgggcgcctctgcctcatgttccgcgtgggcgacttgcgctcctcacacata
82 gtggaggcctccatccgcgccaagctcatccgctcgcgccagacgctggagggcgagttc
83 atcccgctgcaccagaccgacctcagcgtgggcttcgacacgggagacgaccgcctcttc
84 ctcgtctcgccgctggttatcagccacgagatcgacgccgccagccccttctgggaggcg
85 tcgcgccgtgccctcgagagggacgacttcgagatcgtcgttatcctcgagggcatggtg
86 gaagccacgggaatgacatgccaagctcggagctcctacctggtagacgaggggctgtgg
87 ggccaccgcttcacgtcagtgctgactctggaggacggcttctacgaagtggactatgcc
88 agctttcacgagacttttgaggtgcccacaccttcgtgcagtgctcgagagctggcagag
89 gctgccgcccgccttgatgcccatctctactggtccatccccagccggctggatgagaag
90 agagtgagtccaaggtgtgaccagcttcctccagacccctgtggcagaccgggggccaga
91 cacagatacatggggaactgcatatcggaggtggtggaggaggaggaggaggaggaaggc
92 aaagcccctggaaatgtgctaaagttggaaagtccccgtcccccagaacctcaagtctag
94 >HSBA536C5|GENSCAN_predicted_peptide_3|43_aa
95 MNTAAINIHRQIFMWTSSVVKTSFTVTFSSPGVIPPRLPYARE
97 >HSBA536C5|GENSCAN_predicted_CDS_3|132_bp
98 atgaatacagctgctataaacatccatcggcagattttcatgtggacgtcttctgtggtg
99 aagacctccttcactgtgaccttctcctcaccaggtgtgatcccccccaggctcccctat
100 gcccgtgaatga
102 >HSBA536C5|GENSCAN_predicted_peptide_8|1429_aa
103 XEAKACVVHGSDLKDMTSEQLDEILKNHTEIVFARTSPQQKLIIVEGCQRQGAIVAVTGD
104 GVNDSPALKKADIGIAMGISGSDVSKQAADMILLDDNFASIVTGVEEGRLIFDNLKKSIA
105 YTLTSNIPEITPFLLFIIANIPLPLGTVTILCIDLGTDMVPAISLAYEAAESDIMKRQPR
106 NSQTDKLVNERLISMAYGQIGMIQALGGFFTYFVILAENGFLPSRLLGIRLDWDDRTMND
107 LEDSYGQEWTYEQRKVVEFTCHTAFFASIVVVQWADLIICKTRRNSVFQQGMKNKILIFG
108 LLEETALAAFLSYCPGMGVALRMYPLKVTWWFCAFPYSLLIFIYDEVRKLILRRYPGDLA
109 ITKGSSGECKSLRLEKVDLSPSRGCFLPTVELGQLFLGIAMGLWGKKGTVAPHDQSPRRR
110 PKKGLIKKKMVKREKQKRNMEELKKEVVMDDHKLTLEELSTKYSVDLTKGHSHQRAKEIL
111 TRGGPNTVTPPPTTPEWVKFCKQLFGGFSLLLWTGAILCFVAYSIQIYFNEEPTKDNLYL
112 SIVLSVVVIVTGCFSYYQEAKSSKIMESFKNMVPQQALVIRGGEKMQINVQEVVLGDLVE
113 IKGGDRVPADLRLISAQGCKVDNSSLTGESEPQSRSPDFTHENPLETRNICFFSTNCVEG
114 TARGIVIATGDSTVMGRIASLTSGLAVGQTPIAAEIEHFIHLITVVAVFLGVTFFALSLL
115 LGYGWLEAIIFLIGIIVANVPEGLLATVTVCLTLTAKRMARKNCLVKNLEAVETLGSTST
116 ICSDKTGTLTQNRMTVAHMWFDMTVYEADTTEEQTGKTFTKSSDTWFMLARIAGLCNRAD
117 FKANQEILPIAKRATTGDASESALLKFIEQSYSSVAEMREKNPKVAEIPFNSTNKYQMSI
118 HLREDSSQTHVLMMKGAPERILEFCSTFLLNGQEYSMNDEMKEAFQNAYLELGGLGERVL
119 GFCFLNLPSSFSKGFPFNTDEINFPMDNLCFVGLISMIDPPRAAVPDAVSKCRSAGIKVI
120 MVTGDHPITAKAIAKGVGIISEGTETAEEVAARLKIPISKVDASAAKAIVVHGAELKDIQ
121 SKQLDQILQNHPEIVFARTSPQQKLIIVEGCQRLGAVVAVTGDGVNDSPALKKADIGIAM
122 GISGSDVSKQAADMILLDDNFASIVTGVEEGRLIFDNLKKSIMYTLTSNIPEITPFLMFI
123 ILGIPLPLGTITILCIDLGTDMVPAISLAYESAESDIMKRLPRNPKTDNLVNHRLIGMAY
124 GQIGMIQALAGFFTYFVILAENGFRPVDLLGIRLHWEDKYLNDLEDSYGQQWTYEQRKVV
125 EFTCQTAFFVTIVVVQWADLIISKTRRNSLFQQGMRNKVLIFGILEETLLAAFLSYTPGM
126 DVALRMYPLKITWWLCAIPYSILIFVYDEIRKLLIRQHPDGWVERETYY
128 >HSBA536C5|GENSCAN_predicted_CDS_8|4290_bp
129 nnagaagccaaggcatgcgtggtgcacggctctgacctgaaggacatgacatcggagcag
130 ctcgatgagatcctcaagaaccacacagagatcgtctttgctcgaacgtctccccagcag
131 aagctcatcattgtggagggatgtcagaggcagggagccattgtggccgtgacgggtgac
132 ggggtgaacgactcccctgcattgaagaaggctgacattggcattgccatgggcatctct
133 ggctctgacgtctctaagcaggcagccgacatgatcctgctggatgacaactttgcctcc
134 atcgtcacgggggtggaggagggccgcctgatctttgacaacttgaagaaatccatcgcc
135 tacaccctgaccagcaacatccccgagatcacccccttcctgctgttcatcattgccaac
136 atccccctacctctgggcactgtgaccatcctttgcattgacctgggcacagatatggtc
137 cctgccatctccttggcctatgaggcagctgagagtgatatcatgaagcggcagccacga
138 aactcccagacggacaagctggtgaatgagaggctcatcagcatggcctacggacagatc
139 gggatgatccaggcactgggtggcttcttcacctactttgtgatcctggcagagaacggt
140 ttcctgccatcacggctactgggaatccgcctcgactgggatgaccggaccatgaatgat
141 ctggaggacagctatggacaggagtggacctatgagcagcggaaggtggtggagttcacg
142 tgccacacggcattctttgccagcatcgtggtggtgcagtgggctgacctcatcatctgc
143 aagacccgccgcaactcagtcttccagcagggcatgaagaacaagatcctgatttttggg
144 ctcctggaggagacggcgttggctgcctttctctcttactgcccaggcatgggtgtagcc
145 ctccgcatgtacccgctcaaagtcacctggtggttctgcgccttcccctacagcctcctc
146 atcttcatctatgatgaggtccgaaagctcatcctgcggcggtatcctggtgaccttgca
147 atcacaaaaggttcttctggtgagtgcaagagcctgagactggaaaaggtggacttgtct
148 cccagtcgaggctgctttcttcccacagttgagctcgggcagctctttctggggatagct
149 atggggctttgggggaagaaagggacagtggctccccatgaccagagtccaagacgaaga
150 cctaaaaaagggcttatcaagaaaaaaatggtgaagagggaaaaacagaagcgcaatatg
151 gaggaactgaagaaggaagtggtcatggatgatcacaaattaaccttggaagagctgagc
152 accaagtactccgtggacctgacaaagggccatagccaccaaagggcaaaggaaatcctg
153 actcgaggtggacccaatactgttaccccaccccccaccactccagaatgggtcaaattc
154 tgtaagcaactgttcggaggcttctccctcctactatggactggggccattctctgcttt
155 gtggcctacagcatccagatatatttcaatgaggagcctaccaaagacaacctctacctg
156 agcatcgtactgtccgtcgtggtcatcgtcactggctgcttctcctattatcaggaggcc
157 aagagctccaagatcatggagtcttttaagaacatggtgcctcagcaagctctggtaatt
158 cgaggaggagagaagatgcaaattaatgtacaagaggtggtgttgggagacctggtggaa
159 atcaagggtggagaccgagtccctgctgacctccggcttatctctgcacaaggatgtaag
160 gtggacaactcatccttgactggggagtcagaaccccagagccgctcccctgacttcacc
161 catgagaaccctctggagacccgaaacatctgcttcttttccaccaactgtgtggaagga
162 accgcccggggtattgtgattgctacgggagactccacagtgatgggcagaattgcctcc
163 ctgacgtcaggcctggcggttggccagacacctatcgctgctgagatcgaacacttcatc
164 catctgatcactgtggtggccgtcttccttggtgtcactttttttgcgctctcacttctc
165 ttgggctatggttggctggaggctatcatttttctcattggcatcattgtggccaatgtg
166 cctgaggggctgttggccacagtcactgtgtgcctgaccctcacagccaagcgcatggcg
167 cggaagaactgcctggtgaagaacctggaggcggtggagacgctgggctccacgtccacc
168 atctgctcagacaagacgggcaccctcacccagaaccgcatgaccgtcgcccacatgtgg
169 tttgatatgaccgtgtatgaggccgacaccactgaagaacagactggaaaaacatttacc
170 aagagctctgatacctggtttatgctggcccgaatcgctggcctctgcaaccgggctgac
171 tttaaggctaatcaggagatcctgcccattgctaagagggccacaacaggtgatgcttcc
172 gagtcagccctcctcaagttcatcgagcagtcttacagctctgtggcggagatgagagag
173 aaaaaccccaaggtggcagagattccctttaattctaccaacaagtaccagatgtccatc
174 caccttcgggaggacagctcccagacccacgtactgatgatgaagggtgctccggagagg
175 atcttggagttttgttctacctttcttctgaatgggcaggagtactcaatgaacgatgaa
176 atgaaggaagccttccaaaatgcctacttagaactgggaggtctgggggaacgtgtgcta
177 ggcttctgcttcttgaatctgcctagcagcttctccaagggattcccatttaatacagat
178 gaaataaatttccccatggacaacctttgttttgtgggcctcatatccatgattgaccct
179 ccccgagctgcagtgcctgatgctgtgagcaagtgtcgcagtgcaggaattaaggtgatc
180 atggtaacaggagatcatcccattacagctaaggccattgccaagggtgtgggcatcatc
181 tcagaaggcactgagacggcagaggaagtcgctgcccggcttaagatccctatcagcaag
182 gtcgatgccagtgctgccaaagccattgtggtgcatggtgcagaactgaaggacatacag
183 tccaagcagcttgatcagatcctccagaaccaccctgagatcgtgtttgctcggacctcc
184 cctcagcagaagctcatcattgtcgagggatgtcagaggctgggagccgttgtggccgtg
185 acaggtgacggggtgaacgactcccctgcgctgaagaaggctgacattggcattgccatg
186 ggcatctctggctctgacgtctctaagcaggcagccgacatgatcctgctggatgacaac
187 tttgcctccatcgtcacgggggtggaggagggccgcctgatctttgacaacctgaagaaa
188 tccatcatgtacaccctgaccagcaacatccccgagatcacgcccttcctgatgttcatc
189 atcctcggtatacccctgcctctgggaaccataaccatcctctgcattgatctcggcact
190 gacatggtccctgccatctccttggcttatgagtcagctgaaagcgacatcatgaagagg
191 cttccaaggaacccaaagacggataatctggtgaaccaccgtctcattggcatggcctat
192 ggacagattgggatgatccaggctctggctggattctttacctactttgtaatcctggct
193 gagaatggttttaggcctgttgatctgctgggcatccgcctccactgggaagataaatac
194 ttgaatgacctggaggacagctacggacagcagtggacctatgagcaacgaaaagttgtg
195 gagttcacatgccaaacggccttttttgtcaccatcgtggttgtgcagtgggcggatctc
196 atcatctccaagactcgccgcaactcacttttccagcagggcatgagaaacaaagtctta
197 atatttgggatcctggaggagacactcttggctgcatttctgtcctacactccaggcatg
198 gacgtggccctgcgaatgtacccactcaagataacctggtggctctgtgccattccctac
199 agtattctcatcttcgtctatgatgaaatcagaaaactcctcatccgtcagcacccggat
200 ggctgggtggaaagggagacgtactactaa
203 Explanation
205 Gn.Ex : gene number, exon number (for reference)
206 Type  : Init = Initial exon
207         Intr = Internal exon
208         Term = Terminal exon
209         Sngl = Single-exon gene
210         Prom = Promoter
211         PlyA = poly-A signal
212 S     : DNA strand (+ = input strand; - = opposite strand)
213 Begin : beginning of exon or signal (numbered on input strand)
214 End   : end point of exon or signal (numbered on input strand)
215 Len   : length of exon or signal (bp)
216 Fr    : reading frame (a codon ending at x is in frame f = x mod 3)
217 Ph    : net phase of exon (length mod 3)
218 I/Ac  : initiation signal or acceptor splice site score (x 10)
219 Do/T  : donor splice site or termination signal score (x 10)
220 CodRg : coding region score (x 10)
221 P     : probability of exon (sum over all parses containing exon)
222 Tscr  : exon score (depends on length, I/Ac, Do/T and CodRg scores)
224 Comments
226 The SCORE of a predicted feature (e.g., exon or splice site) is a
227 log-odds measure of the quality of the feature based on local sequence
228 properties. Thus, for example, a predicted donor splice site with
229 score > 100 is excellent; 50-100 is acceptable; 0-50 is weak; and
230 below 0 is poor (probably not a real donor site).
232 The PROBABILITY of a predicted exon is the estimated probability under
233 GENSCAN's model of genomic sequence structure that the exon is correct.
234 This probability depends in general on global as well as local sequence
235 properties.  This information can be used to assess the reliability of the
236 predicted exon, e.g., it would be better to design PCR primers based on
237 a predicted exon with probability > 0.95 than one with lower probability.