3 date: 'Fri 2015-05-08 20:31:31 +0800'
4 slug: "build-tree-with-mt-sequences"
5 title: "Build Tree with Mt Sequences"
7 category: bioinformatics
10 {% include JB/setup %}
14 CLUSTAL 2.1 Multiple Sequence Alignments
17 2. Multiple Alignments ->
18 F. Toggle FASTA format output = ON
19 1. Do complete multiple alignment now Slow/Accurate
22 生成multi-Fasta文件,自己转nex,保证每条序列一行。
28 jModelTest 2 (自带PhyML_3.0)
29 https://github.com/ddarriba/jmodeltest2
32 https://github.com/stephaneguindon/phyml/
35 Analysis -> Compute likelihood scores -> 设置 Number of substitution schemes = 5(多的通常没必要)
36 Analysis -> Do AIC calculations -> 勾选Write PAUP* block
43 在`.nex`文件后面追加paup的命令,例:
44 `less -Sx 30 mt_paup_2.nex`
50 Dimensions ntax=11 nchar=15461;
51 Format datatype=DNA gap=- missing=?;
53 NC_009970_M.ursinus GTTTACGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTATATAACTCC
54 NC_003427_U.arctos GTTTATGTAGCTTAATAGTAAAGCAAGGCACTGAAAATGCCTAGACGAGTTATATAATTCC
55 NC_003428_U.maritimus GTTTATGTAGCTTAATAGTAAAGCAAGGCACTGAAAATGCCTAGACGAGTTATATAATTCC
56 NC_009968_H.malayanus GTCCATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTATATAACTCC
57 NC_003426_U.americanus GTTTATGTAGCTTAATAGTAAAGCAAGGCACTGAAAATGCCTAGACGAGTTATATAACTCC
58 AB863014_U.t.japonicus GTTTATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTGTATGACCCC
59 NC_009331_U.t.formosanus GTTTATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAAACAAGTTGTATAACCCC
60 NC_011118_U.t.thibetanus GTTTATGTAGTTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTGTATAACCCC
61 NC_008753_U.t.mupinensis GTTTATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTGTATAACCCC
62 NC_011117_U.t.ussuricus GTTTATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTGTATAACCCC
63 EF667005_U.t.ussuricus GTTTATGTAGCTTAATAATAAAGCAAGGCACTGAAAATGCCTAGACGAGTTGTATAACCCC
68 usertype 20_1 = 4 [weights transversions 20 times transitions]
77 set autoclose=yes increase=auto warntree=no warnreset=no;
79 Lset base=(0.3157 0.2581 0.1503 ) nst=6 rmat=(1.3137 81.4005 2.4087 1.1669 79.3722) rates=gamma shape=0.1760 ncat=4 pinvar=0;
80 set criterion=distance;
83 savetrees file=nj_gtrG.tre brlens=yes;
85 savetrees file=nj_gtrG_bs.tre savebootp=nodelabels from=1 to=1;
87 set criterion=likelihood;
88 Lset base=(0.3167 0.2570 0.1500 ) nst=2 tratio=23.4667 rates=gamma shape=0.1770 ncat=4 pinvar=0;
90 savetrees file=ml_hkyG.tre brlens=yes;
91 bootstrap search=heuristic;
92 savetrees file=ml_hkyG_bs.tre savebootp=nodelabels from=1 to=1;
94 set criterion=distance;
97 savetrees file=nj_hkyG.tre brlens=yes;
99 savetrees file=nj_hkyG_bs.tre savebootp=nodelabels from=1 to=1;
101 set criterion=parsimony;
103 savetrees file=mp.tre brlens=yes;
104 bootstrap search=heuristic;
105 savetrees file=mp_bs.tre savebootp=nodelabels from=1 to=1;
109 savetrees file=mp_20vs1.tre brlens=yes;
110 bootstrap search=heuristic;
111 savetrees file=mp_20vs1_bs.tre savebootp=nodelabels from=1 to=1;
118 然后,`nohup paup mt_paup_2.nex &`。
122 http://mrbayes.sourceforge.net/
129 lset nst=6 ploidy=haploid rates=gamma;
130 mcmc ngen=10000000 samplefreq=1000 printfreq=10000 diagnfreq=10000 nruns=3;
139 die "usage: mb <number of threads> <input.nex>\n" if @ARGV<2;
140 exec ("module add mpi/openmpi-x86_64 && mpirun -np $ARGV[0] /opt/mrbayes_3.2.2/src/mb $ARGV[1]");
145 默认每个run有4条chains,MPI线程数必须是`3x4=12`的因数,即1,2,3,4,6,12。否则报错。
147 结果是`mt_GTR-G.log`和`mt_GTR-G.nex.con.tre`。
149 检查log,要求`nruns=3`的3个run间方差稳定小于0.01,否则就是`ngen=10000000`太小,或者模型参数太多。例:
152 Chain results (10000000 generations requested):
154 0 -- [-65946.127] [...11 remote chains...]
155 10000 -- (-40726.293) [...11 remote chains...] -- 0:49:56
157 Average standard deviation of split frequencies: 0.037717
159 20000 -- (-40726.746) [...11 remote chains...] -- 0:41:35
161 Average standard deviation of split frequencies: 0.024056
163 30000 -- [-40719.331] [...11 remote chains...] -- 0:38:46
165 Average standard deviation of split frequencies: 0.005346
167 40000 -- (-40717.835) [...11 remote chains...] -- 0:37:21
169 Average standard deviation of split frequencies: 0.012416
171 50000 -- [-40719.091] [...11 remote chains...] -- 0:36:29
173 Average standard deviation of split frequencies: 0.013159
175 60000 -- (-40719.444) [...11 remote chains...] -- 0:35:53
177 Average standard deviation of split frequencies: 0.006973
179 70000 -- (-40722.172) [...11 remote chains...] -- 0:35:27
181 Average standard deviation of split frequencies: 0.006389
183 80000 -- (-40716.393) [...11 remote chains...] -- 0:35:08
185 Average standard deviation of split frequencies: 0.005937
187 90000 -- (-40718.560) [...11 remote chains...] -- 0:34:52
189 Average standard deviation of split frequencies: 0.012203
191 100000 -- (-40722.171) [...11 remote chains...] -- 0:34:39
193 Average standard deviation of split frequencies: 0.006121
195 110000 -- (-40731.146) [...11 remote chains...] -- 0:34:27
197 Average standard deviation of split frequencies: 0.002850
199 120000 -- (-40717.502) [...11 remote chains...] -- 0:34:18
201 Average standard deviation of split frequencies: 0.005112
205 Average standard deviation of split frequencies: 0.001247
207 9990000 -- (-40723.895) [...11 remote chains...] -- 0:00:02
209 Average standard deviation of split frequencies: 0.001277
210 10000000 -- (-40724.333) [...11 remote chains...] -- 0:00:00
212 Average standard deviation of split frequencies: 0.001194
214 Analysis completed in 39 mins 36 seconds
215 Analysis used 2373.21 seconds of CPU time on processor 0
216 Likelihood of best state for "cold" chain of run 1 was -40707.40
217 Likelihood of best state for "cold" chain of run 2 was -40707.40
218 Likelihood of best state for "cold" chain of run 3 was -40707.78
223 如果所有树拓扑一致,就说一致,然后直接选一个放上去。
225 将`*_bs.tre`的 bootstrap 值写下来,然后标到正常树上面。