4 The base files for the analysis should be in data/
6 real_R1.fastq.gz, real_R2.fastq.gz
7 trim_R1.fastq, trim_R2.fastq.gz
9 You can get these from (must use browser, not wget)
11 https://copy.com/jxoJKLOhWzjz
13 You can then generate the trimmed versions as:
15 bash src/trim.sh data/real_R{1,2}.fastq.gz 15
16 bash src/trim.sh data/sim_R{1,2}.fastq.gz 20
18 Where the number indicates the threshold.
24 The simulated data were generated with Sherman v0.1.6. using:
25 `src/gen-simulated.sh`
27 To generate another version of the the simulated data, use
28 `src/gen-simulated.sh` which should require only minimal changes to point to the
29 reference fasta for `mm10` on your system.
32 Once the simulated data is generated, run:
34 bash src/trim.sh data/sim_R{1,2}.fastq.gz
36 Adjust ./common.sh so that `REF` points the the `mm10` on your system.
40 You will then need to create the appropriate indexes for each of the
41 alingers you wish to test.
44 bismark_genome_preparation /path/to/reference/files/
47 python src/gsnap-meth.py index $REF 15
50 python ../bwameth.py index $REF
53 lastdb -w 2 -u /path/to/last-hg/examples/bisulfite_f.seed $REF.last_f $REF
54 lastdb -w 2 -u /path/to/last-hg/examples/bisulfite_r.seed $REF.last_r $REF
63 bswc_bowtie2_index.pl --name=bsmooth/mm10 $REF
68 Once you have the indexes in place and `REF=` in common.sh in place, you can
69 use the run-{method} scripts to run each of the aligners.
71 All of the run-{method} scripts source `common.sh` so make sure that has
79 $ python src/sim-roc.py \
80 --reads data/sim_R1.fastq.gz \ # this is to get the number of input reads
81 results/*-sim.bam > sim-trim-qual-summ.txt
85 $ python src/sim-roc.py --reads 1000000 results/*-sim.bam results/bsmooth/bsmooth-sim.bam results/bison-sim/sim_R1.bam > sim.quals.txt
89 $ python src/sim-roc.py --reads 1000000 results/trim/*-sim.bam results/trim//bsmooth/bsmooth-sim.bam results/trim/bison-sim/sim_R1.trim.bam > sim.trim.quals.txt
93 $ python src/sim-roc.py --reads 1000000 results/trim/*-noerrorsim.bam results/trim//bsmooth/bsmooth-noerrorsim.bam results/trim/bison-noerrorsim/noerrorsim_R1.trim.bam > noerrorsim.trim.quals.txt
97 $ python src/sim-roc.py --reads 1000000 results/*-noerrorsim.bam results/bsmooth/bsmooth-noerrorsim.bam results/bison-noerrorsim/noerrorsim_R1.bam > noerrorsim.quals.txt
99 This will make a plot with matplotlib that's not very pretty. One can then create
100 a nice ggplot, plot with:
102 Rscript src/plot-quals.R sim-trim-quals.txt sim-trim-quals.png
103 Rscript src/plot-quals.R sim-quals.txt sim-quals.png
105 And the output will appear in the png (or .pdf or .eps) specified as the 2nd arg.
110 $ python src/target-roc.py \
111 data/mm10.capture-regions.bed.gz \
112 --reads data/real_R1.fastq.gz \ # this is to get the number of input reads
115 Both real and simulated data can also be after trimming.
116 BAMs from trimmed input appear in results/trim/
121 If you want to simply align some reads. The entire syntax is:
123 bwameth.py index /path/to/ref.fasta
124 bwameth.py --reference /path/to/ref.fasta reads_R1.fastq reads_R2.fastq -p myoutput
126 and the result will appear in myoutput.bam