Public Git Hosting - GalaxyCodeBases.git/blob - BioInfo/BS-Seq/bwa-meth/compare/README.md

1 Setup

2 =====

4 The base files for the analysis should be in data/

6 real_R1.fastq.gz, real_R2.fastq.gz

7 trim_R1.fastq, trim_R2.fastq.gz

9 You can get these from (must use browser, not wget)

11 https://copy.com/jxoJKLOhWzjz

13 You can then generate the trimmed versions as:

15 bash src/trim.sh data/real_R{1,2}.fastq.gz 15

16 bash src/trim.sh data/sim_R{1,2}.fastq.gz 20

18 Where the number indicates the threshold.

21 Simulated

22 ---------

24 The simulated data were generated with Sherman v0.1.6. using:

25 `src/gen-simulated.sh`

27 To generate another version of the the simulated data, use

28 `src/gen-simulated.sh` which should require only minimal changes to point to the

29 reference fasta for `mm10` on your system.

32 Once the simulated data is generated, run:

34 bash src/trim.sh data/sim_R{1,2}.fastq.gz

36 Adjust ./common.sh so that `REF` points the the `mm10` on your system.

38 Index

39 =====

40 You will then need to create the appropriate indexes for each of the

41 alingers you wish to test.

43 Bowtie:

44 bismark_genome_preparation /path/to/reference/files/

46 GSNAP:

47 python src/gsnap-meth.py index $REF 15

49 bwa-meth:

50 python ../bwameth.py index $REF

52 last:

53 lastdb -w 2 -u /path/to/last-hg/examples/bisulfite_f.seed $REF.last_f $REF

54 lastdb -w 2 -u /path/to/last-hg/examples/bisulfite_r.seed $REF.last_r $REF

56 bsmap:

57 no indexing

59 bison:

60 bison_index $REF

62 bsmooth:

63 bswc_bowtie2_index.pl --name=bsmooth/mm10 $REF

65 Align

66 =====

68 Once you have the indexes in place and `REF=` in common.sh in place, you can

69 use the run-{method} scripts to run each of the aligners.

71 All of the run-{method} scripts source `common.sh` so make sure that has

72 what you want.

74 Assessment

75 ==========

77 For simulated reads:

79 $ python src/sim-roc.py \

80 --reads data/sim_R1.fastq.gz \ # this is to get the number of input reads

81 results/*-sim.bam > sim-trim-qual-summ.txt

83 Used:

85 $ python src/sim-roc.py --reads 1000000 results/*-sim.bam results/bsmooth/bsmooth-sim.bam results/bison-sim/sim_R1.bam > sim.quals.txt

87 And:

89 $ python src/sim-roc.py --reads 1000000 results/trim/*-sim.bam results/trim//bsmooth/bsmooth-sim.bam results/trim/bison-sim/sim_R1.trim.bam > sim.trim.quals.txt

91 And:

93 $ python src/sim-roc.py --reads 1000000 results/trim/*-noerrorsim.bam results/trim//bsmooth/bsmooth-noerrorsim.bam results/trim/bison-noerrorsim/noerrorsim_R1.trim.bam > noerrorsim.trim.quals.txt

95 And:

97 $ python src/sim-roc.py --reads 1000000 results/*-noerrorsim.bam results/bsmooth/bsmooth-noerrorsim.bam results/bison-noerrorsim/noerrorsim_R1.bam > noerrorsim.quals.txt

99 This will make a plot with matplotlib that's not very pretty. One can then create

100 a nice ggplot, plot with:

101

102 Rscript src/plot-quals.R sim-trim-quals.txt sim-trim-quals.png

103 Rscript src/plot-quals.R sim-quals.txt sim-quals.png

104

105 And the output will appear in the png (or .pdf or .eps) specified as the 2nd arg.

106

107

108 For real reads:

109

110 $ python src/target-roc.py \