4 The `predictd` command is part of the MACS3 suite of tools and is used
5 to predict the expected DNA fragment size from alignment files. It
6 uses the cross-correlation method to find the best shift to correlate
7 the cutting ends on plus and minus strands.
9 ## Detailed Description
11 The `predictd` command takes an input bedGraph file and predicts *d*
12 or fragment size from alignment results. In case of paired-end data,
13 it will report the average insertion/fragment size from all
14 pairs. Note there will be no step for duplicate reads filtering or
15 sequencing depth scaling, so you may need to do certain
16 pre/post-processing, such as using `filterdup` or `randsample`
19 If the alignment file is a single-end file, a model file (from
20 `--rfile`) will be saved which can be used to visualize the model in
21 PDF. And the command line output will tell the predicted *d* size in
22 the line of `predicted fragment length is` and alternative *d* sizes
23 in the line of `alternative fragment length(s) may be`.
25 If the alignment file is a paired-end file (`-f BAMPE` or `-f BEDPE`),
26 the model file won't be generated. Instead, you can find the average
27 fragment size in the command line output in the line of: `Average
28 insertion length of all pairs is`.
30 ## Command Line Options
32 Here is a brief overview of the `predictd` options:
34 - `-i` or `--ifile`: ChIP-seq alignment file. If multiple files are
35 given as '-t A B C', then they will all be read and
37 - `-f` or `--format`: Format of the tag file.
38 - `AUTO`: MACS3 will pick a format from "AUTO", "BED", "ELAND",
39 "ELANDMULTI", "ELANDEXPORT", "SAM", "BAM", "BOWTIE", "BAMPE", and
40 "BEDPE". However, if you want to decide the average insertion
41 size/fragment size from PE data such as BEDPE or BAMPE, please
42 specify the format as BAMPE or BEDPE since MACS3 won't
43 automatically recognize these two formats with -f AUTO. Please be
44 aware that in PE mode, -g, -s, --bw, --d-min, -m, and --rfile have
45 NO effect. DEFAULT: "AUTO"
46 - `-g` or `--gsizeE`: Please check [`callpeak`](./callpeak.md) for
48 - `-s` or `--tsize`: Tag size. This will override the auto-detected
49 tag size. DEFAULT: Not set
50 - `--bw`: Bandwidth for picking regions to compute the fragment
51 size. This value is only used while building the shifting
53 - `--d-min`: Minimum fragment size in base pairs. Any predicted
54 fragment size less than this will be excluded. DEFAULT: 20
55 - `-m` or `--mfoldD`: Select the regions within MFOLD range of
56 high-confidence enrichment ratio against background to build the
57 model. Fold-enrichment in regions must be lower than the upper limit
58 and higher than the lower limit. Use as "-m 10 30". DEFAULT: 5 50
59 - `--outdir`: If specified, all output files will be written to that
60 directory. Default: the current working directory
61 - `--rfile`: PREFIX of the filename of the R script for drawing the
62 X-correlation figure. DEFAULT: 'predictd_model.R' and the R file
63 will be predicted_model.R
64 - `--buffer-size`: Buffer size for incrementally increasing the
65 internal array size to store read alignment information. In most
66 cases, you don't have to change this parameter. However, if there is
67 a large number of chromosomes/contigs/scaffolds in your alignment,
68 it's recommended to specify a smaller buffer size in order to
69 decrease memory usage (but it will take longer time to read
70 alignment files). Minimum memory requested for reading an alignment
71 file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT:
73 - `--verbose`: Set the verbose level of runtime messages. 0: only show
74 critical messages, 1: show additional warning messages, 2: show
75 process information, 3: show debug messages. DEFAULT: 2
79 Here is an example of how to use the `predictd` command:
82 macs3 predictd -i input.bedGraph --rfile model.R
85 Then you can use R to make a figure for the model: