4 The `randsample` command is part of the MACS3 suite of tools and is
5 used to randomly sample a certain number or percentage of tags from
6 alignment files. This can be useful in ChIP-Seq analysis when a
7 subset of the data is required for downstream analysis.
9 ## Detailed Description
11 The `randsample` command takes in one or multiple input alignment
12 files and produces an output file with the randomly sampled tags. It
13 will randomly sample the tags, according to setting for percentage or
14 for total number of tags to be kept.
16 When `-p 100` is used, which means we want to keep all reads, the
17 `randsample` command can be used to convert any format MACS3 supported
18 to BED (or BEDPE if the input is BAMPE format) format. It can generate
19 the same result as `filterdup --keep-dup all` to convert other formats
20 into BED/BEDPE format.
22 Please note that, when writing BED format output for single-end
23 dataset, MACS3 assume all the reads having the same length either from
24 `-s` setting or from auto-detection.
26 ## Command Line Options
28 Here is a brief overview of the `randsample` options:
30 - `-i` or `--ifile`: Alignment file. If multiple files are given as
31 '-t A B C', then they will all be read and combined. REQUIRED.
32 - `-p` or `--percentage`: Percentage of tags you want to keep. Input
33 80.0 for 80%. This option can't be used at the same time with
34 -n/--num. If the setting is 100, it will keep all the reads and
35 convert any format that MACS3 supports into BED or BEDPE (if input
36 is BAMPE) format. REQUIRED
37 - `-n` or `--number`: Number of tags you want to keep. Input 8000000
38 or 8e+6 for 8 million. This option can't be used at the same time
39 with -p/--percent. Note that the number of tags in the output is
40 approximate as the number specified here. REQUIRED
41 - `--seed`: Set the random seed while downsampling data. Must be a
42 non-negative integer in order to be effective. If you want more
43 reproducible results, please specify a random seed and record
45 - `-o` or `--ofile`: Output BED file name. If not specified, will
46 write to standard output. Note, if the input format is BAMPE or
47 BEDPE, the output will be in BEDPE format. DEFAULT: stdout
48 - `--outdir`: If specified, all output files will be written to that
49 directory. Default: the current working directory
50 - `-s` or `--tsize`: Tag size. This will override the auto-detected
51 tag size. DEFAULT: Not set
52 - `-f` or `--format`: Format of the tag file.
53 - `AUTO`: MACS3 will pick a format from "AUTO", "BED", "ELAND",
54 "ELANDMULTI", "ELANDEXPORT", "SAM", "BAM", "BOWTIE", "BAMPE", and
55 "BEDPE". Please check the definition in the README file if you
56 choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE or
57 BAMPE/BEDPE. DEFAULT: "AUTO"
58 - `--buffer-size`: Buffer size for incrementally increasing the
59 internal array size to store read alignment information. In most
60 cases, you don't have to change this parameter. However, if there
61 are a large number of chromosomes/contigs/scaffolds in your
62 alignment, it's recommended to specify a smaller buffer size in
63 order to decrease memory usage (but it will take longer time to read
64 alignment files). Minimum memory requested for reading an alignment
65 file is about # of CHROMOSOME * BUFFER_SIZE * 8 Bytes. DEFAULT:
67 - `--verbose`: Set the verbose level. 0: only show critical messages,
68 1: show additional warning messages, 2: show process information, 3:
69 show debug messages. If you want to know where the duplicate reads
70 are, use 3. DEFAULT: 2
75 Here is an example of how to use the `randsample` command:
78 macs3 randsample -i treatment.bam -o sampled.bed -f BAM -p 10
81 In this example, the program will randomly sample 10 percent of total
82 tags from the `treatment.bam` file and write the result to