From 715e30f7609751268ebf296e3bfc8db01532e79b Mon Sep 17 00:00:00 2001 From: philippadoherty <94635687+philippadoherty@users.noreply.github.com> Date: Thu, 20 Jul 2023 13:25:01 -0400 Subject: [PATCH] fix callvar arguments --- docs/callvar.md | 117 ++++++++++++++++++++++++++++++-------------------------- 1 file changed, 62 insertions(+), 55 deletions(-) rewrite docs/callvar.md (80%) diff --git a/docs/callvar.md b/docs/callvar.md dissimilarity index 80% index ba1aef5..b9ffdd7 100644 --- a/docs/callvar.md +++ b/docs/callvar.md @@ -1,55 +1,62 @@ -# Callvar - -## Overview -The `callvar` command is part of the MACS3 suite of tools and is used to call variants in peak regions. It is particularly useful in ChIP-Seq analysis where the identification of genomic variants is required. - -## Detailed Description - -The `callvar` command takes in treatment and control BAM files along with a bed file containing peak regions. The command identifies variants in these regions using a multi-process approach, greatly improving the speed and efficiency of variant calling. - -## Command Line Options - -The command line options for `callvar` are defined in `/MACS3/Commands/callvar_cmd.py` and `/bin/macs3` files. Here is a brief overview of these options: - -- `-t` or `--tfile`: The treatment BAM file. This option is required. -- `-c` or `--cfile`: The control BAM file. This option is required. -- `-f` or `--format`: The format of the BAM file. Default is AUTO. Other options include SAM and BAM. -- `-g` or `--gsize`: The size of the genome. Can be an integer or a string like '2.7e9'. Default is 'hs' for human genome size. -- `-p` or `--pvalue`: The p-value cutoff. Default is 1e-5. -- `-n` or `--name`: The name string of the experiment. This will be used to generate output file names. Default is 'NA'. -- `-s` or `--tsize`: The tag size. Default is None. -- `-B` or `--BAM`: Store results in BAM format. If this option is set, MACS will store all tags that pass quality filters to a BAM file including tags that failed to pass the shifting model. -- `-m` or `--mfold`: The mfold range. Default is [5,50]. -- `--bw`: The bandwidth used to compute the fragment size. Default is 300. -- `--fix-bimodal`: If set, MACS will not model but use the user-determined fragment size. -- `--nomodel`: If set, MACS will not build the shifting model. -- `--extsize`: While '--nomodel' is set, MACS uses this parameter to extend reads in 5'->3' direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when --nomodel is set or when MACS fails to build model and --fix-bimodal is not set. - -## Example Usage - -Here is an example of how to use the `callvar` command: - -```bash -macs3 callvar -t treatment.bam -c control.bam -f BAM -g hs -n experiment1 -B --mfold 10 30 -``` - -In this example, the program will identify variants in the `treatment.bam` file relative to the `control.bam` file. The BAM files are in BAM format, the genome size is set to 'hs' (human), and the name of the experiment is 'experiment1'. All tags that pass quality filters will be stored in a BAM file. - -## FAQs about `callvar` - -Q: What does `callvar` do? -A: `callvar` is a tool in the MACS3 suite that identifies genomic variants in peak regions. It is useful in ChIP-Seq analysis where the identification of genomic variants is required. - -Q: How do I use `callvar`? -A: You can use `callvar` by providing it with a treatment and control BAM file, a bed file containing peak regions, and various other options as required. See the [Example Usage](#example-usage) section above for an example of how to use `callvar`. - -## Troubleshooting `callvar` - -If you're having trouble using `callvar`, here are some things to try: - -- Make sure your input files are in the correct format. `callvar` requires BAM and bed files. -- Make sure the parameters you provide are appropriate for your data. Different parameters will yield different results. - -## Known issues or limitations of `callvar` - -As of now, there are no known issues or limitations with `callvar`. If you encounter a problem, please submit an issue on the MACS3 GitHub page. \ No newline at end of file +# Callvar + +## Overview +The `callvar` command is part of the MACS3 suite of tools and is used to call variants in given peak regions from the alignment BAM files. It is particularly useful in ChIP-Seq analysis where the identification of genomic variants is required. + +## Detailed Description + +The `callvar` command takes in treatment and control BAM files along with a bed file containing peak regions. The command identifies variants in these regions using a multi-process approach, greatly improving the speed and efficiency of variant calling. + +The `callvar` command assumes you have two types of BAM files. The first type, what we call `TREAT`, is from DNA enrichment assay such as ChIP-seq or ATAC-seq where the DNA fragments in the sequencing library are enriched in certain genomics regions with potential allele biases; the second type, called `CTRL` for control, is from genomic assay in which the DNA enrichment is less biased in multiploid chromosomes and more uniform across the whole genome (the later one is optional). In order to run `callvar`, please sort (by coordinates) and index the BAM files. +Example: + +1. Sort the BAM file: + `$ samtools sort TREAT.bam -o TREAT_sorted.bam` + `$ samtools sort CTRL.bam -o CTRL_sorted.bam` +2. Index the BAM file: + `$ samtools index TREAT_sorted.bam` + `$ samtools index CTRL_sorted.bam` +3. Make sure .bai files are available: + `$ ls TREAT_sorted.bam.bai` + `$ ls CTRL_sorted.bam.bai` + +To call variants: + `$ macs3 callvar -b peaks.bed -t TREAT_sorted.bam -c CTRL_sorted.bam -o peaks.vcf` + +## Command Line Options + +The command line options for `callvar` are defined in `/MACS3/Commands/callvar_cmd.py` and `/bin/macs3` files. Here is a brief overview of these options: + +### Input files arguments: +- `-b` or `--peak`: The peak regions in BED format, sorted by coordinates. This option is required. +- `-t` or `--treatment`: The ChIP-seq/ATAC-seq treatment file in BAM format, sorted by coordinates. Make sure the .bai file is avaiable in the same directory. This option is required. +- `-c` or `--control`: Optional control file in BAM format, sorted by coordinates. Make sure the .bai file is avaiable in the same directory. + +### Output arguments: +- `--outdir`: The directory for all output files to be written to. Default: writes output files to the current working directory. +- `-o` or `--ofile`: The output VCF file name. +- `--verbose`: The directory for all output files to be written to. Default: writes output files to the current working directory. + +### Variant calling arguments: +- `-g` or `--gq-hetero`: The Genotype Quality score (-10log10((L00+L11)/(L01+L00+L11))) cutoff for Heterozygous allele type. Default is 0, or there is no cutoff on GQ. +- `-G` or `--gq-homo`: The Genotype Quality score (-10log10((L00+L01)/(L01+L00+L11))) cutoff for Homozygous allele (not the same as reference) type. Default is 0, or there is no cutoff on GQ. +- `-Q`: The cutoff for the quality score. Only consider bases with quality score greater than this value. Default is 20, which means Q20 or 0.01 error rate. +- `-F` or `--fermi`: The option to control when to apply local assembly through Fermi. By default (set as 'auto'), while SAPPER detects any INDEL variant in a peak region, it will utilize Fermi to recover the actual DNA sequences to refine the read alignments. If set as 'on', Fermi will be always invoked. It can increase specificity however sensivity and speed will be significantly lower. If set as 'off', Fermi won't be invoked at all. If so, speed and sensitivity can be higher but specificity will be significantly lower. +- `--fermi-overlap`: The minimal overlap for fermi to initially assemble two reads. Must be between 1 and read length. A longer fermiMinOverlap is needed while read length is small (e.g. 30 for 36bp read, but 33 for 100bp read may work). Default is 30. +- `--top2alleles-mratio`: The reads for the top 2 most frequent alleles (e.g. a ref allele and an alternative allele) at a loci shouldn't be too few comparing to total reads mapped. The minimum ratio is set by this optoin. Must be a float between 0.5 and 1. Default:0.8 which means at least 80% of reads contain the top 2 alleles. +- `--altallele-count`: The count of the alternative (non-reference) allele at a loci shouldn't be too few. By default, we require at least two reads support the alternative allele. Default:2 +- `--max-ar`: The maximum Allele-Ratio allowed while calculating likelihood for allele-specific binding. If we allow higher maxAR, we may mistakenly assign some homozygous loci as heterozygous. Default:0.95 + +### Misc arguments: +- `-m` or `--multiple-processing`: The CPU used for mutliple processing. Please note that, assigning more CPUs does not guarantee the process being faster. Creating too many parrallel processes need memory operations and may negate benefit from multi processing. Default: 1 + + +## Example Usage + +Here is an example of how to use the `callvar` command: + +```bash +macs3 callvar -b peaks.bed -t treatment.bam -c control.bam -o experiment1 +``` + +In this example, the program will identify variants in the `treatment.bam` file relative to the `control.bam` file. The name of the experiment is 'experiment1'. All tags that pass quality filters will be stored in a BAM file. \ No newline at end of file -- 2.11.4.GIT