1 # MACS: Model-based Analysis for ChIP-Seq
3 ![Status](https://img.shields.io/pypi/status/macs3.svg) ![License](https://img.shields.io/github/license/macs3-project/MACS) ![Programming languages](https://img.shields.io/github/languages/top/macs3-project/MACS) ![CI x64](https://github.com/macs3-project/MACS/workflows/MACS3%20CI%20x64/badge.svg?branch=master) ![CI non x64](https://github.com/macs3-project/MACS/workflows/MACS3%20CI%20non%20x64/badge.svg?branch=master) ![CI Mac OS](https://github.com/macs3-project/MACS/actions/workflows/build-and-test-MACS3-macos.yml/badge.svg?branch=master)
6 download](https://img.shields.io/pypi/dm/macs3?label=pypi%20downloads)](https://pypistats.org/packages/macs3)
9 * Github: [![Github Release](https://img.shields.io/github/v/release/macs3-project/MACS)](https://github.com/macs3-project/MACS/releases)
10 * PyPI: [![PyPI Release](https://img.shields.io/pypi/v/macs3.svg) ![PyPI Python Version](https://img.shields.io/pypi/pyversions/MACS3) ![PyPI Format](https://img.shields.io/pypi/format/macs3)](https://pypi.org/project/macs3/)
11 * Anaconda:[![Anaconda-Server Badge](https://anaconda.org/macs3/macs3/badges/version.svg)](https://anaconda.org/macs3/macs3)
15 With the improvement of sequencing techniques, chromatin
16 immunoprecipitation followed by high throughput sequencing (ChIP-Seq)
17 is getting popular to study genome-wide protein-DNA interactions. To
18 address the lack of powerful ChIP-Seq analysis method, we presented
19 the **M**odel-based **A**nalysis of **C**hIP-**S**eq (MACS), for
20 identifying transcript factor binding sites. MACS captures the
21 influence of genome complexity to evaluate the significance of
22 enriched ChIP regions and MACS improves the spatial resolution of
23 binding sites through combining the information of both sequencing tag
24 position and orientation. MACS can be easily used for ChIP-Seq data
25 alone, or with a control sample with the increase of
26 specificity. Moreover, as a general peak-caller, MACS can also be
27 applied to any "DNA enrichment assays" if the question to be asked is
28 simply: *where we can find significant reads coverage than the random
31 ## Changes for MACS (3.0.1)
35 1) Fixed a bug that the `hmmatac` can't correctly save the digested
37 files. [#605](https://github.com/macs3-project/MACS/issues/605)
38 [#611](https://github.com/macs3-project/MACS/pull/611)
40 2) Applied a patch to remove cython requirement from the installed
41 system. (it's needed for building the
42 package). [#606](https://github.com/macs3-project/MACS/issues/606)
43 [#612](https://github.com/macs3-project/MACS/pull/612)
45 3) Relax the testing script while comparing the peaks called from
46 current codes and the standard peaks. To implement this, we added
47 'intersection' function to 'Regions' class to find the
48 intersecting regions of two Regions object (similar to PeakIO but
49 only recording chromosome, start and end positions). And we
50 updated the unit test 'test_Region.py' then implemented a script
51 'jaccard.py' to compute the Jaccard Index of two peak files. If
52 the JI > 0.99 we would think the peaks called and the standard
53 peaks are similar. This is to avoid the problem caused by
54 different Numpy/SciPy/sci-kit learn libraries, when certain peak
55 coordinates may have 10bps
56 difference. [#615](https://github.com/macs3-project/MACS/issues/615)
57 [#619](https://github.com/macs3-project/MACS/pull/619)
59 4) Due to [the changes in scikit-learn
60 1.3.0](https://scikit-learn.org/1.3/whats_new/v1.3.html), the way
61 hmmlearn 0.3 uses Kmeans will end up with inconsistent results
62 between sklearn <1.3 and sklearn >=1.3. Therefore, we patched the
63 class hmm.GaussianHMM and adjusted the standard output from
64 `hmmratac` subcommand. The change is based on [hmmlearn
65 PR#545](https://github.com/hmmlearn/hmmlearn/pull/545). The idea
66 is to do the random seeding of KMeans 10 times. Now the `hmmratac`
67 results should be more consistent (at least
68 JI>0.99). [#615](https://github.com/macs3-project/MACS/issues/615)
69 [#620](https://github.com/macs3-project/MACS/pull/620)
73 1) We added some dependencies to MACS3. `hmmratc` subcommand needs
74 `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
75 `scikit-learn` needs `scipy`. Since major releases have happened
76 for both`scipy` and `scikit-learn`, we have to set specific
77 version requirements for them in order to make sure the output
78 results from `hmmratac` are consistent.
80 2) We updated our documentation website using
81 Sphinx. https://macs3-project.github.io/MACS/
83 ## Changes for MACS (3.0.0)
85 1) Call variants in peak regions directly from BAM files. The
86 function was originally developed under code name SAPPER. Now
87 SAPPER has been merged into MACS as the `callvar` command. It can
88 be used to call SNVs and small INDELs directly from alignment
89 files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
90 the DNA sequence at the enriched genomic regions (binding sites or
91 accessible DNA) and to refine the alignment when necessary. We
92 added `simde` as a submodule in order to support fermi-lite
93 library under non-x64 architectures.
95 2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is a
96 dedicated software to analyze ATAC-seq data. The basic idea behind
97 HMMRATAC is to digest ATAC-seq data according to the fragment
98 length of read pairs into four signal tracks: short fragments,
99 mono-nucleosomal fragments, di-nucleosomal fragments and
100 tri-nucleosomal fragments. Then integrate the four tracks again
101 using Hidden Markov Model to consider three hidden states: open
102 region, nucleosomal region, and background region. The orginal
103 paper was published in 2019 written in JAVA, by Evan Tarbell. We
104 implemented it in Python/Cython and optimize the whole process
105 using existing MACS functions and hmmlearn. Now it can run much
106 faster than the original JAVA version. Note: evaluation of the
107 peak calling results is still underway.
109 3) Speed/memory optimization. Use the cykhash to replace python
110 dictionary. Use buffer (10MB) to read and parse input file (not
111 available for BAM file parser). And many optimization tweaks. We
112 added memory monitoring to the runtime messages.
114 4) R wrappers for MACS -- MACSr for bioconductor.
116 5) Code cleanup. Reorganize source codes.
120 7) Switch to Github Action for CI, support multi-arch testing
121 including x64, armv7, aarch64, s390x and ppc64le. We also test on
124 8) MACS tag-shifting model has been refined. Now it will use a naive
125 peak calling approach to find ALL possible paired peaks at + and -
126 strand, then use all of them to calculate the
127 cross-correlation. (a related bug has been fix
128 [#442](https://github.com/macs3-project/MACS/issues/442))
130 9) BAI index and random access to BAM file now is
131 supported. [#449](https://github.com/macs3-project/MACS/issues/449).
133 10) Support of Python > 3.10 [#498](https://github.com/macs3-project/MACS/issues/498)
135 11) The effective genome size parameters have been updated
136 according to deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
138 12) Multiple updates regarding dependencies, anaconda built, CI/CD
141 13) Cython 3 is supported.
143 14) Documentations for each subcommand can be found under /docs
147 1) Missing header line while no peaks can be called
148 [#501](https://github.com/macs3-project/MACS/issues/501)
149 [#502](https://github.com/macs3-project/MACS/issues/502)
151 2) Note: different numpy, scipy, sklearn may give slightly
152 different results for hmmratac results. The current standard
153 results for automated testing in `/test` directory are from Numpy
154 1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
158 The common way to install MACS is through
159 [PYPI](https://pypi.org/project/macs3/)). Please check the
160 [INSTALL](docs/INSTALL.md) document for detail.
162 MACS3 has been tested using GitHub Actions for every push and PR in
163 the following architectures:
165 * x86_64 (Ubuntu 22, Python 3.9, 3.10, 3.11, 3.12)
166 * aarch64 (Ubuntu 22, Python 3.10)
167 * armv7 (Ubuntu 22, Python 3.10)
168 * ppc64le (Ubuntu 22, Python 3.10)
169 * s390x (Ubuntu 22, Python 3.10)
170 * Apple chips (Mac OS 13, Python 3.9, 3.10, 3.11, 3.12)
172 In general, you can install through PyPI as `pip install macs3`. To
173 use virtual environment is highly recommended. Or you can install
174 after unzipping the released package downloaded from Github, then use
175 `pip install .` command. Please note that, we haven't tested
176 installation on any Windows OS, so currently only Linux and Mac OS
177 systems are supported. Also, for aarch64, armv7, ppc64le and s390x,
178 due to some unknown reason potentially related to the scientific
179 calculation libraries MACS3 depends on, such as Numpy, Scipy,
180 hmm-learn, scikit-learn, the results from `hmmratac` subcommand may
181 not be consistent with the results from x86 or Apple chips. Please be
186 Example for regular peak calling on TF ChIP-seq:
188 `macs3 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01`
190 Example for broad peak calling on Histone Mark ChIP-seq:
192 `macs3 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1`
194 Example for peak calling on ATAC-seq (paired-end mode):
196 `macs3 callpeak -f BAMPE -t ATAC.bam -g hs -n test -B -q 0.01`
198 There are currently 14 functions available in MACS3 serving as
199 sub-commands. Please click on the link to see the detail description
202 Subcommand | Description
203 -----------|----------
204 [`callpeak`](docs/callpeak.md) | Main MACS3 Function to call peaks from alignment results.
205 [`bdgpeakcall`](docs/bdgpeakcall.md) | Call peaks from bedGraph file.
206 [`bdgbroadcall`](docs/bdgbroadcall.md) | Call nested broad peaks from bedGraph file.
207 [`bdgcmp`](docs/bdgcmp.md) | Comparing two signal tracks in bedGraph format.
208 [`bdgopt`](docs/bdgopt.md) | Operate the score column of bedGraph file.
209 [`cmbreps`](docs/cmbreps.md) | Combine bedGraph files of scores from replicates.
210 [`bdgdiff`](docs/bdgdiff.md) | Differential peak detection based on paired four bedGraph files.
211 [`filterdup`](docs/filterdup.md) | Remove duplicate reads, then save in BED/BEDPE format file.
212 [`predictd`](docs/predictd.md) | Predict d or fragment size from alignment results. In case of PE data, report the average insertion/fragment size from all pairs.
213 [`pileup`](docs/pileup.md) | Pileup aligned reads (single-end) or fragments (paired-end)
214 [`randsample`](docs/randsample.md) | Randomly choose a number/percentage of total reads, then save in BED/BEDPE format file.
215 [`refinepeak`](docs/refinepeak.md) | Take raw reads alignment, refine peak summits.
216 [`callvar`](docs/callvar.md) | Call variants in given peak regions from the alignment BAM files.
217 [`hmmratac`](docs/hmmratac.md) | Dedicated peak calling based on Hidden Markov Model for ATAC-seq data.
219 For advanced usage, for example, to run `macs3` in a modular way,
220 please read the [advanced usage](docs/Advanced_Step-by-step_Peak_Calling.md). There is a
221 [Q&A](docs/qa.md) document where we collected some common questions
226 Please read our [CODE OF CONDUCT](CODE_OF_CONDUCT.md) and [How to
227 contribute](CONTRIBUTING.md) documents. If you have any questions,
228 suggestion/ideas, or just want to have conversions with developers and
229 other users in the community, we recommend using the [MACS
230 Discussions](https://github.com/macs3-project/MACS/discussions)
231 instead of posting to our
232 [Issues](https://github.com/macs3-project/MACS/issues) page.
236 MACS3 project is sponsored by
237 [CZI EOSS](https://chanzuckerberg.com/eoss/). And we particularly want
238 to thank the user community for their supports, feedbacks and
239 contributions over the years.
243 2008: [Model-based Analysis of ChIP-Seq
244 (MACS)](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-9-r137)
246 ## Other useful links
248 * [Cistrome](http://cistrome.org/)
249 * [bedTools](http://code.google.com/p/bedtools/)
250 * [UCSC toolkits](http://hgdownload.cse.ucsc.edu/admin/exe/)
251 * [deepTools](https://github.com/deeptools/deepTools/)