1 2024-09-06 Tao Liu <vladimir.liu@gmail.com>
6 1) Introduce a new emission model for the `hmmratac` function. Now
7 users can choose the simpler Poisson emission `--hmm-type poisson`
8 instead of the default Gaussian emission. As a consequence, the
9 saved HMM model file in json will include the hmm-type information
10 as well. Note that in order to be compatible with the HMM model
11 file from previous version, if there is no hmm-type information in
12 the model file, the hmm-type will be assigned as gaussian. #635
14 2) `hmmratac` now output narrowPeak format output. The summit
15 position and the peak score columns reported in the narrowPeak
16 output represents the position with highest foldchange value
17 (pileup vs average background).
19 3) Add `--cutoff-analysis-steps` and `--cutoff-analysis-max` for
20 `callpeak`, `bdgpeakcall`, and `hmmratac` so that we can
21 have finer resolution of the cutoff analysis report. #636 #642
23 4) Reduce memory usage of `hmmratac` during decoding step, by
24 writing decoding results to a temporary file on disk (file
25 location depends on the environmental TEMP setting), then loading
26 it back while identifying state pathes. This change will decrease
27 the memory usage dramatically. #628 #640
29 5) Fix instructions for preparing narrowPeak files for uploading
30 to UCSC browser, with the `--trackline` option in `callpeak`. #653
32 6) For gappedPeak output, set thickStart and thickEnd columns as
33 0, according to UCSC definition.
37 1) Use `-O3` instead of `-Ofast` for compatibility. #637
41 1) Update instruction to install macs3 through conda/bioconda
43 2) Reorganize MACS3 docs and publish through
44 https://macs3-project.github.io/MACS
46 3) Description on various file formats used in MACS3.
48 2024-02-19 Tao Liu <vladimir.liu@gmail.com>
53 1) Fixed a bug that the `hmmatac` can't correctly save the
54 digested signal files. #605 #611
56 2) Applied a patch to remove cython requirement from the installed
57 system. (it's needed for building the package). #606 #612
59 3) Relax the testing script while comparing the peaks called from
60 current codes and the standard peaks. To implement this, we added
61 'intersection' function to 'Regions' class to find the
62 intersecting regions of two Regions object (similar to PeakIO but
63 only recording chromosome, start and end positions). And we
64 updated the unit test 'test_Region.py' then implemented a script
65 'jaccard.py' to compute the Jaccard Index of two peak files. If
66 the JI > 0.99 we would think the peaks called and the standard
67 peaks are similar. This is to avoid the problem caused by
68 different Numpy/SciPy/sci-kit learn libraries, when certain peak
69 coordinates may have 10bps difference. #615 #619
71 4) Due to the changes in scikit-learn 1.3.0:
72 https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
73 0.3 uses Kmeans will end up with inconsistent results between
74 sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
75 hmm.GaussianHMM and adjusted the standard output from `hmmratac`
76 subcommand. The change is based on
77 https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
78 the random seeding of KMeans 10 times. Now the `hmmratac` results
79 should be more consistent (at least JI>0.99). #615 #620
83 1) We added some dependencies to MACS3. `hmmratc` subcommand needs
84 `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
85 `scikit-learn` needs `scipy`. Since major releases have happened
86 for both`scipy` and `scikit-learn`, we have to set specific
87 version requirements for them in order to make sure the output
88 results from `hmmratac` are consistent.
90 2) We updated our documentation website using
91 Sphinx. https://macs3-project.github.io/MACS/
93 2023-11-15 Tao Liu <vladimir.liu@gmail.com>
96 1) Call variants in peak regions directly from BAM files. The
97 function was originally developed under code name SAPPER. Now
98 SAPPER has been merged into MACS as the `callvar` command. It can
99 be used to call SNVs and small INDELs directly from alignment
100 files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
101 the DNA sequence at the enriched genomic regions (binding sites or
102 accessible DNA) and to refine the alignment when necessary. We
103 added `simde` as a submodule in order to support fermi-lite
104 library under non-x64 architectures.
106 2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
107 a dedicated software to analyze ATAC-seq data. The basic idea
108 behind HMMRATAC is to digest ATAC-seq data according to the
109 fragment length of read pairs into four signal tracks: short
110 fragments, mono-nucleosomal fragments, di-nucleosomal fragments
111 and tri-nucleosomal fragments. Then integrate the four tracks
112 again using Hidden Markov Model to consider three hidden states:
113 open region, nucleosomal region, and background region. The
114 orginal paper was published in 2019 written in JAVA, by Evan
115 Tarbell. We implemented it in Python/Cython and optimize the whole
116 process using existing MACS functions and hmmlearn. Now it can run
117 much faster than the original JAVA version. Note: evaluation of
118 the peak calling results is still underway.
120 3) Speed/memory optimization. Use the cykhash to replace python
121 dictionary. Use buffer (10MB) to read and parse input file (not
122 available for BAM file parser). And many optimization tweaks. We
123 added memory monitoring to the runtime messages.
125 4) R wrappers for MACS -- MACSr for bioconductor.
127 5) Code cleanup. Reorganize source codes.
131 7) Switch to Github Action for CI, support multi-arch testing
132 including x64, armv7, aarch64, s390x and ppc64le. We also test on
135 8) MACS tag-shifting model has been refined. Now it will use a
136 naive peak calling approach to find ALL possible paired peaks at +
137 and - strand, then use all of them to calculate the
138 cross-correlation. (a related bug has been fix
139 [#442](https://github.com/macs3-project/MACS/issues/442))
141 9) BAI index and random access to BAM file now is
142 supported. [#449](https://github.com/macs3-project/MACS/issues/449).
144 10) Support of Python > 3.10
145 [#498](https://github.com/macs3-project/MACS/issues/498)
147 11) The effective genome size parameters have been updated
149 deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
151 12) Multiple updates regarding dependencies, anaconda built, CI/CD
154 13) Cython 3 is supported.
156 14) Documentations for each subcommand can be found under /docs
160 1) Missing header line while no peaks can be called
161 [#501](https://github.com/macs3-project/MACS/issues/501)
162 [#502](https://github.com/macs3-project/MACS/issues/502)
164 2) Note: different numpy, scipy, sklearn may give slightly
165 different results for hmmratac results. The current standard
166 results for automated testing in `/test` directory are from Numpy
167 1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
169 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
174 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
177 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
182 1) MACS2 has been tested on multiple architectures to make sure it
183 can successfully generate consistent results. Currently the
184 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
185 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
186 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
187 and #370. The lesson is that if the project is built on Cython and
188 is aimed at memory efficiency, we should specifically define all
189 int/float types in pyx files such as int8_t or uint32_t using
190 either libc or numpy (c version) instead of relying on Cython
191 types such as short, long, double.
193 2) MACS2 setup script will check numpy and install numpy if
194 necessary. PR #378, issue #364
196 3) `bdgbroadcall` command will correctly add the score column (5th
197 column). The score (5th) column contains 10 times of the average
198 score in the broad region. PR #373, issue #362
200 4) The missing test on `bdgopt` subcommand has been added. PR #363
202 5) The obsolete option `--ratio` from `callpeak` subcommand has
203 been removed. PR #369, issue #366
205 6) Fixed the incorrect description in README on the 'maximum
206 length of broad region is 4 times of d' to 'maximum gap for
207 merging broad regions is 4 times of tag size by default'. PR #380,
212 1) CODE OF CONDUCT document has been added to MACS2 github
215 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
220 1) Speed up MACS2. Some programming tricks and code cleanup. The
221 filter_dup function replaces separate_dups. The later one was
222 implemented for potentially putting back duplicate reads in
223 certain downstream analysis. However such analysis hasn't been
224 implemented. Optimize the speed of writing bedGraph
225 files. Optimize BAM and BAMPE parsing with pointer casting instead
228 2) The comment lines in the headers of BED or SAM files will be
229 correctly skipped. However, MACS2 won't check comment lines in the
234 1) Cutoff-analysis in callpeak command. #341
236 2) Issues related to SAMParser and three ELAND Parsers are
241 1) cmdlinetest script in test/ folder has been updated to: 1. test
242 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
243 and after the error or warning message during tests; 3. output
244 only the first 10 lines if the difference between test result and
245 standard result can be found; 4. prockreport monitor CPU time and
246 memory usage in 1 sec interval -- a bit more accurate.
248 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
250 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
251 MACS version 2.2.5 (Py3 speed up)
255 1) *Github code only and Not included in MACS2 release* New
256 testing data for performance test. An subsampled ENCODE2 CTCF
257 ChIP-seq dataset, including 5million ChIP reads and 5 million
258 control reads, has been included in the test folder for testing
259 CPU and memory usage (i.e. 5M test). Several related scripts ,
260 including `prockreport` for output cpu memory usage, `pyprofile`
261 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
264 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
265 The old hashtable.pyx implementation copied from Pandas (very old
266 version) doesn't work well in Python3+Cython. It slows down the
267 pqtable checkup using the identical Cython codes as in
268 v2.1.4. While running 5M test, the `__getitem__` function in the
269 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
270 148.6s with the same number of calls in MACS2 v2.2.4. As a
271 consequence, the standard python dictionary implementation has
272 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
273 faster than py2 version, but uses a bit more memory. In general,
274 v2.2.5 can finish 5M reads test in 20% less time than MACS2
275 v2.1.4, but use 15% more memory.
279 1) More Python3 related fixes, e.g. the return value of keys from
283 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
284 MACS version 2.2.4 (Python3)
288 1) First Python3 version MACS2 released.
290 2) Version number 2.2.X will be used for MACS2 in Python3, in
293 3) More comprehensive test.sh script to check the consistency of
294 results from Python2 version and Python3 version.
296 4) Simplify setup.py script since the newest version transparently
297 supports cython. And when cython is not installed by the user,
298 setup.py can still compile using only C codes.
300 5) Fix Signal.pyx to use np.array instead of np.mat.
302 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
307 Github Actions is used together with Travis CI for testing and
314 1) #318 Random score in bdgdiff output. It turns out the sum_v is
315 not initialized as 0 before adding. Potential bugs are fixed in
316 other functions in ScoreTrack and CallPeakUnit codes.
318 2) #321 Cython dependency in setup.py script is removed. And place
319 'cythonzie' call to the correct position.
321 3) A typo is fixed in Github Actions script.
323 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
328 1) Support Docker auto-deploy. PR #309
330 2) Support Travis CI auto-testing, update unit-testing
331 scripts, and enable subcommand testing on small datasets.
333 3) Update README documents. #297 PR #306
335 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
336 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
338 5) `--d-min` option is added in `callpeak` and `predictd`, to
339 exclude predictions of fragment size smaller than the given
340 value. Merged from PR #267 @shouldsee.
342 6) `--buffer-size` option is added in `predictd`, `filterdup`,
343 `pileup` and `refinepeak` subcommands. Users can use this option
344 to decrease memory usage while there are a large number of contigs
345 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
346 `pileup` and `refinepeak` will suggest users to tweak
347 `--buffer-size` while catching a MemoryError. #313 PR #314
351 1) #265 Fixed a bug where the pseudocount hasn't been applied
352 while calculating p-value score in ScoreTrack object.
354 2) Fixed bdgbroadcall so that it will report those broad peaks
355 without strong peak inside, a consistent behavior as `callpeak
358 3) Rename COPYING to LICENSE.
360 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
365 1) Added missing BEDPE support. And enable the support for BAMPE
366 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
367 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
368 will pile up the whole fragment defined by mapping locations of
369 the left end and right end of each read pair. Thank @purcaro
371 2) Added options to callpeak command for tweaking max-gap and
372 min-len during peak calling. Thank @jsh58!
374 3) The callpeak option "--to-large" option is replaced with
377 4) The randsample option "-t" has been replaced with "-i".
381 1) Fixed memory issue related to #122 and #146
383 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
385 3) Fixed a bug while setting commandline qvalue cutoff.
387 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
389 5) Fixed the calculation of average fragment length for paired-end
392 6) Fixed bugs caused by khash while computing p/q-value and log
393 likelihood ratios. Thank @jsh58
395 7) More spelling tweaks in source code. Thank @mr-c
397 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
398 MACS version 2.1.1 20160309
402 * Fixed spelling. Merged pull request #120. Thank @mr-c!
404 * Change filtering criteria for reading BAM/SAM files
406 Related to callpeak and filterdup commands. Now the
407 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
408 still be read although MACS2 may decide them as duplicates
409 later. Related to old issue #33. Sorry I forgot to address it for
412 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
413 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
417 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
418 the former option is not supported by older GCC. Related to issues
421 2) Issue #108 is fixed. If no peak can be found in a chromosome,
422 the PeakIO won't throw an error.
428 a) A more flexible format, BEDPE, is supported. Now users can
429 define the left and right position of the ChIPed fragment, and
430 MACS2 will skip model building and directly pileup the
431 fragments. Related to issue #112.
433 b) The 'tempdir' can be specified, to save cached pileup
434 tracks. Originially, the temporary files were stored in
435 /tmp. Thank @daler! Related to issues #97 and #105.
439 New operations are added, to calculate the maximum or minimum value between
440 values in BEDGRAPH and given value.
444 New method is added, to calculate the maximum value between values
445 defined in two BEDGRAPH files.
447 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
448 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
452 1) Fix a bug while dealing with some chromosomes only containing
453 one read (pair). The size of dup_plus/dup_minus arrays after
454 filtering dups should +1.
456 2) Fix a bug related to the broad peak calling function in
457 previous versions. The gaps were miscalculated, so segmented weak
458 broad calls may be reported, and sometimes you would see peaks
459 with lower than cutoff values in the output files.
461 3) "Potentially" Fixed issue #105 on temporary cache files, need
465 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
466 MACS version 2.1.0 20150731 (tag:rc)
470 1) Fixed issue #76: information about broad/narrow cutoff will be
473 2) Fixed issue #79: bdgopt extparam option is fixed.
475 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
476 for filterdup command.
478 4) Fixed issue #78, #88 and similar issue reported in MACS google
479 group: MACS2 now can correctly deal with multiple alignment files
480 for -t or -c. The 'finalize' function will be correctly
481 called. Multiple files option is enabled for filterdup,
482 randsample, predictd, pileup and refinepeak commands.
484 5) A related issue to #88, when BAMPE mode is used, PE pairs will
485 be sorted by leftmost then rightmost ends.
487 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
488 array. This will cause 'callpeak --nolambda' hang forever while
489 calculating pvalues and qvalues.
491 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
492 MACS version 2.1.0 20150420 (tag:rc)
496 1) bdgopt: some convenient functions to modify bedGraph files.
498 2) cmbreps: Combine scores from two replicates. Including three
499 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
500 method to combine two p-value scores. After that, user can use
501 bdgpeakcall to call peaks on combined scores.
505 1) callpeak and bdgpeakcall now can try to analyze the
506 relationship between p-values and number/length of peaks then
507 generate a summary to help users decide an appropriate cutoff.
509 2) callpeak now can accept fold-enrichment cutoff as a filter for
514 Now MACS2 runs about 3X as fast as previous version. Trade
515 clean python codes for speed... Now while processing 50M ChIP vs
516 50M control, it will take only 10 minutes.
520 1) Sampling function in BAMPE mode.
522 2) Callpeak while there are >= 2 input files for -t or -c.
524 3) While reading BAM/SAM, those secondary or supplementary
525 alignments will be correctly skipped.
527 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
528 option that MACS2 will discard those SAM/BAM alignments with bit
529 1024 no matter how --keep-dup is set.
531 5) Fixed issue #49: setuptools is used intead of distutils
533 6) Fixed issue #51: fix the problem when using --trackline
534 argument when control file is absent.
536 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
537 read mapped to minus strand. Previous implementation will find
538 incorrect 5' end if there is indel in alignment.
540 8) Fixed issue #56: An incorrect sorting method used for BAMPE
541 mode which will cause incorrect filtering of duplicated reads. Now
544 9) Issue #63: Merged from jayhesselberth@github, extsize now can
547 10) Issue #71: Merged from aertslab@github, close file descriptor
548 after creating them with mkstemp().
550 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
551 MACS version 2.1.0 20140616 (tag:rc)
555 "--ratio" is added to manually assign the scaling factor of ChIP
556 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
557 implementing the patch file!
559 "--shift" is added to move cutting ends (5' end of reads) around,
560 in order to process DNAse-Seq data, e.g., use "--shift -100
561 --extsize 200" to get 200bps fragments around 5' ends. For general
562 ChIP-Seq data analysis, this option should be always set as
563 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
566 ** Do not output negative fragment size from cross-correlation
567 analysis. Thank Alvin Qin for the feedback!
569 ** --half-ext and --control-shift are removed. For complex read
570 shifting and extending, combine '--shift' and '--extsize'
571 options. For comparing two conditions, use 'bdgdiff' module
574 ** a bug is fixed to output the last pileup value in bdg file
579 A 'dry-run' option is added to only output numbers, including the
580 number of allowed duplicates, the total number of reads before and
581 after filtering duplicates and the estimated duplication
582 rate. Thank John Urban for the suggestion!
585 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
586 MACS version 2.0.10 20131216 (tag:alpha)
590 * We changed license from Artistic License to 3-clauses BSD license.
592 Yes. Simpler the better.
594 * Process paired-end data with "-f BAMPE" without control
596 * GappedPeak output for --broad option has been fixed again to be
597 consistent with official UCSC format. We add 1bp pseudo-block to
598 left and/or right of broad region when necessary, so that you can
599 virtualize the regions without strong enrichment inside
600 successfully. In downstream analysis except for virtualization,
601 you may need to remove all 1bps blocks from gappedPeak file.
603 * diffpeak subcommand is temporarily disabled. Till we
606 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
607 MACS version 2.0.10 20131028 (tag:alpha)
609 * callpeak --call-summits improvement
611 The smoothing window length has been fixed as fragment length
612 instead of short read length. The larger smoothing window will
613 grant better smoothing results and better sub-peak summits
616 * --outdir and --ofile options for almost all commands
618 Thank Björn Grüning for initially implementing these options!
619 Now, MACS2 will save results into a specified
620 directory by '--outdir' option, and/or save result into a
621 specified file by '--ofile' option. Note, in case '--ofile' is
622 available for a subcommand, '-o' now has been adjusted to be the
623 same as '--ofile' instead of '--o-prefix'.
625 Here is the list of changes. For more detail, use 'macs2 xxx -h'
628 ** callpeak: --outdir
629 ** diffpeak: Not implemented
630 ** bdgpeakcall: --outdir and --ofile
631 ** bdgbroadcall: --outdir and --ofile
632 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
633 and the order of arguments for --ofile must be the same as for -m.
634 ** bdgdiff: --outdir and --ofile
635 ** filterdup: --outdir
637 ** randsample: --outdir
638 ** refinepeak: --outdir and --ofile
641 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
642 MACS version 2.0.10 20130915 (tag:alpha)
644 * callpeak Added a new option --buffer-size
646 This option is to tweak a previously hidden parameter that
647 controls the steps to increase array size for storing alignment
648 information. While in some rare cases, the number of
649 chromosomes/contigs/scaffolds is huge, the original default
650 setting will cause a huge memory waste. In these cases, we
651 recommend to decrease --buffer-size (e.g., 1000) to save memory,
652 although the decrease will slow process to read alignment files.
654 * an optimization to speed up pvalue-qvalue statistics
656 Previously, it took a hour to prepare p-q-table for 65M vs 65M
657 human TF library, and now it will take 10 minutes. It was due to a
658 single line of code to get a value from a numpy array ...
662 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
663 MACS version 2.0.10 20130731 (tag:alpha)
665 * callpeak --call-summits
667 Fix bugs causing callpeak --call-summits option generating extra
668 number of peaks and inconsistent peak boundaries comparing to
669 default option. Thank Ben Levinson!
673 Fix bugs causing bdgcmp output logLR all in positive values. Now
674 'depletion' can be correctly represented as negative values.
678 Fix the behavior of bdgdiff module. Now it can take four
679 bedGraph files, then use logLR as cutoff to call differential
680 regions. Check command line of bdgdiff for detail.
682 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
683 MACS version 2.0.10 20130713 (tag:alpha)
685 * fix bugs while output broadPeak and gappedPeak.
687 Note. Those weak broad regions without any strong enrichment
688 regions inside won't be saved in gappedPeak file.
690 * bdgcmp -T and -C are merged into -S and description is updated.
692 Now, you can use it to override SPMR values in your input for
693 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
694 statistics will cause weird results ( in most cases, lower
695 significancy), and won't be consistent with MACS2 callpeak
696 behavior. So if you have SPMR bedGraphs, input the smaller/larger
697 sample size in MILLION according to 'callpeak --to-large' option.
699 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
700 MACS version 2.0.10 20130710 (tag:alpha)
702 * fix BED style output format of callpeak module:
704 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
705 the output. Old BED format file won't be saved.
707 2) with --broad: broadPeak (BED6+3) for broad region and
708 gappedPeak (BED12+3) for chained enriched regions will be the
709 output. Old BED format, narrowPeak format, summit file won't be
712 * bdgcmp now can accept list of methods to calculate scores. So
713 you can run it once to generate multiple types of scores. Thank
714 Jon Urban for this suggestion!
716 * C codes are re-generated through Cython 0.19.1.
718 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
719 MACS version 2.0.10 20130520 (tag:alpha)
721 * broad peak calling modules are modified in order to report all
722 relexed regions even there is no strong enrichment inside.
724 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
725 MACS version 2.0.10 20130501 (tag:alpha)
727 * Memory usage is decreased to about 1/4-1/5 of previous usage
728 Now, the internal data structure and algorithm are both
729 re-organized, so that intermediate data wouldn't be saved in
730 memory. Intead they will be calculated on the fly. New MACS2 will
731 spend longer time (1.5 to 2 times) however it will use less memory
732 so can be more usable on small mem servers.
734 * --seed option is added to callpeak and randsample commands
735 Thank Mathieu Gineste for this suggestion!
737 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
738 MACS version 2.0.10 20130306 (tag:alpha)
740 * diffpeak module New module to detect differential binding sites
741 with more statistics.
743 * Introduced --refine-peaks
744 Calculates reads balancing to refine peak summits
746 * Ouput file names prefix
747 Correct encodePeak to narrowPeak, broadPeak to bed12.
749 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
750 MACS version 2.0.10 (tag:alpha not released)
752 * Introduced BAMPEParser
753 Reads PE data directly, requires bedtools for now
755 * Introduced --call-summits
756 Uses signal processing methods to call overlapping peaks
758 * Added --no-trackline
759 By default, files have descriptive tracklines now
761 * new refinepeak command (experimental)
762 This new function will use a similar method in SPP (wtd), to
763 analyze raw tag distribution in peak region, then redefine the
764 peak summit where plus and minus tags are evenly distributed
767 * Changes to output *
768 cPeakDetect.pyx has full support for new print/write methods and
769 --call-peaks, BAMPEParser, and use of paired-end data
771 * Parser optimization
773 cParser.pyx is rewritten to use io.BufferedReader to speed
774 up. Speed is doubled.
776 Code is reorganized -- most of functions are inherited from
779 * Use cross-correlation to calculate fragment size
781 First, all pairs will be used in prediction for fragment
782 size. Previously, only no more than 1000 pairs are used. Second,
783 cross-correlation is used to find the best phase difference
784 between + and - tag pileups.
786 * Speed up p-value and q-value calculation
788 This part is ten times faster now. I am using a dictionary to
789 cache p-value results from Poisson CDF function. A bit more memory
790 will be used to increase speed. I hope this dictionary would not
791 explode since the possible pairs of ChIP signal and control lambda
792 are hugely redundant. Also, I rewrited part of q-value
795 * Speed up peak detection
797 This part is about hundred of times faster now. Optimizations
798 include using Numpy functions as much as possible, and making loop
799 body as small as possible.
801 * Post-processing on differential calls
803 After macs2diff finds differential binding sites between two
804 conditions, it will try to annotate the peak calls from one of two
805 conditions, describe the changes ...
807 * Fragment size prediction in macs2diff
809 Now by default, macs2diff will try to use the average fragment
810 size from both condition 1 and condition 2 for tag extension and
811 peak calling. Previously, by default, it will use different sizes
812 unless --nomodel is specified.
814 Technically, I separate model building processes out. So macs2diff
815 will build fragment sizes for condition 1 and 2 in parallel (2
816 processes maximum), then perform 4-way comparisons in parallel (4
821 Combine two p/qscore tracks together. At regions where condition 1
822 is higher than condition 2, score would be positive, otherwise,
825 * SAMParser and BAMParser
827 Bug fixed for paired-end sequencing data.
831 Fixed a bug while calling peaks from BedGraph file. It previously
832 mistakenly output same peaks multiple times at the end of
835 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
836 MACS version 2.0.9 (tag:alpha)
838 * Auto fixation on predicted d is turned off by default!
840 Previous --off-auto is now default. MACS will not automatically
841 fix d less than 2 times of tag size according to
842 --shiftsize. While tag size is getting longer nowadays, it would
843 be easier to have d less than 2 times of tag size, however d may
844 still be meaningful and useful. Please judge it using your own
849 Now, the default scaling while treatment and input are unbalanced
850 has been adjusted. By default, larger sample will be scaled down
851 linearly to match the smaller sample. In this way, background
852 noise will be reduced more than real signals, so we expect to have
853 more specific results than the other way around (i.e. --to-large
856 Also, an alternative option to randomly sample larger data
857 (--down-sample) is provided to replace default linear
858 scaling. However, this option will cause results irresproducible,
863 A new script 'randsample' is added, which can randomly sample
864 certain percentage or number of tags.
868 Now, MACS will decide peak summits according to pileup height
869 instead of qvalue scores. In this way, the summit may be more
874 MACS calculate qvalue scores as differential scores. When compare
875 two conditions (saying A and B), the maximum qscore for comparing
876 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
877 will be computed. If maxqscore_a2b is bigger, the diff score is
878 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
880 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
881 MACS version 2.0.8 (tag:alpha)
883 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
885 New script bdgbroadcall and the extra option '--broad' for macs2
886 script, can be used to call broad regions with a loose cutoff to
887 link nearby significant regions. The output is represented as
890 * MACS2/IO/cScoreTrack.pyx
892 Fix q-value calculation to generate forcefully monotonic values.
894 * bin/eland*2bed, bin/sam2bed and bin/filterdup
896 They are combined to one more powerful script called
897 "filterdup". The script filterdup can filter duplicated reads
898 according to sequencing depth and genome size. The script can also
899 convert any format supported by MACS to BED format.
901 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
902 MACS version 2.0.7 (tag:alpha)
904 * bin/macsdiff renamed to bin/bdgdiff
906 Now this script will work as a low-level finetuning tool as bdgcmp
911 A new script to take treatment and control files from two
912 condition, calculate fragment size, use local poisson to get
913 pvalues and BH process to get qvalues, then combine 4-ways result
914 to call differential sites.
916 This script can use upto 4 cpus to speed up 4-ways calculation. (
917 I am trying multiprocessing in python. )
919 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
920 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
921 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
923 All above files are modified for the new macs2diff script.
925 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
927 Now q-value 0.01 is the default cutoff. If -p is specified,
928 p-value cutoff will be used instead.
930 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
931 MACS version 2.0.6 (tag:alpha)
935 A script to call differential regions. A naive way is introduced
936 to find the regions where:
938 1. signal from condition 1 is larger than input 1 and condition 2 --
939 unique region in condition 1;
940 2. signal from condition 2 is larger than input 2 and condition 1
941 -- unique region in condition 2;
942 3. signal from condition 1 is larger than input 1, signal from
943 condition 2 is larger than input 2, however either signal from
944 condition 1 or 2 is not larger than the other.
946 Here 'larger' means the pvalue or qvalue from a Poisson test is
947 under certain cutoff.
949 (I will make another script to wrap up mulitple scripts for
950 differential calling)
952 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
953 MACS version 2.0.5 (tag:alpha)
955 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
958 Use hash to store peak information. Add back the feature to deal
959 with data without control.
961 Fix bug which incorrectly allows small peaks at the end of
964 * bin/bdgpeakcall, bin/bdgcmp
966 Fix bugs. bdgpeakcall can output encodePeak format.
968 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
969 MACS version 2.0.4 (tag:alpha)
973 Fix a bug, correctly assign lambda_bg while --to-small is
974 set. Thanks Junya Seo!
976 Add rank and num of bp columns to pvalue-qvalue table.
980 Fix bugs to correctly deal with peakless chromosomes. Thanks
983 Use AFDR for independent tests instead.
987 Now MACS can output peak coordinates together with pvalue, qvalue,
988 summit positions in a single encodePeak format (designed for
989 ENCODE project) file. This file can be loaded to UCSC
990 browser. Definition of some specific columns are: 5th:
991 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
992 -log10qvalue, 10th: relative summit position to peak start.
995 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
996 MACS version 2.0.3 (tag:alpha)
998 * Rich output with qvalue, fold enrichment, and pileup height
1000 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1003 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1005 Now we have a similiar xls output file as before. The differences
1006 from previous file are:
1008 1. Summit now is absolute summit, instead of relative summit
1010 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1011 pileup at the peak summit;
1012 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1013 5.00 means 1e-5, simple and less confusing.
1014 4. FDR column becomes '-log10(qvalue)' column.
1015 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1016 the values at the peak summit.
1018 * Extra output files
1020 NAME_pqtable.txt contains pvalue and qvalue relationships.
1022 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1023 and -log10qvalue scores in BedGraph format. Nearby regions with
1024 the same value are not merged.
1026 * Separation of FeatIO.py
1028 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1029 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1030 implemented to store pileup, local lambda, pvalue, and qvalue
1031 alltogether in cScoreTrack.pyx.
1033 * Experimental option --half-ext
1035 Suggested by NPS algorithm, I added an experimental option
1036 --half-ext to let MACS only extends ChIP fragment around its
1037 middle point for only 1/2 d.
1039 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1040 MACS version 2.0.2 (tag:alpha)
1044 Add an error check to see if there is no common chromosome names
1045 from treatment file and control file
1047 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1049 Reduce memory usage by removing deepcopy() calls.
1051 * Modify README documents and others.
1053 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1054 MACS Version 2.0.1 (tag:alpha)
1056 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1058 Jie suggested me a brilliant simple method to pileup fragments
1059 into bedGraph track. It works extremely faster than the previous
1060 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1061 large local lambda calculation in MACSv2 now. Now I generate three
1062 bedGraphs for d-size local bias, slocal-size and llocal-size local
1063 bias, and calculate the maximum local bias as local lambda
1066 Minor: add_loc in bedGraphTrackI now can correctly merge the
1067 region with its preceding region if their value are the same.
1071 Add an option to shift control tags before extension. By default,
1072 control tags will be extended to both sides regardless of strand
1075 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1076 MACS Version 2.0.0 (tag:alpha)
1078 * Use bedGraph type to store data internally and externally.
1080 We can have theoretically one-basepair resolution profiles. 10
1081 times smaller in filesize and even smaller after converting to
1082 bigWig for visualization.
1084 * Peak calling process modified. Better peak boundary detection.
1086 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1087 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1088 one will be averaged to d size) Then calculate the maximum value
1089 of these two tracks and a global background, to have a
1090 local-lambda bedGraph.
1092 Use -10log10poisson_pvalue as scores to generate a score track
1093 before peak calling.
1095 A general peak calling based on a score cutoff, min length of peak
1096 and max gap between nearby peaks.
1100 Wiggle file output is removed. Now we only support bedGraph
1101 output. The generation of bedGraph is highly recommended since it
1102 will not cost extra time. In other words, bedGraph generation is
1103 internally run even you don't want to save bedGraphs on disk, due
1104 to the peak calling algorithm in MACS v2.
1108 We now can calculate poisson pvalue in log space so that the score
1109 (-10*log10pvalue) will not have a upper limit of 3100 due to
1110 precision of float number.
1112 * Cython is adopted to speed up Python code.
1114 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1117 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1119 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1120 Version 1.4.0rc2 (Valentine)
1122 * --single-wig option is renamed to --single-profile
1124 * BedGraph output with --bdg or -B option.
1126 The BedGraph output provides 1bp resolution fragment pileup
1127 profile. File size is smaller than wig file. This option can be
1128 combined with --single-profile option to produce a bedgraph file
1129 for the whole genome. This option can also make --space,
1130 --call-subpeaks invalid.
1132 * Fix the description of --shiftsize to correctly state that the
1133 value is 1/2 d (fragment size).
1135 * Fix a bug in the call to __filter_w_control_tags when control is
1138 * Fix a bug on --to-small option. Now it works as expected.
1140 * Fix a bug while counting the tags in candidate peak region, an
1141 extra tag may be included. (Thanks to Jake Biesinger!)
1143 * Fix the bug for the peaks extended outside of chromosome
1144 start. If the minus strand tag goes outside of chromosome start
1145 after extension of d, it will be thrown out.
1147 * Post-process script for a combined wig file:
1149 The "wignorm" command can be called after a full run of MACS14 as
1150 a postprocess. wignorm can calculate the local background from the
1151 control wig file from MACS14, then use either foldchange,
1152 -10*log10(pvalue) from possion test, or difference after asinh
1153 transformation as the score to build a single wig track to
1154 represent the binding strength. This script will take a
1155 significant long time to process.
1157 * --wigextend has been obsoleted.
1159 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1160 Version 1.4.0rc1 (Starry Sky)
1162 * Duplicate reads option
1164 --keep-dup behavior is changed. Now user can specify how many
1165 reads he/she wants to keep at the same genomic location. 'auto' to
1166 let MACS decide the number based on binomial distribution, 'all'
1167 to let MACS keep all reads.
1169 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1171 By default, MACS will now scale the smaller dataset to the bigger
1172 dataset. For instance, if IP has 10 million reads, and Input has 5
1173 million, MACS will double the lambda value calculated from Input
1174 reads while calling BOTH the positive peaks and negative
1175 peaks. This will address the issue caused by unbalanced numbers of
1176 reads from IP and Input. If --to-small is turned on, MACS will
1177 scale the larger dataset to the smaller one. So from now on, if d
1178 is fixed, then the peaks from a MACS call for A vs B should be
1179 identical to the negative peaks from a B vs A.
1181 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1182 Version 1.4.0beta (summer wishes)
1188 The default behavior in the model building step is slightly
1189 changed. When MACS can't find enough pairs to build model
1190 (implemented in alpha version) or the modeled fragment length is
1191 less than 2 times of tag length (implemented in beta version),
1192 MACS will use 2 times of --shiftsize value as fragment length in
1193 the later analysis. --off-auto can turn off this default behavior.
1195 ** Redundant tag filtering
1197 The IO module is rewritten. The redundant tag filtering process
1198 becomes simpler and works as promise. The maximum allowed number
1199 of tags at the exact same location is calculated from the
1200 sequencing depth and genome size using a binomial distribution,
1201 for both TREAMENT and CONTROL separately. ( previously only
1202 TREATMENT is considered ) The exact same location means the same
1203 coordination and the same strand. Then MACS will only keep at most
1204 this number of tags at the exact same location in the following
1205 analysis. An option --keep-dup can let MACS skip the filtering and
1206 keep all the tags. However this may bring in a lot of sequencing
1207 bias, so you may get many false positive peaks.
1209 ** Single wiggle mode
1211 First thing to mention, this is not the score track that I
1212 described before. By default, MACS generates wiggle files for
1213 fragment pileup for every chromosomes separately. When you use
1214 --single-wig option, MACS will generate a single wiggle file for
1215 all the chromosomes so you will get a wig.gz for TREATMENT and
1216 another wig.gz for CONTROL if available.
1218 ** Sniff -- automatic format detection
1220 Now, by default or "-f AUTO", MACS will decide the input file
1221 format automatically. Technically, it will try to read at most
1222 1000 records for the first 10 non-comment lines. If it succeeds,
1223 the format is decided. I recommend not to use AUTO and specify the
1224 right format for your input files, unless you combine different
1225 formats in a single MACS run.
1229 --single-wig and --keep-dup are added. Check previous section in
1230 ChangeLog for detail.
1232 -f (--format) AUTO is now the default option.
1234 --slocal default: 1000
1235 --llocal default: 10000
1239 Setup script will stop the installation if python version is not
1240 python2.6 or python2.7.
1242 Local lambda calculation has been changed back. MACS will check
1243 peak_region, slocal( default 1K) and llocal (default 10K) for the
1244 local bias. The previous 200bps default will cause MACS misses
1245 some peaks where the input bias is very sharp.
1247 sam2bed.py script is corrected.
1249 Relative pos in xls output is fixed.
1251 Parser for ELAND_export is fixed to pass some of the no match
1252 lines. And elandexport2bed.py is fixed too. ( however I can't
1253 guarantee that it works on any eland_export files. )
1255 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1256 Version 1.4.0alpha2 (be smarter)
1260 --gsize now provides shortcuts for common genomes, including
1261 human, mouse, C. elegans and fruitfly.
1263 --llocal now will be 5000 bps if there is no input file, so that
1264 local lambda doesn't overkill enriched binding sites.
1266 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1267 Version 1.4alpha (be smarter)
1271 --tsize option is redesigned. MACS will use the first 10 lines of
1272 the input to decide the tag size. If user specifies --tsize, it
1273 will override the auto decided tsize.
1275 --lambdaset is replaced by --slocal and --llocal which mean the
1276 small local region and large local region.
1278 --bw has no effect on the scan-window size now. It only affects the
1279 paired-peaks model process.
1283 During the model building, MACS will pick out the enriched regions
1284 which are not too high and not too low to build the paired-peak
1285 model. Default the region is from fold 10 to fold 30. If MACS
1286 fails to build the model, by default it will use the nomodel
1287 settings, like shiftsize=100bps, to shift and extend each
1288 tags. This behavior can be turned off by '--off-auto'.
1292 An extra file including all the summit positions are saved in
1293 *_summits.bed file. An option '--call-subpeaks' will invoke
1294 PeakSplitter developed by Mali Salmon to split wide peaks into
1297 * Sniff ( will in beta )
1299 Automatically recognize the input file format, so use can combine
1300 different format in one MACS run.
1302 Not implemented features/TODO:
1304 * Algorithms ( in near future? )
1306 MACS will try to refine the peak boundaries by calculating the
1307 scores for every point in the candidate peak regions. The score
1308 will be the -10*log(10,pvalue) on a local poisson distribution. A
1309 cutoff specified by users (--pvalue) will be applied to find the
1310 precise sub-peaks in the original candidate peak region. Peak
1311 boudaries and peak summits positions will be saved in separate BED
1314 * Single wiggle track ( in near future? )
1316 A single wiggle track will be generated to save the scores within
1317 candidate peak regions in the 10bps resolution. The wiggle file
1318 is in fixedStep format.
1321 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1322 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1326 Fixed typo. FCSTEP -> FESTEP
1330 The 'femax' attribute bug is fixed
1332 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1333 Version 1.3.7 (Oktoberfest)
1335 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1337 Enhancements by Peter Chines:
1339 1. gzip files are supported.
1340 2. when --diag is on, user can set the increment and endpoint for
1341 fold enrichment analysis by setting --fe-step and --fe-max.
1343 Enhancements by Davide Cittaro:
1345 1. BAM and SAM formats are supported.
1346 2. small changes in the header lines of wiggle output.
1349 1. I added --fe-min option;
1350 2. Bowtie ascii output with suffix ".map" is supported.
1354 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1355 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1356 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1357 4. Some "fold change" have been changed to "fold enrichment".
1359 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1360 Version 1.3.6.1 (default parameter change)
1362 * bin/macs, lib/PeakDetect.py
1364 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1365 default. "--futurefdr" is added which can turn on the 'new' method
1366 introduced in 1.3.6. By default it's off.
1370 Fixed a bug. p-value is corrected a little bit.
1373 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1374 Version 1.3.6 (Birthday cake)
1378 "track name" is added to the header of BED output file.
1380 Now the default peak detection method is to consider 5k and 10k
1381 nearby regions in treatment data and peak location, 1k, 5k, and
1382 10k regions in control data to calculate local bias. The old
1383 method can be called through '--old' option.
1385 Information about how many total/unique tags in treatment or
1386 control will be saved in final .xls output.
1388 * lib/IO/__init__.py
1390 ".fa" will be removed from input tag alignment so only the
1391 chromosome names are kept.
1393 WigTrackI class is added for Wiggle like data structure. (not used
1396 The parser for ELAND multi PET files has been fixed. Now the 5'
1397 tag position for a pair will be kept, whereas in the previous
1398 version, the middle points are kept.
1400 * lib/IO/BinKeeper.py
1402 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1403 browser, which can quickly access certain region for values in a
1404 large wiggle like data file. (not used now)
1406 * lib/OptValidator.py
1412 Now the default peak detection method is to consider 5k and 10k
1413 nearby regions in treatment data and peak location, 1k, 5k, and
1414 10k regions in control data to calculate local bias. The old
1415 method can be called through '--old' option.
1417 Two columns have beed added to BED output file. 4th column: peak
1418 name; 5th column: peak score using -10log(10,pvalue) as score.
1422 Add support to build a Mac App through 'setup.py py2app', or a
1423 Windows executable through 'setup.py py2exe'. You need to install
1424 py2app or py2exe package in order to use these functions.
1426 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1427 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1431 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1432 in control data to calculate local lambda for each peak. Peak
1433 calling results will be slightly different with previous version,
1438 Typo fixed, ELANDParser -> ELANDResultParser
1442 Now, modeled d value will be shown on the model figure.
1444 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1445 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1447 * macs, IO/__init__.py, PeakDetect.py
1449 Add support for ELAND multi format. Add support for Pair-End
1450 experiment, in this case, 5'end and 3'end ELAND multi format files
1451 are required for treatment or control data. See 00README file for
1454 Add wigextend option.
1456 Add petdist option for Pair-End Tag experiment, which is the best
1457 distance between 5' and 3' tags.
1461 Fixed a bug which cause the end positions of every peak region
1462 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1466 Fix bugs while generating wiggle files. The start position of
1467 wiggle file is set to 1 instead of 0.
1469 Fix a bug that every 10M bps, signals in the first 'd' range are
1470 lower than actual. ( Thanks Mali Salmon!)
1473 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1474 Version 1.3.3 (wiggle bugs fixed)
1478 Fix bugs while generating wiggle files. 1. 'span=' is added to
1479 'variableStep' line; 2. previously, every 10M bps, the coordinates
1480 were wrongly shifted to the right for 'd' basepairs.
1482 * macs, PeakDetect.py
1484 Add an option to save wiggle files on different resolution.
1486 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1487 Version 1.3.2 (tiny bugs fixed)
1491 Fix 65536 -> 65535. ( Thank Joon)
1495 Improved for binomial function with extra large number. Imported
1496 from Cistrome project.
1500 If treatment channel misses reads in some chromosome included in
1501 control channel, or vice versa, MACS will not exit. (Thank Shaun
1504 Instead, MACS will fake a tag at position -1 when calling
1505 treatment peaks vs control, but will ignore the chromosome while
1506 calling negative peaks.
1508 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1509 Version 1.3.1 (tiny bugs fixed version)
1513 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1514 binomial functions can tolerate large and small numbers.
1518 Parsers now split lines in BED/ELAND file using any
1519 whitespaces. 'track' or 'browser' lines will be regarded as
1520 comment lines. A bug fixed when throwing StrandFormatError. The
1521 maximum redundant tag number at a single position can be no less
1525 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1526 Version 1.3 (naming clarification version)
1528 * Naming clarification changes according to our manuscript:
1530 'frag_len' is changed to 'd'.
1532 'fold_change' is changed to 'fold_enrichment'.
1534 Suggest '--bw' parameter to be determined by users from the real
1537 Maximum FDR is 100% in the output file.
1539 And other clarifications in 00README file and the documents on the
1543 If the redundant tag number at a single position is over 32767,
1544 just remember 32767, instead of raising an overflow exception.
1550 Bug fixed for diagnosis report.
1553 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1558 Poisson distribution CDF and inverse CDF functions are
1559 corrected. They can produce right results even for huge lambda
1560 now. So that the p-value and FDR values in the final excel sheet
1563 IO package now can tolerate some rare cases; ELANDParser in IO
1564 package is fixed. (Thank Bogdan)
1568 Reverse paired peaks in model are rejected. So there will be no
1569 negative 'frag_len'. (Thank Bogdan)
1573 Diagnosis function is completed. Which can output a table file for
1574 users to estimate their sequencing depth.
1577 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1580 * Probe.py is added!
1582 GSL is totally removed from MACS. Instead, I have implemented the
1583 CDF and inverse CDF for poisson and binomial distribution purely
1586 * Constants.py is added!
1588 Organize constants used in MACS in the Constants.py file.
1590 * All other files are modified!
1592 Foldchange calculation is modified. Now the foldchange only be
1593 calculated at the peak summit position instead of the whole peak
1594 region. The values will be higher and more robust than before.
1598 1. MACS can save wiggle format files containing the tag number at
1599 every 10 bp along the genome. Tags are shifted according to our
1600 model before they are calculated.
1602 2. Model building and local lambda calculation can be skipped with
1605 3. A diagnosis report can be generated through '--diag'
1606 option. This report can help you get an assumption about the
1607 sequencing saturation. This funtion is only in beta stage.
1609 4. FDR calculation speed is highly improved.
1611 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1614 * TabIO, PeakModel.py ...
1615 Bug fixed to let MACS tolerate some cases while there is no tag on
1616 either plus strand or minus strand.
1619 Check the version of python. If the version is lower than 2.4,
1620 refuse to install with warning.
1623 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1624 MACS version 2.0.10 20130731 (tag:alpha)
1626 * callpeak --call-summits
1628 Fix bugs causing callpeak --call-summits option generating extra
1629 number of peaks and inconsistent peak boundaries comparing to
1630 default option. Thank Ben Levinson!
1634 Fix bugs causing bdgcmp output logLR all in positive values. Now
1635 'depletion' can be correctly represented as negative values.
1639 Fix the behavior of bdgdiff module. Now it can take four
1640 bedGraph files, then use logLR as cutoff to call differential
1641 regions. Check command line of bdgdiff for detail.
1643 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1644 MACS version 2.0.10 20130713 (tag:alpha)
1646 * fix bugs while output broadPeak and gappedPeak.
1648 Note. Those weak broad regions without any strong enrichment
1649 regions inside won't be saved in gappedPeak file.
1651 * bdgcmp -T and -C are merged into -S and description is updated.
1653 Now, you can use it to override SPMR values in your input for
1654 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1655 statistics will cause weird results ( in most cases, lower
1656 significancy), and won't be consistent with MACS2 callpeak
1657 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1658 sample size in MILLION according to 'callpeak --to-large' option.
1660 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1661 MACS version 2.0.10 20130710 (tag:alpha)
1663 * fix BED style output format of callpeak module:
1665 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1666 the output. Old BED format file won't be saved.
1668 2) with --broad: broadPeak (BED6+3) for broad region and
1669 gappedPeak (BED12+3) for chained enriched regions will be the
1670 output. Old BED format, narrowPeak format, summit file won't be
1673 * bdgcmp now can accept list of methods to calculate scores. So
1674 you can run it once to generate multiple types of scores. Thank
1675 Jon Urban for this suggestion!
1677 * C codes are re-generated through Cython 0.19.1.
1679 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1680 MACS version 2.0.10 20130520 (tag:alpha)
1682 * broad peak calling modules are modified in order to report all
1683 relexed regions even there is no strong enrichment inside.
1685 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1686 MACS version 2.0.10 20130501 (tag:alpha)
1688 * Memory usage is decreased to about 1/4-1/5 of previous usage
1689 Now, the internal data structure and algorithm are both
1690 re-organized, so that intermediate data wouldn't be saved in
1691 memory. Intead they will be calculated on the fly. New MACS2 will
1692 spend longer time (1.5 to 2 times) however it will use less memory
1693 so can be more usable on small mem servers.
1695 * --seed option is added to callpeak and randsample commands
1696 Thank Mathieu Gineste for this suggestion!
1698 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1699 MACS version 2.0.10 20130306 (tag:alpha)
1701 * diffpeak module New module to detect differential binding sites
1702 with more statistics.
1704 * Introduced --refine-peaks
1705 Calculates reads balancing to refine peak summits
1707 * Ouput file names prefix
1708 Correct encodePeak to narrowPeak, broadPeak to bed12.
1710 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1711 MACS version 2.0.10 (tag:alpha not released)
1713 * Introduced BAMPEParser
1714 Reads PE data directly, requires bedtools for now
1716 * Introduced --call-summits
1717 Uses signal processing methods to call overlapping peaks
1719 * Added --no-trackline
1720 By default, files have descriptive tracklines now
1722 * new refinepeak command (experimental)
1723 This new function will use a similar method in SPP (wtd), to
1724 analyze raw tag distribution in peak region, then redefine the
1725 peak summit where plus and minus tags are evenly distributed
1728 * Changes to output *
1729 cPeakDetect.pyx has full support for new print/write methods and
1730 --call-peaks, BAMPEParser, and use of paired-end data
1732 * Parser optimization
1734 cParser.pyx is rewritten to use io.BufferedReader to speed
1735 up. Speed is doubled.
1737 Code is reorganized -- most of functions are inherited from
1738 GenericParser class.
1740 * Use cross-correlation to calculate fragment size
1742 First, all pairs will be used in prediction for fragment
1743 size. Previously, only no more than 1000 pairs are used. Second,
1744 cross-correlation is used to find the best phase difference
1745 between + and - tag pileups.
1747 * Speed up p-value and q-value calculation
1749 This part is ten times faster now. I am using a dictionary to
1750 cache p-value results from Poisson CDF function. A bit more memory
1751 will be used to increase speed. I hope this dictionary would not
1752 explode since the possible pairs of ChIP signal and control lambda
1753 are hugely redundant. Also, I rewrited part of q-value
1756 * Speed up peak detection
1758 This part is about hundred of times faster now. Optimizations
1759 include using Numpy functions as much as possible, and making loop
1760 body as small as possible.
1762 * Post-processing on differential calls
1764 After macs2diff finds differential binding sites between two
1765 conditions, it will try to annotate the peak calls from one of two
1766 conditions, describe the changes ...
1768 * Fragment size prediction in macs2diff
1770 Now by default, macs2diff will try to use the average fragment
1771 size from both condition 1 and condition 2 for tag extension and
1772 peak calling. Previously, by default, it will use different sizes
1773 unless --nomodel is specified.
1775 Technically, I separate model building processes out. So macs2diff
1776 will build fragment sizes for condition 1 and 2 in parallel (2
1777 processes maximum), then perform 4-way comparisons in parallel (4
1782 Combine two p/qscore tracks together. At regions where condition 1
1783 is higher than condition 2, score would be positive, otherwise,
1786 * SAMParser and BAMParser
1788 Bug fixed for paired-end sequencing data.
1792 Fixed a bug while calling peaks from BedGraph file. It previously
1793 mistakenly output same peaks multiple times at the end of
1796 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1797 MACS version 2.0.9 (tag:alpha)
1799 * Auto fixation on predicted d is turned off by default!
1801 Previous --off-auto is now default. MACS will not automatically
1802 fix d less than 2 times of tag size according to
1803 --shiftsize. While tag size is getting longer nowadays, it would
1804 be easier to have d less than 2 times of tag size, however d may
1805 still be meaningful and useful. Please judge it using your own
1810 Now, the default scaling while treatment and input are unbalanced
1811 has been adjusted. By default, larger sample will be scaled down
1812 linearly to match the smaller sample. In this way, background
1813 noise will be reduced more than real signals, so we expect to have
1814 more specific results than the other way around (i.e. --to-large
1817 Also, an alternative option to randomly sample larger data
1818 (--down-sample) is provided to replace default linear
1819 scaling. However, this option will cause results irresproducible,
1824 A new script 'randsample' is added, which can randomly sample
1825 certain percentage or number of tags.
1829 Now, MACS will decide peak summits according to pileup height
1830 instead of qvalue scores. In this way, the summit may be more
1835 MACS calculate qvalue scores as differential scores. When compare
1836 two conditions (saying A and B), the maximum qscore for comparing
1837 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1838 will be computed. If maxqscore_a2b is bigger, the diff score is
1839 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1841 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1842 MACS version 2.0.8 (tag:alpha)
1844 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1846 New script bdgbroadcall and the extra option '--broad' for macs2
1847 script, can be used to call broad regions with a loose cutoff to
1848 link nearby significant regions. The output is represented as
1851 * MACS2/IO/cScoreTrack.pyx
1853 Fix q-value calculation to generate forcefully monotonic values.
1855 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1857 They are combined to one more powerful script called
1858 "filterdup". The script filterdup can filter duplicated reads
1859 according to sequencing depth and genome size. The script can also
1860 convert any format supported by MACS to BED format.
1862 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1863 MACS version 2.0.7 (tag:alpha)
1865 * bin/macsdiff renamed to bin/bdgdiff
1867 Now this script will work as a low-level finetuning tool as bdgcmp
1872 A new script to take treatment and control files from two
1873 condition, calculate fragment size, use local poisson to get
1874 pvalues and BH process to get qvalues, then combine 4-ways result
1875 to call differential sites.
1877 This script can use upto 4 cpus to speed up 4-ways calculation. (
1878 I am trying multiprocessing in python. )
1880 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1881 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1882 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1884 All above files are modified for the new macs2diff script.
1886 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1888 Now q-value 0.01 is the default cutoff. If -p is specified,
1889 p-value cutoff will be used instead.
1891 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1892 MACS version 2.0.6 (tag:alpha)
1896 A script to call differential regions. A naive way is introduced
1897 to find the regions where:
1899 1. signal from condition 1 is larger than input 1 and condition 2 --
1900 unique region in condition 1;
1901 2. signal from condition 2 is larger than input 2 and condition 1
1902 -- unique region in condition 2;
1903 3. signal from condition 1 is larger than input 1, signal from
1904 condition 2 is larger than input 2, however either signal from
1905 condition 1 or 2 is not larger than the other.
1907 Here 'larger' means the pvalue or qvalue from a Poisson test is
1908 under certain cutoff.
1910 (I will make another script to wrap up mulitple scripts for
1911 differential calling)
1913 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1914 MACS version 2.0.5 (tag:alpha)
1916 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1917 MACS2/IO/cPeakIO.pyx
1919 Use hash to store peak information. Add back the feature to deal
1920 with data without control.
1922 Fix bug which incorrectly allows small peaks at the end of
1925 * bin/bdgpeakcall, bin/bdgcmp
1927 Fix bugs. bdgpeakcall can output encodePeak format.
1929 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1930 MACS version 2.0.4 (tag:alpha)
1934 Fix a bug, correctly assign lambda_bg while --to-small is
1935 set. Thanks Junya Seo!
1937 Add rank and num of bp columns to pvalue-qvalue table.
1941 Fix bugs to correctly deal with peakless chromosomes. Thanks
1944 Use AFDR for independent tests instead.
1948 Now MACS can output peak coordinates together with pvalue, qvalue,
1949 summit positions in a single encodePeak format (designed for
1950 ENCODE project) file. This file can be loaded to UCSC
1951 browser. Definition of some specific columns are: 5th:
1952 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1953 -log10qvalue, 10th: relative summit position to peak start.
1956 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1957 MACS version 2.0.3 (tag:alpha)
1959 * Rich output with qvalue, fold enrichment, and pileup height
1961 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1964 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1966 Now we have a similiar xls output file as before. The differences
1967 from previous file are:
1969 1. Summit now is absolute summit, instead of relative summit
1971 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1972 pileup at the peak summit;
1973 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1974 5.00 means 1e-5, simple and less confusing.
1975 4. FDR column becomes '-log10(qvalue)' column.
1976 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1977 the values at the peak summit.
1979 * Extra output files
1981 NAME_pqtable.txt contains pvalue and qvalue relationships.
1983 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1984 and -log10qvalue scores in BedGraph format. Nearby regions with
1985 the same value are not merged.
1987 * Separation of FeatIO.py
1989 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1990 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1991 implemented to store pileup, local lambda, pvalue, and qvalue
1992 alltogether in cScoreTrack.pyx.
1994 * Experimental option --half-ext
1996 Suggested by NPS algorithm, I added an experimental option
1997 --half-ext to let MACS only extends ChIP fragment around its
1998 middle point for only 1/2 d.
2000 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
2001 MACS version 2.0.2 (tag:alpha)
2005 Add an error check to see if there is no common chromosome names
2006 from treatment file and control file
2008 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
2010 Reduce memory usage by removing deepcopy() calls.
2012 * Modify README documents and others.
2014 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
2015 MACS Version 2.0.1 (tag:alpha)
2017 * cPileup.pyx, cPeakDetect.pyx and peak calling process
2019 Jie suggested me a brilliant simple method to pileup fragments
2020 into bedGraph track. It works extremely faster than the previous
2021 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
2022 large local lambda calculation in MACSv2 now. Now I generate three
2023 bedGraphs for d-size local bias, slocal-size and llocal-size local
2024 bias, and calculate the maximum local bias as local lambda
2027 Minor: add_loc in bedGraphTrackI now can correctly merge the
2028 region with its preceding region if their value are the same.
2032 Add an option to shift control tags before extension. By default,
2033 control tags will be extended to both sides regardless of strand
2036 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
2037 MACS Version 2.0.0 (tag:alpha)
2039 * Use bedGraph type to store data internally and externally.
2041 We can have theoretically one-basepair resolution profiles. 10
2042 times smaller in filesize and even smaller after converting to
2043 bigWig for visualization.
2045 * Peak calling process modified. Better peak boundary detection.
2047 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2048 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2049 one will be averaged to d size) Then calculate the maximum value
2050 of these two tracks and a global background, to have a
2051 local-lambda bedGraph.
2053 Use -10log10poisson_pvalue as scores to generate a score track
2054 before peak calling.
2056 A general peak calling based on a score cutoff, min length of peak
2057 and max gap between nearby peaks.
2061 Wiggle file output is removed. Now we only support bedGraph
2062 output. The generation of bedGraph is highly recommended since it
2063 will not cost extra time. In other words, bedGraph generation is
2064 internally run even you don't want to save bedGraphs on disk, due
2065 to the peak calling algorithm in MACS v2.
2069 We now can calculate poisson pvalue in log space so that the score
2070 (-10*log10pvalue) will not have a upper limit of 3100 due to
2071 precision of float number.
2073 * Cython is adopted to speed up Python code.
2075 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
2078 * Replaced with a newest WigTrackI class and fixed the wignorm script.
2080 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
2081 Version 1.4.0rc2 (Valentine)
2083 * --single-wig option is renamed to --single-profile
2085 * BedGraph output with --bdg or -B option.
2087 The BedGraph output provides 1bp resolution fragment pileup
2088 profile. File size is smaller than wig file. This option can be
2089 combined with --single-profile option to produce a bedgraph file
2090 for the whole genome. This option can also make --space,
2091 --call-subpeaks invalid.
2093 * Fix the description of --shiftsize to correctly state that the
2094 value is 1/2 d (fragment size).
2096 * Fix a bug in the call to __filter_w_control_tags when control is
2099 * Fix a bug on --to-small option. Now it works as expected.
2101 * Fix a bug while counting the tags in candidate peak region, an
2102 extra tag may be included. (Thanks to Jake Biesinger!)
2104 * Fix the bug for the peaks extended outside of chromosome
2105 start. If the minus strand tag goes outside of chromosome start
2106 after extension of d, it will be thrown out.
2108 * Post-process script for a combined wig file:
2110 The "wignorm" command can be called after a full run of MACS14 as
2111 a postprocess. wignorm can calculate the local background from the
2112 control wig file from MACS14, then use either foldchange,
2113 -10*log10(pvalue) from possion test, or difference after asinh
2114 transformation as the score to build a single wig track to
2115 represent the binding strength. This script will take a
2116 significant long time to process.
2118 * --wigextend has been obsoleted.
2120 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
2121 Version 1.4.0rc1 (Starry Sky)
2123 * Duplicate reads option
2125 --keep-dup behavior is changed. Now user can specify how many
2126 reads he/she wants to keep at the same genomic location. 'auto' to
2127 let MACS decide the number based on binomial distribution, 'all'
2128 to let MACS keep all reads.
2130 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2132 By default, MACS will now scale the smaller dataset to the bigger
2133 dataset. For instance, if IP has 10 million reads, and Input has 5
2134 million, MACS will double the lambda value calculated from Input
2135 reads while calling BOTH the positive peaks and negative
2136 peaks. This will address the issue caused by unbalanced numbers of
2137 reads from IP and Input. If --to-small is turned on, MACS will
2138 scale the larger dataset to the smaller one. So from now on, if d
2139 is fixed, then the peaks from a MACS call for A vs B should be
2140 identical to the negative peaks from a B vs A.
2142 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2143 Version 1.4.0beta (summer wishes)
2149 The default behavior in the model building step is slightly
2150 changed. When MACS can't find enough pairs to build model
2151 (implemented in alpha version) or the modeled fragment length is
2152 less than 2 times of tag length (implemented in beta version),
2153 MACS will use 2 times of --shiftsize value as fragment length in
2154 the later analysis. --off-auto can turn off this default behavior.
2156 ** Redundant tag filtering
2158 The IO module is rewritten. The redundant tag filtering process
2159 becomes simpler and works as promise. The maximum allowed number
2160 of tags at the exact same location is calculated from the
2161 sequencing depth and genome size using a binomial distribution,
2162 for both TREAMENT and CONTROL separately. ( previously only
2163 TREATMENT is considered ) The exact same location means the same
2164 coordination and the same strand. Then MACS will only keep at most
2165 this number of tags at the exact same location in the following
2166 analysis. An option --keep-dup can let MACS skip the filtering and
2167 keep all the tags. However this may bring in a lot of sequencing
2168 bias, so you may get many false positive peaks.
2170 ** Single wiggle mode
2172 First thing to mention, this is not the score track that I
2173 described before. By default, MACS generates wiggle files for
2174 fragment pileup for every chromosomes separately. When you use
2175 --single-wig option, MACS will generate a single wiggle file for
2176 all the chromosomes so you will get a wig.gz for TREATMENT and
2177 another wig.gz for CONTROL if available.
2179 ** Sniff -- automatic format detection
2181 Now, by default or "-f AUTO", MACS will decide the input file
2182 format automatically. Technically, it will try to read at most
2183 1000 records for the first 10 non-comment lines. If it succeeds,
2184 the format is decided. I recommend not to use AUTO and specify the
2185 right format for your input files, unless you combine different
2186 formats in a single MACS run.
2190 --single-wig and --keep-dup are added. Check previous section in
2191 ChangeLog for detail.
2193 -f (--format) AUTO is now the default option.
2195 --slocal default: 1000
2196 --llocal default: 10000
2200 Setup script will stop the installation if python version is not
2201 python2.6 or python2.7.
2203 Local lambda calculation has been changed back. MACS will check
2204 peak_region, slocal( default 1K) and llocal (default 10K) for the
2205 local bias. The previous 200bps default will cause MACS misses
2206 some peaks where the input bias is very sharp.
2208 sam2bed.py script is corrected.
2210 Relative pos in xls output is fixed.
2212 Parser for ELAND_export is fixed to pass some of the no match
2213 lines. And elandexport2bed.py is fixed too. ( however I can't
2214 guarantee that it works on any eland_export files. )
2216 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2217 Version 1.4.0alpha2 (be smarter)
2221 --gsize now provides shortcuts for common genomes, including
2222 human, mouse, C. elegans and fruitfly.
2224 --llocal now will be 5000 bps if there is no input file, so that
2225 local lambda doesn't overkill enriched binding sites.
2227 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2228 Version 1.4alpha (be smarter)
2232 --tsize option is redesigned. MACS will use the first 10 lines of
2233 the input to decide the tag size. If user specifies --tsize, it
2234 will override the auto decided tsize.
2236 --lambdaset is replaced by --slocal and --llocal which mean the
2237 small local region and large local region.
2239 --bw has no effect on the scan-window size now. It only affects the
2240 paired-peaks model process.
2244 During the model building, MACS will pick out the enriched regions
2245 which are not too high and not too low to build the paired-peak
2246 model. Default the region is from fold 10 to fold 30. If MACS
2247 fails to build the model, by default it will use the nomodel
2248 settings, like shiftsize=100bps, to shift and extend each
2249 tags. This behavior can be turned off by '--off-auto'.
2253 An extra file including all the summit positions are saved in
2254 *_summits.bed file. An option '--call-subpeaks' will invoke
2255 PeakSplitter developed by Mali Salmon to split wide peaks into
2258 * Sniff ( will in beta )
2260 Automatically recognize the input file format, so use can combine
2261 different format in one MACS run.
2263 Not implemented features/TODO:
2265 * Algorithms ( in near future? )
2267 MACS will try to refine the peak boundaries by calculating the
2268 scores for every point in the candidate peak regions. The score
2269 will be the -10*log(10,pvalue) on a local poisson distribution. A
2270 cutoff specified by users (--pvalue) will be applied to find the
2271 precise sub-peaks in the original candidate peak region. Peak
2272 boudaries and peak summits positions will be saved in separate BED
2275 * Single wiggle track ( in near future? )
2277 A single wiggle track will be generated to save the scores within
2278 candidate peak regions in the 10bps resolution. The wiggle file
2279 is in fixedStep format.
2282 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2283 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2287 Fixed typo. FCSTEP -> FESTEP
2291 The 'femax' attribute bug is fixed
2293 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2294 Version 1.3.7 (Oktoberfest)
2296 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2298 Enhancements by Peter Chines:
2300 1. gzip files are supported.
2301 2. when --diag is on, user can set the increment and endpoint for
2302 fold enrichment analysis by setting --fe-step and --fe-max.
2304 Enhancements by Davide Cittaro:
2306 1. BAM and SAM formats are supported.
2307 2. small changes in the header lines of wiggle output.
2310 1. I added --fe-min option;
2311 2. Bowtie ascii output with suffix ".map" is supported.
2315 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2316 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2317 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2318 4. Some "fold change" have been changed to "fold enrichment".
2320 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2321 Version 1.3.6.1 (default parameter change)
2323 * bin/macs, lib/PeakDetect.py
2325 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2326 default. "--futurefdr" is added which can turn on the 'new' method
2327 introduced in 1.3.6. By default it's off.
2331 Fixed a bug. p-value is corrected a little bit.
2334 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2335 Version 1.3.6 (Birthday cake)
2339 "track name" is added to the header of BED output file.
2341 Now the default peak detection method is to consider 5k and 10k
2342 nearby regions in treatment data and peak location, 1k, 5k, and
2343 10k regions in control data to calculate local bias. The old
2344 method can be called through '--old' option.
2346 Information about how many total/unique tags in treatment or
2347 control will be saved in final .xls output.
2349 * lib/IO/__init__.py
2351 ".fa" will be removed from input tag alignment so only the
2352 chromosome names are kept.
2354 WigTrackI class is added for Wiggle like data structure. (not used
2357 The parser for ELAND multi PET files has been fixed. Now the 5'
2358 tag position for a pair will be kept, whereas in the previous
2359 version, the middle points are kept.
2361 * lib/IO/BinKeeper.py
2363 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2364 browser, which can quickly access certain region for values in a
2365 large wiggle like data file. (not used now)
2367 * lib/OptValidator.py
2373 Now the default peak detection method is to consider 5k and 10k
2374 nearby regions in treatment data and peak location, 1k, 5k, and
2375 10k regions in control data to calculate local bias. The old
2376 method can be called through '--old' option.
2378 Two columns have beed added to BED output file. 4th column: peak
2379 name; 5th column: peak score using -10log(10,pvalue) as score.
2383 Add support to build a Mac App through 'setup.py py2app', or a
2384 Windows executable through 'setup.py py2exe'. You need to install
2385 py2app or py2exe package in order to use these functions.
2387 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2388 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2392 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2393 in control data to calculate local lambda for each peak. Peak
2394 calling results will be slightly different with previous version,
2399 Typo fixed, ELANDParser -> ELANDResultParser
2403 Now, modeled d value will be shown on the model figure.
2405 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2406 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2408 * macs, IO/__init__.py, PeakDetect.py
2410 Add support for ELAND multi format. Add support for Pair-End
2411 experiment, in this case, 5'end and 3'end ELAND multi format files
2412 are required for treatment or control data. See 00README file for
2415 Add wigextend option.
2417 Add petdist option for Pair-End Tag experiment, which is the best
2418 distance between 5' and 3' tags.
2422 Fixed a bug which cause the end positions of every peak region
2423 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2427 Fix bugs while generating wiggle files. The start position of
2428 wiggle file is set to 1 instead of 0.
2430 Fix a bug that every 10M bps, signals in the first 'd' range are
2431 lower than actual. ( Thanks Mali Salmon!)
2434 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2435 Version 1.3.3 (wiggle bugs fixed)
2439 Fix bugs while generating wiggle files. 1. 'span=' is added to
2440 'variableStep' line; 2. previously, every 10M bps, the coordinates
2441 were wrongly shifted to the right for 'd' basepairs.
2443 * macs, PeakDetect.py
2445 Add an option to save wiggle files on different resolution.
2447 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2448 Version 1.3.2 (tiny bugs fixed)
2452 Fix 65536 -> 65535. ( Thank Joon)
2456 Improved for binomial function with extra large number. Imported
2457 from Cistrome project.
2461 If treatment channel misses reads in some chromosome included in
2462 control channel, or vice versa, MACS will not exit. (Thank Shaun
2465 Instead, MACS will fake a tag at position -1 when calling
2466 treatment peaks vs control, but will ignore the chromosome while
2467 calling negative peaks.
2469 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2470 Version 1.3.1 (tiny bugs fixed version)
2474 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2475 binomial functions can tolerate large and small numbers.
2479 Parsers now split lines in BED/ELAND file using any
2480 whitespaces. 'track' or 'browser' lines will be regarded as
2481 comment lines. A bug fixed when throwing StrandFormatError. The
2482 maximum redundant tag number at a single position can be no less
2486 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2487 Version 1.3 (naming clarification version)
2489 * Naming clarification changes according to our manuscript:
2491 'frag_len' is changed to 'd'.
2493 'fold_change' is changed to 'fold_enrichment'.
2495 Suggest '--bw' parameter to be determined by users from the real
2498 Maximum FDR is 100% in the output file.
2500 And other clarifications in 00README file and the documents on the
2504 If the redundant tag number at a single position is over 32767,
2505 just remember 32767, instead of raising an overflow exception.
2511 Bug fixed for diagnosis report.
2514 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2519 Poisson distribution CDF and inverse CDF functions are
2520 corrected. They can produce right results even for huge lambda
2521 now. So that the p-value and FDR values in the final excel sheet
2524 IO package now can tolerate some rare cases; ELANDParser in IO
2525 package is fixed. (Thank Bogdan)
2529 Reverse paired peaks in model are rejected. So there will be no
2530 negative 'frag_len'. (Thank Bogdan)
2534 Diagnosis function is completed. Which can output a table file for
2535 users to estimate their sequencing depth.
2538 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2541 * Probe.py is added!
2543 GSL is totally removed from MACS. Instead, I have implemented the
2544 CDF and inverse CDF for poisson and binomial distribution purely
2547 * Constants.py is added!
2549 Organize constants used in MACS in the Constants.py file.
2551 * All other files are modified!
2553 Foldchange calculation is modified. Now the foldchange only be
2554 calculated at the peak summit position instead of the whole peak
2555 region. The values will be higher and more robust than before.
2559 1. MACS can save wiggle format files containing the tag number at
2560 every 10 bp along the genome. Tags are shifted according to our
2561 model before they are calculated.
2563 2. Model building and local lambda calculation can be skipped with
2566 3. A diagnosis report can be generated through '--diag'
2567 option. This report can help you get an assumption about the
2568 sequencing saturation. This funtion is only in beta stage.
2570 4. FDR calculation speed is highly improved.
2572 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2575 * TabIO, PeakModel.py ...
2576 Bug fixed to let MACS tolerate some cases while there is no tag on
2577 either plus strand or minus strand.
2580 Check the version of python. If the version is lower than 2.4,
2581 refuse to install with warning.