1 2024-02-19 Tao Liu <vladimir.liu@gmail.com>
6 1) Fixed a bug that the `hmmatac` can't correctly save the
7 digested signal files. #605 #611
9 2) Applied a patch to remove cython requirement from the installed
10 system. (it's needed for building the package). #606 #612
12 3) Relax the testing script while comparing the peaks called from
13 current codes and the standard peaks. To implement this, we added
14 'intersection' function to 'Regions' class to find the
15 intersecting regions of two Regions object (similar to PeakIO but
16 only recording chromosome, start and end positions). And we
17 updated the unit test 'test_Region.py' then implemented a script
18 'jaccard.py' to compute the Jaccard Index of two peak files. If
19 the JI > 0.99 we would think the peaks called and the standard
20 peaks are similar. This is to avoid the problem caused by
21 different Numpy/SciPy/sci-kit learn libraries, when certain peak
22 coordinates may have 10bps difference. #615 #619
24 4) Due to the changes in scikit-learn 1.3.0:
25 https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
26 0.3 uses Kmeans will end up with inconsistent results between
27 sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
28 hmm.GaussianHMM and adjusted the standard output from `hmmratac`
29 subcommand. The change is based on
30 https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
31 the random seeding of KMeans 10 times. Now the `hmmratac` results
32 should be more consistent (at least JI>0.99). #615 #620
36 1) We added some dependencies to MACS3. `hmmratc` subcommand needs
37 `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
38 `scikit-learn` needs `scipy`. Since major releases have happened
39 for both`scipy` and `scikit-learn`, we have to set specific
40 version requirements for them in order to make sure the output
41 results from `hmmratac` are consistent.
43 2) We updated our documentation website using
44 Sphinx. https://macs3-project.github.io/MACS/
46 2023-11-15 Tao Liu <vladimir.liu@gmail.com>
49 1) Call variants in peak regions directly from BAM files. The
50 function was originally developed under code name SAPPER. Now
51 SAPPER has been merged into MACS as the `callvar` command. It can
52 be used to call SNVs and small INDELs directly from alignment
53 files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
54 the DNA sequence at the enriched genomic regions (binding sites or
55 accessible DNA) and to refine the alignment when necessary. We
56 added `simde` as a submodule in order to support fermi-lite
57 library under non-x64 architectures.
59 2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
60 a dedicated software to analyze ATAC-seq data. The basic idea
61 behind HMMRATAC is to digest ATAC-seq data according to the
62 fragment length of read pairs into four signal tracks: short
63 fragments, mono-nucleosomal fragments, di-nucleosomal fragments
64 and tri-nucleosomal fragments. Then integrate the four tracks
65 again using Hidden Markov Model to consider three hidden states:
66 open region, nucleosomal region, and background region. The
67 orginal paper was published in 2019 written in JAVA, by Evan
68 Tarbell. We implemented it in Python/Cython and optimize the whole
69 process using existing MACS functions and hmmlearn. Now it can run
70 much faster than the original JAVA version. Note: evaluation of
71 the peak calling results is still underway.
73 3) Speed/memory optimization. Use the cykhash to replace python
74 dictionary. Use buffer (10MB) to read and parse input file (not
75 available for BAM file parser). And many optimization tweaks. We
76 added memory monitoring to the runtime messages.
78 4) R wrappers for MACS -- MACSr for bioconductor.
80 5) Code cleanup. Reorganize source codes.
84 7) Switch to Github Action for CI, support multi-arch testing
85 including x64, armv7, aarch64, s390x and ppc64le. We also test on
88 8) MACS tag-shifting model has been refined. Now it will use a
89 naive peak calling approach to find ALL possible paired peaks at +
90 and - strand, then use all of them to calculate the
91 cross-correlation. (a related bug has been fix
92 [#442](https://github.com/macs3-project/MACS/issues/442))
94 9) BAI index and random access to BAM file now is
95 supported. [#449](https://github.com/macs3-project/MACS/issues/449).
97 10) Support of Python > 3.10
98 [#498](https://github.com/macs3-project/MACS/issues/498)
100 11) The effective genome size parameters have been updated
102 deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
104 12) Multiple updates regarding dependencies, anaconda built, CI/CD
107 13) Cython 3 is supported.
109 14) Documentations for each subcommand can be found under /docs
113 1) Missing header line while no peaks can be called
114 [#501](https://github.com/macs3-project/MACS/issues/501)
115 [#502](https://github.com/macs3-project/MACS/issues/502)
117 2) Note: different numpy, scipy, sklearn may give slightly
118 different results for hmmratac results. The current standard
119 results for automated testing in `/test` directory are from Numpy
120 1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
122 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
127 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
130 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
135 1) MACS2 has been tested on multiple architectures to make sure it
136 can successfully generate consistent results. Currently the
137 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
138 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
139 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
140 and #370. The lesson is that if the project is built on Cython and
141 is aimed at memory efficiency, we should specifically define all
142 int/float types in pyx files such as int8_t or uint32_t using
143 either libc or numpy (c version) instead of relying on Cython
144 types such as short, long, double.
146 2) MACS2 setup script will check numpy and install numpy if
147 necessary. PR #378, issue #364
149 3) `bdgbroadcall` command will correctly add the score column (5th
150 column). The score (5th) column contains 10 times of the average
151 score in the broad region. PR #373, issue #362
153 4) The missing test on `bdgopt` subcommand has been added. PR #363
155 5) The obsolete option `--ratio` from `callpeak` subcommand has
156 been removed. PR #369, issue #366
158 6) Fixed the incorrect description in README on the 'maximum
159 length of broad region is 4 times of d' to 'maximum gap for
160 merging broad regions is 4 times of tag size by default'. PR #380,
165 1) CODE OF CONDUCT document has been added to MACS2 github
168 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
173 1) Speed up MACS2. Some programming tricks and code cleanup. The
174 filter_dup function replaces separate_dups. The later one was
175 implemented for potentially putting back duplicate reads in
176 certain downstream analysis. However such analysis hasn't been
177 implemented. Optimize the speed of writing bedGraph
178 files. Optimize BAM and BAMPE parsing with pointer casting instead
181 2) The comment lines in the headers of BED or SAM files will be
182 correctly skipped. However, MACS2 won't check comment lines in the
187 1) Cutoff-analysis in callpeak command. #341
189 2) Issues related to SAMParser and three ELAND Parsers are
194 1) cmdlinetest script in test/ folder has been updated to: 1. test
195 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
196 and after the error or warning message during tests; 3. output
197 only the first 10 lines if the difference between test result and
198 standard result can be found; 4. prockreport monitor CPU time and
199 memory usage in 1 sec interval -- a bit more accurate.
201 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
203 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
204 MACS version 2.2.5 (Py3 speed up)
208 1) *Github code only and Not included in MACS2 release* New
209 testing data for performance test. An subsampled ENCODE2 CTCF
210 ChIP-seq dataset, including 5million ChIP reads and 5 million
211 control reads, has been included in the test folder for testing
212 CPU and memory usage (i.e. 5M test). Several related scripts ,
213 including `prockreport` for output cpu memory usage, `pyprofile`
214 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
217 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
218 The old hashtable.pyx implementation copied from Pandas (very old
219 version) doesn't work well in Python3+Cython. It slows down the
220 pqtable checkup using the identical Cython codes as in
221 v2.1.4. While running 5M test, the `__getitem__` function in the
222 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
223 148.6s with the same number of calls in MACS2 v2.2.4. As a
224 consequence, the standard python dictionary implementation has
225 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
226 faster than py2 version, but uses a bit more memory. In general,
227 v2.2.5 can finish 5M reads test in 20% less time than MACS2
228 v2.1.4, but use 15% more memory.
232 1) More Python3 related fixes, e.g. the return value of keys from
236 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
237 MACS version 2.2.4 (Python3)
241 1) First Python3 version MACS2 released.
243 2) Version number 2.2.X will be used for MACS2 in Python3, in
246 3) More comprehensive test.sh script to check the consistency of
247 results from Python2 version and Python3 version.
249 4) Simplify setup.py script since the newest version transparently
250 supports cython. And when cython is not installed by the user,
251 setup.py can still compile using only C codes.
253 5) Fix Signal.pyx to use np.array instead of np.mat.
255 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
260 Github Actions is used together with Travis CI for testing and
267 1) #318 Random score in bdgdiff output. It turns out the sum_v is
268 not initialized as 0 before adding. Potential bugs are fixed in
269 other functions in ScoreTrack and CallPeakUnit codes.
271 2) #321 Cython dependency in setup.py script is removed. And place
272 'cythonzie' call to the correct position.
274 3) A typo is fixed in Github Actions script.
276 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
281 1) Support Docker auto-deploy. PR #309
283 2) Support Travis CI auto-testing, update unit-testing
284 scripts, and enable subcommand testing on small datasets.
286 3) Update README documents. #297 PR #306
288 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
289 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
291 5) `--d-min` option is added in `callpeak` and `predictd`, to
292 exclude predictions of fragment size smaller than the given
293 value. Merged from PR #267 @shouldsee.
295 6) `--buffer-size` option is added in `predictd`, `filterdup`,
296 `pileup` and `refinepeak` subcommands. Users can use this option
297 to decrease memory usage while there are a large number of contigs
298 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
299 `pileup` and `refinepeak` will suggest users to tweak
300 `--buffer-size` while catching a MemoryError. #313 PR #314
304 1) #265 Fixed a bug where the pseudocount hasn't been applied
305 while calculating p-value score in ScoreTrack object.
307 2) Fixed bdgbroadcall so that it will report those broad peaks
308 without strong peak inside, a consistent behavior as `callpeak
311 3) Rename COPYING to LICENSE.
313 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
318 1) Added missing BEDPE support. And enable the support for BAMPE
319 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
320 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
321 will pile up the whole fragment defined by mapping locations of
322 the left end and right end of each read pair. Thank @purcaro
324 2) Added options to callpeak command for tweaking max-gap and
325 min-len during peak calling. Thank @jsh58!
327 3) The callpeak option "--to-large" option is replaced with
330 4) The randsample option "-t" has been replaced with "-i".
334 1) Fixed memory issue related to #122 and #146
336 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
338 3) Fixed a bug while setting commandline qvalue cutoff.
340 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
342 5) Fixed the calculation of average fragment length for paired-end
345 6) Fixed bugs caused by khash while computing p/q-value and log
346 likelihood ratios. Thank @jsh58
348 7) More spelling tweaks in source code. Thank @mr-c
350 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
351 MACS version 2.1.1 20160309
355 * Fixed spelling. Merged pull request #120. Thank @mr-c!
357 * Change filtering criteria for reading BAM/SAM files
359 Related to callpeak and filterdup commands. Now the
360 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
361 still be read although MACS2 may decide them as duplicates
362 later. Related to old issue #33. Sorry I forgot to address it for
365 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
366 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
370 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
371 the former option is not supported by older GCC. Related to issues
374 2) Issue #108 is fixed. If no peak can be found in a chromosome,
375 the PeakIO won't throw an error.
381 a) A more flexible format, BEDPE, is supported. Now users can
382 define the left and right position of the ChIPed fragment, and
383 MACS2 will skip model building and directly pileup the
384 fragments. Related to issue #112.
386 b) The 'tempdir' can be specified, to save cached pileup
387 tracks. Originially, the temporary files were stored in
388 /tmp. Thank @daler! Related to issues #97 and #105.
392 New operations are added, to calculate the maximum or minimum value between
393 values in BEDGRAPH and given value.
397 New method is added, to calculate the maximum value between values
398 defined in two BEDGRAPH files.
400 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
401 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
405 1) Fix a bug while dealing with some chromosomes only containing
406 one read (pair). The size of dup_plus/dup_minus arrays after
407 filtering dups should +1.
409 2) Fix a bug related to the broad peak calling function in
410 previous versions. The gaps were miscalculated, so segmented weak
411 broad calls may be reported, and sometimes you would see peaks
412 with lower than cutoff values in the output files.
414 3) "Potentially" Fixed issue #105 on temporary cache files, need
418 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
419 MACS version 2.1.0 20150731 (tag:rc)
423 1) Fixed issue #76: information about broad/narrow cutoff will be
426 2) Fixed issue #79: bdgopt extparam option is fixed.
428 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
429 for filterdup command.
431 4) Fixed issue #78, #88 and similar issue reported in MACS google
432 group: MACS2 now can correctly deal with multiple alignment files
433 for -t or -c. The 'finalize' function will be correctly
434 called. Multiple files option is enabled for filterdup,
435 randsample, predictd, pileup and refinepeak commands.
437 5) A related issue to #88, when BAMPE mode is used, PE pairs will
438 be sorted by leftmost then rightmost ends.
440 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
441 array. This will cause 'callpeak --nolambda' hang forever while
442 calculating pvalues and qvalues.
444 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
445 MACS version 2.1.0 20150420 (tag:rc)
449 1) bdgopt: some convenient functions to modify bedGraph files.
451 2) cmbreps: Combine scores from two replicates. Including three
452 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
453 method to combine two p-value scores. After that, user can use
454 bdgpeakcall to call peaks on combined scores.
458 1) callpeak and bdgpeakcall now can try to analyze the
459 relationship between p-values and number/length of peaks then
460 generate a summary to help users decide an appropriate cutoff.
462 2) callpeak now can accept fold-enrichment cutoff as a filter for
467 Now MACS2 runs about 3X as fast as previous version. Trade
468 clean python codes for speed... Now while processing 50M ChIP vs
469 50M control, it will take only 10 minutes.
473 1) Sampling function in BAMPE mode.
475 2) Callpeak while there are >= 2 input files for -t or -c.
477 3) While reading BAM/SAM, those secondary or supplementary
478 alignments will be correctly skipped.
480 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
481 option that MACS2 will discard those SAM/BAM alignments with bit
482 1024 no matter how --keep-dup is set.
484 5) Fixed issue #49: setuptools is used intead of distutils
486 6) Fixed issue #51: fix the problem when using --trackline
487 argument when control file is absent.
489 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
490 read mapped to minus strand. Previous implementation will find
491 incorrect 5' end if there is indel in alignment.
493 8) Fixed issue #56: An incorrect sorting method used for BAMPE
494 mode which will cause incorrect filtering of duplicated reads. Now
497 9) Issue #63: Merged from jayhesselberth@github, extsize now can
500 10) Issue #71: Merged from aertslab@github, close file descriptor
501 after creating them with mkstemp().
503 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
504 MACS version 2.1.0 20140616 (tag:rc)
508 "--ratio" is added to manually assign the scaling factor of ChIP
509 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
510 implementing the patch file!
512 "--shift" is added to move cutting ends (5' end of reads) around,
513 in order to process DNAse-Seq data, e.g., use "--shift -100
514 --extsize 200" to get 200bps fragments around 5' ends. For general
515 ChIP-Seq data analysis, this option should be always set as
516 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
519 ** Do not output negative fragment size from cross-correlation
520 analysis. Thank Alvin Qin for the feedback!
522 ** --half-ext and --control-shift are removed. For complex read
523 shifting and extending, combine '--shift' and '--extsize'
524 options. For comparing two conditions, use 'bdgdiff' module
527 ** a bug is fixed to output the last pileup value in bdg file
532 A 'dry-run' option is added to only output numbers, including the
533 number of allowed duplicates, the total number of reads before and
534 after filtering duplicates and the estimated duplication
535 rate. Thank John Urban for the suggestion!
538 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
539 MACS version 2.0.10 20131216 (tag:alpha)
543 * We changed license from Artistic License to 3-clauses BSD license.
545 Yes. Simpler the better.
547 * Process paired-end data with "-f BAMPE" without control
549 * GappedPeak output for --broad option has been fixed again to be
550 consistent with official UCSC format. We add 1bp pseudo-block to
551 left and/or right of broad region when necessary, so that you can
552 virtualize the regions without strong enrichment inside
553 successfully. In downstream analysis except for virtualization,
554 you may need to remove all 1bps blocks from gappedPeak file.
556 * diffpeak subcommand is temporarily disabled. Till we
559 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
560 MACS version 2.0.10 20131028 (tag:alpha)
562 * callpeak --call-summits improvement
564 The smoothing window length has been fixed as fragment length
565 instead of short read length. The larger smoothing window will
566 grant better smoothing results and better sub-peak summits
569 * --outdir and --ofile options for almost all commands
571 Thank Björn Grüning for initially implementing these options!
572 Now, MACS2 will save results into a specified
573 directory by '--outdir' option, and/or save result into a
574 specified file by '--ofile' option. Note, in case '--ofile' is
575 available for a subcommand, '-o' now has been adjusted to be the
576 same as '--ofile' instead of '--o-prefix'.
578 Here is the list of changes. For more detail, use 'macs2 xxx -h'
581 ** callpeak: --outdir
582 ** diffpeak: Not implemented
583 ** bdgpeakcall: --outdir and --ofile
584 ** bdgbroadcall: --outdir and --ofile
585 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
586 and the order of arguments for --ofile must be the same as for -m.
587 ** bdgdiff: --outdir and --ofile
588 ** filterdup: --outdir
590 ** randsample: --outdir
591 ** refinepeak: --outdir and --ofile
594 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
595 MACS version 2.0.10 20130915 (tag:alpha)
597 * callpeak Added a new option --buffer-size
599 This option is to tweak a previously hidden parameter that
600 controls the steps to increase array size for storing alignment
601 information. While in some rare cases, the number of
602 chromosomes/contigs/scaffolds is huge, the original default
603 setting will cause a huge memory waste. In these cases, we
604 recommend to decrease --buffer-size (e.g., 1000) to save memory,
605 although the decrease will slow process to read alignment files.
607 * an optimization to speed up pvalue-qvalue statistics
609 Previously, it took a hour to prepare p-q-table for 65M vs 65M
610 human TF library, and now it will take 10 minutes. It was due to a
611 single line of code to get a value from a numpy array ...
615 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
616 MACS version 2.0.10 20130731 (tag:alpha)
618 * callpeak --call-summits
620 Fix bugs causing callpeak --call-summits option generating extra
621 number of peaks and inconsistent peak boundaries comparing to
622 default option. Thank Ben Levinson!
626 Fix bugs causing bdgcmp output logLR all in positive values. Now
627 'depletion' can be correctly represented as negative values.
631 Fix the behavior of bdgdiff module. Now it can take four
632 bedGraph files, then use logLR as cutoff to call differential
633 regions. Check command line of bdgdiff for detail.
635 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
636 MACS version 2.0.10 20130713 (tag:alpha)
638 * fix bugs while output broadPeak and gappedPeak.
640 Note. Those weak broad regions without any strong enrichment
641 regions inside won't be saved in gappedPeak file.
643 * bdgcmp -T and -C are merged into -S and description is updated.
645 Now, you can use it to override SPMR values in your input for
646 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
647 statistics will cause weird results ( in most cases, lower
648 significancy), and won't be consistent with MACS2 callpeak
649 behavior. So if you have SPMR bedGraphs, input the smaller/larger
650 sample size in MILLION according to 'callpeak --to-large' option.
652 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
653 MACS version 2.0.10 20130710 (tag:alpha)
655 * fix BED style output format of callpeak module:
657 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
658 the output. Old BED format file won't be saved.
660 2) with --broad: broadPeak (BED6+3) for broad region and
661 gappedPeak (BED12+3) for chained enriched regions will be the
662 output. Old BED format, narrowPeak format, summit file won't be
665 * bdgcmp now can accept list of methods to calculate scores. So
666 you can run it once to generate multiple types of scores. Thank
667 Jon Urban for this suggestion!
669 * C codes are re-generated through Cython 0.19.1.
671 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
672 MACS version 2.0.10 20130520 (tag:alpha)
674 * broad peak calling modules are modified in order to report all
675 relexed regions even there is no strong enrichment inside.
677 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
678 MACS version 2.0.10 20130501 (tag:alpha)
680 * Memory usage is decreased to about 1/4-1/5 of previous usage
681 Now, the internal data structure and algorithm are both
682 re-organized, so that intermediate data wouldn't be saved in
683 memory. Intead they will be calculated on the fly. New MACS2 will
684 spend longer time (1.5 to 2 times) however it will use less memory
685 so can be more usable on small mem servers.
687 * --seed option is added to callpeak and randsample commands
688 Thank Mathieu Gineste for this suggestion!
690 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
691 MACS version 2.0.10 20130306 (tag:alpha)
693 * diffpeak module New module to detect differential binding sites
694 with more statistics.
696 * Introduced --refine-peaks
697 Calculates reads balancing to refine peak summits
699 * Ouput file names prefix
700 Correct encodePeak to narrowPeak, broadPeak to bed12.
702 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
703 MACS version 2.0.10 (tag:alpha not released)
705 * Introduced BAMPEParser
706 Reads PE data directly, requires bedtools for now
708 * Introduced --call-summits
709 Uses signal processing methods to call overlapping peaks
711 * Added --no-trackline
712 By default, files have descriptive tracklines now
714 * new refinepeak command (experimental)
715 This new function will use a similar method in SPP (wtd), to
716 analyze raw tag distribution in peak region, then redefine the
717 peak summit where plus and minus tags are evenly distributed
720 * Changes to output *
721 cPeakDetect.pyx has full support for new print/write methods and
722 --call-peaks, BAMPEParser, and use of paired-end data
724 * Parser optimization
726 cParser.pyx is rewritten to use io.BufferedReader to speed
727 up. Speed is doubled.
729 Code is reorganized -- most of functions are inherited from
732 * Use cross-correlation to calculate fragment size
734 First, all pairs will be used in prediction for fragment
735 size. Previously, only no more than 1000 pairs are used. Second,
736 cross-correlation is used to find the best phase difference
737 between + and - tag pileups.
739 * Speed up p-value and q-value calculation
741 This part is ten times faster now. I am using a dictionary to
742 cache p-value results from Poisson CDF function. A bit more memory
743 will be used to increase speed. I hope this dictionary would not
744 explode since the possible pairs of ChIP signal and control lambda
745 are hugely redundant. Also, I rewrited part of q-value
748 * Speed up peak detection
750 This part is about hundred of times faster now. Optimizations
751 include using Numpy functions as much as possible, and making loop
752 body as small as possible.
754 * Post-processing on differential calls
756 After macs2diff finds differential binding sites between two
757 conditions, it will try to annotate the peak calls from one of two
758 conditions, describe the changes ...
760 * Fragment size prediction in macs2diff
762 Now by default, macs2diff will try to use the average fragment
763 size from both condition 1 and condition 2 for tag extension and
764 peak calling. Previously, by default, it will use different sizes
765 unless --nomodel is specified.
767 Technically, I separate model building processes out. So macs2diff
768 will build fragment sizes for condition 1 and 2 in parallel (2
769 processes maximum), then perform 4-way comparisons in parallel (4
774 Combine two p/qscore tracks together. At regions where condition 1
775 is higher than condition 2, score would be positive, otherwise,
778 * SAMParser and BAMParser
780 Bug fixed for paired-end sequencing data.
784 Fixed a bug while calling peaks from BedGraph file. It previously
785 mistakenly output same peaks multiple times at the end of
788 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
789 MACS version 2.0.9 (tag:alpha)
791 * Auto fixation on predicted d is turned off by default!
793 Previous --off-auto is now default. MACS will not automatically
794 fix d less than 2 times of tag size according to
795 --shiftsize. While tag size is getting longer nowadays, it would
796 be easier to have d less than 2 times of tag size, however d may
797 still be meaningful and useful. Please judge it using your own
802 Now, the default scaling while treatment and input are unbalanced
803 has been adjusted. By default, larger sample will be scaled down
804 linearly to match the smaller sample. In this way, background
805 noise will be reduced more than real signals, so we expect to have
806 more specific results than the other way around (i.e. --to-large
809 Also, an alternative option to randomly sample larger data
810 (--down-sample) is provided to replace default linear
811 scaling. However, this option will cause results irresproducible,
816 A new script 'randsample' is added, which can randomly sample
817 certain percentage or number of tags.
821 Now, MACS will decide peak summits according to pileup height
822 instead of qvalue scores. In this way, the summit may be more
827 MACS calculate qvalue scores as differential scores. When compare
828 two conditions (saying A and B), the maximum qscore for comparing
829 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
830 will be computed. If maxqscore_a2b is bigger, the diff score is
831 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
833 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
834 MACS version 2.0.8 (tag:alpha)
836 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
838 New script bdgbroadcall and the extra option '--broad' for macs2
839 script, can be used to call broad regions with a loose cutoff to
840 link nearby significant regions. The output is represented as
843 * MACS2/IO/cScoreTrack.pyx
845 Fix q-value calculation to generate forcefully monotonic values.
847 * bin/eland*2bed, bin/sam2bed and bin/filterdup
849 They are combined to one more powerful script called
850 "filterdup". The script filterdup can filter duplicated reads
851 according to sequencing depth and genome size. The script can also
852 convert any format supported by MACS to BED format.
854 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
855 MACS version 2.0.7 (tag:alpha)
857 * bin/macsdiff renamed to bin/bdgdiff
859 Now this script will work as a low-level finetuning tool as bdgcmp
864 A new script to take treatment and control files from two
865 condition, calculate fragment size, use local poisson to get
866 pvalues and BH process to get qvalues, then combine 4-ways result
867 to call differential sites.
869 This script can use upto 4 cpus to speed up 4-ways calculation. (
870 I am trying multiprocessing in python. )
872 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
873 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
874 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
876 All above files are modified for the new macs2diff script.
878 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
880 Now q-value 0.01 is the default cutoff. If -p is specified,
881 p-value cutoff will be used instead.
883 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
884 MACS version 2.0.6 (tag:alpha)
888 A script to call differential regions. A naive way is introduced
889 to find the regions where:
891 1. signal from condition 1 is larger than input 1 and condition 2 --
892 unique region in condition 1;
893 2. signal from condition 2 is larger than input 2 and condition 1
894 -- unique region in condition 2;
895 3. signal from condition 1 is larger than input 1, signal from
896 condition 2 is larger than input 2, however either signal from
897 condition 1 or 2 is not larger than the other.
899 Here 'larger' means the pvalue or qvalue from a Poisson test is
900 under certain cutoff.
902 (I will make another script to wrap up mulitple scripts for
903 differential calling)
905 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
906 MACS version 2.0.5 (tag:alpha)
908 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
911 Use hash to store peak information. Add back the feature to deal
912 with data without control.
914 Fix bug which incorrectly allows small peaks at the end of
917 * bin/bdgpeakcall, bin/bdgcmp
919 Fix bugs. bdgpeakcall can output encodePeak format.
921 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
922 MACS version 2.0.4 (tag:alpha)
926 Fix a bug, correctly assign lambda_bg while --to-small is
927 set. Thanks Junya Seo!
929 Add rank and num of bp columns to pvalue-qvalue table.
933 Fix bugs to correctly deal with peakless chromosomes. Thanks
936 Use AFDR for independent tests instead.
940 Now MACS can output peak coordinates together with pvalue, qvalue,
941 summit positions in a single encodePeak format (designed for
942 ENCODE project) file. This file can be loaded to UCSC
943 browser. Definition of some specific columns are: 5th:
944 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
945 -log10qvalue, 10th: relative summit position to peak start.
948 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
949 MACS version 2.0.3 (tag:alpha)
951 * Rich output with qvalue, fold enrichment, and pileup height
953 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
956 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
958 Now we have a similiar xls output file as before. The differences
959 from previous file are:
961 1. Summit now is absolute summit, instead of relative summit
963 2. 'Pileup' is previous 'tag' column. It's the extended fragment
964 pileup at the peak summit;
965 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
966 5.00 means 1e-5, simple and less confusing.
967 4. FDR column becomes '-log10(qvalue)' column.
968 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
969 the values at the peak summit.
973 NAME_pqtable.txt contains pvalue and qvalue relationships.
975 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
976 and -log10qvalue scores in BedGraph format. Nearby regions with
977 the same value are not merged.
979 * Separation of FeatIO.py
981 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
982 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
983 implemented to store pileup, local lambda, pvalue, and qvalue
984 alltogether in cScoreTrack.pyx.
986 * Experimental option --half-ext
988 Suggested by NPS algorithm, I added an experimental option
989 --half-ext to let MACS only extends ChIP fragment around its
990 middle point for only 1/2 d.
992 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
993 MACS version 2.0.2 (tag:alpha)
997 Add an error check to see if there is no common chromosome names
998 from treatment file and control file
1000 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1002 Reduce memory usage by removing deepcopy() calls.
1004 * Modify README documents and others.
1006 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1007 MACS Version 2.0.1 (tag:alpha)
1009 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1011 Jie suggested me a brilliant simple method to pileup fragments
1012 into bedGraph track. It works extremely faster than the previous
1013 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1014 large local lambda calculation in MACSv2 now. Now I generate three
1015 bedGraphs for d-size local bias, slocal-size and llocal-size local
1016 bias, and calculate the maximum local bias as local lambda
1019 Minor: add_loc in bedGraphTrackI now can correctly merge the
1020 region with its preceding region if their value are the same.
1024 Add an option to shift control tags before extension. By default,
1025 control tags will be extended to both sides regardless of strand
1028 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1029 MACS Version 2.0.0 (tag:alpha)
1031 * Use bedGraph type to store data internally and externally.
1033 We can have theoretically one-basepair resolution profiles. 10
1034 times smaller in filesize and even smaller after converting to
1035 bigWig for visualization.
1037 * Peak calling process modified. Better peak boundary detection.
1039 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1040 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1041 one will be averaged to d size) Then calculate the maximum value
1042 of these two tracks and a global background, to have a
1043 local-lambda bedGraph.
1045 Use -10log10poisson_pvalue as scores to generate a score track
1046 before peak calling.
1048 A general peak calling based on a score cutoff, min length of peak
1049 and max gap between nearby peaks.
1053 Wiggle file output is removed. Now we only support bedGraph
1054 output. The generation of bedGraph is highly recommended since it
1055 will not cost extra time. In other words, bedGraph generation is
1056 internally run even you don't want to save bedGraphs on disk, due
1057 to the peak calling algorithm in MACS v2.
1061 We now can calculate poisson pvalue in log space so that the score
1062 (-10*log10pvalue) will not have a upper limit of 3100 due to
1063 precision of float number.
1065 * Cython is adopted to speed up Python code.
1067 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1070 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1072 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1073 Version 1.4.0rc2 (Valentine)
1075 * --single-wig option is renamed to --single-profile
1077 * BedGraph output with --bdg or -B option.
1079 The BedGraph output provides 1bp resolution fragment pileup
1080 profile. File size is smaller than wig file. This option can be
1081 combined with --single-profile option to produce a bedgraph file
1082 for the whole genome. This option can also make --space,
1083 --call-subpeaks invalid.
1085 * Fix the description of --shiftsize to correctly state that the
1086 value is 1/2 d (fragment size).
1088 * Fix a bug in the call to __filter_w_control_tags when control is
1091 * Fix a bug on --to-small option. Now it works as expected.
1093 * Fix a bug while counting the tags in candidate peak region, an
1094 extra tag may be included. (Thanks to Jake Biesinger!)
1096 * Fix the bug for the peaks extended outside of chromosome
1097 start. If the minus strand tag goes outside of chromosome start
1098 after extension of d, it will be thrown out.
1100 * Post-process script for a combined wig file:
1102 The "wignorm" command can be called after a full run of MACS14 as
1103 a postprocess. wignorm can calculate the local background from the
1104 control wig file from MACS14, then use either foldchange,
1105 -10*log10(pvalue) from possion test, or difference after asinh
1106 transformation as the score to build a single wig track to
1107 represent the binding strength. This script will take a
1108 significant long time to process.
1110 * --wigextend has been obsoleted.
1112 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1113 Version 1.4.0rc1 (Starry Sky)
1115 * Duplicate reads option
1117 --keep-dup behavior is changed. Now user can specify how many
1118 reads he/she wants to keep at the same genomic location. 'auto' to
1119 let MACS decide the number based on binomial distribution, 'all'
1120 to let MACS keep all reads.
1122 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1124 By default, MACS will now scale the smaller dataset to the bigger
1125 dataset. For instance, if IP has 10 million reads, and Input has 5
1126 million, MACS will double the lambda value calculated from Input
1127 reads while calling BOTH the positive peaks and negative
1128 peaks. This will address the issue caused by unbalanced numbers of
1129 reads from IP and Input. If --to-small is turned on, MACS will
1130 scale the larger dataset to the smaller one. So from now on, if d
1131 is fixed, then the peaks from a MACS call for A vs B should be
1132 identical to the negative peaks from a B vs A.
1134 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1135 Version 1.4.0beta (summer wishes)
1141 The default behavior in the model building step is slightly
1142 changed. When MACS can't find enough pairs to build model
1143 (implemented in alpha version) or the modeled fragment length is
1144 less than 2 times of tag length (implemented in beta version),
1145 MACS will use 2 times of --shiftsize value as fragment length in
1146 the later analysis. --off-auto can turn off this default behavior.
1148 ** Redundant tag filtering
1150 The IO module is rewritten. The redundant tag filtering process
1151 becomes simpler and works as promise. The maximum allowed number
1152 of tags at the exact same location is calculated from the
1153 sequencing depth and genome size using a binomial distribution,
1154 for both TREAMENT and CONTROL separately. ( previously only
1155 TREATMENT is considered ) The exact same location means the same
1156 coordination and the same strand. Then MACS will only keep at most
1157 this number of tags at the exact same location in the following
1158 analysis. An option --keep-dup can let MACS skip the filtering and
1159 keep all the tags. However this may bring in a lot of sequencing
1160 bias, so you may get many false positive peaks.
1162 ** Single wiggle mode
1164 First thing to mention, this is not the score track that I
1165 described before. By default, MACS generates wiggle files for
1166 fragment pileup for every chromosomes separately. When you use
1167 --single-wig option, MACS will generate a single wiggle file for
1168 all the chromosomes so you will get a wig.gz for TREATMENT and
1169 another wig.gz for CONTROL if available.
1171 ** Sniff -- automatic format detection
1173 Now, by default or "-f AUTO", MACS will decide the input file
1174 format automatically. Technically, it will try to read at most
1175 1000 records for the first 10 non-comment lines. If it succeeds,
1176 the format is decided. I recommend not to use AUTO and specify the
1177 right format for your input files, unless you combine different
1178 formats in a single MACS run.
1182 --single-wig and --keep-dup are added. Check previous section in
1183 ChangeLog for detail.
1185 -f (--format) AUTO is now the default option.
1187 --slocal default: 1000
1188 --llocal default: 10000
1192 Setup script will stop the installation if python version is not
1193 python2.6 or python2.7.
1195 Local lambda calculation has been changed back. MACS will check
1196 peak_region, slocal( default 1K) and llocal (default 10K) for the
1197 local bias. The previous 200bps default will cause MACS misses
1198 some peaks where the input bias is very sharp.
1200 sam2bed.py script is corrected.
1202 Relative pos in xls output is fixed.
1204 Parser for ELAND_export is fixed to pass some of the no match
1205 lines. And elandexport2bed.py is fixed too. ( however I can't
1206 guarantee that it works on any eland_export files. )
1208 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1209 Version 1.4.0alpha2 (be smarter)
1213 --gsize now provides shortcuts for common genomes, including
1214 human, mouse, C. elegans and fruitfly.
1216 --llocal now will be 5000 bps if there is no input file, so that
1217 local lambda doesn't overkill enriched binding sites.
1219 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1220 Version 1.4alpha (be smarter)
1224 --tsize option is redesigned. MACS will use the first 10 lines of
1225 the input to decide the tag size. If user specifies --tsize, it
1226 will override the auto decided tsize.
1228 --lambdaset is replaced by --slocal and --llocal which mean the
1229 small local region and large local region.
1231 --bw has no effect on the scan-window size now. It only affects the
1232 paired-peaks model process.
1236 During the model building, MACS will pick out the enriched regions
1237 which are not too high and not too low to build the paired-peak
1238 model. Default the region is from fold 10 to fold 30. If MACS
1239 fails to build the model, by default it will use the nomodel
1240 settings, like shiftsize=100bps, to shift and extend each
1241 tags. This behavior can be turned off by '--off-auto'.
1245 An extra file including all the summit positions are saved in
1246 *_summits.bed file. An option '--call-subpeaks' will invoke
1247 PeakSplitter developed by Mali Salmon to split wide peaks into
1250 * Sniff ( will in beta )
1252 Automatically recognize the input file format, so use can combine
1253 different format in one MACS run.
1255 Not implemented features/TODO:
1257 * Algorithms ( in near future? )
1259 MACS will try to refine the peak boundaries by calculating the
1260 scores for every point in the candidate peak regions. The score
1261 will be the -10*log(10,pvalue) on a local poisson distribution. A
1262 cutoff specified by users (--pvalue) will be applied to find the
1263 precise sub-peaks in the original candidate peak region. Peak
1264 boudaries and peak summits positions will be saved in separate BED
1267 * Single wiggle track ( in near future? )
1269 A single wiggle track will be generated to save the scores within
1270 candidate peak regions in the 10bps resolution. The wiggle file
1271 is in fixedStep format.
1274 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1275 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1279 Fixed typo. FCSTEP -> FESTEP
1283 The 'femax' attribute bug is fixed
1285 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1286 Version 1.3.7 (Oktoberfest)
1288 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1290 Enhancements by Peter Chines:
1292 1. gzip files are supported.
1293 2. when --diag is on, user can set the increment and endpoint for
1294 fold enrichment analysis by setting --fe-step and --fe-max.
1296 Enhancements by Davide Cittaro:
1298 1. BAM and SAM formats are supported.
1299 2. small changes in the header lines of wiggle output.
1302 1. I added --fe-min option;
1303 2. Bowtie ascii output with suffix ".map" is supported.
1307 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1308 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1309 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1310 4. Some "fold change" have been changed to "fold enrichment".
1312 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1313 Version 1.3.6.1 (default parameter change)
1315 * bin/macs, lib/PeakDetect.py
1317 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1318 default. "--futurefdr" is added which can turn on the 'new' method
1319 introduced in 1.3.6. By default it's off.
1323 Fixed a bug. p-value is corrected a little bit.
1326 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1327 Version 1.3.6 (Birthday cake)
1331 "track name" is added to the header of BED output file.
1333 Now the default peak detection method is to consider 5k and 10k
1334 nearby regions in treatment data and peak location, 1k, 5k, and
1335 10k regions in control data to calculate local bias. The old
1336 method can be called through '--old' option.
1338 Information about how many total/unique tags in treatment or
1339 control will be saved in final .xls output.
1341 * lib/IO/__init__.py
1343 ".fa" will be removed from input tag alignment so only the
1344 chromosome names are kept.
1346 WigTrackI class is added for Wiggle like data structure. (not used
1349 The parser for ELAND multi PET files has been fixed. Now the 5'
1350 tag position for a pair will be kept, whereas in the previous
1351 version, the middle points are kept.
1353 * lib/IO/BinKeeper.py
1355 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1356 browser, which can quickly access certain region for values in a
1357 large wiggle like data file. (not used now)
1359 * lib/OptValidator.py
1365 Now the default peak detection method is to consider 5k and 10k
1366 nearby regions in treatment data and peak location, 1k, 5k, and
1367 10k regions in control data to calculate local bias. The old
1368 method can be called through '--old' option.
1370 Two columns have beed added to BED output file. 4th column: peak
1371 name; 5th column: peak score using -10log(10,pvalue) as score.
1375 Add support to build a Mac App through 'setup.py py2app', or a
1376 Windows executable through 'setup.py py2exe'. You need to install
1377 py2app or py2exe package in order to use these functions.
1379 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1380 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1384 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1385 in control data to calculate local lambda for each peak. Peak
1386 calling results will be slightly different with previous version,
1391 Typo fixed, ELANDParser -> ELANDResultParser
1395 Now, modeled d value will be shown on the model figure.
1397 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1398 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1400 * macs, IO/__init__.py, PeakDetect.py
1402 Add support for ELAND multi format. Add support for Pair-End
1403 experiment, in this case, 5'end and 3'end ELAND multi format files
1404 are required for treatment or control data. See 00README file for
1407 Add wigextend option.
1409 Add petdist option for Pair-End Tag experiment, which is the best
1410 distance between 5' and 3' tags.
1414 Fixed a bug which cause the end positions of every peak region
1415 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1419 Fix bugs while generating wiggle files. The start position of
1420 wiggle file is set to 1 instead of 0.
1422 Fix a bug that every 10M bps, signals in the first 'd' range are
1423 lower than actual. ( Thanks Mali Salmon!)
1426 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1427 Version 1.3.3 (wiggle bugs fixed)
1431 Fix bugs while generating wiggle files. 1. 'span=' is added to
1432 'variableStep' line; 2. previously, every 10M bps, the coordinates
1433 were wrongly shifted to the right for 'd' basepairs.
1435 * macs, PeakDetect.py
1437 Add an option to save wiggle files on different resolution.
1439 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1440 Version 1.3.2 (tiny bugs fixed)
1444 Fix 65536 -> 65535. ( Thank Joon)
1448 Improved for binomial function with extra large number. Imported
1449 from Cistrome project.
1453 If treatment channel misses reads in some chromosome included in
1454 control channel, or vice versa, MACS will not exit. (Thank Shaun
1457 Instead, MACS will fake a tag at position -1 when calling
1458 treatment peaks vs control, but will ignore the chromosome while
1459 calling negative peaks.
1461 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1462 Version 1.3.1 (tiny bugs fixed version)
1466 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1467 binomial functions can tolerate large and small numbers.
1471 Parsers now split lines in BED/ELAND file using any
1472 whitespaces. 'track' or 'browser' lines will be regarded as
1473 comment lines. A bug fixed when throwing StrandFormatError. The
1474 maximum redundant tag number at a single position can be no less
1478 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1479 Version 1.3 (naming clarification version)
1481 * Naming clarification changes according to our manuscript:
1483 'frag_len' is changed to 'd'.
1485 'fold_change' is changed to 'fold_enrichment'.
1487 Suggest '--bw' parameter to be determined by users from the real
1490 Maximum FDR is 100% in the output file.
1492 And other clarifications in 00README file and the documents on the
1496 If the redundant tag number at a single position is over 32767,
1497 just remember 32767, instead of raising an overflow exception.
1503 Bug fixed for diagnosis report.
1506 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1511 Poisson distribution CDF and inverse CDF functions are
1512 corrected. They can produce right results even for huge lambda
1513 now. So that the p-value and FDR values in the final excel sheet
1516 IO package now can tolerate some rare cases; ELANDParser in IO
1517 package is fixed. (Thank Bogdan)
1521 Reverse paired peaks in model are rejected. So there will be no
1522 negative 'frag_len'. (Thank Bogdan)
1526 Diagnosis function is completed. Which can output a table file for
1527 users to estimate their sequencing depth.
1530 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1533 * Probe.py is added!
1535 GSL is totally removed from MACS. Instead, I have implemented the
1536 CDF and inverse CDF for poisson and binomial distribution purely
1539 * Constants.py is added!
1541 Organize constants used in MACS in the Constants.py file.
1543 * All other files are modified!
1545 Foldchange calculation is modified. Now the foldchange only be
1546 calculated at the peak summit position instead of the whole peak
1547 region. The values will be higher and more robust than before.
1551 1. MACS can save wiggle format files containing the tag number at
1552 every 10 bp along the genome. Tags are shifted according to our
1553 model before they are calculated.
1555 2. Model building and local lambda calculation can be skipped with
1558 3. A diagnosis report can be generated through '--diag'
1559 option. This report can help you get an assumption about the
1560 sequencing saturation. This funtion is only in beta stage.
1562 4. FDR calculation speed is highly improved.
1564 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1567 * TabIO, PeakModel.py ...
1568 Bug fixed to let MACS tolerate some cases while there is no tag on
1569 either plus strand or minus strand.
1572 Check the version of python. If the version is lower than 2.4,
1573 refuse to install with warning.
1576 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1577 MACS version 2.0.10 20130731 (tag:alpha)
1579 * callpeak --call-summits
1581 Fix bugs causing callpeak --call-summits option generating extra
1582 number of peaks and inconsistent peak boundaries comparing to
1583 default option. Thank Ben Levinson!
1587 Fix bugs causing bdgcmp output logLR all in positive values. Now
1588 'depletion' can be correctly represented as negative values.
1592 Fix the behavior of bdgdiff module. Now it can take four
1593 bedGraph files, then use logLR as cutoff to call differential
1594 regions. Check command line of bdgdiff for detail.
1596 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1597 MACS version 2.0.10 20130713 (tag:alpha)
1599 * fix bugs while output broadPeak and gappedPeak.
1601 Note. Those weak broad regions without any strong enrichment
1602 regions inside won't be saved in gappedPeak file.
1604 * bdgcmp -T and -C are merged into -S and description is updated.
1606 Now, you can use it to override SPMR values in your input for
1607 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1608 statistics will cause weird results ( in most cases, lower
1609 significancy), and won't be consistent with MACS2 callpeak
1610 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1611 sample size in MILLION according to 'callpeak --to-large' option.
1613 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1614 MACS version 2.0.10 20130710 (tag:alpha)
1616 * fix BED style output format of callpeak module:
1618 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1619 the output. Old BED format file won't be saved.
1621 2) with --broad: broadPeak (BED6+3) for broad region and
1622 gappedPeak (BED12+3) for chained enriched regions will be the
1623 output. Old BED format, narrowPeak format, summit file won't be
1626 * bdgcmp now can accept list of methods to calculate scores. So
1627 you can run it once to generate multiple types of scores. Thank
1628 Jon Urban for this suggestion!
1630 * C codes are re-generated through Cython 0.19.1.
1632 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1633 MACS version 2.0.10 20130520 (tag:alpha)
1635 * broad peak calling modules are modified in order to report all
1636 relexed regions even there is no strong enrichment inside.
1638 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1639 MACS version 2.0.10 20130501 (tag:alpha)
1641 * Memory usage is decreased to about 1/4-1/5 of previous usage
1642 Now, the internal data structure and algorithm are both
1643 re-organized, so that intermediate data wouldn't be saved in
1644 memory. Intead they will be calculated on the fly. New MACS2 will
1645 spend longer time (1.5 to 2 times) however it will use less memory
1646 so can be more usable on small mem servers.
1648 * --seed option is added to callpeak and randsample commands
1649 Thank Mathieu Gineste for this suggestion!
1651 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1652 MACS version 2.0.10 20130306 (tag:alpha)
1654 * diffpeak module New module to detect differential binding sites
1655 with more statistics.
1657 * Introduced --refine-peaks
1658 Calculates reads balancing to refine peak summits
1660 * Ouput file names prefix
1661 Correct encodePeak to narrowPeak, broadPeak to bed12.
1663 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1664 MACS version 2.0.10 (tag:alpha not released)
1666 * Introduced BAMPEParser
1667 Reads PE data directly, requires bedtools for now
1669 * Introduced --call-summits
1670 Uses signal processing methods to call overlapping peaks
1672 * Added --no-trackline
1673 By default, files have descriptive tracklines now
1675 * new refinepeak command (experimental)
1676 This new function will use a similar method in SPP (wtd), to
1677 analyze raw tag distribution in peak region, then redefine the
1678 peak summit where plus and minus tags are evenly distributed
1681 * Changes to output *
1682 cPeakDetect.pyx has full support for new print/write methods and
1683 --call-peaks, BAMPEParser, and use of paired-end data
1685 * Parser optimization
1687 cParser.pyx is rewritten to use io.BufferedReader to speed
1688 up. Speed is doubled.
1690 Code is reorganized -- most of functions are inherited from
1691 GenericParser class.
1693 * Use cross-correlation to calculate fragment size
1695 First, all pairs will be used in prediction for fragment
1696 size. Previously, only no more than 1000 pairs are used. Second,
1697 cross-correlation is used to find the best phase difference
1698 between + and - tag pileups.
1700 * Speed up p-value and q-value calculation
1702 This part is ten times faster now. I am using a dictionary to
1703 cache p-value results from Poisson CDF function. A bit more memory
1704 will be used to increase speed. I hope this dictionary would not
1705 explode since the possible pairs of ChIP signal and control lambda
1706 are hugely redundant. Also, I rewrited part of q-value
1709 * Speed up peak detection
1711 This part is about hundred of times faster now. Optimizations
1712 include using Numpy functions as much as possible, and making loop
1713 body as small as possible.
1715 * Post-processing on differential calls
1717 After macs2diff finds differential binding sites between two
1718 conditions, it will try to annotate the peak calls from one of two
1719 conditions, describe the changes ...
1721 * Fragment size prediction in macs2diff
1723 Now by default, macs2diff will try to use the average fragment
1724 size from both condition 1 and condition 2 for tag extension and
1725 peak calling. Previously, by default, it will use different sizes
1726 unless --nomodel is specified.
1728 Technically, I separate model building processes out. So macs2diff
1729 will build fragment sizes for condition 1 and 2 in parallel (2
1730 processes maximum), then perform 4-way comparisons in parallel (4
1735 Combine two p/qscore tracks together. At regions where condition 1
1736 is higher than condition 2, score would be positive, otherwise,
1739 * SAMParser and BAMParser
1741 Bug fixed for paired-end sequencing data.
1745 Fixed a bug while calling peaks from BedGraph file. It previously
1746 mistakenly output same peaks multiple times at the end of
1749 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1750 MACS version 2.0.9 (tag:alpha)
1752 * Auto fixation on predicted d is turned off by default!
1754 Previous --off-auto is now default. MACS will not automatically
1755 fix d less than 2 times of tag size according to
1756 --shiftsize. While tag size is getting longer nowadays, it would
1757 be easier to have d less than 2 times of tag size, however d may
1758 still be meaningful and useful. Please judge it using your own
1763 Now, the default scaling while treatment and input are unbalanced
1764 has been adjusted. By default, larger sample will be scaled down
1765 linearly to match the smaller sample. In this way, background
1766 noise will be reduced more than real signals, so we expect to have
1767 more specific results than the other way around (i.e. --to-large
1770 Also, an alternative option to randomly sample larger data
1771 (--down-sample) is provided to replace default linear
1772 scaling. However, this option will cause results irresproducible,
1777 A new script 'randsample' is added, which can randomly sample
1778 certain percentage or number of tags.
1782 Now, MACS will decide peak summits according to pileup height
1783 instead of qvalue scores. In this way, the summit may be more
1788 MACS calculate qvalue scores as differential scores. When compare
1789 two conditions (saying A and B), the maximum qscore for comparing
1790 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1791 will be computed. If maxqscore_a2b is bigger, the diff score is
1792 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1794 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1795 MACS version 2.0.8 (tag:alpha)
1797 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1799 New script bdgbroadcall and the extra option '--broad' for macs2
1800 script, can be used to call broad regions with a loose cutoff to
1801 link nearby significant regions. The output is represented as
1804 * MACS2/IO/cScoreTrack.pyx
1806 Fix q-value calculation to generate forcefully monotonic values.
1808 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1810 They are combined to one more powerful script called
1811 "filterdup". The script filterdup can filter duplicated reads
1812 according to sequencing depth and genome size. The script can also
1813 convert any format supported by MACS to BED format.
1815 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1816 MACS version 2.0.7 (tag:alpha)
1818 * bin/macsdiff renamed to bin/bdgdiff
1820 Now this script will work as a low-level finetuning tool as bdgcmp
1825 A new script to take treatment and control files from two
1826 condition, calculate fragment size, use local poisson to get
1827 pvalues and BH process to get qvalues, then combine 4-ways result
1828 to call differential sites.
1830 This script can use upto 4 cpus to speed up 4-ways calculation. (
1831 I am trying multiprocessing in python. )
1833 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1834 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1835 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1837 All above files are modified for the new macs2diff script.
1839 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1841 Now q-value 0.01 is the default cutoff. If -p is specified,
1842 p-value cutoff will be used instead.
1844 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1845 MACS version 2.0.6 (tag:alpha)
1849 A script to call differential regions. A naive way is introduced
1850 to find the regions where:
1852 1. signal from condition 1 is larger than input 1 and condition 2 --
1853 unique region in condition 1;
1854 2. signal from condition 2 is larger than input 2 and condition 1
1855 -- unique region in condition 2;
1856 3. signal from condition 1 is larger than input 1, signal from
1857 condition 2 is larger than input 2, however either signal from
1858 condition 1 or 2 is not larger than the other.
1860 Here 'larger' means the pvalue or qvalue from a Poisson test is
1861 under certain cutoff.
1863 (I will make another script to wrap up mulitple scripts for
1864 differential calling)
1866 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1867 MACS version 2.0.5 (tag:alpha)
1869 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1870 MACS2/IO/cPeakIO.pyx
1872 Use hash to store peak information. Add back the feature to deal
1873 with data without control.
1875 Fix bug which incorrectly allows small peaks at the end of
1878 * bin/bdgpeakcall, bin/bdgcmp
1880 Fix bugs. bdgpeakcall can output encodePeak format.
1882 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1883 MACS version 2.0.4 (tag:alpha)
1887 Fix a bug, correctly assign lambda_bg while --to-small is
1888 set. Thanks Junya Seo!
1890 Add rank and num of bp columns to pvalue-qvalue table.
1894 Fix bugs to correctly deal with peakless chromosomes. Thanks
1897 Use AFDR for independent tests instead.
1901 Now MACS can output peak coordinates together with pvalue, qvalue,
1902 summit positions in a single encodePeak format (designed for
1903 ENCODE project) file. This file can be loaded to UCSC
1904 browser. Definition of some specific columns are: 5th:
1905 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1906 -log10qvalue, 10th: relative summit position to peak start.
1909 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1910 MACS version 2.0.3 (tag:alpha)
1912 * Rich output with qvalue, fold enrichment, and pileup height
1914 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1917 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1919 Now we have a similiar xls output file as before. The differences
1920 from previous file are:
1922 1. Summit now is absolute summit, instead of relative summit
1924 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1925 pileup at the peak summit;
1926 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1927 5.00 means 1e-5, simple and less confusing.
1928 4. FDR column becomes '-log10(qvalue)' column.
1929 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1930 the values at the peak summit.
1932 * Extra output files
1934 NAME_pqtable.txt contains pvalue and qvalue relationships.
1936 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1937 and -log10qvalue scores in BedGraph format. Nearby regions with
1938 the same value are not merged.
1940 * Separation of FeatIO.py
1942 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1943 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1944 implemented to store pileup, local lambda, pvalue, and qvalue
1945 alltogether in cScoreTrack.pyx.
1947 * Experimental option --half-ext
1949 Suggested by NPS algorithm, I added an experimental option
1950 --half-ext to let MACS only extends ChIP fragment around its
1951 middle point for only 1/2 d.
1953 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1954 MACS version 2.0.2 (tag:alpha)
1958 Add an error check to see if there is no common chromosome names
1959 from treatment file and control file
1961 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1963 Reduce memory usage by removing deepcopy() calls.
1965 * Modify README documents and others.
1967 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1968 MACS Version 2.0.1 (tag:alpha)
1970 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1972 Jie suggested me a brilliant simple method to pileup fragments
1973 into bedGraph track. It works extremely faster than the previous
1974 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1975 large local lambda calculation in MACSv2 now. Now I generate three
1976 bedGraphs for d-size local bias, slocal-size and llocal-size local
1977 bias, and calculate the maximum local bias as local lambda
1980 Minor: add_loc in bedGraphTrackI now can correctly merge the
1981 region with its preceding region if their value are the same.
1985 Add an option to shift control tags before extension. By default,
1986 control tags will be extended to both sides regardless of strand
1989 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1990 MACS Version 2.0.0 (tag:alpha)
1992 * Use bedGraph type to store data internally and externally.
1994 We can have theoretically one-basepair resolution profiles. 10
1995 times smaller in filesize and even smaller after converting to
1996 bigWig for visualization.
1998 * Peak calling process modified. Better peak boundary detection.
2000 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2001 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2002 one will be averaged to d size) Then calculate the maximum value
2003 of these two tracks and a global background, to have a
2004 local-lambda bedGraph.
2006 Use -10log10poisson_pvalue as scores to generate a score track
2007 before peak calling.
2009 A general peak calling based on a score cutoff, min length of peak
2010 and max gap between nearby peaks.
2014 Wiggle file output is removed. Now we only support bedGraph
2015 output. The generation of bedGraph is highly recommended since it
2016 will not cost extra time. In other words, bedGraph generation is
2017 internally run even you don't want to save bedGraphs on disk, due
2018 to the peak calling algorithm in MACS v2.
2022 We now can calculate poisson pvalue in log space so that the score
2023 (-10*log10pvalue) will not have a upper limit of 3100 due to
2024 precision of float number.
2026 * Cython is adopted to speed up Python code.
2028 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
2031 * Replaced with a newest WigTrackI class and fixed the wignorm script.
2033 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
2034 Version 1.4.0rc2 (Valentine)
2036 * --single-wig option is renamed to --single-profile
2038 * BedGraph output with --bdg or -B option.
2040 The BedGraph output provides 1bp resolution fragment pileup
2041 profile. File size is smaller than wig file. This option can be
2042 combined with --single-profile option to produce a bedgraph file
2043 for the whole genome. This option can also make --space,
2044 --call-subpeaks invalid.
2046 * Fix the description of --shiftsize to correctly state that the
2047 value is 1/2 d (fragment size).
2049 * Fix a bug in the call to __filter_w_control_tags when control is
2052 * Fix a bug on --to-small option. Now it works as expected.
2054 * Fix a bug while counting the tags in candidate peak region, an
2055 extra tag may be included. (Thanks to Jake Biesinger!)
2057 * Fix the bug for the peaks extended outside of chromosome
2058 start. If the minus strand tag goes outside of chromosome start
2059 after extension of d, it will be thrown out.
2061 * Post-process script for a combined wig file:
2063 The "wignorm" command can be called after a full run of MACS14 as
2064 a postprocess. wignorm can calculate the local background from the
2065 control wig file from MACS14, then use either foldchange,
2066 -10*log10(pvalue) from possion test, or difference after asinh
2067 transformation as the score to build a single wig track to
2068 represent the binding strength. This script will take a
2069 significant long time to process.
2071 * --wigextend has been obsoleted.
2073 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
2074 Version 1.4.0rc1 (Starry Sky)
2076 * Duplicate reads option
2078 --keep-dup behavior is changed. Now user can specify how many
2079 reads he/she wants to keep at the same genomic location. 'auto' to
2080 let MACS decide the number based on binomial distribution, 'all'
2081 to let MACS keep all reads.
2083 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2085 By default, MACS will now scale the smaller dataset to the bigger
2086 dataset. For instance, if IP has 10 million reads, and Input has 5
2087 million, MACS will double the lambda value calculated from Input
2088 reads while calling BOTH the positive peaks and negative
2089 peaks. This will address the issue caused by unbalanced numbers of
2090 reads from IP and Input. If --to-small is turned on, MACS will
2091 scale the larger dataset to the smaller one. So from now on, if d
2092 is fixed, then the peaks from a MACS call for A vs B should be
2093 identical to the negative peaks from a B vs A.
2095 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2096 Version 1.4.0beta (summer wishes)
2102 The default behavior in the model building step is slightly
2103 changed. When MACS can't find enough pairs to build model
2104 (implemented in alpha version) or the modeled fragment length is
2105 less than 2 times of tag length (implemented in beta version),
2106 MACS will use 2 times of --shiftsize value as fragment length in
2107 the later analysis. --off-auto can turn off this default behavior.
2109 ** Redundant tag filtering
2111 The IO module is rewritten. The redundant tag filtering process
2112 becomes simpler and works as promise. The maximum allowed number
2113 of tags at the exact same location is calculated from the
2114 sequencing depth and genome size using a binomial distribution,
2115 for both TREAMENT and CONTROL separately. ( previously only
2116 TREATMENT is considered ) The exact same location means the same
2117 coordination and the same strand. Then MACS will only keep at most
2118 this number of tags at the exact same location in the following
2119 analysis. An option --keep-dup can let MACS skip the filtering and
2120 keep all the tags. However this may bring in a lot of sequencing
2121 bias, so you may get many false positive peaks.
2123 ** Single wiggle mode
2125 First thing to mention, this is not the score track that I
2126 described before. By default, MACS generates wiggle files for
2127 fragment pileup for every chromosomes separately. When you use
2128 --single-wig option, MACS will generate a single wiggle file for
2129 all the chromosomes so you will get a wig.gz for TREATMENT and
2130 another wig.gz for CONTROL if available.
2132 ** Sniff -- automatic format detection
2134 Now, by default or "-f AUTO", MACS will decide the input file
2135 format automatically. Technically, it will try to read at most
2136 1000 records for the first 10 non-comment lines. If it succeeds,
2137 the format is decided. I recommend not to use AUTO and specify the
2138 right format for your input files, unless you combine different
2139 formats in a single MACS run.
2143 --single-wig and --keep-dup are added. Check previous section in
2144 ChangeLog for detail.
2146 -f (--format) AUTO is now the default option.
2148 --slocal default: 1000
2149 --llocal default: 10000
2153 Setup script will stop the installation if python version is not
2154 python2.6 or python2.7.
2156 Local lambda calculation has been changed back. MACS will check
2157 peak_region, slocal( default 1K) and llocal (default 10K) for the
2158 local bias. The previous 200bps default will cause MACS misses
2159 some peaks where the input bias is very sharp.
2161 sam2bed.py script is corrected.
2163 Relative pos in xls output is fixed.
2165 Parser for ELAND_export is fixed to pass some of the no match
2166 lines. And elandexport2bed.py is fixed too. ( however I can't
2167 guarantee that it works on any eland_export files. )
2169 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2170 Version 1.4.0alpha2 (be smarter)
2174 --gsize now provides shortcuts for common genomes, including
2175 human, mouse, C. elegans and fruitfly.
2177 --llocal now will be 5000 bps if there is no input file, so that
2178 local lambda doesn't overkill enriched binding sites.
2180 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2181 Version 1.4alpha (be smarter)
2185 --tsize option is redesigned. MACS will use the first 10 lines of
2186 the input to decide the tag size. If user specifies --tsize, it
2187 will override the auto decided tsize.
2189 --lambdaset is replaced by --slocal and --llocal which mean the
2190 small local region and large local region.
2192 --bw has no effect on the scan-window size now. It only affects the
2193 paired-peaks model process.
2197 During the model building, MACS will pick out the enriched regions
2198 which are not too high and not too low to build the paired-peak
2199 model. Default the region is from fold 10 to fold 30. If MACS
2200 fails to build the model, by default it will use the nomodel
2201 settings, like shiftsize=100bps, to shift and extend each
2202 tags. This behavior can be turned off by '--off-auto'.
2206 An extra file including all the summit positions are saved in
2207 *_summits.bed file. An option '--call-subpeaks' will invoke
2208 PeakSplitter developed by Mali Salmon to split wide peaks into
2211 * Sniff ( will in beta )
2213 Automatically recognize the input file format, so use can combine
2214 different format in one MACS run.
2216 Not implemented features/TODO:
2218 * Algorithms ( in near future? )
2220 MACS will try to refine the peak boundaries by calculating the
2221 scores for every point in the candidate peak regions. The score
2222 will be the -10*log(10,pvalue) on a local poisson distribution. A
2223 cutoff specified by users (--pvalue) will be applied to find the
2224 precise sub-peaks in the original candidate peak region. Peak
2225 boudaries and peak summits positions will be saved in separate BED
2228 * Single wiggle track ( in near future? )
2230 A single wiggle track will be generated to save the scores within
2231 candidate peak regions in the 10bps resolution. The wiggle file
2232 is in fixedStep format.
2235 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2236 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2240 Fixed typo. FCSTEP -> FESTEP
2244 The 'femax' attribute bug is fixed
2246 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2247 Version 1.3.7 (Oktoberfest)
2249 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2251 Enhancements by Peter Chines:
2253 1. gzip files are supported.
2254 2. when --diag is on, user can set the increment and endpoint for
2255 fold enrichment analysis by setting --fe-step and --fe-max.
2257 Enhancements by Davide Cittaro:
2259 1. BAM and SAM formats are supported.
2260 2. small changes in the header lines of wiggle output.
2263 1. I added --fe-min option;
2264 2. Bowtie ascii output with suffix ".map" is supported.
2268 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2269 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2270 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2271 4. Some "fold change" have been changed to "fold enrichment".
2273 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2274 Version 1.3.6.1 (default parameter change)
2276 * bin/macs, lib/PeakDetect.py
2278 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2279 default. "--futurefdr" is added which can turn on the 'new' method
2280 introduced in 1.3.6. By default it's off.
2284 Fixed a bug. p-value is corrected a little bit.
2287 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2288 Version 1.3.6 (Birthday cake)
2292 "track name" is added to the header of BED output file.
2294 Now the default peak detection method is to consider 5k and 10k
2295 nearby regions in treatment data and peak location, 1k, 5k, and
2296 10k regions in control data to calculate local bias. The old
2297 method can be called through '--old' option.
2299 Information about how many total/unique tags in treatment or
2300 control will be saved in final .xls output.
2302 * lib/IO/__init__.py
2304 ".fa" will be removed from input tag alignment so only the
2305 chromosome names are kept.
2307 WigTrackI class is added for Wiggle like data structure. (not used
2310 The parser for ELAND multi PET files has been fixed. Now the 5'
2311 tag position for a pair will be kept, whereas in the previous
2312 version, the middle points are kept.
2314 * lib/IO/BinKeeper.py
2316 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2317 browser, which can quickly access certain region for values in a
2318 large wiggle like data file. (not used now)
2320 * lib/OptValidator.py
2326 Now the default peak detection method is to consider 5k and 10k
2327 nearby regions in treatment data and peak location, 1k, 5k, and
2328 10k regions in control data to calculate local bias. The old
2329 method can be called through '--old' option.
2331 Two columns have beed added to BED output file. 4th column: peak
2332 name; 5th column: peak score using -10log(10,pvalue) as score.
2336 Add support to build a Mac App through 'setup.py py2app', or a
2337 Windows executable through 'setup.py py2exe'. You need to install
2338 py2app or py2exe package in order to use these functions.
2340 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2341 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2345 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2346 in control data to calculate local lambda for each peak. Peak
2347 calling results will be slightly different with previous version,
2352 Typo fixed, ELANDParser -> ELANDResultParser
2356 Now, modeled d value will be shown on the model figure.
2358 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2359 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2361 * macs, IO/__init__.py, PeakDetect.py
2363 Add support for ELAND multi format. Add support for Pair-End
2364 experiment, in this case, 5'end and 3'end ELAND multi format files
2365 are required for treatment or control data. See 00README file for
2368 Add wigextend option.
2370 Add petdist option for Pair-End Tag experiment, which is the best
2371 distance between 5' and 3' tags.
2375 Fixed a bug which cause the end positions of every peak region
2376 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2380 Fix bugs while generating wiggle files. The start position of
2381 wiggle file is set to 1 instead of 0.
2383 Fix a bug that every 10M bps, signals in the first 'd' range are
2384 lower than actual. ( Thanks Mali Salmon!)
2387 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2388 Version 1.3.3 (wiggle bugs fixed)
2392 Fix bugs while generating wiggle files. 1. 'span=' is added to
2393 'variableStep' line; 2. previously, every 10M bps, the coordinates
2394 were wrongly shifted to the right for 'd' basepairs.
2396 * macs, PeakDetect.py
2398 Add an option to save wiggle files on different resolution.
2400 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2401 Version 1.3.2 (tiny bugs fixed)
2405 Fix 65536 -> 65535. ( Thank Joon)
2409 Improved for binomial function with extra large number. Imported
2410 from Cistrome project.
2414 If treatment channel misses reads in some chromosome included in
2415 control channel, or vice versa, MACS will not exit. (Thank Shaun
2418 Instead, MACS will fake a tag at position -1 when calling
2419 treatment peaks vs control, but will ignore the chromosome while
2420 calling negative peaks.
2422 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2423 Version 1.3.1 (tiny bugs fixed version)
2427 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2428 binomial functions can tolerate large and small numbers.
2432 Parsers now split lines in BED/ELAND file using any
2433 whitespaces. 'track' or 'browser' lines will be regarded as
2434 comment lines. A bug fixed when throwing StrandFormatError. The
2435 maximum redundant tag number at a single position can be no less
2439 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2440 Version 1.3 (naming clarification version)
2442 * Naming clarification changes according to our manuscript:
2444 'frag_len' is changed to 'd'.
2446 'fold_change' is changed to 'fold_enrichment'.
2448 Suggest '--bw' parameter to be determined by users from the real
2451 Maximum FDR is 100% in the output file.
2453 And other clarifications in 00README file and the documents on the
2457 If the redundant tag number at a single position is over 32767,
2458 just remember 32767, instead of raising an overflow exception.
2464 Bug fixed for diagnosis report.
2467 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2472 Poisson distribution CDF and inverse CDF functions are
2473 corrected. They can produce right results even for huge lambda
2474 now. So that the p-value and FDR values in the final excel sheet
2477 IO package now can tolerate some rare cases; ELANDParser in IO
2478 package is fixed. (Thank Bogdan)
2482 Reverse paired peaks in model are rejected. So there will be no
2483 negative 'frag_len'. (Thank Bogdan)
2487 Diagnosis function is completed. Which can output a table file for
2488 users to estimate their sequencing depth.
2491 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2494 * Probe.py is added!
2496 GSL is totally removed from MACS. Instead, I have implemented the
2497 CDF and inverse CDF for poisson and binomial distribution purely
2500 * Constants.py is added!
2502 Organize constants used in MACS in the Constants.py file.
2504 * All other files are modified!
2506 Foldchange calculation is modified. Now the foldchange only be
2507 calculated at the peak summit position instead of the whole peak
2508 region. The values will be higher and more robust than before.
2512 1. MACS can save wiggle format files containing the tag number at
2513 every 10 bp along the genome. Tags are shifted according to our
2514 model before they are calculated.
2516 2. Model building and local lambda calculation can be skipped with
2519 3. A diagnosis report can be generated through '--diag'
2520 option. This report can help you get an assumption about the
2521 sequencing saturation. This funtion is only in beta stage.
2523 4. FDR calculation speed is highly improved.
2525 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2528 * TabIO, PeakModel.py ...
2529 Bug fixed to let MACS tolerate some cases while there is no tag on
2530 either plus strand or minus strand.
2533 Check the version of python. If the version is lower than 2.4,
2534 refuse to install with warning.