1 2024-10-08 Tao Liu <vladimir.liu@gmail.com>
6 1) We implemented the IO module for reading the fragment files
7 usually used in single-cell ATAC-seq experiment
8 `Parser.FragParser`. And we implemented a new
9 `PairedEndTrack.PETrackII` to store the data in fragment file,
10 including the barcodes and counts information. In the `PETrackII`
11 class, we are able to extract a subset using a list of barcodes,
12 which enables us to call peaks only on a pool (pseudo-bulk) of
15 2) We extensively rewrote the `pyx` codes into `py` codes. In
16 another words, we now apply the 'pure python style' with PEP-484
17 type annotations to our previous Cython style codes. So that, the
18 source codes can be more compatible to Python programming tools
19 such as `flake8`. During rewritting, we cleaned the source codes
20 even more, and removed unnecessary dependencies during
25 1) Fix issues in big-endian system in `Parser.py` codes. Enable
26 big-endian support in `BAM.py` codes for accessig certain
27 alignment records that overlap with givin genomic
28 coordinates using BAM/BAI files.
32 1) Explanation on the filtering criteria on SAM/BAM/BAMPE files.
34 2024-09-06 Tao Liu <vladimir.liu@gmail.com>
39 1) Introduce a new emission model for the `hmmratac` function. Now
40 users can choose the simpler Poisson emission `--hmm-type poisson`
41 instead of the default Gaussian emission. As a consequence, the
42 saved HMM model file in json will include the hmm-type information
43 as well. Note that in order to be compatible with the HMM model
44 file from previous version, if there is no hmm-type information in
45 the model file, the hmm-type will be assigned as gaussian. #635
47 2) `hmmratac` now output narrowPeak format output. The summit
48 position and the peak score columns reported in the narrowPeak
49 output represents the position with highest foldchange value
50 (pileup vs average background).
52 3) Add `--cutoff-analysis-steps` and `--cutoff-analysis-max` for
53 `callpeak`, `bdgpeakcall`, and `hmmratac` so that we can
54 have finer resolution of the cutoff analysis report. #636 #642
56 4) Reduce memory usage of `hmmratac` during decoding step, by
57 writing decoding results to a temporary file on disk (file
58 location depends on the environmental TEMP setting), then loading
59 it back while identifying state pathes. This change will decrease
60 the memory usage dramatically. #628 #640
62 5) Fix instructions for preparing narrowPeak files for uploading
63 to UCSC browser, with the `--trackline` option in `callpeak`. #653
65 6) For gappedPeak output, set thickStart and thickEnd columns as
66 0, according to UCSC definition.
70 1) Use `-O3` instead of `-Ofast` for compatibility. #637
74 1) Update instruction to install macs3 through conda/bioconda
76 2) Reorganize MACS3 docs and publish through
77 https://macs3-project.github.io/MACS
79 3) Description on various file formats used in MACS3.
81 2024-02-19 Tao Liu <vladimir.liu@gmail.com>
86 1) Fixed a bug that the `hmmatac` can't correctly save the
87 digested signal files. #605 #611
89 2) Applied a patch to remove cython requirement from the installed
90 system. (it's needed for building the package). #606 #612
92 3) Relax the testing script while comparing the peaks called from
93 current codes and the standard peaks. To implement this, we added
94 'intersection' function to 'Regions' class to find the
95 intersecting regions of two Regions object (similar to PeakIO but
96 only recording chromosome, start and end positions). And we
97 updated the unit test 'test_Region.py' then implemented a script
98 'jaccard.py' to compute the Jaccard Index of two peak files. If
99 the JI > 0.99 we would think the peaks called and the standard
100 peaks are similar. This is to avoid the problem caused by
101 different Numpy/SciPy/sci-kit learn libraries, when certain peak
102 coordinates may have 10bps difference. #615 #619
104 4) Due to the changes in scikit-learn 1.3.0:
105 https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
106 0.3 uses Kmeans will end up with inconsistent results between
107 sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
108 hmm.GaussianHMM and adjusted the standard output from `hmmratac`
109 subcommand. The change is based on
110 https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
111 the random seeding of KMeans 10 times. Now the `hmmratac` results
112 should be more consistent (at least JI>0.99). #615 #620
116 1) We added some dependencies to MACS3. `hmmratc` subcommand needs
117 `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
118 `scikit-learn` needs `scipy`. Since major releases have happened
119 for both`scipy` and `scikit-learn`, we have to set specific
120 version requirements for them in order to make sure the output
121 results from `hmmratac` are consistent.
123 2) We updated our documentation website using
124 Sphinx. https://macs3-project.github.io/MACS/
126 2023-11-15 Tao Liu <vladimir.liu@gmail.com>
129 1) Call variants in peak regions directly from BAM files. The
130 function was originally developed under code name SAPPER. Now
131 SAPPER has been merged into MACS as the `callvar` command. It can
132 be used to call SNVs and small INDELs directly from alignment
133 files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
134 the DNA sequence at the enriched genomic regions (binding sites or
135 accessible DNA) and to refine the alignment when necessary. We
136 added `simde` as a submodule in order to support fermi-lite
137 library under non-x64 architectures.
139 2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
140 a dedicated software to analyze ATAC-seq data. The basic idea
141 behind HMMRATAC is to digest ATAC-seq data according to the
142 fragment length of read pairs into four signal tracks: short
143 fragments, mono-nucleosomal fragments, di-nucleosomal fragments
144 and tri-nucleosomal fragments. Then integrate the four tracks
145 again using Hidden Markov Model to consider three hidden states:
146 open region, nucleosomal region, and background region. The
147 orginal paper was published in 2019 written in JAVA, by Evan
148 Tarbell. We implemented it in Python/Cython and optimize the whole
149 process using existing MACS functions and hmmlearn. Now it can run
150 much faster than the original JAVA version. Note: evaluation of
151 the peak calling results is still underway.
153 3) Speed/memory optimization. Use the cykhash to replace python
154 dictionary. Use buffer (10MB) to read and parse input file (not
155 available for BAM file parser). And many optimization tweaks. We
156 added memory monitoring to the runtime messages.
158 4) R wrappers for MACS -- MACSr for bioconductor.
160 5) Code cleanup. Reorganize source codes.
164 7) Switch to Github Action for CI, support multi-arch testing
165 including x64, armv7, aarch64, s390x and ppc64le. We also test on
168 8) MACS tag-shifting model has been refined. Now it will use a
169 naive peak calling approach to find ALL possible paired peaks at +
170 and - strand, then use all of them to calculate the
171 cross-correlation. (a related bug has been fix
172 [#442](https://github.com/macs3-project/MACS/issues/442))
174 9) BAI index and random access to BAM file now is
175 supported. [#449](https://github.com/macs3-project/MACS/issues/449).
177 10) Support of Python > 3.10
178 [#498](https://github.com/macs3-project/MACS/issues/498)
180 11) The effective genome size parameters have been updated
182 deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
184 12) Multiple updates regarding dependencies, anaconda built, CI/CD
187 13) Cython 3 is supported.
189 14) Documentations for each subcommand can be found under /docs
193 1) Missing header line while no peaks can be called
194 [#501](https://github.com/macs3-project/MACS/issues/501)
195 [#502](https://github.com/macs3-project/MACS/issues/502)
197 2) Note: different numpy, scipy, sklearn may give slightly
198 different results for hmmratac results. The current standard
199 results for automated testing in `/test` directory are from Numpy
200 1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
202 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
207 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
210 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
215 1) MACS2 has been tested on multiple architectures to make sure it
216 can successfully generate consistent results. Currently the
217 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
218 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
219 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
220 and #370. The lesson is that if the project is built on Cython and
221 is aimed at memory efficiency, we should specifically define all
222 int/float types in pyx files such as int8_t or uint32_t using
223 either libc or numpy (c version) instead of relying on Cython
224 types such as short, long, double.
226 2) MACS2 setup script will check numpy and install numpy if
227 necessary. PR #378, issue #364
229 3) `bdgbroadcall` command will correctly add the score column (5th
230 column). The score (5th) column contains 10 times of the average
231 score in the broad region. PR #373, issue #362
233 4) The missing test on `bdgopt` subcommand has been added. PR #363
235 5) The obsolete option `--ratio` from `callpeak` subcommand has
236 been removed. PR #369, issue #366
238 6) Fixed the incorrect description in README on the 'maximum
239 length of broad region is 4 times of d' to 'maximum gap for
240 merging broad regions is 4 times of tag size by default'. PR #380,
245 1) CODE OF CONDUCT document has been added to MACS2 github
248 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
253 1) Speed up MACS2. Some programming tricks and code cleanup. The
254 filter_dup function replaces separate_dups. The later one was
255 implemented for potentially putting back duplicate reads in
256 certain downstream analysis. However such analysis hasn't been
257 implemented. Optimize the speed of writing bedGraph
258 files. Optimize BAM and BAMPE parsing with pointer casting instead
261 2) The comment lines in the headers of BED or SAM files will be
262 correctly skipped. However, MACS2 won't check comment lines in the
267 1) Cutoff-analysis in callpeak command. #341
269 2) Issues related to SAMParser and three ELAND Parsers are
274 1) cmdlinetest script in test/ folder has been updated to: 1. test
275 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
276 and after the error or warning message during tests; 3. output
277 only the first 10 lines if the difference between test result and
278 standard result can be found; 4. prockreport monitor CPU time and
279 memory usage in 1 sec interval -- a bit more accurate.
281 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
283 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
284 MACS version 2.2.5 (Py3 speed up)
288 1) *Github code only and Not included in MACS2 release* New
289 testing data for performance test. An subsampled ENCODE2 CTCF
290 ChIP-seq dataset, including 5million ChIP reads and 5 million
291 control reads, has been included in the test folder for testing
292 CPU and memory usage (i.e. 5M test). Several related scripts ,
293 including `prockreport` for output cpu memory usage, `pyprofile`
294 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
297 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
298 The old hashtable.pyx implementation copied from Pandas (very old
299 version) doesn't work well in Python3+Cython. It slows down the
300 pqtable checkup using the identical Cython codes as in
301 v2.1.4. While running 5M test, the `__getitem__` function in the
302 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
303 148.6s with the same number of calls in MACS2 v2.2.4. As a
304 consequence, the standard python dictionary implementation has
305 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
306 faster than py2 version, but uses a bit more memory. In general,
307 v2.2.5 can finish 5M reads test in 20% less time than MACS2
308 v2.1.4, but use 15% more memory.
312 1) More Python3 related fixes, e.g. the return value of keys from
316 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
317 MACS version 2.2.4 (Python3)
321 1) First Python3 version MACS2 released.
323 2) Version number 2.2.X will be used for MACS2 in Python3, in
326 3) More comprehensive test.sh script to check the consistency of
327 results from Python2 version and Python3 version.
329 4) Simplify setup.py script since the newest version transparently
330 supports cython. And when cython is not installed by the user,
331 setup.py can still compile using only C codes.
333 5) Fix Signal.pyx to use np.array instead of np.mat.
335 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
340 Github Actions is used together with Travis CI for testing and
347 1) #318 Random score in bdgdiff output. It turns out the sum_v is
348 not initialized as 0 before adding. Potential bugs are fixed in
349 other functions in ScoreTrack and CallPeakUnit codes.
351 2) #321 Cython dependency in setup.py script is removed. And place
352 'cythonzie' call to the correct position.
354 3) A typo is fixed in Github Actions script.
356 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
361 1) Support Docker auto-deploy. PR #309
363 2) Support Travis CI auto-testing, update unit-testing
364 scripts, and enable subcommand testing on small datasets.
366 3) Update README documents. #297 PR #306
368 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
369 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
371 5) `--d-min` option is added in `callpeak` and `predictd`, to
372 exclude predictions of fragment size smaller than the given
373 value. Merged from PR #267 @shouldsee.
375 6) `--buffer-size` option is added in `predictd`, `filterdup`,
376 `pileup` and `refinepeak` subcommands. Users can use this option
377 to decrease memory usage while there are a large number of contigs
378 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
379 `pileup` and `refinepeak` will suggest users to tweak
380 `--buffer-size` while catching a MemoryError. #313 PR #314
384 1) #265 Fixed a bug where the pseudocount hasn't been applied
385 while calculating p-value score in ScoreTrack object.
387 2) Fixed bdgbroadcall so that it will report those broad peaks
388 without strong peak inside, a consistent behavior as `callpeak
391 3) Rename COPYING to LICENSE.
393 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
398 1) Added missing BEDPE support. And enable the support for BAMPE
399 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
400 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
401 will pile up the whole fragment defined by mapping locations of
402 the left end and right end of each read pair. Thank @purcaro
404 2) Added options to callpeak command for tweaking max-gap and
405 min-len during peak calling. Thank @jsh58!
407 3) The callpeak option "--to-large" option is replaced with
410 4) The randsample option "-t" has been replaced with "-i".
414 1) Fixed memory issue related to #122 and #146
416 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
418 3) Fixed a bug while setting commandline qvalue cutoff.
420 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
422 5) Fixed the calculation of average fragment length for paired-end
425 6) Fixed bugs caused by khash while computing p/q-value and log
426 likelihood ratios. Thank @jsh58
428 7) More spelling tweaks in source code. Thank @mr-c
430 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
431 MACS version 2.1.1 20160309
435 * Fixed spelling. Merged pull request #120. Thank @mr-c!
437 * Change filtering criteria for reading BAM/SAM files
439 Related to callpeak and filterdup commands. Now the
440 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
441 still be read although MACS2 may decide them as duplicates
442 later. Related to old issue #33. Sorry I forgot to address it for
445 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
446 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
450 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
451 the former option is not supported by older GCC. Related to issues
454 2) Issue #108 is fixed. If no peak can be found in a chromosome,
455 the PeakIO won't throw an error.
461 a) A more flexible format, BEDPE, is supported. Now users can
462 define the left and right position of the ChIPed fragment, and
463 MACS2 will skip model building and directly pileup the
464 fragments. Related to issue #112.
466 b) The 'tempdir' can be specified, to save cached pileup
467 tracks. Originially, the temporary files were stored in
468 /tmp. Thank @daler! Related to issues #97 and #105.
472 New operations are added, to calculate the maximum or minimum value between
473 values in BEDGRAPH and given value.
477 New method is added, to calculate the maximum value between values
478 defined in two BEDGRAPH files.
480 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
481 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
485 1) Fix a bug while dealing with some chromosomes only containing
486 one read (pair). The size of dup_plus/dup_minus arrays after
487 filtering dups should +1.
489 2) Fix a bug related to the broad peak calling function in
490 previous versions. The gaps were miscalculated, so segmented weak
491 broad calls may be reported, and sometimes you would see peaks
492 with lower than cutoff values in the output files.
494 3) "Potentially" Fixed issue #105 on temporary cache files, need
498 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
499 MACS version 2.1.0 20150731 (tag:rc)
503 1) Fixed issue #76: information about broad/narrow cutoff will be
506 2) Fixed issue #79: bdgopt extparam option is fixed.
508 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
509 for filterdup command.
511 4) Fixed issue #78, #88 and similar issue reported in MACS google
512 group: MACS2 now can correctly deal with multiple alignment files
513 for -t or -c. The 'finalize' function will be correctly
514 called. Multiple files option is enabled for filterdup,
515 randsample, predictd, pileup and refinepeak commands.
517 5) A related issue to #88, when BAMPE mode is used, PE pairs will
518 be sorted by leftmost then rightmost ends.
520 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
521 array. This will cause 'callpeak --nolambda' hang forever while
522 calculating pvalues and qvalues.
524 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
525 MACS version 2.1.0 20150420 (tag:rc)
529 1) bdgopt: some convenient functions to modify bedGraph files.
531 2) cmbreps: Combine scores from two replicates. Including three
532 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
533 method to combine two p-value scores. After that, user can use
534 bdgpeakcall to call peaks on combined scores.
538 1) callpeak and bdgpeakcall now can try to analyze the
539 relationship between p-values and number/length of peaks then
540 generate a summary to help users decide an appropriate cutoff.
542 2) callpeak now can accept fold-enrichment cutoff as a filter for
547 Now MACS2 runs about 3X as fast as previous version. Trade
548 clean python codes for speed... Now while processing 50M ChIP vs
549 50M control, it will take only 10 minutes.
553 1) Sampling function in BAMPE mode.
555 2) Callpeak while there are >= 2 input files for -t or -c.
557 3) While reading BAM/SAM, those secondary or supplementary
558 alignments will be correctly skipped.
560 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
561 option that MACS2 will discard those SAM/BAM alignments with bit
562 1024 no matter how --keep-dup is set.
564 5) Fixed issue #49: setuptools is used intead of distutils
566 6) Fixed issue #51: fix the problem when using --trackline
567 argument when control file is absent.
569 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
570 read mapped to minus strand. Previous implementation will find
571 incorrect 5' end if there is indel in alignment.
573 8) Fixed issue #56: An incorrect sorting method used for BAMPE
574 mode which will cause incorrect filtering of duplicated reads. Now
577 9) Issue #63: Merged from jayhesselberth@github, extsize now can
580 10) Issue #71: Merged from aertslab@github, close file descriptor
581 after creating them with mkstemp().
583 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
584 MACS version 2.1.0 20140616 (tag:rc)
588 "--ratio" is added to manually assign the scaling factor of ChIP
589 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
590 implementing the patch file!
592 "--shift" is added to move cutting ends (5' end of reads) around,
593 in order to process DNAse-Seq data, e.g., use "--shift -100
594 --extsize 200" to get 200bps fragments around 5' ends. For general
595 ChIP-Seq data analysis, this option should be always set as
596 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
599 ** Do not output negative fragment size from cross-correlation
600 analysis. Thank Alvin Qin for the feedback!
602 ** --half-ext and --control-shift are removed. For complex read
603 shifting and extending, combine '--shift' and '--extsize'
604 options. For comparing two conditions, use 'bdgdiff' module
607 ** a bug is fixed to output the last pileup value in bdg file
612 A 'dry-run' option is added to only output numbers, including the
613 number of allowed duplicates, the total number of reads before and
614 after filtering duplicates and the estimated duplication
615 rate. Thank John Urban for the suggestion!
618 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
619 MACS version 2.0.10 20131216 (tag:alpha)
623 * We changed license from Artistic License to 3-clauses BSD license.
625 Yes. Simpler the better.
627 * Process paired-end data with "-f BAMPE" without control
629 * GappedPeak output for --broad option has been fixed again to be
630 consistent with official UCSC format. We add 1bp pseudo-block to
631 left and/or right of broad region when necessary, so that you can
632 virtualize the regions without strong enrichment inside
633 successfully. In downstream analysis except for virtualization,
634 you may need to remove all 1bps blocks from gappedPeak file.
636 * diffpeak subcommand is temporarily disabled. Till we
639 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
640 MACS version 2.0.10 20131028 (tag:alpha)
642 * callpeak --call-summits improvement
644 The smoothing window length has been fixed as fragment length
645 instead of short read length. The larger smoothing window will
646 grant better smoothing results and better sub-peak summits
649 * --outdir and --ofile options for almost all commands
651 Thank Björn Grüning for initially implementing these options!
652 Now, MACS2 will save results into a specified
653 directory by '--outdir' option, and/or save result into a
654 specified file by '--ofile' option. Note, in case '--ofile' is
655 available for a subcommand, '-o' now has been adjusted to be the
656 same as '--ofile' instead of '--o-prefix'.
658 Here is the list of changes. For more detail, use 'macs2 xxx -h'
661 ** callpeak: --outdir
662 ** diffpeak: Not implemented
663 ** bdgpeakcall: --outdir and --ofile
664 ** bdgbroadcall: --outdir and --ofile
665 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
666 and the order of arguments for --ofile must be the same as for -m.
667 ** bdgdiff: --outdir and --ofile
668 ** filterdup: --outdir
670 ** randsample: --outdir
671 ** refinepeak: --outdir and --ofile
674 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
675 MACS version 2.0.10 20130915 (tag:alpha)
677 * callpeak Added a new option --buffer-size
679 This option is to tweak a previously hidden parameter that
680 controls the steps to increase array size for storing alignment
681 information. While in some rare cases, the number of
682 chromosomes/contigs/scaffolds is huge, the original default
683 setting will cause a huge memory waste. In these cases, we
684 recommend to decrease --buffer-size (e.g., 1000) to save memory,
685 although the decrease will slow process to read alignment files.
687 * an optimization to speed up pvalue-qvalue statistics
689 Previously, it took a hour to prepare p-q-table for 65M vs 65M
690 human TF library, and now it will take 10 minutes. It was due to a
691 single line of code to get a value from a numpy array ...
695 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
696 MACS version 2.0.10 20130731 (tag:alpha)
698 * callpeak --call-summits
700 Fix bugs causing callpeak --call-summits option generating extra
701 number of peaks and inconsistent peak boundaries comparing to
702 default option. Thank Ben Levinson!
706 Fix bugs causing bdgcmp output logLR all in positive values. Now
707 'depletion' can be correctly represented as negative values.
711 Fix the behavior of bdgdiff module. Now it can take four
712 bedGraph files, then use logLR as cutoff to call differential
713 regions. Check command line of bdgdiff for detail.
715 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
716 MACS version 2.0.10 20130713 (tag:alpha)
718 * fix bugs while output broadPeak and gappedPeak.
720 Note. Those weak broad regions without any strong enrichment
721 regions inside won't be saved in gappedPeak file.
723 * bdgcmp -T and -C are merged into -S and description is updated.
725 Now, you can use it to override SPMR values in your input for
726 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
727 statistics will cause weird results ( in most cases, lower
728 significancy), and won't be consistent with MACS2 callpeak
729 behavior. So if you have SPMR bedGraphs, input the smaller/larger
730 sample size in MILLION according to 'callpeak --to-large' option.
732 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
733 MACS version 2.0.10 20130710 (tag:alpha)
735 * fix BED style output format of callpeak module:
737 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
738 the output. Old BED format file won't be saved.
740 2) with --broad: broadPeak (BED6+3) for broad region and
741 gappedPeak (BED12+3) for chained enriched regions will be the
742 output. Old BED format, narrowPeak format, summit file won't be
745 * bdgcmp now can accept list of methods to calculate scores. So
746 you can run it once to generate multiple types of scores. Thank
747 Jon Urban for this suggestion!
749 * C codes are re-generated through Cython 0.19.1.
751 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
752 MACS version 2.0.10 20130520 (tag:alpha)
754 * broad peak calling modules are modified in order to report all
755 relexed regions even there is no strong enrichment inside.
757 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
758 MACS version 2.0.10 20130501 (tag:alpha)
760 * Memory usage is decreased to about 1/4-1/5 of previous usage
761 Now, the internal data structure and algorithm are both
762 re-organized, so that intermediate data wouldn't be saved in
763 memory. Intead they will be calculated on the fly. New MACS2 will
764 spend longer time (1.5 to 2 times) however it will use less memory
765 so can be more usable on small mem servers.
767 * --seed option is added to callpeak and randsample commands
768 Thank Mathieu Gineste for this suggestion!
770 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
771 MACS version 2.0.10 20130306 (tag:alpha)
773 * diffpeak module New module to detect differential binding sites
774 with more statistics.
776 * Introduced --refine-peaks
777 Calculates reads balancing to refine peak summits
779 * Ouput file names prefix
780 Correct encodePeak to narrowPeak, broadPeak to bed12.
782 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
783 MACS version 2.0.10 (tag:alpha not released)
785 * Introduced BAMPEParser
786 Reads PE data directly, requires bedtools for now
788 * Introduced --call-summits
789 Uses signal processing methods to call overlapping peaks
791 * Added --no-trackline
792 By default, files have descriptive tracklines now
794 * new refinepeak command (experimental)
795 This new function will use a similar method in SPP (wtd), to
796 analyze raw tag distribution in peak region, then redefine the
797 peak summit where plus and minus tags are evenly distributed
800 * Changes to output *
801 cPeakDetect.pyx has full support for new print/write methods and
802 --call-peaks, BAMPEParser, and use of paired-end data
804 * Parser optimization
806 cParser.pyx is rewritten to use io.BufferedReader to speed
807 up. Speed is doubled.
809 Code is reorganized -- most of functions are inherited from
812 * Use cross-correlation to calculate fragment size
814 First, all pairs will be used in prediction for fragment
815 size. Previously, only no more than 1000 pairs are used. Second,
816 cross-correlation is used to find the best phase difference
817 between + and - tag pileups.
819 * Speed up p-value and q-value calculation
821 This part is ten times faster now. I am using a dictionary to
822 cache p-value results from Poisson CDF function. A bit more memory
823 will be used to increase speed. I hope this dictionary would not
824 explode since the possible pairs of ChIP signal and control lambda
825 are hugely redundant. Also, I rewrited part of q-value
828 * Speed up peak detection
830 This part is about hundred of times faster now. Optimizations
831 include using Numpy functions as much as possible, and making loop
832 body as small as possible.
834 * Post-processing on differential calls
836 After macs2diff finds differential binding sites between two
837 conditions, it will try to annotate the peak calls from one of two
838 conditions, describe the changes ...
840 * Fragment size prediction in macs2diff
842 Now by default, macs2diff will try to use the average fragment
843 size from both condition 1 and condition 2 for tag extension and
844 peak calling. Previously, by default, it will use different sizes
845 unless --nomodel is specified.
847 Technically, I separate model building processes out. So macs2diff
848 will build fragment sizes for condition 1 and 2 in parallel (2
849 processes maximum), then perform 4-way comparisons in parallel (4
854 Combine two p/qscore tracks together. At regions where condition 1
855 is higher than condition 2, score would be positive, otherwise,
858 * SAMParser and BAMParser
860 Bug fixed for paired-end sequencing data.
864 Fixed a bug while calling peaks from BedGraph file. It previously
865 mistakenly output same peaks multiple times at the end of
868 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
869 MACS version 2.0.9 (tag:alpha)
871 * Auto fixation on predicted d is turned off by default!
873 Previous --off-auto is now default. MACS will not automatically
874 fix d less than 2 times of tag size according to
875 --shiftsize. While tag size is getting longer nowadays, it would
876 be easier to have d less than 2 times of tag size, however d may
877 still be meaningful and useful. Please judge it using your own
882 Now, the default scaling while treatment and input are unbalanced
883 has been adjusted. By default, larger sample will be scaled down
884 linearly to match the smaller sample. In this way, background
885 noise will be reduced more than real signals, so we expect to have
886 more specific results than the other way around (i.e. --to-large
889 Also, an alternative option to randomly sample larger data
890 (--down-sample) is provided to replace default linear
891 scaling. However, this option will cause results irresproducible,
896 A new script 'randsample' is added, which can randomly sample
897 certain percentage or number of tags.
901 Now, MACS will decide peak summits according to pileup height
902 instead of qvalue scores. In this way, the summit may be more
907 MACS calculate qvalue scores as differential scores. When compare
908 two conditions (saying A and B), the maximum qscore for comparing
909 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
910 will be computed. If maxqscore_a2b is bigger, the diff score is
911 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
913 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
914 MACS version 2.0.8 (tag:alpha)
916 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
918 New script bdgbroadcall and the extra option '--broad' for macs2
919 script, can be used to call broad regions with a loose cutoff to
920 link nearby significant regions. The output is represented as
923 * MACS2/IO/cScoreTrack.pyx
925 Fix q-value calculation to generate forcefully monotonic values.
927 * bin/eland*2bed, bin/sam2bed and bin/filterdup
929 They are combined to one more powerful script called
930 "filterdup". The script filterdup can filter duplicated reads
931 according to sequencing depth and genome size. The script can also
932 convert any format supported by MACS to BED format.
934 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
935 MACS version 2.0.7 (tag:alpha)
937 * bin/macsdiff renamed to bin/bdgdiff
939 Now this script will work as a low-level finetuning tool as bdgcmp
944 A new script to take treatment and control files from two
945 condition, calculate fragment size, use local poisson to get
946 pvalues and BH process to get qvalues, then combine 4-ways result
947 to call differential sites.
949 This script can use upto 4 cpus to speed up 4-ways calculation. (
950 I am trying multiprocessing in python. )
952 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
953 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
954 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
956 All above files are modified for the new macs2diff script.
958 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
960 Now q-value 0.01 is the default cutoff. If -p is specified,
961 p-value cutoff will be used instead.
963 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
964 MACS version 2.0.6 (tag:alpha)
968 A script to call differential regions. A naive way is introduced
969 to find the regions where:
971 1. signal from condition 1 is larger than input 1 and condition 2 --
972 unique region in condition 1;
973 2. signal from condition 2 is larger than input 2 and condition 1
974 -- unique region in condition 2;
975 3. signal from condition 1 is larger than input 1, signal from
976 condition 2 is larger than input 2, however either signal from
977 condition 1 or 2 is not larger than the other.
979 Here 'larger' means the pvalue or qvalue from a Poisson test is
980 under certain cutoff.
982 (I will make another script to wrap up mulitple scripts for
983 differential calling)
985 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
986 MACS version 2.0.5 (tag:alpha)
988 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
991 Use hash to store peak information. Add back the feature to deal
992 with data without control.
994 Fix bug which incorrectly allows small peaks at the end of
997 * bin/bdgpeakcall, bin/bdgcmp
999 Fix bugs. bdgpeakcall can output encodePeak format.
1001 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1002 MACS version 2.0.4 (tag:alpha)
1006 Fix a bug, correctly assign lambda_bg while --to-small is
1007 set. Thanks Junya Seo!
1009 Add rank and num of bp columns to pvalue-qvalue table.
1013 Fix bugs to correctly deal with peakless chromosomes. Thanks
1016 Use AFDR for independent tests instead.
1020 Now MACS can output peak coordinates together with pvalue, qvalue,
1021 summit positions in a single encodePeak format (designed for
1022 ENCODE project) file. This file can be loaded to UCSC
1023 browser. Definition of some specific columns are: 5th:
1024 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1025 -log10qvalue, 10th: relative summit position to peak start.
1028 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1029 MACS version 2.0.3 (tag:alpha)
1031 * Rich output with qvalue, fold enrichment, and pileup height
1033 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1036 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1038 Now we have a similiar xls output file as before. The differences
1039 from previous file are:
1041 1. Summit now is absolute summit, instead of relative summit
1043 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1044 pileup at the peak summit;
1045 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1046 5.00 means 1e-5, simple and less confusing.
1047 4. FDR column becomes '-log10(qvalue)' column.
1048 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1049 the values at the peak summit.
1051 * Extra output files
1053 NAME_pqtable.txt contains pvalue and qvalue relationships.
1055 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1056 and -log10qvalue scores in BedGraph format. Nearby regions with
1057 the same value are not merged.
1059 * Separation of FeatIO.py
1061 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1062 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1063 implemented to store pileup, local lambda, pvalue, and qvalue
1064 alltogether in cScoreTrack.pyx.
1066 * Experimental option --half-ext
1068 Suggested by NPS algorithm, I added an experimental option
1069 --half-ext to let MACS only extends ChIP fragment around its
1070 middle point for only 1/2 d.
1072 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1073 MACS version 2.0.2 (tag:alpha)
1077 Add an error check to see if there is no common chromosome names
1078 from treatment file and control file
1080 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1082 Reduce memory usage by removing deepcopy() calls.
1084 * Modify README documents and others.
1086 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1087 MACS Version 2.0.1 (tag:alpha)
1089 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1091 Jie suggested me a brilliant simple method to pileup fragments
1092 into bedGraph track. It works extremely faster than the previous
1093 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1094 large local lambda calculation in MACSv2 now. Now I generate three
1095 bedGraphs for d-size local bias, slocal-size and llocal-size local
1096 bias, and calculate the maximum local bias as local lambda
1099 Minor: add_loc in bedGraphTrackI now can correctly merge the
1100 region with its preceding region if their value are the same.
1104 Add an option to shift control tags before extension. By default,
1105 control tags will be extended to both sides regardless of strand
1108 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1109 MACS Version 2.0.0 (tag:alpha)
1111 * Use bedGraph type to store data internally and externally.
1113 We can have theoretically one-basepair resolution profiles. 10
1114 times smaller in filesize and even smaller after converting to
1115 bigWig for visualization.
1117 * Peak calling process modified. Better peak boundary detection.
1119 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1120 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1121 one will be averaged to d size) Then calculate the maximum value
1122 of these two tracks and a global background, to have a
1123 local-lambda bedGraph.
1125 Use -10log10poisson_pvalue as scores to generate a score track
1126 before peak calling.
1128 A general peak calling based on a score cutoff, min length of peak
1129 and max gap between nearby peaks.
1133 Wiggle file output is removed. Now we only support bedGraph
1134 output. The generation of bedGraph is highly recommended since it
1135 will not cost extra time. In other words, bedGraph generation is
1136 internally run even you don't want to save bedGraphs on disk, due
1137 to the peak calling algorithm in MACS v2.
1141 We now can calculate poisson pvalue in log space so that the score
1142 (-10*log10pvalue) will not have a upper limit of 3100 due to
1143 precision of float number.
1145 * Cython is adopted to speed up Python code.
1147 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1150 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1152 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1153 Version 1.4.0rc2 (Valentine)
1155 * --single-wig option is renamed to --single-profile
1157 * BedGraph output with --bdg or -B option.
1159 The BedGraph output provides 1bp resolution fragment pileup
1160 profile. File size is smaller than wig file. This option can be
1161 combined with --single-profile option to produce a bedgraph file
1162 for the whole genome. This option can also make --space,
1163 --call-subpeaks invalid.
1165 * Fix the description of --shiftsize to correctly state that the
1166 value is 1/2 d (fragment size).
1168 * Fix a bug in the call to __filter_w_control_tags when control is
1171 * Fix a bug on --to-small option. Now it works as expected.
1173 * Fix a bug while counting the tags in candidate peak region, an
1174 extra tag may be included. (Thanks to Jake Biesinger!)
1176 * Fix the bug for the peaks extended outside of chromosome
1177 start. If the minus strand tag goes outside of chromosome start
1178 after extension of d, it will be thrown out.
1180 * Post-process script for a combined wig file:
1182 The "wignorm" command can be called after a full run of MACS14 as
1183 a postprocess. wignorm can calculate the local background from the
1184 control wig file from MACS14, then use either foldchange,
1185 -10*log10(pvalue) from possion test, or difference after asinh
1186 transformation as the score to build a single wig track to
1187 represent the binding strength. This script will take a
1188 significant long time to process.
1190 * --wigextend has been obsoleted.
1192 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1193 Version 1.4.0rc1 (Starry Sky)
1195 * Duplicate reads option
1197 --keep-dup behavior is changed. Now user can specify how many
1198 reads he/she wants to keep at the same genomic location. 'auto' to
1199 let MACS decide the number based on binomial distribution, 'all'
1200 to let MACS keep all reads.
1202 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1204 By default, MACS will now scale the smaller dataset to the bigger
1205 dataset. For instance, if IP has 10 million reads, and Input has 5
1206 million, MACS will double the lambda value calculated from Input
1207 reads while calling BOTH the positive peaks and negative
1208 peaks. This will address the issue caused by unbalanced numbers of
1209 reads from IP and Input. If --to-small is turned on, MACS will
1210 scale the larger dataset to the smaller one. So from now on, if d
1211 is fixed, then the peaks from a MACS call for A vs B should be
1212 identical to the negative peaks from a B vs A.
1214 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1215 Version 1.4.0beta (summer wishes)
1221 The default behavior in the model building step is slightly
1222 changed. When MACS can't find enough pairs to build model
1223 (implemented in alpha version) or the modeled fragment length is
1224 less than 2 times of tag length (implemented in beta version),
1225 MACS will use 2 times of --shiftsize value as fragment length in
1226 the later analysis. --off-auto can turn off this default behavior.
1228 ** Redundant tag filtering
1230 The IO module is rewritten. The redundant tag filtering process
1231 becomes simpler and works as promise. The maximum allowed number
1232 of tags at the exact same location is calculated from the
1233 sequencing depth and genome size using a binomial distribution,
1234 for both TREAMENT and CONTROL separately. ( previously only
1235 TREATMENT is considered ) The exact same location means the same
1236 coordination and the same strand. Then MACS will only keep at most
1237 this number of tags at the exact same location in the following
1238 analysis. An option --keep-dup can let MACS skip the filtering and
1239 keep all the tags. However this may bring in a lot of sequencing
1240 bias, so you may get many false positive peaks.
1242 ** Single wiggle mode
1244 First thing to mention, this is not the score track that I
1245 described before. By default, MACS generates wiggle files for
1246 fragment pileup for every chromosomes separately. When you use
1247 --single-wig option, MACS will generate a single wiggle file for
1248 all the chromosomes so you will get a wig.gz for TREATMENT and
1249 another wig.gz for CONTROL if available.
1251 ** Sniff -- automatic format detection
1253 Now, by default or "-f AUTO", MACS will decide the input file
1254 format automatically. Technically, it will try to read at most
1255 1000 records for the first 10 non-comment lines. If it succeeds,
1256 the format is decided. I recommend not to use AUTO and specify the
1257 right format for your input files, unless you combine different
1258 formats in a single MACS run.
1262 --single-wig and --keep-dup are added. Check previous section in
1263 ChangeLog for detail.
1265 -f (--format) AUTO is now the default option.
1267 --slocal default: 1000
1268 --llocal default: 10000
1272 Setup script will stop the installation if python version is not
1273 python2.6 or python2.7.
1275 Local lambda calculation has been changed back. MACS will check
1276 peak_region, slocal( default 1K) and llocal (default 10K) for the
1277 local bias. The previous 200bps default will cause MACS misses
1278 some peaks where the input bias is very sharp.
1280 sam2bed.py script is corrected.
1282 Relative pos in xls output is fixed.
1284 Parser for ELAND_export is fixed to pass some of the no match
1285 lines. And elandexport2bed.py is fixed too. ( however I can't
1286 guarantee that it works on any eland_export files. )
1288 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1289 Version 1.4.0alpha2 (be smarter)
1293 --gsize now provides shortcuts for common genomes, including
1294 human, mouse, C. elegans and fruitfly.
1296 --llocal now will be 5000 bps if there is no input file, so that
1297 local lambda doesn't overkill enriched binding sites.
1299 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1300 Version 1.4alpha (be smarter)
1304 --tsize option is redesigned. MACS will use the first 10 lines of
1305 the input to decide the tag size. If user specifies --tsize, it
1306 will override the auto decided tsize.
1308 --lambdaset is replaced by --slocal and --llocal which mean the
1309 small local region and large local region.
1311 --bw has no effect on the scan-window size now. It only affects the
1312 paired-peaks model process.
1316 During the model building, MACS will pick out the enriched regions
1317 which are not too high and not too low to build the paired-peak
1318 model. Default the region is from fold 10 to fold 30. If MACS
1319 fails to build the model, by default it will use the nomodel
1320 settings, like shiftsize=100bps, to shift and extend each
1321 tags. This behavior can be turned off by '--off-auto'.
1325 An extra file including all the summit positions are saved in
1326 *_summits.bed file. An option '--call-subpeaks' will invoke
1327 PeakSplitter developed by Mali Salmon to split wide peaks into
1330 * Sniff ( will in beta )
1332 Automatically recognize the input file format, so use can combine
1333 different format in one MACS run.
1335 Not implemented features/TODO:
1337 * Algorithms ( in near future? )
1339 MACS will try to refine the peak boundaries by calculating the
1340 scores for every point in the candidate peak regions. The score
1341 will be the -10*log(10,pvalue) on a local poisson distribution. A
1342 cutoff specified by users (--pvalue) will be applied to find the
1343 precise sub-peaks in the original candidate peak region. Peak
1344 boudaries and peak summits positions will be saved in separate BED
1347 * Single wiggle track ( in near future? )
1349 A single wiggle track will be generated to save the scores within
1350 candidate peak regions in the 10bps resolution. The wiggle file
1351 is in fixedStep format.
1354 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1355 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1359 Fixed typo. FCSTEP -> FESTEP
1363 The 'femax' attribute bug is fixed
1365 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1366 Version 1.3.7 (Oktoberfest)
1368 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1370 Enhancements by Peter Chines:
1372 1. gzip files are supported.
1373 2. when --diag is on, user can set the increment and endpoint for
1374 fold enrichment analysis by setting --fe-step and --fe-max.
1376 Enhancements by Davide Cittaro:
1378 1. BAM and SAM formats are supported.
1379 2. small changes in the header lines of wiggle output.
1382 1. I added --fe-min option;
1383 2. Bowtie ascii output with suffix ".map" is supported.
1387 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1388 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1389 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1390 4. Some "fold change" have been changed to "fold enrichment".
1392 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1393 Version 1.3.6.1 (default parameter change)
1395 * bin/macs, lib/PeakDetect.py
1397 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1398 default. "--futurefdr" is added which can turn on the 'new' method
1399 introduced in 1.3.6. By default it's off.
1403 Fixed a bug. p-value is corrected a little bit.
1406 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1407 Version 1.3.6 (Birthday cake)
1411 "track name" is added to the header of BED output file.
1413 Now the default peak detection method is to consider 5k and 10k
1414 nearby regions in treatment data and peak location, 1k, 5k, and
1415 10k regions in control data to calculate local bias. The old
1416 method can be called through '--old' option.
1418 Information about how many total/unique tags in treatment or
1419 control will be saved in final .xls output.
1421 * lib/IO/__init__.py
1423 ".fa" will be removed from input tag alignment so only the
1424 chromosome names are kept.
1426 WigTrackI class is added for Wiggle like data structure. (not used
1429 The parser for ELAND multi PET files has been fixed. Now the 5'
1430 tag position for a pair will be kept, whereas in the previous
1431 version, the middle points are kept.
1433 * lib/IO/BinKeeper.py
1435 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1436 browser, which can quickly access certain region for values in a
1437 large wiggle like data file. (not used now)
1439 * lib/OptValidator.py
1445 Now the default peak detection method is to consider 5k and 10k
1446 nearby regions in treatment data and peak location, 1k, 5k, and
1447 10k regions in control data to calculate local bias. The old
1448 method can be called through '--old' option.
1450 Two columns have beed added to BED output file. 4th column: peak
1451 name; 5th column: peak score using -10log(10,pvalue) as score.
1455 Add support to build a Mac App through 'setup.py py2app', or a
1456 Windows executable through 'setup.py py2exe'. You need to install
1457 py2app or py2exe package in order to use these functions.
1459 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1460 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1464 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1465 in control data to calculate local lambda for each peak. Peak
1466 calling results will be slightly different with previous version,
1471 Typo fixed, ELANDParser -> ELANDResultParser
1475 Now, modeled d value will be shown on the model figure.
1477 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1478 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1480 * macs, IO/__init__.py, PeakDetect.py
1482 Add support for ELAND multi format. Add support for Pair-End
1483 experiment, in this case, 5'end and 3'end ELAND multi format files
1484 are required for treatment or control data. See 00README file for
1487 Add wigextend option.
1489 Add petdist option for Pair-End Tag experiment, which is the best
1490 distance between 5' and 3' tags.
1494 Fixed a bug which cause the end positions of every peak region
1495 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1499 Fix bugs while generating wiggle files. The start position of
1500 wiggle file is set to 1 instead of 0.
1502 Fix a bug that every 10M bps, signals in the first 'd' range are
1503 lower than actual. ( Thanks Mali Salmon!)
1506 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1507 Version 1.3.3 (wiggle bugs fixed)
1511 Fix bugs while generating wiggle files. 1. 'span=' is added to
1512 'variableStep' line; 2. previously, every 10M bps, the coordinates
1513 were wrongly shifted to the right for 'd' basepairs.
1515 * macs, PeakDetect.py
1517 Add an option to save wiggle files on different resolution.
1519 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1520 Version 1.3.2 (tiny bugs fixed)
1524 Fix 65536 -> 65535. ( Thank Joon)
1528 Improved for binomial function with extra large number. Imported
1529 from Cistrome project.
1533 If treatment channel misses reads in some chromosome included in
1534 control channel, or vice versa, MACS will not exit. (Thank Shaun
1537 Instead, MACS will fake a tag at position -1 when calling
1538 treatment peaks vs control, but will ignore the chromosome while
1539 calling negative peaks.
1541 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1542 Version 1.3.1 (tiny bugs fixed version)
1546 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1547 binomial functions can tolerate large and small numbers.
1551 Parsers now split lines in BED/ELAND file using any
1552 whitespaces. 'track' or 'browser' lines will be regarded as
1553 comment lines. A bug fixed when throwing StrandFormatError. The
1554 maximum redundant tag number at a single position can be no less
1558 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1559 Version 1.3 (naming clarification version)
1561 * Naming clarification changes according to our manuscript:
1563 'frag_len' is changed to 'd'.
1565 'fold_change' is changed to 'fold_enrichment'.
1567 Suggest '--bw' parameter to be determined by users from the real
1570 Maximum FDR is 100% in the output file.
1572 And other clarifications in 00README file and the documents on the
1576 If the redundant tag number at a single position is over 32767,
1577 just remember 32767, instead of raising an overflow exception.
1583 Bug fixed for diagnosis report.
1586 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1591 Poisson distribution CDF and inverse CDF functions are
1592 corrected. They can produce right results even for huge lambda
1593 now. So that the p-value and FDR values in the final excel sheet
1596 IO package now can tolerate some rare cases; ELANDParser in IO
1597 package is fixed. (Thank Bogdan)
1601 Reverse paired peaks in model are rejected. So there will be no
1602 negative 'frag_len'. (Thank Bogdan)
1606 Diagnosis function is completed. Which can output a table file for
1607 users to estimate their sequencing depth.
1610 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1613 * Probe.py is added!
1615 GSL is totally removed from MACS. Instead, I have implemented the
1616 CDF and inverse CDF for poisson and binomial distribution purely
1619 * Constants.py is added!
1621 Organize constants used in MACS in the Constants.py file.
1623 * All other files are modified!
1625 Foldchange calculation is modified. Now the foldchange only be
1626 calculated at the peak summit position instead of the whole peak
1627 region. The values will be higher and more robust than before.
1631 1. MACS can save wiggle format files containing the tag number at
1632 every 10 bp along the genome. Tags are shifted according to our
1633 model before they are calculated.
1635 2. Model building and local lambda calculation can be skipped with
1638 3. A diagnosis report can be generated through '--diag'
1639 option. This report can help you get an assumption about the
1640 sequencing saturation. This funtion is only in beta stage.
1642 4. FDR calculation speed is highly improved.
1644 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1647 * TabIO, PeakModel.py ...
1648 Bug fixed to let MACS tolerate some cases while there is no tag on
1649 either plus strand or minus strand.
1652 Check the version of python. If the version is lower than 2.4,
1653 refuse to install with warning.
1656 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1657 MACS version 2.0.10 20130731 (tag:alpha)
1659 * callpeak --call-summits
1661 Fix bugs causing callpeak --call-summits option generating extra
1662 number of peaks and inconsistent peak boundaries comparing to
1663 default option. Thank Ben Levinson!
1667 Fix bugs causing bdgcmp output logLR all in positive values. Now
1668 'depletion' can be correctly represented as negative values.
1672 Fix the behavior of bdgdiff module. Now it can take four
1673 bedGraph files, then use logLR as cutoff to call differential
1674 regions. Check command line of bdgdiff for detail.
1676 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1677 MACS version 2.0.10 20130713 (tag:alpha)
1679 * fix bugs while output broadPeak and gappedPeak.
1681 Note. Those weak broad regions without any strong enrichment
1682 regions inside won't be saved in gappedPeak file.
1684 * bdgcmp -T and -C are merged into -S and description is updated.
1686 Now, you can use it to override SPMR values in your input for
1687 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1688 statistics will cause weird results ( in most cases, lower
1689 significancy), and won't be consistent with MACS2 callpeak
1690 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1691 sample size in MILLION according to 'callpeak --to-large' option.
1693 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1694 MACS version 2.0.10 20130710 (tag:alpha)
1696 * fix BED style output format of callpeak module:
1698 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1699 the output. Old BED format file won't be saved.
1701 2) with --broad: broadPeak (BED6+3) for broad region and
1702 gappedPeak (BED12+3) for chained enriched regions will be the
1703 output. Old BED format, narrowPeak format, summit file won't be
1706 * bdgcmp now can accept list of methods to calculate scores. So
1707 you can run it once to generate multiple types of scores. Thank
1708 Jon Urban for this suggestion!
1710 * C codes are re-generated through Cython 0.19.1.
1712 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1713 MACS version 2.0.10 20130520 (tag:alpha)
1715 * broad peak calling modules are modified in order to report all
1716 relexed regions even there is no strong enrichment inside.
1718 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1719 MACS version 2.0.10 20130501 (tag:alpha)
1721 * Memory usage is decreased to about 1/4-1/5 of previous usage
1722 Now, the internal data structure and algorithm are both
1723 re-organized, so that intermediate data wouldn't be saved in
1724 memory. Intead they will be calculated on the fly. New MACS2 will
1725 spend longer time (1.5 to 2 times) however it will use less memory
1726 so can be more usable on small mem servers.
1728 * --seed option is added to callpeak and randsample commands
1729 Thank Mathieu Gineste for this suggestion!
1731 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1732 MACS version 2.0.10 20130306 (tag:alpha)
1734 * diffpeak module New module to detect differential binding sites
1735 with more statistics.
1737 * Introduced --refine-peaks
1738 Calculates reads balancing to refine peak summits
1740 * Ouput file names prefix
1741 Correct encodePeak to narrowPeak, broadPeak to bed12.
1743 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1744 MACS version 2.0.10 (tag:alpha not released)
1746 * Introduced BAMPEParser
1747 Reads PE data directly, requires bedtools for now
1749 * Introduced --call-summits
1750 Uses signal processing methods to call overlapping peaks
1752 * Added --no-trackline
1753 By default, files have descriptive tracklines now
1755 * new refinepeak command (experimental)
1756 This new function will use a similar method in SPP (wtd), to
1757 analyze raw tag distribution in peak region, then redefine the
1758 peak summit where plus and minus tags are evenly distributed
1761 * Changes to output *
1762 cPeakDetect.pyx has full support for new print/write methods and
1763 --call-peaks, BAMPEParser, and use of paired-end data
1765 * Parser optimization
1767 cParser.pyx is rewritten to use io.BufferedReader to speed
1768 up. Speed is doubled.
1770 Code is reorganized -- most of functions are inherited from
1771 GenericParser class.
1773 * Use cross-correlation to calculate fragment size
1775 First, all pairs will be used in prediction for fragment
1776 size. Previously, only no more than 1000 pairs are used. Second,
1777 cross-correlation is used to find the best phase difference
1778 between + and - tag pileups.
1780 * Speed up p-value and q-value calculation
1782 This part is ten times faster now. I am using a dictionary to
1783 cache p-value results from Poisson CDF function. A bit more memory
1784 will be used to increase speed. I hope this dictionary would not
1785 explode since the possible pairs of ChIP signal and control lambda
1786 are hugely redundant. Also, I rewrited part of q-value
1789 * Speed up peak detection
1791 This part is about hundred of times faster now. Optimizations
1792 include using Numpy functions as much as possible, and making loop
1793 body as small as possible.
1795 * Post-processing on differential calls
1797 After macs2diff finds differential binding sites between two
1798 conditions, it will try to annotate the peak calls from one of two
1799 conditions, describe the changes ...
1801 * Fragment size prediction in macs2diff
1803 Now by default, macs2diff will try to use the average fragment
1804 size from both condition 1 and condition 2 for tag extension and
1805 peak calling. Previously, by default, it will use different sizes
1806 unless --nomodel is specified.
1808 Technically, I separate model building processes out. So macs2diff
1809 will build fragment sizes for condition 1 and 2 in parallel (2
1810 processes maximum), then perform 4-way comparisons in parallel (4
1815 Combine two p/qscore tracks together. At regions where condition 1
1816 is higher than condition 2, score would be positive, otherwise,
1819 * SAMParser and BAMParser
1821 Bug fixed for paired-end sequencing data.
1825 Fixed a bug while calling peaks from BedGraph file. It previously
1826 mistakenly output same peaks multiple times at the end of
1829 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1830 MACS version 2.0.9 (tag:alpha)
1832 * Auto fixation on predicted d is turned off by default!
1834 Previous --off-auto is now default. MACS will not automatically
1835 fix d less than 2 times of tag size according to
1836 --shiftsize. While tag size is getting longer nowadays, it would
1837 be easier to have d less than 2 times of tag size, however d may
1838 still be meaningful and useful. Please judge it using your own
1843 Now, the default scaling while treatment and input are unbalanced
1844 has been adjusted. By default, larger sample will be scaled down
1845 linearly to match the smaller sample. In this way, background
1846 noise will be reduced more than real signals, so we expect to have
1847 more specific results than the other way around (i.e. --to-large
1850 Also, an alternative option to randomly sample larger data
1851 (--down-sample) is provided to replace default linear
1852 scaling. However, this option will cause results irresproducible,
1857 A new script 'randsample' is added, which can randomly sample
1858 certain percentage or number of tags.
1862 Now, MACS will decide peak summits according to pileup height
1863 instead of qvalue scores. In this way, the summit may be more
1868 MACS calculate qvalue scores as differential scores. When compare
1869 two conditions (saying A and B), the maximum qscore for comparing
1870 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1871 will be computed. If maxqscore_a2b is bigger, the diff score is
1872 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1874 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1875 MACS version 2.0.8 (tag:alpha)
1877 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1879 New script bdgbroadcall and the extra option '--broad' for macs2
1880 script, can be used to call broad regions with a loose cutoff to
1881 link nearby significant regions. The output is represented as
1884 * MACS2/IO/cScoreTrack.pyx
1886 Fix q-value calculation to generate forcefully monotonic values.
1888 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1890 They are combined to one more powerful script called
1891 "filterdup". The script filterdup can filter duplicated reads
1892 according to sequencing depth and genome size. The script can also
1893 convert any format supported by MACS to BED format.
1895 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1896 MACS version 2.0.7 (tag:alpha)
1898 * bin/macsdiff renamed to bin/bdgdiff
1900 Now this script will work as a low-level finetuning tool as bdgcmp
1905 A new script to take treatment and control files from two
1906 condition, calculate fragment size, use local poisson to get
1907 pvalues and BH process to get qvalues, then combine 4-ways result
1908 to call differential sites.
1910 This script can use upto 4 cpus to speed up 4-ways calculation. (
1911 I am trying multiprocessing in python. )
1913 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1914 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1915 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1917 All above files are modified for the new macs2diff script.
1919 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1921 Now q-value 0.01 is the default cutoff. If -p is specified,
1922 p-value cutoff will be used instead.
1924 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1925 MACS version 2.0.6 (tag:alpha)
1929 A script to call differential regions. A naive way is introduced
1930 to find the regions where:
1932 1. signal from condition 1 is larger than input 1 and condition 2 --
1933 unique region in condition 1;
1934 2. signal from condition 2 is larger than input 2 and condition 1
1935 -- unique region in condition 2;
1936 3. signal from condition 1 is larger than input 1, signal from
1937 condition 2 is larger than input 2, however either signal from
1938 condition 1 or 2 is not larger than the other.
1940 Here 'larger' means the pvalue or qvalue from a Poisson test is
1941 under certain cutoff.
1943 (I will make another script to wrap up mulitple scripts for
1944 differential calling)
1946 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1947 MACS version 2.0.5 (tag:alpha)
1949 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1950 MACS2/IO/cPeakIO.pyx
1952 Use hash to store peak information. Add back the feature to deal
1953 with data without control.
1955 Fix bug which incorrectly allows small peaks at the end of
1958 * bin/bdgpeakcall, bin/bdgcmp
1960 Fix bugs. bdgpeakcall can output encodePeak format.
1962 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1963 MACS version 2.0.4 (tag:alpha)
1967 Fix a bug, correctly assign lambda_bg while --to-small is
1968 set. Thanks Junya Seo!
1970 Add rank and num of bp columns to pvalue-qvalue table.
1974 Fix bugs to correctly deal with peakless chromosomes. Thanks
1977 Use AFDR for independent tests instead.
1981 Now MACS can output peak coordinates together with pvalue, qvalue,
1982 summit positions in a single encodePeak format (designed for
1983 ENCODE project) file. This file can be loaded to UCSC
1984 browser. Definition of some specific columns are: 5th:
1985 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1986 -log10qvalue, 10th: relative summit position to peak start.
1989 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1990 MACS version 2.0.3 (tag:alpha)
1992 * Rich output with qvalue, fold enrichment, and pileup height
1994 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1997 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1999 Now we have a similiar xls output file as before. The differences
2000 from previous file are:
2002 1. Summit now is absolute summit, instead of relative summit
2004 2. 'Pileup' is previous 'tag' column. It's the extended fragment
2005 pileup at the peak summit;
2006 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
2007 5.00 means 1e-5, simple and less confusing.
2008 4. FDR column becomes '-log10(qvalue)' column.
2009 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
2010 the values at the peak summit.
2012 * Extra output files
2014 NAME_pqtable.txt contains pvalue and qvalue relationships.
2016 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
2017 and -log10qvalue scores in BedGraph format. Nearby regions with
2018 the same value are not merged.
2020 * Separation of FeatIO.py
2022 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
2023 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
2024 implemented to store pileup, local lambda, pvalue, and qvalue
2025 alltogether in cScoreTrack.pyx.
2027 * Experimental option --half-ext
2029 Suggested by NPS algorithm, I added an experimental option
2030 --half-ext to let MACS only extends ChIP fragment around its
2031 middle point for only 1/2 d.
2033 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
2034 MACS version 2.0.2 (tag:alpha)
2038 Add an error check to see if there is no common chromosome names
2039 from treatment file and control file
2041 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
2043 Reduce memory usage by removing deepcopy() calls.
2045 * Modify README documents and others.
2047 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
2048 MACS Version 2.0.1 (tag:alpha)
2050 * cPileup.pyx, cPeakDetect.pyx and peak calling process
2052 Jie suggested me a brilliant simple method to pileup fragments
2053 into bedGraph track. It works extremely faster than the previous
2054 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
2055 large local lambda calculation in MACSv2 now. Now I generate three
2056 bedGraphs for d-size local bias, slocal-size and llocal-size local
2057 bias, and calculate the maximum local bias as local lambda
2060 Minor: add_loc in bedGraphTrackI now can correctly merge the
2061 region with its preceding region if their value are the same.
2065 Add an option to shift control tags before extension. By default,
2066 control tags will be extended to both sides regardless of strand
2069 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
2070 MACS Version 2.0.0 (tag:alpha)
2072 * Use bedGraph type to store data internally and externally.
2074 We can have theoretically one-basepair resolution profiles. 10
2075 times smaller in filesize and even smaller after converting to
2076 bigWig for visualization.
2078 * Peak calling process modified. Better peak boundary detection.
2080 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2081 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2082 one will be averaged to d size) Then calculate the maximum value
2083 of these two tracks and a global background, to have a
2084 local-lambda bedGraph.
2086 Use -10log10poisson_pvalue as scores to generate a score track
2087 before peak calling.
2089 A general peak calling based on a score cutoff, min length of peak
2090 and max gap between nearby peaks.
2094 Wiggle file output is removed. Now we only support bedGraph
2095 output. The generation of bedGraph is highly recommended since it
2096 will not cost extra time. In other words, bedGraph generation is
2097 internally run even you don't want to save bedGraphs on disk, due
2098 to the peak calling algorithm in MACS v2.
2102 We now can calculate poisson pvalue in log space so that the score
2103 (-10*log10pvalue) will not have a upper limit of 3100 due to
2104 precision of float number.
2106 * Cython is adopted to speed up Python code.
2108 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
2111 * Replaced with a newest WigTrackI class and fixed the wignorm script.
2113 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
2114 Version 1.4.0rc2 (Valentine)
2116 * --single-wig option is renamed to --single-profile
2118 * BedGraph output with --bdg or -B option.
2120 The BedGraph output provides 1bp resolution fragment pileup
2121 profile. File size is smaller than wig file. This option can be
2122 combined with --single-profile option to produce a bedgraph file
2123 for the whole genome. This option can also make --space,
2124 --call-subpeaks invalid.
2126 * Fix the description of --shiftsize to correctly state that the
2127 value is 1/2 d (fragment size).
2129 * Fix a bug in the call to __filter_w_control_tags when control is
2132 * Fix a bug on --to-small option. Now it works as expected.
2134 * Fix a bug while counting the tags in candidate peak region, an
2135 extra tag may be included. (Thanks to Jake Biesinger!)
2137 * Fix the bug for the peaks extended outside of chromosome
2138 start. If the minus strand tag goes outside of chromosome start
2139 after extension of d, it will be thrown out.
2141 * Post-process script for a combined wig file:
2143 The "wignorm" command can be called after a full run of MACS14 as
2144 a postprocess. wignorm can calculate the local background from the
2145 control wig file from MACS14, then use either foldchange,
2146 -10*log10(pvalue) from possion test, or difference after asinh
2147 transformation as the score to build a single wig track to
2148 represent the binding strength. This script will take a
2149 significant long time to process.
2151 * --wigextend has been obsoleted.
2153 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
2154 Version 1.4.0rc1 (Starry Sky)
2156 * Duplicate reads option
2158 --keep-dup behavior is changed. Now user can specify how many
2159 reads he/she wants to keep at the same genomic location. 'auto' to
2160 let MACS decide the number based on binomial distribution, 'all'
2161 to let MACS keep all reads.
2163 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2165 By default, MACS will now scale the smaller dataset to the bigger
2166 dataset. For instance, if IP has 10 million reads, and Input has 5
2167 million, MACS will double the lambda value calculated from Input
2168 reads while calling BOTH the positive peaks and negative
2169 peaks. This will address the issue caused by unbalanced numbers of
2170 reads from IP and Input. If --to-small is turned on, MACS will
2171 scale the larger dataset to the smaller one. So from now on, if d
2172 is fixed, then the peaks from a MACS call for A vs B should be
2173 identical to the negative peaks from a B vs A.
2175 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2176 Version 1.4.0beta (summer wishes)
2182 The default behavior in the model building step is slightly
2183 changed. When MACS can't find enough pairs to build model
2184 (implemented in alpha version) or the modeled fragment length is
2185 less than 2 times of tag length (implemented in beta version),
2186 MACS will use 2 times of --shiftsize value as fragment length in
2187 the later analysis. --off-auto can turn off this default behavior.
2189 ** Redundant tag filtering
2191 The IO module is rewritten. The redundant tag filtering process
2192 becomes simpler and works as promise. The maximum allowed number
2193 of tags at the exact same location is calculated from the
2194 sequencing depth and genome size using a binomial distribution,
2195 for both TREAMENT and CONTROL separately. ( previously only
2196 TREATMENT is considered ) The exact same location means the same
2197 coordination and the same strand. Then MACS will only keep at most
2198 this number of tags at the exact same location in the following
2199 analysis. An option --keep-dup can let MACS skip the filtering and
2200 keep all the tags. However this may bring in a lot of sequencing
2201 bias, so you may get many false positive peaks.
2203 ** Single wiggle mode
2205 First thing to mention, this is not the score track that I
2206 described before. By default, MACS generates wiggle files for
2207 fragment pileup for every chromosomes separately. When you use
2208 --single-wig option, MACS will generate a single wiggle file for
2209 all the chromosomes so you will get a wig.gz for TREATMENT and
2210 another wig.gz for CONTROL if available.
2212 ** Sniff -- automatic format detection
2214 Now, by default or "-f AUTO", MACS will decide the input file
2215 format automatically. Technically, it will try to read at most
2216 1000 records for the first 10 non-comment lines. If it succeeds,
2217 the format is decided. I recommend not to use AUTO and specify the
2218 right format for your input files, unless you combine different
2219 formats in a single MACS run.
2223 --single-wig and --keep-dup are added. Check previous section in
2224 ChangeLog for detail.
2226 -f (--format) AUTO is now the default option.
2228 --slocal default: 1000
2229 --llocal default: 10000
2233 Setup script will stop the installation if python version is not
2234 python2.6 or python2.7.
2236 Local lambda calculation has been changed back. MACS will check
2237 peak_region, slocal( default 1K) and llocal (default 10K) for the
2238 local bias. The previous 200bps default will cause MACS misses
2239 some peaks where the input bias is very sharp.
2241 sam2bed.py script is corrected.
2243 Relative pos in xls output is fixed.
2245 Parser for ELAND_export is fixed to pass some of the no match
2246 lines. And elandexport2bed.py is fixed too. ( however I can't
2247 guarantee that it works on any eland_export files. )
2249 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2250 Version 1.4.0alpha2 (be smarter)
2254 --gsize now provides shortcuts for common genomes, including
2255 human, mouse, C. elegans and fruitfly.
2257 --llocal now will be 5000 bps if there is no input file, so that
2258 local lambda doesn't overkill enriched binding sites.
2260 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2261 Version 1.4alpha (be smarter)
2265 --tsize option is redesigned. MACS will use the first 10 lines of
2266 the input to decide the tag size. If user specifies --tsize, it
2267 will override the auto decided tsize.
2269 --lambdaset is replaced by --slocal and --llocal which mean the
2270 small local region and large local region.
2272 --bw has no effect on the scan-window size now. It only affects the
2273 paired-peaks model process.
2277 During the model building, MACS will pick out the enriched regions
2278 which are not too high and not too low to build the paired-peak
2279 model. Default the region is from fold 10 to fold 30. If MACS
2280 fails to build the model, by default it will use the nomodel
2281 settings, like shiftsize=100bps, to shift and extend each
2282 tags. This behavior can be turned off by '--off-auto'.
2286 An extra file including all the summit positions are saved in
2287 *_summits.bed file. An option '--call-subpeaks' will invoke
2288 PeakSplitter developed by Mali Salmon to split wide peaks into
2291 * Sniff ( will in beta )
2293 Automatically recognize the input file format, so use can combine
2294 different format in one MACS run.
2296 Not implemented features/TODO:
2298 * Algorithms ( in near future? )
2300 MACS will try to refine the peak boundaries by calculating the
2301 scores for every point in the candidate peak regions. The score
2302 will be the -10*log(10,pvalue) on a local poisson distribution. A
2303 cutoff specified by users (--pvalue) will be applied to find the
2304 precise sub-peaks in the original candidate peak region. Peak
2305 boudaries and peak summits positions will be saved in separate BED
2308 * Single wiggle track ( in near future? )
2310 A single wiggle track will be generated to save the scores within
2311 candidate peak regions in the 10bps resolution. The wiggle file
2312 is in fixedStep format.
2315 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2316 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2320 Fixed typo. FCSTEP -> FESTEP
2324 The 'femax' attribute bug is fixed
2326 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2327 Version 1.3.7 (Oktoberfest)
2329 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2331 Enhancements by Peter Chines:
2333 1. gzip files are supported.
2334 2. when --diag is on, user can set the increment and endpoint for
2335 fold enrichment analysis by setting --fe-step and --fe-max.
2337 Enhancements by Davide Cittaro:
2339 1. BAM and SAM formats are supported.
2340 2. small changes in the header lines of wiggle output.
2343 1. I added --fe-min option;
2344 2. Bowtie ascii output with suffix ".map" is supported.
2348 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2349 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2350 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2351 4. Some "fold change" have been changed to "fold enrichment".
2353 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2354 Version 1.3.6.1 (default parameter change)
2356 * bin/macs, lib/PeakDetect.py
2358 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2359 default. "--futurefdr" is added which can turn on the 'new' method
2360 introduced in 1.3.6. By default it's off.
2364 Fixed a bug. p-value is corrected a little bit.
2367 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2368 Version 1.3.6 (Birthday cake)
2372 "track name" is added to the header of BED output file.
2374 Now the default peak detection method is to consider 5k and 10k
2375 nearby regions in treatment data and peak location, 1k, 5k, and
2376 10k regions in control data to calculate local bias. The old
2377 method can be called through '--old' option.
2379 Information about how many total/unique tags in treatment or
2380 control will be saved in final .xls output.
2382 * lib/IO/__init__.py
2384 ".fa" will be removed from input tag alignment so only the
2385 chromosome names are kept.
2387 WigTrackI class is added for Wiggle like data structure. (not used
2390 The parser for ELAND multi PET files has been fixed. Now the 5'
2391 tag position for a pair will be kept, whereas in the previous
2392 version, the middle points are kept.
2394 * lib/IO/BinKeeper.py
2396 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2397 browser, which can quickly access certain region for values in a
2398 large wiggle like data file. (not used now)
2400 * lib/OptValidator.py
2406 Now the default peak detection method is to consider 5k and 10k
2407 nearby regions in treatment data and peak location, 1k, 5k, and
2408 10k regions in control data to calculate local bias. The old
2409 method can be called through '--old' option.
2411 Two columns have beed added to BED output file. 4th column: peak
2412 name; 5th column: peak score using -10log(10,pvalue) as score.
2416 Add support to build a Mac App through 'setup.py py2app', or a
2417 Windows executable through 'setup.py py2exe'. You need to install
2418 py2app or py2exe package in order to use these functions.
2420 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2421 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2425 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2426 in control data to calculate local lambda for each peak. Peak
2427 calling results will be slightly different with previous version,
2432 Typo fixed, ELANDParser -> ELANDResultParser
2436 Now, modeled d value will be shown on the model figure.
2438 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2439 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2441 * macs, IO/__init__.py, PeakDetect.py
2443 Add support for ELAND multi format. Add support for Pair-End
2444 experiment, in this case, 5'end and 3'end ELAND multi format files
2445 are required for treatment or control data. See 00README file for
2448 Add wigextend option.
2450 Add petdist option for Pair-End Tag experiment, which is the best
2451 distance between 5' and 3' tags.
2455 Fixed a bug which cause the end positions of every peak region
2456 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2460 Fix bugs while generating wiggle files. The start position of
2461 wiggle file is set to 1 instead of 0.
2463 Fix a bug that every 10M bps, signals in the first 'd' range are
2464 lower than actual. ( Thanks Mali Salmon!)
2467 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2468 Version 1.3.3 (wiggle bugs fixed)
2472 Fix bugs while generating wiggle files. 1. 'span=' is added to
2473 'variableStep' line; 2. previously, every 10M bps, the coordinates
2474 were wrongly shifted to the right for 'd' basepairs.
2476 * macs, PeakDetect.py
2478 Add an option to save wiggle files on different resolution.
2480 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2481 Version 1.3.2 (tiny bugs fixed)
2485 Fix 65536 -> 65535. ( Thank Joon)
2489 Improved for binomial function with extra large number. Imported
2490 from Cistrome project.
2494 If treatment channel misses reads in some chromosome included in
2495 control channel, or vice versa, MACS will not exit. (Thank Shaun
2498 Instead, MACS will fake a tag at position -1 when calling
2499 treatment peaks vs control, but will ignore the chromosome while
2500 calling negative peaks.
2502 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2503 Version 1.3.1 (tiny bugs fixed version)
2507 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2508 binomial functions can tolerate large and small numbers.
2512 Parsers now split lines in BED/ELAND file using any
2513 whitespaces. 'track' or 'browser' lines will be regarded as
2514 comment lines. A bug fixed when throwing StrandFormatError. The
2515 maximum redundant tag number at a single position can be no less
2519 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2520 Version 1.3 (naming clarification version)
2522 * Naming clarification changes according to our manuscript:
2524 'frag_len' is changed to 'd'.
2526 'fold_change' is changed to 'fold_enrichment'.
2528 Suggest '--bw' parameter to be determined by users from the real
2531 Maximum FDR is 100% in the output file.
2533 And other clarifications in 00README file and the documents on the
2537 If the redundant tag number at a single position is over 32767,
2538 just remember 32767, instead of raising an overflow exception.
2544 Bug fixed for diagnosis report.
2547 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2552 Poisson distribution CDF and inverse CDF functions are
2553 corrected. They can produce right results even for huge lambda
2554 now. So that the p-value and FDR values in the final excel sheet
2557 IO package now can tolerate some rare cases; ELANDParser in IO
2558 package is fixed. (Thank Bogdan)
2562 Reverse paired peaks in model are rejected. So there will be no
2563 negative 'frag_len'. (Thank Bogdan)
2567 Diagnosis function is completed. Which can output a table file for
2568 users to estimate their sequencing depth.
2571 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2574 * Probe.py is added!
2576 GSL is totally removed from MACS. Instead, I have implemented the
2577 CDF and inverse CDF for poisson and binomial distribution purely
2580 * Constants.py is added!
2582 Organize constants used in MACS in the Constants.py file.
2584 * All other files are modified!
2586 Foldchange calculation is modified. Now the foldchange only be
2587 calculated at the peak summit position instead of the whole peak
2588 region. The values will be higher and more robust than before.
2592 1. MACS can save wiggle format files containing the tag number at
2593 every 10 bp along the genome. Tags are shifted according to our
2594 model before they are calculated.
2596 2. Model building and local lambda calculation can be skipped with
2599 3. A diagnosis report can be generated through '--diag'
2600 option. This report can help you get an assumption about the
2601 sequencing saturation. This funtion is only in beta stage.
2603 4. FDR calculation speed is highly improved.
2605 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2608 * TabIO, PeakModel.py ...
2609 Bug fixed to let MACS tolerate some cases while there is no tag on
2610 either plus strand or minus strand.
2613 Check the version of python. If the version is lower than 2.4,
2614 refuse to install with warning.