1 2023-07-28 Tao Liu <vladimir.liu@gmail.com>
4 * New features in MACS3:
6 1) Speed/memory optimization. Use the cykhash to replace python
7 dictionary. Use buffer (10MB) to read and parse input file (not
8 available for BAM file parser). And many optimization tweaks. We
9 added memory monitoring to the runtime messages.
11 2) Call variants in peak regions directly from BAM files. The
12 function was originally developed under code name SAPPER. Now
13 SAPPER has been merged into MACS. Also, `simde` has been added as
14 a submodule in order to support fermi-lite library under non-x64
17 3) HMMRATAC module is added. HMMRATAC is a dedicated software to
18 analyze ATAC-seq data. The basic idea behind HMMRATAC is to digest
19 ATAC-seq data according to the fragment length of read pairs into
20 four signal tracks: short fragments, mononucleosomal fragments,
21 di-nucleosomal fragments and tri-nucleosomal fragments. Then
22 integrate the four tracks again using Hidden Markov Model to
23 consider three hidden states: open region, nucleosomal region, and
24 background region. The orginal paper was published in 2019 written
25 in JAVA, by Evan Tarbell. We implemented it in Python/Cython and
26 optimize the whole process using existing MACS functions and
27 hmmlearn. Now it can run much faster than the original JAVA
28 version. Note: evaluation of the peak calling results is underway.
30 4) Code cleanup. Reorganize source codes.
34 6) R wrappers for MACS -- MACSr
36 7) Switch to Github Action for CI, support multi-arch testing
37 including x64, armv7, aarch64, s390x and ppc64le. We also test on
40 8) MACS tag-shifting model has been refined. Now it will use a
41 naive peak calling approach to find ALL possible paired peaks at +
42 and - strand, then use all of them to calculate the
43 cross-correlation. (a related bug has been fix #442)
45 9) BAI index and random access to BAM file now is supported. #449
46 And user can use original BAM file (instead of the subset of BAM
47 file as in SAPPER) in the `callvar` command.
49 10) Support of Python > 3.10 #497 #498
51 11) The effective genome size parameters have been updated
52 according to deeptools. #508
54 12) Multiple updates regarding dependencies, anaconda built, CI/CD
57 13) Cython support to ~0.29. Cython 3 is not supported yet.
60 1) Missing header line while no peaks can be called #501 #502
62 2) Note: different numpy, scipy, sklearn may give slightly
63 different results for hmmratac results. The current standard
64 results for automated testing in `/test` directory are from Numpy
65 1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
67 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
72 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
75 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
80 1) MACS2 has been tested on multiple architectures to make sure it
81 can successfully generate consistent results. Currently the
82 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
83 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
84 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
85 and #370. The lesson is that if the project is built on Cython and
86 is aimed at memory efficiency, we should specifically define all
87 int/float types in pyx files such as int8_t or uint32_t using
88 either libc or numpy (c version) instead of relying on Cython
89 types such as short, long, double.
91 2) MACS2 setup script will check numpy and install numpy if
92 necessary. PR #378, issue #364
94 3) `bdgbroadcall` command will correctly add the score column (5th
95 column). The score (5th) column contains 10 times of the average
96 score in the broad region. PR #373, issue #362
98 4) The missing test on `bdgopt` subcommand has been added. PR #363
100 5) The obsolete option `--ratio` from `callpeak` subcommand has
101 been removed. PR #369, issue #366
103 6) Fixed the incorrect description in README on the 'maximum
104 length of broad region is 4 times of d' to 'maximum gap for
105 merging broad regions is 4 times of tag size by default'. PR #380,
110 1) CODE OF CONDUCT document has been added to MACS2 github
113 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
118 1) Speed up MACS2. Some programming tricks and code cleanup. The
119 filter_dup function replaces separate_dups. The later one was
120 implemented for potentially putting back duplicate reads in
121 certain downstream analysis. However such analysis hasn't been
122 implemented. Optimize the speed of writing bedGraph
123 files. Optimize BAM and BAMPE parsing with pointer casting instead
126 2) The comment lines in the headers of BED or SAM files will be
127 correctly skipped. However, MACS2 won't check comment lines in the
132 1) Cutoff-analysis in callpeak command. #341
134 2) Issues related to SAMParser and three ELAND Parsers are
139 1) cmdlinetest script in test/ folder has been updated to: 1. test
140 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
141 and after the error or warning message during tests; 3. output
142 only the first 10 lines if the difference between test result and
143 standard result can be found; 4. prockreport monitor CPU time and
144 memory usage in 1 sec interval -- a bit more accurate.
146 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
148 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
149 MACS version 2.2.5 (Py3 speed up)
153 1) *Github code only and Not included in MACS2 release* New
154 testing data for performance test. An subsampled ENCODE2 CTCF
155 ChIP-seq dataset, including 5million ChIP reads and 5 million
156 control reads, has been included in the test folder for testing
157 CPU and memory usage (i.e. 5M test). Several related scripts ,
158 including `prockreport` for output cpu memory usage, `pyprofile`
159 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
162 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
163 The old hashtable.pyx implementation copied from Pandas (very old
164 version) doesn't work well in Python3+Cython. It slows down the
165 pqtable checkup using the identical Cython codes as in
166 v2.1.4. While running 5M test, the `__getitem__` function in the
167 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
168 148.6s with the same number of calls in MACS2 v2.2.4. As a
169 consequence, the standard python dictionary implementation has
170 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
171 faster than py2 version, but uses a bit more memory. In general,
172 v2.2.5 can finish 5M reads test in 20% less time than MACS2
173 v2.1.4, but use 15% more memory.
177 1) More Python3 related fixes, e.g. the return value of keys from
181 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
182 MACS version 2.2.4 (Python3)
186 1) First Python3 version MACS2 released.
188 2) Version number 2.2.X will be used for MACS2 in Python3, in
191 3) More comprehensive test.sh script to check the consistency of
192 results from Python2 version and Python3 version.
194 4) Simplify setup.py script since the newest version transparently
195 supports cython. And when cython is not installed by the user,
196 setup.py can still compile using only C codes.
198 5) Fix Signal.pyx to use np.array instead of np.mat.
200 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
205 Github Actions is used together with Travis CI for testing and
212 1) #318 Random score in bdgdiff output. It turns out the sum_v is
213 not initialized as 0 before adding. Potential bugs are fixed in
214 other functions in ScoreTrack and CallPeakUnit codes.
216 2) #321 Cython dependency in setup.py script is removed. And place
217 'cythonzie' call to the correct position.
219 3) A typo is fixed in Github Actions script.
221 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
226 1) Support Docker auto-deploy. PR #309
228 2) Support Travis CI auto-testing, update unit-testing
229 scripts, and enable subcommand testing on small datasets.
231 3) Update README documents. #297 PR #306
233 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
234 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
236 5) `--d-min` option is added in `callpeak` and `predictd`, to
237 exclude predictions of fragment size smaller than the given
238 value. Merged from PR #267 @shouldsee.
240 6) `--buffer-size` option is added in `predictd`, `filterdup`,
241 `pileup` and `refinepeak` subcommands. Users can use this option
242 to decrease memory usage while there are a large number of contigs
243 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
244 `pileup` and `refinepeak` will suggest users to tweak
245 `--buffer-size` while catching a MemoryError. #313 PR #314
249 1) #265 Fixed a bug where the pseudocount hasn't been applied
250 while calculating p-value score in ScoreTrack object.
252 2) Fixed bdgbroadcall so that it will report those broad peaks
253 without strong peak inside, a consistent behavior as `callpeak
256 3) Rename COPYING to LICENSE.
258 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
263 1) Added missing BEDPE support. And enable the support for BAMPE
264 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
265 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
266 will pile up the whole fragment defined by mapping locations of
267 the left end and right end of each read pair. Thank @purcaro
269 2) Added options to callpeak command for tweaking max-gap and
270 min-len during peak calling. Thank @jsh58!
272 3) The callpeak option "--to-large" option is replaced with
275 4) The randsample option "-t" has been replaced with "-i".
279 1) Fixed memory issue related to #122 and #146
281 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
283 3) Fixed a bug while setting commandline qvalue cutoff.
285 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
287 5) Fixed the calculation of average fragment length for paired-end
290 6) Fixed bugs caused by khash while computing p/q-value and log
291 likelihood ratios. Thank @jsh58
293 7) More spelling tweaks in source code. Thank @mr-c
295 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
296 MACS version 2.1.1 20160309
300 * Fixed spelling. Merged pull request #120. Thank @mr-c!
302 * Change filtering criteria for reading BAM/SAM files
304 Related to callpeak and filterdup commands. Now the
305 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
306 still be read although MACS2 may decide them as duplicates
307 later. Related to old issue #33. Sorry I forgot to address it for
310 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
311 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
315 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
316 the former option is not supported by older GCC. Related to issues
319 2) Issue #108 is fixed. If no peak can be found in a chromosome,
320 the PeakIO won't throw an error.
326 a) A more flexible format, BEDPE, is supported. Now users can
327 define the left and right position of the ChIPed fragment, and
328 MACS2 will skip model building and directly pileup the
329 fragments. Related to issue #112.
331 b) The 'tempdir' can be specified, to save cached pileup
332 tracks. Originially, the temporary files were stored in
333 /tmp. Thank @daler! Related to issues #97 and #105.
337 New operations are added, to calculate the maximum or minimum value between
338 values in BEDGRAPH and given value.
342 New method is added, to calculate the maximum value between values
343 defined in two BEDGRAPH files.
345 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
346 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
350 1) Fix a bug while dealing with some chromosomes only containing
351 one read (pair). The size of dup_plus/dup_minus arrays after
352 filtering dups should +1.
354 2) Fix a bug related to the broad peak calling function in
355 previous versions. The gaps were miscalculated, so segmented weak
356 broad calls may be reported, and sometimes you would see peaks
357 with lower than cutoff values in the output files.
359 3) "Potentially" Fixed issue #105 on temporary cache files, need
363 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
364 MACS version 2.1.0 20150731 (tag:rc)
368 1) Fixed issue #76: information about broad/narrow cutoff will be
371 2) Fixed issue #79: bdgopt extparam option is fixed.
373 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
374 for filterdup command.
376 4) Fixed issue #78, #88 and similar issue reported in MACS google
377 group: MACS2 now can correctly deal with multiple alignment files
378 for -t or -c. The 'finalize' function will be correctly
379 called. Multiple files option is enabled for filterdup,
380 randsample, predictd, pileup and refinepeak commands.
382 5) A related issue to #88, when BAMPE mode is used, PE pairs will
383 be sorted by leftmost then rightmost ends.
385 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
386 array. This will cause 'callpeak --nolambda' hang forever while
387 calculating pvalues and qvalues.
389 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
390 MACS version 2.1.0 20150420 (tag:rc)
394 1) bdgopt: some convenient functions to modify bedGraph files.
396 2) cmbreps: Combine scores from two replicates. Including three
397 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
398 method to combine two p-value scores. After that, user can use
399 bdgpeakcall to call peaks on combined scores.
403 1) callpeak and bdgpeakcall now can try to analyze the
404 relationship between p-values and number/length of peaks then
405 generate a summary to help users decide an appropriate cutoff.
407 2) callpeak now can accept fold-enrichment cutoff as a filter for
412 Now MACS2 runs about 3X as fast as previous version. Trade
413 clean python codes for speed... Now while processing 50M ChIP vs
414 50M control, it will take only 10 minutes.
418 1) Sampling function in BAMPE mode.
420 2) Callpeak while there are >= 2 input files for -t or -c.
422 3) While reading BAM/SAM, those secondary or supplementary
423 alignments will be correctly skipped.
425 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
426 option that MACS2 will discard those SAM/BAM alignments with bit
427 1024 no matter how --keep-dup is set.
429 5) Fixed issue #49: setuptools is used intead of distutils
431 6) Fixed issue #51: fix the problem when using --trackline
432 argument when control file is absent.
434 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
435 read mapped to minus strand. Previous implementation will find
436 incorrect 5' end if there is indel in alignment.
438 8) Fixed issue #56: An incorrect sorting method used for BAMPE
439 mode which will cause incorrect filtering of duplicated reads. Now
442 9) Issue #63: Merged from jayhesselberth@github, extsize now can
445 10) Issue #71: Merged from aertslab@github, close file descriptor
446 after creating them with mkstemp().
448 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
449 MACS version 2.1.0 20140616 (tag:rc)
453 "--ratio" is added to manually assign the scaling factor of ChIP
454 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
455 implementing the patch file!
457 "--shift" is added to move cutting ends (5' end of reads) around,
458 in order to process DNAse-Seq data, e.g., use "--shift -100
459 --extsize 200" to get 200bps fragments around 5' ends. For general
460 ChIP-Seq data analysis, this option should be always set as
461 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
464 ** Do not output negative fragment size from cross-correlation
465 analysis. Thank Alvin Qin for the feedback!
467 ** --half-ext and --control-shift are removed. For complex read
468 shifting and extending, combine '--shift' and '--extsize'
469 options. For comparing two conditions, use 'bdgdiff' module
472 ** a bug is fixed to output the last pileup value in bdg file
477 A 'dry-run' option is added to only output numbers, including the
478 number of allowed duplicates, the total number of reads before and
479 after filtering duplicates and the estimated duplication
480 rate. Thank John Urban for the suggestion!
483 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
484 MACS version 2.0.10 20131216 (tag:alpha)
488 * We changed license from Artistic License to 3-clauses BSD license.
490 Yes. Simpler the better.
492 * Process paired-end data with "-f BAMPE" without control
494 * GappedPeak output for --broad option has been fixed again to be
495 consistent with official UCSC format. We add 1bp pseudo-block to
496 left and/or right of broad region when necessary, so that you can
497 virtualize the regions without strong enrichment inside
498 successfully. In downstream analysis except for virtualization,
499 you may need to remove all 1bps blocks from gappedPeak file.
501 * diffpeak subcommand is temporarily disabled. Till we
504 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
505 MACS version 2.0.10 20131028 (tag:alpha)
507 * callpeak --call-summits improvement
509 The smoothing window length has been fixed as fragment length
510 instead of short read length. The larger smoothing window will
511 grant better smoothing results and better sub-peak summits
514 * --outdir and --ofile options for almost all commands
516 Thank Björn Grüning for initially implementing these options!
517 Now, MACS2 will save results into a specified
518 directory by '--outdir' option, and/or save result into a
519 specified file by '--ofile' option. Note, in case '--ofile' is
520 available for a subcommand, '-o' now has been adjusted to be the
521 same as '--ofile' instead of '--o-prefix'.
523 Here is the list of changes. For more detail, use 'macs2 xxx -h'
526 ** callpeak: --outdir
527 ** diffpeak: Not implemented
528 ** bdgpeakcall: --outdir and --ofile
529 ** bdgbroadcall: --outdir and --ofile
530 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
531 and the order of arguments for --ofile must be the same as for -m.
532 ** bdgdiff: --outdir and --ofile
533 ** filterdup: --outdir
535 ** randsample: --outdir
536 ** refinepeak: --outdir and --ofile
539 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
540 MACS version 2.0.10 20130915 (tag:alpha)
542 * callpeak Added a new option --buffer-size
544 This option is to tweak a previously hidden parameter that
545 controls the steps to increase array size for storing alignment
546 information. While in some rare cases, the number of
547 chromosomes/contigs/scaffolds is huge, the original default
548 setting will cause a huge memory waste. In these cases, we
549 recommend to decrease --buffer-size (e.g., 1000) to save memory,
550 although the decrease will slow process to read alignment files.
552 * an optimization to speed up pvalue-qvalue statistics
554 Previously, it took a hour to prepare p-q-table for 65M vs 65M
555 human TF library, and now it will take 10 minutes. It was due to a
556 single line of code to get a value from a numpy array ...
560 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
561 MACS version 2.0.10 20130731 (tag:alpha)
563 * callpeak --call-summits
565 Fix bugs causing callpeak --call-summits option generating extra
566 number of peaks and inconsistent peak boundaries comparing to
567 default option. Thank Ben Levinson!
571 Fix bugs causing bdgcmp output logLR all in positive values. Now
572 'depletion' can be correctly represented as negative values.
576 Fix the behavior of bdgdiff module. Now it can take four
577 bedGraph files, then use logLR as cutoff to call differential
578 regions. Check command line of bdgdiff for detail.
580 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
581 MACS version 2.0.10 20130713 (tag:alpha)
583 * fix bugs while output broadPeak and gappedPeak.
585 Note. Those weak broad regions without any strong enrichment
586 regions inside won't be saved in gappedPeak file.
588 * bdgcmp -T and -C are merged into -S and description is updated.
590 Now, you can use it to override SPMR values in your input for
591 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
592 statistics will cause weird results ( in most cases, lower
593 significancy), and won't be consistent with MACS2 callpeak
594 behavior. So if you have SPMR bedGraphs, input the smaller/larger
595 sample size in MILLION according to 'callpeak --to-large' option.
597 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
598 MACS version 2.0.10 20130710 (tag:alpha)
600 * fix BED style output format of callpeak module:
602 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
603 the output. Old BED format file won't be saved.
605 2) with --broad: broadPeak (BED6+3) for broad region and
606 gappedPeak (BED12+3) for chained enriched regions will be the
607 output. Old BED format, narrowPeak format, summit file won't be
610 * bdgcmp now can accept list of methods to calculate scores. So
611 you can run it once to generate multiple types of scores. Thank
612 Jon Urban for this suggestion!
614 * C codes are re-generated through Cython 0.19.1.
616 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
617 MACS version 2.0.10 20130520 (tag:alpha)
619 * broad peak calling modules are modified in order to report all
620 relexed regions even there is no strong enrichment inside.
622 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
623 MACS version 2.0.10 20130501 (tag:alpha)
625 * Memory usage is decreased to about 1/4-1/5 of previous usage
626 Now, the internal data structure and algorithm are both
627 re-organized, so that intermediate data wouldn't be saved in
628 memory. Intead they will be calculated on the fly. New MACS2 will
629 spend longer time (1.5 to 2 times) however it will use less memory
630 so can be more usable on small mem servers.
632 * --seed option is added to callpeak and randsample commands
633 Thank Mathieu Gineste for this suggestion!
635 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
636 MACS version 2.0.10 20130306 (tag:alpha)
638 * diffpeak module New module to detect differential binding sites
639 with more statistics.
641 * Introduced --refine-peaks
642 Calculates reads balancing to refine peak summits
644 * Ouput file names prefix
645 Correct encodePeak to narrowPeak, broadPeak to bed12.
647 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
648 MACS version 2.0.10 (tag:alpha not released)
650 * Introduced BAMPEParser
651 Reads PE data directly, requires bedtools for now
653 * Introduced --call-summits
654 Uses signal processing methods to call overlapping peaks
656 * Added --no-trackline
657 By default, files have descriptive tracklines now
659 * new refinepeak command (experimental)
660 This new function will use a similar method in SPP (wtd), to
661 analyze raw tag distribution in peak region, then redefine the
662 peak summit where plus and minus tags are evenly distributed
665 * Changes to output *
666 cPeakDetect.pyx has full support for new print/write methods and
667 --call-peaks, BAMPEParser, and use of paired-end data
669 * Parser optimization
671 cParser.pyx is rewritten to use io.BufferedReader to speed
672 up. Speed is doubled.
674 Code is reorganized -- most of functions are inherited from
677 * Use cross-correlation to calculate fragment size
679 First, all pairs will be used in prediction for fragment
680 size. Previously, only no more than 1000 pairs are used. Second,
681 cross-correlation is used to find the best phase difference
682 between + and - tag pileups.
684 * Speed up p-value and q-value calculation
686 This part is ten times faster now. I am using a dictionary to
687 cache p-value results from Poisson CDF function. A bit more memory
688 will be used to increase speed. I hope this dictionary would not
689 explode since the possible pairs of ChIP signal and control lambda
690 are hugely redundant. Also, I rewrited part of q-value
693 * Speed up peak detection
695 This part is about hundred of times faster now. Optimizations
696 include using Numpy functions as much as possible, and making loop
697 body as small as possible.
699 * Post-processing on differential calls
701 After macs2diff finds differential binding sites between two
702 conditions, it will try to annotate the peak calls from one of two
703 conditions, describe the changes ...
705 * Fragment size prediction in macs2diff
707 Now by default, macs2diff will try to use the average fragment
708 size from both condition 1 and condition 2 for tag extension and
709 peak calling. Previously, by default, it will use different sizes
710 unless --nomodel is specified.
712 Technically, I separate model building processes out. So macs2diff
713 will build fragment sizes for condition 1 and 2 in parallel (2
714 processes maximum), then perform 4-way comparisons in parallel (4
719 Combine two p/qscore tracks together. At regions where condition 1
720 is higher than condition 2, score would be positive, otherwise,
723 * SAMParser and BAMParser
725 Bug fixed for paired-end sequencing data.
729 Fixed a bug while calling peaks from BedGraph file. It previously
730 mistakenly output same peaks multiple times at the end of
733 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
734 MACS version 2.0.9 (tag:alpha)
736 * Auto fixation on predicted d is turned off by default!
738 Previous --off-auto is now default. MACS will not automatically
739 fix d less than 2 times of tag size according to
740 --shiftsize. While tag size is getting longer nowadays, it would
741 be easier to have d less than 2 times of tag size, however d may
742 still be meaningful and useful. Please judge it using your own
747 Now, the default scaling while treatment and input are unbalanced
748 has been adjusted. By default, larger sample will be scaled down
749 linearly to match the smaller sample. In this way, background
750 noise will be reduced more than real signals, so we expect to have
751 more specific results than the other way around (i.e. --to-large
754 Also, an alternative option to randomly sample larger data
755 (--down-sample) is provided to replace default linear
756 scaling. However, this option will cause results irresproducible,
761 A new script 'randsample' is added, which can randomly sample
762 certain percentage or number of tags.
766 Now, MACS will decide peak summits according to pileup height
767 instead of qvalue scores. In this way, the summit may be more
772 MACS calculate qvalue scores as differential scores. When compare
773 two conditions (saying A and B), the maximum qscore for comparing
774 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
775 will be computed. If maxqscore_a2b is bigger, the diff score is
776 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
778 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
779 MACS version 2.0.8 (tag:alpha)
781 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
783 New script bdgbroadcall and the extra option '--broad' for macs2
784 script, can be used to call broad regions with a loose cutoff to
785 link nearby significant regions. The output is represented as
788 * MACS2/IO/cScoreTrack.pyx
790 Fix q-value calculation to generate forcefully monotonic values.
792 * bin/eland*2bed, bin/sam2bed and bin/filterdup
794 They are combined to one more powerful script called
795 "filterdup". The script filterdup can filter duplicated reads
796 according to sequencing depth and genome size. The script can also
797 convert any format supported by MACS to BED format.
799 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
800 MACS version 2.0.7 (tag:alpha)
802 * bin/macsdiff renamed to bin/bdgdiff
804 Now this script will work as a low-level finetuning tool as bdgcmp
809 A new script to take treatment and control files from two
810 condition, calculate fragment size, use local poisson to get
811 pvalues and BH process to get qvalues, then combine 4-ways result
812 to call differential sites.
814 This script can use upto 4 cpus to speed up 4-ways calculation. (
815 I am trying multiprocessing in python. )
817 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
818 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
819 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
821 All above files are modified for the new macs2diff script.
823 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
825 Now q-value 0.01 is the default cutoff. If -p is specified,
826 p-value cutoff will be used instead.
828 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
829 MACS version 2.0.6 (tag:alpha)
833 A script to call differential regions. A naive way is introduced
834 to find the regions where:
836 1. signal from condition 1 is larger than input 1 and condition 2 --
837 unique region in condition 1;
838 2. signal from condition 2 is larger than input 2 and condition 1
839 -- unique region in condition 2;
840 3. signal from condition 1 is larger than input 1, signal from
841 condition 2 is larger than input 2, however either signal from
842 condition 1 or 2 is not larger than the other.
844 Here 'larger' means the pvalue or qvalue from a Poisson test is
845 under certain cutoff.
847 (I will make another script to wrap up mulitple scripts for
848 differential calling)
850 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
851 MACS version 2.0.5 (tag:alpha)
853 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
856 Use hash to store peak information. Add back the feature to deal
857 with data without control.
859 Fix bug which incorrectly allows small peaks at the end of
862 * bin/bdgpeakcall, bin/bdgcmp
864 Fix bugs. bdgpeakcall can output encodePeak format.
866 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
867 MACS version 2.0.4 (tag:alpha)
871 Fix a bug, correctly assign lambda_bg while --to-small is
872 set. Thanks Junya Seo!
874 Add rank and num of bp columns to pvalue-qvalue table.
878 Fix bugs to correctly deal with peakless chromosomes. Thanks
881 Use AFDR for independent tests instead.
885 Now MACS can output peak coordinates together with pvalue, qvalue,
886 summit positions in a single encodePeak format (designed for
887 ENCODE project) file. This file can be loaded to UCSC
888 browser. Definition of some specific columns are: 5th:
889 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
890 -log10qvalue, 10th: relative summit position to peak start.
893 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
894 MACS version 2.0.3 (tag:alpha)
896 * Rich output with qvalue, fold enrichment, and pileup height
898 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
901 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
903 Now we have a similiar xls output file as before. The differences
904 from previous file are:
906 1. Summit now is absolute summit, instead of relative summit
908 2. 'Pileup' is previous 'tag' column. It's the extended fragment
909 pileup at the peak summit;
910 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
911 5.00 means 1e-5, simple and less confusing.
912 4. FDR column becomes '-log10(qvalue)' column.
913 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
914 the values at the peak summit.
918 NAME_pqtable.txt contains pvalue and qvalue relationships.
920 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
921 and -log10qvalue scores in BedGraph format. Nearby regions with
922 the same value are not merged.
924 * Separation of FeatIO.py
926 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
927 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
928 implemented to store pileup, local lambda, pvalue, and qvalue
929 alltogether in cScoreTrack.pyx.
931 * Experimental option --half-ext
933 Suggested by NPS algorithm, I added an experimental option
934 --half-ext to let MACS only extends ChIP fragment around its
935 middle point for only 1/2 d.
937 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
938 MACS version 2.0.2 (tag:alpha)
942 Add an error check to see if there is no common chromosome names
943 from treatment file and control file
945 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
947 Reduce memory usage by removing deepcopy() calls.
949 * Modify README documents and others.
951 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
952 MACS Version 2.0.1 (tag:alpha)
954 * cPileup.pyx, cPeakDetect.pyx and peak calling process
956 Jie suggested me a brilliant simple method to pileup fragments
957 into bedGraph track. It works extremely faster than the previous
958 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
959 large local lambda calculation in MACSv2 now. Now I generate three
960 bedGraphs for d-size local bias, slocal-size and llocal-size local
961 bias, and calculate the maximum local bias as local lambda
964 Minor: add_loc in bedGraphTrackI now can correctly merge the
965 region with its preceding region if their value are the same.
969 Add an option to shift control tags before extension. By default,
970 control tags will be extended to both sides regardless of strand
973 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
974 MACS Version 2.0.0 (tag:alpha)
976 * Use bedGraph type to store data internally and externally.
978 We can have theoretically one-basepair resolution profiles. 10
979 times smaller in filesize and even smaller after converting to
980 bigWig for visualization.
982 * Peak calling process modified. Better peak boundary detection.
984 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
985 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
986 one will be averaged to d size) Then calculate the maximum value
987 of these two tracks and a global background, to have a
988 local-lambda bedGraph.
990 Use -10log10poisson_pvalue as scores to generate a score track
993 A general peak calling based on a score cutoff, min length of peak
994 and max gap between nearby peaks.
998 Wiggle file output is removed. Now we only support bedGraph
999 output. The generation of bedGraph is highly recommended since it
1000 will not cost extra time. In other words, bedGraph generation is
1001 internally run even you don't want to save bedGraphs on disk, due
1002 to the peak calling algorithm in MACS v2.
1006 We now can calculate poisson pvalue in log space so that the score
1007 (-10*log10pvalue) will not have a upper limit of 3100 due to
1008 precision of float number.
1010 * Cython is adopted to speed up Python code.
1012 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1015 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1017 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1018 Version 1.4.0rc2 (Valentine)
1020 * --single-wig option is renamed to --single-profile
1022 * BedGraph output with --bdg or -B option.
1024 The BedGraph output provides 1bp resolution fragment pileup
1025 profile. File size is smaller than wig file. This option can be
1026 combined with --single-profile option to produce a bedgraph file
1027 for the whole genome. This option can also make --space,
1028 --call-subpeaks invalid.
1030 * Fix the description of --shiftsize to correctly state that the
1031 value is 1/2 d (fragment size).
1033 * Fix a bug in the call to __filter_w_control_tags when control is
1036 * Fix a bug on --to-small option. Now it works as expected.
1038 * Fix a bug while counting the tags in candidate peak region, an
1039 extra tag may be included. (Thanks to Jake Biesinger!)
1041 * Fix the bug for the peaks extended outside of chromosome
1042 start. If the minus strand tag goes outside of chromosome start
1043 after extension of d, it will be thrown out.
1045 * Post-process script for a combined wig file:
1047 The "wignorm" command can be called after a full run of MACS14 as
1048 a postprocess. wignorm can calculate the local background from the
1049 control wig file from MACS14, then use either foldchange,
1050 -10*log10(pvalue) from possion test, or difference after asinh
1051 transformation as the score to build a single wig track to
1052 represent the binding strength. This script will take a
1053 significant long time to process.
1055 * --wigextend has been obsoleted.
1057 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1058 Version 1.4.0rc1 (Starry Sky)
1060 * Duplicate reads option
1062 --keep-dup behavior is changed. Now user can specify how many
1063 reads he/she wants to keep at the same genomic location. 'auto' to
1064 let MACS decide the number based on binomial distribution, 'all'
1065 to let MACS keep all reads.
1067 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1069 By default, MACS will now scale the smaller dataset to the bigger
1070 dataset. For instance, if IP has 10 million reads, and Input has 5
1071 million, MACS will double the lambda value calculated from Input
1072 reads while calling BOTH the positive peaks and negative
1073 peaks. This will address the issue caused by unbalanced numbers of
1074 reads from IP and Input. If --to-small is turned on, MACS will
1075 scale the larger dataset to the smaller one. So from now on, if d
1076 is fixed, then the peaks from a MACS call for A vs B should be
1077 identical to the negative peaks from a B vs A.
1079 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1080 Version 1.4.0beta (summer wishes)
1086 The default behavior in the model building step is slightly
1087 changed. When MACS can't find enough pairs to build model
1088 (implemented in alpha version) or the modeled fragment length is
1089 less than 2 times of tag length (implemented in beta version),
1090 MACS will use 2 times of --shiftsize value as fragment length in
1091 the later analysis. --off-auto can turn off this default behavior.
1093 ** Redundant tag filtering
1095 The IO module is rewritten. The redundant tag filtering process
1096 becomes simpler and works as promise. The maximum allowed number
1097 of tags at the exact same location is calculated from the
1098 sequencing depth and genome size using a binomial distribution,
1099 for both TREAMENT and CONTROL separately. ( previously only
1100 TREATMENT is considered ) The exact same location means the same
1101 coordination and the same strand. Then MACS will only keep at most
1102 this number of tags at the exact same location in the following
1103 analysis. An option --keep-dup can let MACS skip the filtering and
1104 keep all the tags. However this may bring in a lot of sequencing
1105 bias, so you may get many false positive peaks.
1107 ** Single wiggle mode
1109 First thing to mention, this is not the score track that I
1110 described before. By default, MACS generates wiggle files for
1111 fragment pileup for every chromosomes separately. When you use
1112 --single-wig option, MACS will generate a single wiggle file for
1113 all the chromosomes so you will get a wig.gz for TREATMENT and
1114 another wig.gz for CONTROL if available.
1116 ** Sniff -- automatic format detection
1118 Now, by default or "-f AUTO", MACS will decide the input file
1119 format automatically. Technically, it will try to read at most
1120 1000 records for the first 10 non-comment lines. If it succeeds,
1121 the format is decided. I recommend not to use AUTO and specify the
1122 right format for your input files, unless you combine different
1123 formats in a single MACS run.
1127 --single-wig and --keep-dup are added. Check previous section in
1128 ChangeLog for detail.
1130 -f (--format) AUTO is now the default option.
1132 --slocal default: 1000
1133 --llocal default: 10000
1137 Setup script will stop the installation if python version is not
1138 python2.6 or python2.7.
1140 Local lambda calculation has been changed back. MACS will check
1141 peak_region, slocal( default 1K) and llocal (default 10K) for the
1142 local bias. The previous 200bps default will cause MACS misses
1143 some peaks where the input bias is very sharp.
1145 sam2bed.py script is corrected.
1147 Relative pos in xls output is fixed.
1149 Parser for ELAND_export is fixed to pass some of the no match
1150 lines. And elandexport2bed.py is fixed too. ( however I can't
1151 guarantee that it works on any eland_export files. )
1153 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1154 Version 1.4.0alpha2 (be smarter)
1158 --gsize now provides shortcuts for common genomes, including
1159 human, mouse, C. elegans and fruitfly.
1161 --llocal now will be 5000 bps if there is no input file, so that
1162 local lambda doesn't overkill enriched binding sites.
1164 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1165 Version 1.4alpha (be smarter)
1169 --tsize option is redesigned. MACS will use the first 10 lines of
1170 the input to decide the tag size. If user specifies --tsize, it
1171 will override the auto decided tsize.
1173 --lambdaset is replaced by --slocal and --llocal which mean the
1174 small local region and large local region.
1176 --bw has no effect on the scan-window size now. It only affects the
1177 paired-peaks model process.
1181 During the model building, MACS will pick out the enriched regions
1182 which are not too high and not too low to build the paired-peak
1183 model. Default the region is from fold 10 to fold 30. If MACS
1184 fails to build the model, by default it will use the nomodel
1185 settings, like shiftsize=100bps, to shift and extend each
1186 tags. This behavior can be turned off by '--off-auto'.
1190 An extra file including all the summit positions are saved in
1191 *_summits.bed file. An option '--call-subpeaks' will invoke
1192 PeakSplitter developed by Mali Salmon to split wide peaks into
1195 * Sniff ( will in beta )
1197 Automatically recognize the input file format, so use can combine
1198 different format in one MACS run.
1200 Not implemented features/TODO:
1202 * Algorithms ( in near future? )
1204 MACS will try to refine the peak boundaries by calculating the
1205 scores for every point in the candidate peak regions. The score
1206 will be the -10*log(10,pvalue) on a local poisson distribution. A
1207 cutoff specified by users (--pvalue) will be applied to find the
1208 precise sub-peaks in the original candidate peak region. Peak
1209 boudaries and peak summits positions will be saved in separate BED
1212 * Single wiggle track ( in near future? )
1214 A single wiggle track will be generated to save the scores within
1215 candidate peak regions in the 10bps resolution. The wiggle file
1216 is in fixedStep format.
1219 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1220 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1224 Fixed typo. FCSTEP -> FESTEP
1228 The 'femax' attribute bug is fixed
1230 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1231 Version 1.3.7 (Oktoberfest)
1233 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1235 Enhancements by Peter Chines:
1237 1. gzip files are supported.
1238 2. when --diag is on, user can set the increment and endpoint for
1239 fold enrichment analysis by setting --fe-step and --fe-max.
1241 Enhancements by Davide Cittaro:
1243 1. BAM and SAM formats are supported.
1244 2. small changes in the header lines of wiggle output.
1247 1. I added --fe-min option;
1248 2. Bowtie ascii output with suffix ".map" is supported.
1252 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1253 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1254 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1255 4. Some "fold change" have been changed to "fold enrichment".
1257 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1258 Version 1.3.6.1 (default parameter change)
1260 * bin/macs, lib/PeakDetect.py
1262 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1263 default. "--futurefdr" is added which can turn on the 'new' method
1264 introduced in 1.3.6. By default it's off.
1268 Fixed a bug. p-value is corrected a little bit.
1271 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1272 Version 1.3.6 (Birthday cake)
1276 "track name" is added to the header of BED output file.
1278 Now the default peak detection method is to consider 5k and 10k
1279 nearby regions in treatment data and peak location, 1k, 5k, and
1280 10k regions in control data to calculate local bias. The old
1281 method can be called through '--old' option.
1283 Information about how many total/unique tags in treatment or
1284 control will be saved in final .xls output.
1286 * lib/IO/__init__.py
1288 ".fa" will be removed from input tag alignment so only the
1289 chromosome names are kept.
1291 WigTrackI class is added for Wiggle like data structure. (not used
1294 The parser for ELAND multi PET files has been fixed. Now the 5'
1295 tag position for a pair will be kept, whereas in the previous
1296 version, the middle points are kept.
1298 * lib/IO/BinKeeper.py
1300 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1301 browser, which can quickly access certain region for values in a
1302 large wiggle like data file. (not used now)
1304 * lib/OptValidator.py
1310 Now the default peak detection method is to consider 5k and 10k
1311 nearby regions in treatment data and peak location, 1k, 5k, and
1312 10k regions in control data to calculate local bias. The old
1313 method can be called through '--old' option.
1315 Two columns have beed added to BED output file. 4th column: peak
1316 name; 5th column: peak score using -10log(10,pvalue) as score.
1320 Add support to build a Mac App through 'setup.py py2app', or a
1321 Windows executable through 'setup.py py2exe'. You need to install
1322 py2app or py2exe package in order to use these functions.
1324 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1325 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1329 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1330 in control data to calculate local lambda for each peak. Peak
1331 calling results will be slightly different with previous version,
1336 Typo fixed, ELANDParser -> ELANDResultParser
1340 Now, modeled d value will be shown on the model figure.
1342 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1343 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1345 * macs, IO/__init__.py, PeakDetect.py
1347 Add support for ELAND multi format. Add support for Pair-End
1348 experiment, in this case, 5'end and 3'end ELAND multi format files
1349 are required for treatment or control data. See 00README file for
1352 Add wigextend option.
1354 Add petdist option for Pair-End Tag experiment, which is the best
1355 distance between 5' and 3' tags.
1359 Fixed a bug which cause the end positions of every peak region
1360 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1364 Fix bugs while generating wiggle files. The start position of
1365 wiggle file is set to 1 instead of 0.
1367 Fix a bug that every 10M bps, signals in the first 'd' range are
1368 lower than actual. ( Thanks Mali Salmon!)
1371 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1372 Version 1.3.3 (wiggle bugs fixed)
1376 Fix bugs while generating wiggle files. 1. 'span=' is added to
1377 'variableStep' line; 2. previously, every 10M bps, the coordinates
1378 were wrongly shifted to the right for 'd' basepairs.
1380 * macs, PeakDetect.py
1382 Add an option to save wiggle files on different resolution.
1384 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1385 Version 1.3.2 (tiny bugs fixed)
1389 Fix 65536 -> 65535. ( Thank Joon)
1393 Improved for binomial function with extra large number. Imported
1394 from Cistrome project.
1398 If treatment channel misses reads in some chromosome included in
1399 control channel, or vice versa, MACS will not exit. (Thank Shaun
1402 Instead, MACS will fake a tag at position -1 when calling
1403 treatment peaks vs control, but will ignore the chromosome while
1404 calling negative peaks.
1406 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1407 Version 1.3.1 (tiny bugs fixed version)
1411 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1412 binomial functions can tolerate large and small numbers.
1416 Parsers now split lines in BED/ELAND file using any
1417 whitespaces. 'track' or 'browser' lines will be regarded as
1418 comment lines. A bug fixed when throwing StrandFormatError. The
1419 maximum redundant tag number at a single position can be no less
1423 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1424 Version 1.3 (naming clarification version)
1426 * Naming clarification changes according to our manuscript:
1428 'frag_len' is changed to 'd'.
1430 'fold_change' is changed to 'fold_enrichment'.
1432 Suggest '--bw' parameter to be determined by users from the real
1435 Maximum FDR is 100% in the output file.
1437 And other clarifications in 00README file and the documents on the
1441 If the redundant tag number at a single position is over 32767,
1442 just remember 32767, instead of raising an overflow exception.
1448 Bug fixed for diagnosis report.
1451 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1456 Poisson distribution CDF and inverse CDF functions are
1457 corrected. They can produce right results even for huge lambda
1458 now. So that the p-value and FDR values in the final excel sheet
1461 IO package now can tolerate some rare cases; ELANDParser in IO
1462 package is fixed. (Thank Bogdan)
1466 Reverse paired peaks in model are rejected. So there will be no
1467 negative 'frag_len'. (Thank Bogdan)
1471 Diagnosis function is completed. Which can output a table file for
1472 users to estimate their sequencing depth.
1475 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1478 * Probe.py is added!
1480 GSL is totally removed from MACS. Instead, I have implemented the
1481 CDF and inverse CDF for poisson and binomial distribution purely
1484 * Constants.py is added!
1486 Organize constants used in MACS in the Constants.py file.
1488 * All other files are modified!
1490 Foldchange calculation is modified. Now the foldchange only be
1491 calculated at the peak summit position instead of the whole peak
1492 region. The values will be higher and more robust than before.
1496 1. MACS can save wiggle format files containing the tag number at
1497 every 10 bp along the genome. Tags are shifted according to our
1498 model before they are calculated.
1500 2. Model building and local lambda calculation can be skipped with
1503 3. A diagnosis report can be generated through '--diag'
1504 option. This report can help you get an assumption about the
1505 sequencing saturation. This funtion is only in beta stage.
1507 4. FDR calculation speed is highly improved.
1509 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1512 * TabIO, PeakModel.py ...
1513 Bug fixed to let MACS tolerate some cases while there is no tag on
1514 either plus strand or minus strand.
1517 Check the version of python. If the version is lower than 2.4,
1518 refuse to install with warning.
1521 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1522 MACS version 2.0.10 20130731 (tag:alpha)
1524 * callpeak --call-summits
1526 Fix bugs causing callpeak --call-summits option generating extra
1527 number of peaks and inconsistent peak boundaries comparing to
1528 default option. Thank Ben Levinson!
1532 Fix bugs causing bdgcmp output logLR all in positive values. Now
1533 'depletion' can be correctly represented as negative values.
1537 Fix the behavior of bdgdiff module. Now it can take four
1538 bedGraph files, then use logLR as cutoff to call differential
1539 regions. Check command line of bdgdiff for detail.
1541 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1542 MACS version 2.0.10 20130713 (tag:alpha)
1544 * fix bugs while output broadPeak and gappedPeak.
1546 Note. Those weak broad regions without any strong enrichment
1547 regions inside won't be saved in gappedPeak file.
1549 * bdgcmp -T and -C are merged into -S and description is updated.
1551 Now, you can use it to override SPMR values in your input for
1552 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1553 statistics will cause weird results ( in most cases, lower
1554 significancy), and won't be consistent with MACS2 callpeak
1555 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1556 sample size in MILLION according to 'callpeak --to-large' option.
1558 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1559 MACS version 2.0.10 20130710 (tag:alpha)
1561 * fix BED style output format of callpeak module:
1563 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1564 the output. Old BED format file won't be saved.
1566 2) with --broad: broadPeak (BED6+3) for broad region and
1567 gappedPeak (BED12+3) for chained enriched regions will be the
1568 output. Old BED format, narrowPeak format, summit file won't be
1571 * bdgcmp now can accept list of methods to calculate scores. So
1572 you can run it once to generate multiple types of scores. Thank
1573 Jon Urban for this suggestion!
1575 * C codes are re-generated through Cython 0.19.1.
1577 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1578 MACS version 2.0.10 20130520 (tag:alpha)
1580 * broad peak calling modules are modified in order to report all
1581 relexed regions even there is no strong enrichment inside.
1583 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1584 MACS version 2.0.10 20130501 (tag:alpha)
1586 * Memory usage is decreased to about 1/4-1/5 of previous usage
1587 Now, the internal data structure and algorithm are both
1588 re-organized, so that intermediate data wouldn't be saved in
1589 memory. Intead they will be calculated on the fly. New MACS2 will
1590 spend longer time (1.5 to 2 times) however it will use less memory
1591 so can be more usable on small mem servers.
1593 * --seed option is added to callpeak and randsample commands
1594 Thank Mathieu Gineste for this suggestion!
1596 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1597 MACS version 2.0.10 20130306 (tag:alpha)
1599 * diffpeak module New module to detect differential binding sites
1600 with more statistics.
1602 * Introduced --refine-peaks
1603 Calculates reads balancing to refine peak summits
1605 * Ouput file names prefix
1606 Correct encodePeak to narrowPeak, broadPeak to bed12.
1608 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1609 MACS version 2.0.10 (tag:alpha not released)
1611 * Introduced BAMPEParser
1612 Reads PE data directly, requires bedtools for now
1614 * Introduced --call-summits
1615 Uses signal processing methods to call overlapping peaks
1617 * Added --no-trackline
1618 By default, files have descriptive tracklines now
1620 * new refinepeak command (experimental)
1621 This new function will use a similar method in SPP (wtd), to
1622 analyze raw tag distribution in peak region, then redefine the
1623 peak summit where plus and minus tags are evenly distributed
1626 * Changes to output *
1627 cPeakDetect.pyx has full support for new print/write methods and
1628 --call-peaks, BAMPEParser, and use of paired-end data
1630 * Parser optimization
1632 cParser.pyx is rewritten to use io.BufferedReader to speed
1633 up. Speed is doubled.
1635 Code is reorganized -- most of functions are inherited from
1636 GenericParser class.
1638 * Use cross-correlation to calculate fragment size
1640 First, all pairs will be used in prediction for fragment
1641 size. Previously, only no more than 1000 pairs are used. Second,
1642 cross-correlation is used to find the best phase difference
1643 between + and - tag pileups.
1645 * Speed up p-value and q-value calculation
1647 This part is ten times faster now. I am using a dictionary to
1648 cache p-value results from Poisson CDF function. A bit more memory
1649 will be used to increase speed. I hope this dictionary would not
1650 explode since the possible pairs of ChIP signal and control lambda
1651 are hugely redundant. Also, I rewrited part of q-value
1654 * Speed up peak detection
1656 This part is about hundred of times faster now. Optimizations
1657 include using Numpy functions as much as possible, and making loop
1658 body as small as possible.
1660 * Post-processing on differential calls
1662 After macs2diff finds differential binding sites between two
1663 conditions, it will try to annotate the peak calls from one of two
1664 conditions, describe the changes ...
1666 * Fragment size prediction in macs2diff
1668 Now by default, macs2diff will try to use the average fragment
1669 size from both condition 1 and condition 2 for tag extension and
1670 peak calling. Previously, by default, it will use different sizes
1671 unless --nomodel is specified.
1673 Technically, I separate model building processes out. So macs2diff
1674 will build fragment sizes for condition 1 and 2 in parallel (2
1675 processes maximum), then perform 4-way comparisons in parallel (4
1680 Combine two p/qscore tracks together. At regions where condition 1
1681 is higher than condition 2, score would be positive, otherwise,
1684 * SAMParser and BAMParser
1686 Bug fixed for paired-end sequencing data.
1690 Fixed a bug while calling peaks from BedGraph file. It previously
1691 mistakenly output same peaks multiple times at the end of
1694 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1695 MACS version 2.0.9 (tag:alpha)
1697 * Auto fixation on predicted d is turned off by default!
1699 Previous --off-auto is now default. MACS will not automatically
1700 fix d less than 2 times of tag size according to
1701 --shiftsize. While tag size is getting longer nowadays, it would
1702 be easier to have d less than 2 times of tag size, however d may
1703 still be meaningful and useful. Please judge it using your own
1708 Now, the default scaling while treatment and input are unbalanced
1709 has been adjusted. By default, larger sample will be scaled down
1710 linearly to match the smaller sample. In this way, background
1711 noise will be reduced more than real signals, so we expect to have
1712 more specific results than the other way around (i.e. --to-large
1715 Also, an alternative option to randomly sample larger data
1716 (--down-sample) is provided to replace default linear
1717 scaling. However, this option will cause results irresproducible,
1722 A new script 'randsample' is added, which can randomly sample
1723 certain percentage or number of tags.
1727 Now, MACS will decide peak summits according to pileup height
1728 instead of qvalue scores. In this way, the summit may be more
1733 MACS calculate qvalue scores as differential scores. When compare
1734 two conditions (saying A and B), the maximum qscore for comparing
1735 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1736 will be computed. If maxqscore_a2b is bigger, the diff score is
1737 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1739 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1740 MACS version 2.0.8 (tag:alpha)
1742 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1744 New script bdgbroadcall and the extra option '--broad' for macs2
1745 script, can be used to call broad regions with a loose cutoff to
1746 link nearby significant regions. The output is represented as
1749 * MACS2/IO/cScoreTrack.pyx
1751 Fix q-value calculation to generate forcefully monotonic values.
1753 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1755 They are combined to one more powerful script called
1756 "filterdup". The script filterdup can filter duplicated reads
1757 according to sequencing depth and genome size. The script can also
1758 convert any format supported by MACS to BED format.
1760 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1761 MACS version 2.0.7 (tag:alpha)
1763 * bin/macsdiff renamed to bin/bdgdiff
1765 Now this script will work as a low-level finetuning tool as bdgcmp
1770 A new script to take treatment and control files from two
1771 condition, calculate fragment size, use local poisson to get
1772 pvalues and BH process to get qvalues, then combine 4-ways result
1773 to call differential sites.
1775 This script can use upto 4 cpus to speed up 4-ways calculation. (
1776 I am trying multiprocessing in python. )
1778 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1779 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1780 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1782 All above files are modified for the new macs2diff script.
1784 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1786 Now q-value 0.01 is the default cutoff. If -p is specified,
1787 p-value cutoff will be used instead.
1789 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1790 MACS version 2.0.6 (tag:alpha)
1794 A script to call differential regions. A naive way is introduced
1795 to find the regions where:
1797 1. signal from condition 1 is larger than input 1 and condition 2 --
1798 unique region in condition 1;
1799 2. signal from condition 2 is larger than input 2 and condition 1
1800 -- unique region in condition 2;
1801 3. signal from condition 1 is larger than input 1, signal from
1802 condition 2 is larger than input 2, however either signal from
1803 condition 1 or 2 is not larger than the other.
1805 Here 'larger' means the pvalue or qvalue from a Poisson test is
1806 under certain cutoff.
1808 (I will make another script to wrap up mulitple scripts for
1809 differential calling)
1811 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1812 MACS version 2.0.5 (tag:alpha)
1814 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1815 MACS2/IO/cPeakIO.pyx
1817 Use hash to store peak information. Add back the feature to deal
1818 with data without control.
1820 Fix bug which incorrectly allows small peaks at the end of
1823 * bin/bdgpeakcall, bin/bdgcmp
1825 Fix bugs. bdgpeakcall can output encodePeak format.
1827 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1828 MACS version 2.0.4 (tag:alpha)
1832 Fix a bug, correctly assign lambda_bg while --to-small is
1833 set. Thanks Junya Seo!
1835 Add rank and num of bp columns to pvalue-qvalue table.
1839 Fix bugs to correctly deal with peakless chromosomes. Thanks
1842 Use AFDR for independent tests instead.
1846 Now MACS can output peak coordinates together with pvalue, qvalue,
1847 summit positions in a single encodePeak format (designed for
1848 ENCODE project) file. This file can be loaded to UCSC
1849 browser. Definition of some specific columns are: 5th:
1850 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1851 -log10qvalue, 10th: relative summit position to peak start.
1854 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1855 MACS version 2.0.3 (tag:alpha)
1857 * Rich output with qvalue, fold enrichment, and pileup height
1859 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1862 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1864 Now we have a similiar xls output file as before. The differences
1865 from previous file are:
1867 1. Summit now is absolute summit, instead of relative summit
1869 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1870 pileup at the peak summit;
1871 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1872 5.00 means 1e-5, simple and less confusing.
1873 4. FDR column becomes '-log10(qvalue)' column.
1874 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1875 the values at the peak summit.
1877 * Extra output files
1879 NAME_pqtable.txt contains pvalue and qvalue relationships.
1881 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1882 and -log10qvalue scores in BedGraph format. Nearby regions with
1883 the same value are not merged.
1885 * Separation of FeatIO.py
1887 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1888 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1889 implemented to store pileup, local lambda, pvalue, and qvalue
1890 alltogether in cScoreTrack.pyx.
1892 * Experimental option --half-ext
1894 Suggested by NPS algorithm, I added an experimental option
1895 --half-ext to let MACS only extends ChIP fragment around its
1896 middle point for only 1/2 d.
1898 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1899 MACS version 2.0.2 (tag:alpha)
1903 Add an error check to see if there is no common chromosome names
1904 from treatment file and control file
1906 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1908 Reduce memory usage by removing deepcopy() calls.
1910 * Modify README documents and others.
1912 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1913 MACS Version 2.0.1 (tag:alpha)
1915 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1917 Jie suggested me a brilliant simple method to pileup fragments
1918 into bedGraph track. It works extremely faster than the previous
1919 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1920 large local lambda calculation in MACSv2 now. Now I generate three
1921 bedGraphs for d-size local bias, slocal-size and llocal-size local
1922 bias, and calculate the maximum local bias as local lambda
1925 Minor: add_loc in bedGraphTrackI now can correctly merge the
1926 region with its preceding region if their value are the same.
1930 Add an option to shift control tags before extension. By default,
1931 control tags will be extended to both sides regardless of strand
1934 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1935 MACS Version 2.0.0 (tag:alpha)
1937 * Use bedGraph type to store data internally and externally.
1939 We can have theoretically one-basepair resolution profiles. 10
1940 times smaller in filesize and even smaller after converting to
1941 bigWig for visualization.
1943 * Peak calling process modified. Better peak boundary detection.
1945 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1946 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1947 one will be averaged to d size) Then calculate the maximum value
1948 of these two tracks and a global background, to have a
1949 local-lambda bedGraph.
1951 Use -10log10poisson_pvalue as scores to generate a score track
1952 before peak calling.
1954 A general peak calling based on a score cutoff, min length of peak
1955 and max gap between nearby peaks.
1959 Wiggle file output is removed. Now we only support bedGraph
1960 output. The generation of bedGraph is highly recommended since it
1961 will not cost extra time. In other words, bedGraph generation is
1962 internally run even you don't want to save bedGraphs on disk, due
1963 to the peak calling algorithm in MACS v2.
1967 We now can calculate poisson pvalue in log space so that the score
1968 (-10*log10pvalue) will not have a upper limit of 3100 due to
1969 precision of float number.
1971 * Cython is adopted to speed up Python code.
1973 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1976 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1978 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1979 Version 1.4.0rc2 (Valentine)
1981 * --single-wig option is renamed to --single-profile
1983 * BedGraph output with --bdg or -B option.
1985 The BedGraph output provides 1bp resolution fragment pileup
1986 profile. File size is smaller than wig file. This option can be
1987 combined with --single-profile option to produce a bedgraph file
1988 for the whole genome. This option can also make --space,
1989 --call-subpeaks invalid.
1991 * Fix the description of --shiftsize to correctly state that the
1992 value is 1/2 d (fragment size).
1994 * Fix a bug in the call to __filter_w_control_tags when control is
1997 * Fix a bug on --to-small option. Now it works as expected.
1999 * Fix a bug while counting the tags in candidate peak region, an
2000 extra tag may be included. (Thanks to Jake Biesinger!)
2002 * Fix the bug for the peaks extended outside of chromosome
2003 start. If the minus strand tag goes outside of chromosome start
2004 after extension of d, it will be thrown out.
2006 * Post-process script for a combined wig file:
2008 The "wignorm" command can be called after a full run of MACS14 as
2009 a postprocess. wignorm can calculate the local background from the
2010 control wig file from MACS14, then use either foldchange,
2011 -10*log10(pvalue) from possion test, or difference after asinh
2012 transformation as the score to build a single wig track to
2013 represent the binding strength. This script will take a
2014 significant long time to process.
2016 * --wigextend has been obsoleted.
2018 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
2019 Version 1.4.0rc1 (Starry Sky)
2021 * Duplicate reads option
2023 --keep-dup behavior is changed. Now user can specify how many
2024 reads he/she wants to keep at the same genomic location. 'auto' to
2025 let MACS decide the number based on binomial distribution, 'all'
2026 to let MACS keep all reads.
2028 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2030 By default, MACS will now scale the smaller dataset to the bigger
2031 dataset. For instance, if IP has 10 million reads, and Input has 5
2032 million, MACS will double the lambda value calculated from Input
2033 reads while calling BOTH the positive peaks and negative
2034 peaks. This will address the issue caused by unbalanced numbers of
2035 reads from IP and Input. If --to-small is turned on, MACS will
2036 scale the larger dataset to the smaller one. So from now on, if d
2037 is fixed, then the peaks from a MACS call for A vs B should be
2038 identical to the negative peaks from a B vs A.
2040 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2041 Version 1.4.0beta (summer wishes)
2047 The default behavior in the model building step is slightly
2048 changed. When MACS can't find enough pairs to build model
2049 (implemented in alpha version) or the modeled fragment length is
2050 less than 2 times of tag length (implemented in beta version),
2051 MACS will use 2 times of --shiftsize value as fragment length in
2052 the later analysis. --off-auto can turn off this default behavior.
2054 ** Redundant tag filtering
2056 The IO module is rewritten. The redundant tag filtering process
2057 becomes simpler and works as promise. The maximum allowed number
2058 of tags at the exact same location is calculated from the
2059 sequencing depth and genome size using a binomial distribution,
2060 for both TREAMENT and CONTROL separately. ( previously only
2061 TREATMENT is considered ) The exact same location means the same
2062 coordination and the same strand. Then MACS will only keep at most
2063 this number of tags at the exact same location in the following
2064 analysis. An option --keep-dup can let MACS skip the filtering and
2065 keep all the tags. However this may bring in a lot of sequencing
2066 bias, so you may get many false positive peaks.
2068 ** Single wiggle mode
2070 First thing to mention, this is not the score track that I
2071 described before. By default, MACS generates wiggle files for
2072 fragment pileup for every chromosomes separately. When you use
2073 --single-wig option, MACS will generate a single wiggle file for
2074 all the chromosomes so you will get a wig.gz for TREATMENT and
2075 another wig.gz for CONTROL if available.
2077 ** Sniff -- automatic format detection
2079 Now, by default or "-f AUTO", MACS will decide the input file
2080 format automatically. Technically, it will try to read at most
2081 1000 records for the first 10 non-comment lines. If it succeeds,
2082 the format is decided. I recommend not to use AUTO and specify the
2083 right format for your input files, unless you combine different
2084 formats in a single MACS run.
2088 --single-wig and --keep-dup are added. Check previous section in
2089 ChangeLog for detail.
2091 -f (--format) AUTO is now the default option.
2093 --slocal default: 1000
2094 --llocal default: 10000
2098 Setup script will stop the installation if python version is not
2099 python2.6 or python2.7.
2101 Local lambda calculation has been changed back. MACS will check
2102 peak_region, slocal( default 1K) and llocal (default 10K) for the
2103 local bias. The previous 200bps default will cause MACS misses
2104 some peaks where the input bias is very sharp.
2106 sam2bed.py script is corrected.
2108 Relative pos in xls output is fixed.
2110 Parser for ELAND_export is fixed to pass some of the no match
2111 lines. And elandexport2bed.py is fixed too. ( however I can't
2112 guarantee that it works on any eland_export files. )
2114 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2115 Version 1.4.0alpha2 (be smarter)
2119 --gsize now provides shortcuts for common genomes, including
2120 human, mouse, C. elegans and fruitfly.
2122 --llocal now will be 5000 bps if there is no input file, so that
2123 local lambda doesn't overkill enriched binding sites.
2125 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2126 Version 1.4alpha (be smarter)
2130 --tsize option is redesigned. MACS will use the first 10 lines of
2131 the input to decide the tag size. If user specifies --tsize, it
2132 will override the auto decided tsize.
2134 --lambdaset is replaced by --slocal and --llocal which mean the
2135 small local region and large local region.
2137 --bw has no effect on the scan-window size now. It only affects the
2138 paired-peaks model process.
2142 During the model building, MACS will pick out the enriched regions
2143 which are not too high and not too low to build the paired-peak
2144 model. Default the region is from fold 10 to fold 30. If MACS
2145 fails to build the model, by default it will use the nomodel
2146 settings, like shiftsize=100bps, to shift and extend each
2147 tags. This behavior can be turned off by '--off-auto'.
2151 An extra file including all the summit positions are saved in
2152 *_summits.bed file. An option '--call-subpeaks' will invoke
2153 PeakSplitter developed by Mali Salmon to split wide peaks into
2156 * Sniff ( will in beta )
2158 Automatically recognize the input file format, so use can combine
2159 different format in one MACS run.
2161 Not implemented features/TODO:
2163 * Algorithms ( in near future? )
2165 MACS will try to refine the peak boundaries by calculating the
2166 scores for every point in the candidate peak regions. The score
2167 will be the -10*log(10,pvalue) on a local poisson distribution. A
2168 cutoff specified by users (--pvalue) will be applied to find the
2169 precise sub-peaks in the original candidate peak region. Peak
2170 boudaries and peak summits positions will be saved in separate BED
2173 * Single wiggle track ( in near future? )
2175 A single wiggle track will be generated to save the scores within
2176 candidate peak regions in the 10bps resolution. The wiggle file
2177 is in fixedStep format.
2180 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2181 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2185 Fixed typo. FCSTEP -> FESTEP
2189 The 'femax' attribute bug is fixed
2191 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2192 Version 1.3.7 (Oktoberfest)
2194 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2196 Enhancements by Peter Chines:
2198 1. gzip files are supported.
2199 2. when --diag is on, user can set the increment and endpoint for
2200 fold enrichment analysis by setting --fe-step and --fe-max.
2202 Enhancements by Davide Cittaro:
2204 1. BAM and SAM formats are supported.
2205 2. small changes in the header lines of wiggle output.
2208 1. I added --fe-min option;
2209 2. Bowtie ascii output with suffix ".map" is supported.
2213 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2214 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2215 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2216 4. Some "fold change" have been changed to "fold enrichment".
2218 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2219 Version 1.3.6.1 (default parameter change)
2221 * bin/macs, lib/PeakDetect.py
2223 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2224 default. "--futurefdr" is added which can turn on the 'new' method
2225 introduced in 1.3.6. By default it's off.
2229 Fixed a bug. p-value is corrected a little bit.
2232 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2233 Version 1.3.6 (Birthday cake)
2237 "track name" is added to the header of BED output file.
2239 Now the default peak detection method is to consider 5k and 10k
2240 nearby regions in treatment data and peak location, 1k, 5k, and
2241 10k regions in control data to calculate local bias. The old
2242 method can be called through '--old' option.
2244 Information about how many total/unique tags in treatment or
2245 control will be saved in final .xls output.
2247 * lib/IO/__init__.py
2249 ".fa" will be removed from input tag alignment so only the
2250 chromosome names are kept.
2252 WigTrackI class is added for Wiggle like data structure. (not used
2255 The parser for ELAND multi PET files has been fixed. Now the 5'
2256 tag position for a pair will be kept, whereas in the previous
2257 version, the middle points are kept.
2259 * lib/IO/BinKeeper.py
2261 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2262 browser, which can quickly access certain region for values in a
2263 large wiggle like data file. (not used now)
2265 * lib/OptValidator.py
2271 Now the default peak detection method is to consider 5k and 10k
2272 nearby regions in treatment data and peak location, 1k, 5k, and
2273 10k regions in control data to calculate local bias. The old
2274 method can be called through '--old' option.
2276 Two columns have beed added to BED output file. 4th column: peak
2277 name; 5th column: peak score using -10log(10,pvalue) as score.
2281 Add support to build a Mac App through 'setup.py py2app', or a
2282 Windows executable through 'setup.py py2exe'. You need to install
2283 py2app or py2exe package in order to use these functions.
2285 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2286 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2290 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2291 in control data to calculate local lambda for each peak. Peak
2292 calling results will be slightly different with previous version,
2297 Typo fixed, ELANDParser -> ELANDResultParser
2301 Now, modeled d value will be shown on the model figure.
2303 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2304 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2306 * macs, IO/__init__.py, PeakDetect.py
2308 Add support for ELAND multi format. Add support for Pair-End
2309 experiment, in this case, 5'end and 3'end ELAND multi format files
2310 are required for treatment or control data. See 00README file for
2313 Add wigextend option.
2315 Add petdist option for Pair-End Tag experiment, which is the best
2316 distance between 5' and 3' tags.
2320 Fixed a bug which cause the end positions of every peak region
2321 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2325 Fix bugs while generating wiggle files. The start position of
2326 wiggle file is set to 1 instead of 0.
2328 Fix a bug that every 10M bps, signals in the first 'd' range are
2329 lower than actual. ( Thanks Mali Salmon!)
2332 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2333 Version 1.3.3 (wiggle bugs fixed)
2337 Fix bugs while generating wiggle files. 1. 'span=' is added to
2338 'variableStep' line; 2. previously, every 10M bps, the coordinates
2339 were wrongly shifted to the right for 'd' basepairs.
2341 * macs, PeakDetect.py
2343 Add an option to save wiggle files on different resolution.
2345 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2346 Version 1.3.2 (tiny bugs fixed)
2350 Fix 65536 -> 65535. ( Thank Joon)
2354 Improved for binomial function with extra large number. Imported
2355 from Cistrome project.
2359 If treatment channel misses reads in some chromosome included in
2360 control channel, or vice versa, MACS will not exit. (Thank Shaun
2363 Instead, MACS will fake a tag at position -1 when calling
2364 treatment peaks vs control, but will ignore the chromosome while
2365 calling negative peaks.
2367 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2368 Version 1.3.1 (tiny bugs fixed version)
2372 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2373 binomial functions can tolerate large and small numbers.
2377 Parsers now split lines in BED/ELAND file using any
2378 whitespaces. 'track' or 'browser' lines will be regarded as
2379 comment lines. A bug fixed when throwing StrandFormatError. The
2380 maximum redundant tag number at a single position can be no less
2384 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2385 Version 1.3 (naming clarification version)
2387 * Naming clarification changes according to our manuscript:
2389 'frag_len' is changed to 'd'.
2391 'fold_change' is changed to 'fold_enrichment'.
2393 Suggest '--bw' parameter to be determined by users from the real
2396 Maximum FDR is 100% in the output file.
2398 And other clarifications in 00README file and the documents on the
2402 If the redundant tag number at a single position is over 32767,
2403 just remember 32767, instead of raising an overflow exception.
2409 Bug fixed for diagnosis report.
2412 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2417 Poisson distribution CDF and inverse CDF functions are
2418 corrected. They can produce right results even for huge lambda
2419 now. So that the p-value and FDR values in the final excel sheet
2422 IO package now can tolerate some rare cases; ELANDParser in IO
2423 package is fixed. (Thank Bogdan)
2427 Reverse paired peaks in model are rejected. So there will be no
2428 negative 'frag_len'. (Thank Bogdan)
2432 Diagnosis function is completed. Which can output a table file for
2433 users to estimate their sequencing depth.
2436 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2439 * Probe.py is added!
2441 GSL is totally removed from MACS. Instead, I have implemented the
2442 CDF and inverse CDF for poisson and binomial distribution purely
2445 * Constants.py is added!
2447 Organize constants used in MACS in the Constants.py file.
2449 * All other files are modified!
2451 Foldchange calculation is modified. Now the foldchange only be
2452 calculated at the peak summit position instead of the whole peak
2453 region. The values will be higher and more robust than before.
2457 1. MACS can save wiggle format files containing the tag number at
2458 every 10 bp along the genome. Tags are shifted according to our
2459 model before they are calculated.
2461 2. Model building and local lambda calculation can be skipped with
2464 3. A diagnosis report can be generated through '--diag'
2465 option. This report can help you get an assumption about the
2466 sequencing saturation. This funtion is only in beta stage.
2468 4. FDR calculation speed is highly improved.
2470 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2473 * TabIO, PeakModel.py ...
2474 Bug fixed to let MACS tolerate some cases while there is no tag on
2475 either plus strand or minus strand.
2478 Check the version of python. If the version is lower than 2.4,
2479 refuse to install with warning.