1 2023-06-15 Tao Liu <vladimir.liu@gmail.com>
3 * New features in MACS3:
5 1) Speed/memory optimization. Use the cykhash to replace python
6 dictionary. Use buffer (10MB) to read and parse input file (not
7 available for BAM file parser). And many optimization tweaks. We
8 added memory monitoring to the runtime messages.
10 2) Code cleanup. Reorganize source codes.
14 4) R wrappers for MACS -- MACSr
16 5) Switch to Github Action for CI, support multi-arch testing
17 including x64, armv7, aarch64, s390x and ppc64le.
19 6) MACS tag-shifting model has been refined. Now it will use a
20 naive peak calling approach to find ALL possible paired peaks at +
21 and - strand, then use all of them to calculate the
22 cross-correlation. (a related bug has been fix #442)
24 7) Call variants in peak regions directly from BAM files. The
25 function was originally developed under code name SAPPER. Now
26 SAPPER has been merged into MACS. Also, `simde` has been added as
27 a submodule in order to support fermi-lite library under non-x64
30 8) BAI index and random access to BAM file now is supported. #449
31 And user can use original BAM file (instead of the subset of BAM
32 file as in SAPPER) in the `callvar` command.
34 9) Support of Python > 3.10 #497 #498
36 10) HMMRATAC module is added. HMMRATAC is a dedicated software to
37 analyze ATAC-seq data. The basic idea behind HMMRATAC is to digest
38 ATAC-seq data according to the fragment length of read pairs into
39 four signal tracks: short fragments, mononucleosomal fragments,
40 di-nucleosomal fragments and tri-nucleosomal fragments. Then
41 integrate the four tracks again using Hidden Markov Model to
42 consider three hidden states: open region, nucleosomal region, and
43 background region. The orginal paper was published in 2019 written
44 in JAVA, by Evan Tarbell. We implemented it in Python/Cython and
45 optimize the whole process using existing MACS functions and
46 hmmlearn. Now it can run much faster than the original JAVA
47 version. Note: evaluation of the peak calling results is underway.
49 11) The effective genome size parameters have been updated
50 according to deeptools. #508
52 12) Multiple updates regarding dependencies, anaconda built, CI/CD
56 1) Missing header line while no peaks can be called #501 #502
58 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
63 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
66 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
71 1) MACS2 has been tested on multiple architectures to make sure it
72 can successfully generate consistent results. Currently the
73 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
74 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
75 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
76 and #370. The lesson is that if the project is built on Cython and
77 is aimed at memory efficiency, we should specifically define all
78 int/float types in pyx files such as int8_t or uint32_t using
79 either libc or numpy (c version) instead of relying on Cython
80 types such as short, long, double.
82 2) MACS2 setup script will check numpy and install numpy if
83 necessary. PR #378, issue #364
85 3) `bdgbroadcall` command will correctly add the score column (5th
86 column). The score (5th) column contains 10 times of the average
87 score in the broad region. PR #373, issue #362
89 4) The missing test on `bdgopt` subcommand has been added. PR #363
91 5) The obsolete option `--ratio` from `callpeak` subcommand has
92 been removed. PR #369, issue #366
94 6) Fixed the incorrect description in README on the 'maximum
95 length of broad region is 4 times of d' to 'maximum gap for
96 merging broad regions is 4 times of tag size by default'. PR #380,
101 1) CODE OF CONDUCT document has been added to MACS2 github
104 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
109 1) Speed up MACS2. Some programming tricks and code cleanup. The
110 filter_dup function replaces separate_dups. The later one was
111 implemented for potentially putting back duplicate reads in
112 certain downstream analysis. However such analysis hasn't been
113 implemented. Optimize the speed of writing bedGraph
114 files. Optimize BAM and BAMPE parsing with pointer casting instead
117 2) The comment lines in the headers of BED or SAM files will be
118 correctly skipped. However, MACS2 won't check comment lines in the
123 1) Cutoff-analysis in callpeak command. #341
125 2) Issues related to SAMParser and three ELAND Parsers are
130 1) cmdlinetest script in test/ folder has been updated to: 1. test
131 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
132 and after the error or warning message during tests; 3. output
133 only the first 10 lines if the difference between test result and
134 standard result can be found; 4. prockreport monitor CPU time and
135 memory usage in 1 sec interval -- a bit more accurate.
137 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
139 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
140 MACS version 2.2.5 (Py3 speed up)
144 1) *Github code only and Not included in MACS2 release* New
145 testing data for performance test. An subsampled ENCODE2 CTCF
146 ChIP-seq dataset, including 5million ChIP reads and 5 million
147 control reads, has been included in the test folder for testing
148 CPU and memory usage (i.e. 5M test). Several related scripts ,
149 including `prockreport` for output cpu memory usage, `pyprofile`
150 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
153 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
154 The old hashtable.pyx implementation copied from Pandas (very old
155 version) doesn't work well in Python3+Cython. It slows down the
156 pqtable checkup using the identical Cython codes as in
157 v2.1.4. While running 5M test, the `__getitem__` function in the
158 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
159 148.6s with the same number of calls in MACS2 v2.2.4. As a
160 consequence, the standard python dictionary implementation has
161 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
162 faster than py2 version, but uses a bit more memory. In general,
163 v2.2.5 can finish 5M reads test in 20% less time than MACS2
164 v2.1.4, but use 15% more memory.
168 1) More Python3 related fixes, e.g. the return value of keys from
172 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
173 MACS version 2.2.4 (Python3)
177 1) First Python3 version MACS2 released.
179 2) Version number 2.2.X will be used for MACS2 in Python3, in
182 3) More comprehensive test.sh script to check the consistency of
183 results from Python2 version and Python3 version.
185 4) Simplify setup.py script since the newest version transparently
186 supports cython. And when cython is not installed by the user,
187 setup.py can still compile using only C codes.
189 5) Fix Signal.pyx to use np.array instead of np.mat.
191 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
196 Github Actions is used together with Travis CI for testing and
203 1) #318 Random score in bdgdiff output. It turns out the sum_v is
204 not initialized as 0 before adding. Potential bugs are fixed in
205 other functions in ScoreTrack and CallPeakUnit codes.
207 2) #321 Cython dependency in setup.py script is removed. And place
208 'cythonzie' call to the correct position.
210 3) A typo is fixed in Github Actions script.
212 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
217 1) Support Docker auto-deploy. PR #309
219 2) Support Travis CI auto-testing, update unit-testing
220 scripts, and enable subcommand testing on small datasets.
222 3) Update README documents. #297 PR #306
224 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
225 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
227 5) `--d-min` option is added in `callpeak` and `predictd`, to
228 exclude predictions of fragment size smaller than the given
229 value. Merged from PR #267 @shouldsee.
231 6) `--buffer-size` option is added in `predictd`, `filterdup`,
232 `pileup` and `refinepeak` subcommands. Users can use this option
233 to decrease memory usage while there are a large number of contigs
234 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
235 `pileup` and `refinepeak` will suggest users to tweak
236 `--buffer-size` while catching a MemoryError. #313 PR #314
240 1) #265 Fixed a bug where the pseudocount hasn't been applied
241 while calculating p-value score in ScoreTrack object.
243 2) Fixed bdgbroadcall so that it will report those broad peaks
244 without strong peak inside, a consistent behavior as `callpeak
247 3) Rename COPYING to LICENSE.
249 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
254 1) Added missing BEDPE support. And enable the support for BAMPE
255 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
256 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
257 will pile up the whole fragment defined by mapping locations of
258 the left end and right end of each read pair. Thank @purcaro
260 2) Added options to callpeak command for tweaking max-gap and
261 min-len during peak calling. Thank @jsh58!
263 3) The callpeak option "--to-large" option is replaced with
266 4) The randsample option "-t" has been replaced with "-i".
270 1) Fixed memory issue related to #122 and #146
272 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
274 3) Fixed a bug while setting commandline qvalue cutoff.
276 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
278 5) Fixed the calculation of average fragment length for paired-end
281 6) Fixed bugs caused by khash while computing p/q-value and log
282 likelihood ratios. Thank @jsh58
284 7) More spelling tweaks in source code. Thank @mr-c
286 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
287 MACS version 2.1.1 20160309
291 * Fixed spelling. Merged pull request #120. Thank @mr-c!
293 * Change filtering criteria for reading BAM/SAM files
295 Related to callpeak and filterdup commands. Now the
296 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
297 still be read although MACS2 may decide them as duplicates
298 later. Related to old issue #33. Sorry I forgot to address it for
301 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
302 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
306 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
307 the former option is not supported by older GCC. Related to issues
310 2) Issue #108 is fixed. If no peak can be found in a chromosome,
311 the PeakIO won't throw an error.
317 a) A more flexible format, BEDPE, is supported. Now users can
318 define the left and right position of the ChIPed fragment, and
319 MACS2 will skip model building and directly pileup the
320 fragments. Related to issue #112.
322 b) The 'tempdir' can be specified, to save cached pileup
323 tracks. Originially, the temporary files were stored in
324 /tmp. Thank @daler! Related to issues #97 and #105.
328 New operations are added, to calculate the maximum or minimum value between
329 values in BEDGRAPH and given value.
333 New method is added, to calculate the maximum value between values
334 defined in two BEDGRAPH files.
336 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
337 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
341 1) Fix a bug while dealing with some chromosomes only containing
342 one read (pair). The size of dup_plus/dup_minus arrays after
343 filtering dups should +1.
345 2) Fix a bug related to the broad peak calling function in
346 previous versions. The gaps were miscalculated, so segmented weak
347 broad calls may be reported, and sometimes you would see peaks
348 with lower than cutoff values in the output files.
350 3) "Potentially" Fixed issue #105 on temporary cache files, need
354 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
355 MACS version 2.1.0 20150731 (tag:rc)
359 1) Fixed issue #76: information about broad/narrow cutoff will be
362 2) Fixed issue #79: bdgopt extparam option is fixed.
364 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
365 for filterdup command.
367 4) Fixed issue #78, #88 and similar issue reported in MACS google
368 group: MACS2 now can correctly deal with multiple alignment files
369 for -t or -c. The 'finalize' function will be correctly
370 called. Multiple files option is enabled for filterdup,
371 randsample, predictd, pileup and refinepeak commands.
373 5) A related issue to #88, when BAMPE mode is used, PE pairs will
374 be sorted by leftmost then rightmost ends.
376 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
377 array. This will cause 'callpeak --nolambda' hang forever while
378 calculating pvalues and qvalues.
380 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
381 MACS version 2.1.0 20150420 (tag:rc)
385 1) bdgopt: some convenient functions to modify bedGraph files.
387 2) cmbreps: Combine scores from two replicates. Including three
388 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
389 method to combine two p-value scores. After that, user can use
390 bdgpeakcall to call peaks on combined scores.
394 1) callpeak and bdgpeakcall now can try to analyze the
395 relationship between p-values and number/length of peaks then
396 generate a summary to help users decide an appropriate cutoff.
398 2) callpeak now can accept fold-enrichment cutoff as a filter for
403 Now MACS2 runs about 3X as fast as previous version. Trade
404 clean python codes for speed... Now while processing 50M ChIP vs
405 50M control, it will take only 10 minutes.
409 1) Sampling function in BAMPE mode.
411 2) Callpeak while there are >= 2 input files for -t or -c.
413 3) While reading BAM/SAM, those secondary or supplementary
414 alignments will be correctly skipped.
416 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
417 option that MACS2 will discard those SAM/BAM alignments with bit
418 1024 no matter how --keep-dup is set.
420 5) Fixed issue #49: setuptools is used intead of distutils
422 6) Fixed issue #51: fix the problem when using --trackline
423 argument when control file is absent.
425 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
426 read mapped to minus strand. Previous implementation will find
427 incorrect 5' end if there is indel in alignment.
429 8) Fixed issue #56: An incorrect sorting method used for BAMPE
430 mode which will cause incorrect filtering of duplicated reads. Now
433 9) Issue #63: Merged from jayhesselberth@github, extsize now can
436 10) Issue #71: Merged from aertslab@github, close file descriptor
437 after creating them with mkstemp().
439 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
440 MACS version 2.1.0 20140616 (tag:rc)
444 "--ratio" is added to manually assign the scaling factor of ChIP
445 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
446 implementing the patch file!
448 "--shift" is added to move cutting ends (5' end of reads) around,
449 in order to process DNAse-Seq data, e.g., use "--shift -100
450 --extsize 200" to get 200bps fragments around 5' ends. For general
451 ChIP-Seq data analysis, this option should be always set as
452 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
455 ** Do not output negative fragment size from cross-correlation
456 analysis. Thank Alvin Qin for the feedback!
458 ** --half-ext and --control-shift are removed. For complex read
459 shifting and extending, combine '--shift' and '--extsize'
460 options. For comparing two conditions, use 'bdgdiff' module
463 ** a bug is fixed to output the last pileup value in bdg file
468 A 'dry-run' option is added to only output numbers, including the
469 number of allowed duplicates, the total number of reads before and
470 after filtering duplicates and the estimated duplication
471 rate. Thank John Urban for the suggestion!
474 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
475 MACS version 2.0.10 20131216 (tag:alpha)
479 * We changed license from Artistic License to 3-clauses BSD license.
481 Yes. Simpler the better.
483 * Process paired-end data with "-f BAMPE" without control
485 * GappedPeak output for --broad option has been fixed again to be
486 consistent with official UCSC format. We add 1bp pseudo-block to
487 left and/or right of broad region when necessary, so that you can
488 virtualize the regions without strong enrichment inside
489 successfully. In downstream analysis except for virtualization,
490 you may need to remove all 1bps blocks from gappedPeak file.
492 * diffpeak subcommand is temporarily disabled. Till we
495 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
496 MACS version 2.0.10 20131028 (tag:alpha)
498 * callpeak --call-summits improvement
500 The smoothing window length has been fixed as fragment length
501 instead of short read length. The larger smoothing window will
502 grant better smoothing results and better sub-peak summits
505 * --outdir and --ofile options for almost all commands
507 Thank Björn Grüning for initially implementing these options!
508 Now, MACS2 will save results into a specified
509 directory by '--outdir' option, and/or save result into a
510 specified file by '--ofile' option. Note, in case '--ofile' is
511 available for a subcommand, '-o' now has been adjusted to be the
512 same as '--ofile' instead of '--o-prefix'.
514 Here is the list of changes. For more detail, use 'macs2 xxx -h'
517 ** callpeak: --outdir
518 ** diffpeak: Not implemented
519 ** bdgpeakcall: --outdir and --ofile
520 ** bdgbroadcall: --outdir and --ofile
521 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
522 and the order of arguments for --ofile must be the same as for -m.
523 ** bdgdiff: --outdir and --ofile
524 ** filterdup: --outdir
526 ** randsample: --outdir
527 ** refinepeak: --outdir and --ofile
530 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
531 MACS version 2.0.10 20130915 (tag:alpha)
533 * callpeak Added a new option --buffer-size
535 This option is to tweak a previously hidden parameter that
536 controls the steps to increase array size for storing alignment
537 information. While in some rare cases, the number of
538 chromosomes/contigs/scaffolds is huge, the original default
539 setting will cause a huge memory waste. In these cases, we
540 recommend to decrease --buffer-size (e.g., 1000) to save memory,
541 although the decrease will slow process to read alignment files.
543 * an optimization to speed up pvalue-qvalue statistics
545 Previously, it took a hour to prepare p-q-table for 65M vs 65M
546 human TF library, and now it will take 10 minutes. It was due to a
547 single line of code to get a value from a numpy array ...
551 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
552 MACS version 2.0.10 20130731 (tag:alpha)
554 * callpeak --call-summits
556 Fix bugs causing callpeak --call-summits option generating extra
557 number of peaks and inconsistent peak boundaries comparing to
558 default option. Thank Ben Levinson!
562 Fix bugs causing bdgcmp output logLR all in positive values. Now
563 'depletion' can be correctly represented as negative values.
567 Fix the behavior of bdgdiff module. Now it can take four
568 bedGraph files, then use logLR as cutoff to call differential
569 regions. Check command line of bdgdiff for detail.
571 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
572 MACS version 2.0.10 20130713 (tag:alpha)
574 * fix bugs while output broadPeak and gappedPeak.
576 Note. Those weak broad regions without any strong enrichment
577 regions inside won't be saved in gappedPeak file.
579 * bdgcmp -T and -C are merged into -S and description is updated.
581 Now, you can use it to override SPMR values in your input for
582 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
583 statistics will cause weird results ( in most cases, lower
584 significancy), and won't be consistent with MACS2 callpeak
585 behavior. So if you have SPMR bedGraphs, input the smaller/larger
586 sample size in MILLION according to 'callpeak --to-large' option.
588 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
589 MACS version 2.0.10 20130710 (tag:alpha)
591 * fix BED style output format of callpeak module:
593 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
594 the output. Old BED format file won't be saved.
596 2) with --broad: broadPeak (BED6+3) for broad region and
597 gappedPeak (BED12+3) for chained enriched regions will be the
598 output. Old BED format, narrowPeak format, summit file won't be
601 * bdgcmp now can accept list of methods to calculate scores. So
602 you can run it once to generate multiple types of scores. Thank
603 Jon Urban for this suggestion!
605 * C codes are re-generated through Cython 0.19.1.
607 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
608 MACS version 2.0.10 20130520 (tag:alpha)
610 * broad peak calling modules are modified in order to report all
611 relexed regions even there is no strong enrichment inside.
613 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
614 MACS version 2.0.10 20130501 (tag:alpha)
616 * Memory usage is decreased to about 1/4-1/5 of previous usage
617 Now, the internal data structure and algorithm are both
618 re-organized, so that intermediate data wouldn't be saved in
619 memory. Intead they will be calculated on the fly. New MACS2 will
620 spend longer time (1.5 to 2 times) however it will use less memory
621 so can be more usable on small mem servers.
623 * --seed option is added to callpeak and randsample commands
624 Thank Mathieu Gineste for this suggestion!
626 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
627 MACS version 2.0.10 20130306 (tag:alpha)
629 * diffpeak module New module to detect differential binding sites
630 with more statistics.
632 * Introduced --refine-peaks
633 Calculates reads balancing to refine peak summits
635 * Ouput file names prefix
636 Correct encodePeak to narrowPeak, broadPeak to bed12.
638 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
639 MACS version 2.0.10 (tag:alpha not released)
641 * Introduced BAMPEParser
642 Reads PE data directly, requires bedtools for now
644 * Introduced --call-summits
645 Uses signal processing methods to call overlapping peaks
647 * Added --no-trackline
648 By default, files have descriptive tracklines now
650 * new refinepeak command (experimental)
651 This new function will use a similar method in SPP (wtd), to
652 analyze raw tag distribution in peak region, then redefine the
653 peak summit where plus and minus tags are evenly distributed
656 * Changes to output *
657 cPeakDetect.pyx has full support for new print/write methods and
658 --call-peaks, BAMPEParser, and use of paired-end data
660 * Parser optimization
662 cParser.pyx is rewritten to use io.BufferedReader to speed
663 up. Speed is doubled.
665 Code is reorganized -- most of functions are inherited from
668 * Use cross-correlation to calculate fragment size
670 First, all pairs will be used in prediction for fragment
671 size. Previously, only no more than 1000 pairs are used. Second,
672 cross-correlation is used to find the best phase difference
673 between + and - tag pileups.
675 * Speed up p-value and q-value calculation
677 This part is ten times faster now. I am using a dictionary to
678 cache p-value results from Poisson CDF function. A bit more memory
679 will be used to increase speed. I hope this dictionary would not
680 explode since the possible pairs of ChIP signal and control lambda
681 are hugely redundant. Also, I rewrited part of q-value
684 * Speed up peak detection
686 This part is about hundred of times faster now. Optimizations
687 include using Numpy functions as much as possible, and making loop
688 body as small as possible.
690 * Post-processing on differential calls
692 After macs2diff finds differential binding sites between two
693 conditions, it will try to annotate the peak calls from one of two
694 conditions, describe the changes ...
696 * Fragment size prediction in macs2diff
698 Now by default, macs2diff will try to use the average fragment
699 size from both condition 1 and condition 2 for tag extension and
700 peak calling. Previously, by default, it will use different sizes
701 unless --nomodel is specified.
703 Technically, I separate model building processes out. So macs2diff
704 will build fragment sizes for condition 1 and 2 in parallel (2
705 processes maximum), then perform 4-way comparisons in parallel (4
710 Combine two p/qscore tracks together. At regions where condition 1
711 is higher than condition 2, score would be positive, otherwise,
714 * SAMParser and BAMParser
716 Bug fixed for paired-end sequencing data.
720 Fixed a bug while calling peaks from BedGraph file. It previously
721 mistakenly output same peaks multiple times at the end of
724 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
725 MACS version 2.0.9 (tag:alpha)
727 * Auto fixation on predicted d is turned off by default!
729 Previous --off-auto is now default. MACS will not automatically
730 fix d less than 2 times of tag size according to
731 --shiftsize. While tag size is getting longer nowadays, it would
732 be easier to have d less than 2 times of tag size, however d may
733 still be meaningful and useful. Please judge it using your own
738 Now, the default scaling while treatment and input are unbalanced
739 has been adjusted. By default, larger sample will be scaled down
740 linearly to match the smaller sample. In this way, background
741 noise will be reduced more than real signals, so we expect to have
742 more specific results than the other way around (i.e. --to-large
745 Also, an alternative option to randomly sample larger data
746 (--down-sample) is provided to replace default linear
747 scaling. However, this option will cause results irresproducible,
752 A new script 'randsample' is added, which can randomly sample
753 certain percentage or number of tags.
757 Now, MACS will decide peak summits according to pileup height
758 instead of qvalue scores. In this way, the summit may be more
763 MACS calculate qvalue scores as differential scores. When compare
764 two conditions (saying A and B), the maximum qscore for comparing
765 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
766 will be computed. If maxqscore_a2b is bigger, the diff score is
767 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
769 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
770 MACS version 2.0.8 (tag:alpha)
772 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
774 New script bdgbroadcall and the extra option '--broad' for macs2
775 script, can be used to call broad regions with a loose cutoff to
776 link nearby significant regions. The output is represented as
779 * MACS2/IO/cScoreTrack.pyx
781 Fix q-value calculation to generate forcefully monotonic values.
783 * bin/eland*2bed, bin/sam2bed and bin/filterdup
785 They are combined to one more powerful script called
786 "filterdup". The script filterdup can filter duplicated reads
787 according to sequencing depth and genome size. The script can also
788 convert any format supported by MACS to BED format.
790 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
791 MACS version 2.0.7 (tag:alpha)
793 * bin/macsdiff renamed to bin/bdgdiff
795 Now this script will work as a low-level finetuning tool as bdgcmp
800 A new script to take treatment and control files from two
801 condition, calculate fragment size, use local poisson to get
802 pvalues and BH process to get qvalues, then combine 4-ways result
803 to call differential sites.
805 This script can use upto 4 cpus to speed up 4-ways calculation. (
806 I am trying multiprocessing in python. )
808 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
809 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
810 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
812 All above files are modified for the new macs2diff script.
814 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
816 Now q-value 0.01 is the default cutoff. If -p is specified,
817 p-value cutoff will be used instead.
819 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
820 MACS version 2.0.6 (tag:alpha)
824 A script to call differential regions. A naive way is introduced
825 to find the regions where:
827 1. signal from condition 1 is larger than input 1 and condition 2 --
828 unique region in condition 1;
829 2. signal from condition 2 is larger than input 2 and condition 1
830 -- unique region in condition 2;
831 3. signal from condition 1 is larger than input 1, signal from
832 condition 2 is larger than input 2, however either signal from
833 condition 1 or 2 is not larger than the other.
835 Here 'larger' means the pvalue or qvalue from a Poisson test is
836 under certain cutoff.
838 (I will make another script to wrap up mulitple scripts for
839 differential calling)
841 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
842 MACS version 2.0.5 (tag:alpha)
844 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
847 Use hash to store peak information. Add back the feature to deal
848 with data without control.
850 Fix bug which incorrectly allows small peaks at the end of
853 * bin/bdgpeakcall, bin/bdgcmp
855 Fix bugs. bdgpeakcall can output encodePeak format.
857 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
858 MACS version 2.0.4 (tag:alpha)
862 Fix a bug, correctly assign lambda_bg while --to-small is
863 set. Thanks Junya Seo!
865 Add rank and num of bp columns to pvalue-qvalue table.
869 Fix bugs to correctly deal with peakless chromosomes. Thanks
872 Use AFDR for independent tests instead.
876 Now MACS can output peak coordinates together with pvalue, qvalue,
877 summit positions in a single encodePeak format (designed for
878 ENCODE project) file. This file can be loaded to UCSC
879 browser. Definition of some specific columns are: 5th:
880 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
881 -log10qvalue, 10th: relative summit position to peak start.
884 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
885 MACS version 2.0.3 (tag:alpha)
887 * Rich output with qvalue, fold enrichment, and pileup height
889 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
892 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
894 Now we have a similiar xls output file as before. The differences
895 from previous file are:
897 1. Summit now is absolute summit, instead of relative summit
899 2. 'Pileup' is previous 'tag' column. It's the extended fragment
900 pileup at the peak summit;
901 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
902 5.00 means 1e-5, simple and less confusing.
903 4. FDR column becomes '-log10(qvalue)' column.
904 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
905 the values at the peak summit.
909 NAME_pqtable.txt contains pvalue and qvalue relationships.
911 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
912 and -log10qvalue scores in BedGraph format. Nearby regions with
913 the same value are not merged.
915 * Separation of FeatIO.py
917 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
918 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
919 implemented to store pileup, local lambda, pvalue, and qvalue
920 alltogether in cScoreTrack.pyx.
922 * Experimental option --half-ext
924 Suggested by NPS algorithm, I added an experimental option
925 --half-ext to let MACS only extends ChIP fragment around its
926 middle point for only 1/2 d.
928 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
929 MACS version 2.0.2 (tag:alpha)
933 Add an error check to see if there is no common chromosome names
934 from treatment file and control file
936 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
938 Reduce memory usage by removing deepcopy() calls.
940 * Modify README documents and others.
942 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
943 MACS Version 2.0.1 (tag:alpha)
945 * cPileup.pyx, cPeakDetect.pyx and peak calling process
947 Jie suggested me a brilliant simple method to pileup fragments
948 into bedGraph track. It works extremely faster than the previous
949 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
950 large local lambda calculation in MACSv2 now. Now I generate three
951 bedGraphs for d-size local bias, slocal-size and llocal-size local
952 bias, and calculate the maximum local bias as local lambda
955 Minor: add_loc in bedGraphTrackI now can correctly merge the
956 region with its preceding region if their value are the same.
960 Add an option to shift control tags before extension. By default,
961 control tags will be extended to both sides regardless of strand
964 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
965 MACS Version 2.0.0 (tag:alpha)
967 * Use bedGraph type to store data internally and externally.
969 We can have theoretically one-basepair resolution profiles. 10
970 times smaller in filesize and even smaller after converting to
971 bigWig for visualization.
973 * Peak calling process modified. Better peak boundary detection.
975 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
976 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
977 one will be averaged to d size) Then calculate the maximum value
978 of these two tracks and a global background, to have a
979 local-lambda bedGraph.
981 Use -10log10poisson_pvalue as scores to generate a score track
984 A general peak calling based on a score cutoff, min length of peak
985 and max gap between nearby peaks.
989 Wiggle file output is removed. Now we only support bedGraph
990 output. The generation of bedGraph is highly recommended since it
991 will not cost extra time. In other words, bedGraph generation is
992 internally run even you don't want to save bedGraphs on disk, due
993 to the peak calling algorithm in MACS v2.
997 We now can calculate poisson pvalue in log space so that the score
998 (-10*log10pvalue) will not have a upper limit of 3100 due to
999 precision of float number.
1001 * Cython is adopted to speed up Python code.
1003 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1006 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1008 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1009 Version 1.4.0rc2 (Valentine)
1011 * --single-wig option is renamed to --single-profile
1013 * BedGraph output with --bdg or -B option.
1015 The BedGraph output provides 1bp resolution fragment pileup
1016 profile. File size is smaller than wig file. This option can be
1017 combined with --single-profile option to produce a bedgraph file
1018 for the whole genome. This option can also make --space,
1019 --call-subpeaks invalid.
1021 * Fix the description of --shiftsize to correctly state that the
1022 value is 1/2 d (fragment size).
1024 * Fix a bug in the call to __filter_w_control_tags when control is
1027 * Fix a bug on --to-small option. Now it works as expected.
1029 * Fix a bug while counting the tags in candidate peak region, an
1030 extra tag may be included. (Thanks to Jake Biesinger!)
1032 * Fix the bug for the peaks extended outside of chromosome
1033 start. If the minus strand tag goes outside of chromosome start
1034 after extension of d, it will be thrown out.
1036 * Post-process script for a combined wig file:
1038 The "wignorm" command can be called after a full run of MACS14 as
1039 a postprocess. wignorm can calculate the local background from the
1040 control wig file from MACS14, then use either foldchange,
1041 -10*log10(pvalue) from possion test, or difference after asinh
1042 transformation as the score to build a single wig track to
1043 represent the binding strength. This script will take a
1044 significant long time to process.
1046 * --wigextend has been obsoleted.
1048 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1049 Version 1.4.0rc1 (Starry Sky)
1051 * Duplicate reads option
1053 --keep-dup behavior is changed. Now user can specify how many
1054 reads he/she wants to keep at the same genomic location. 'auto' to
1055 let MACS decide the number based on binomial distribution, 'all'
1056 to let MACS keep all reads.
1058 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1060 By default, MACS will now scale the smaller dataset to the bigger
1061 dataset. For instance, if IP has 10 million reads, and Input has 5
1062 million, MACS will double the lambda value calculated from Input
1063 reads while calling BOTH the positive peaks and negative
1064 peaks. This will address the issue caused by unbalanced numbers of
1065 reads from IP and Input. If --to-small is turned on, MACS will
1066 scale the larger dataset to the smaller one. So from now on, if d
1067 is fixed, then the peaks from a MACS call for A vs B should be
1068 identical to the negative peaks from a B vs A.
1070 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1071 Version 1.4.0beta (summer wishes)
1077 The default behavior in the model building step is slightly
1078 changed. When MACS can't find enough pairs to build model
1079 (implemented in alpha version) or the modeled fragment length is
1080 less than 2 times of tag length (implemented in beta version),
1081 MACS will use 2 times of --shiftsize value as fragment length in
1082 the later analysis. --off-auto can turn off this default behavior.
1084 ** Redundant tag filtering
1086 The IO module is rewritten. The redundant tag filtering process
1087 becomes simpler and works as promise. The maximum allowed number
1088 of tags at the exact same location is calculated from the
1089 sequencing depth and genome size using a binomial distribution,
1090 for both TREAMENT and CONTROL separately. ( previously only
1091 TREATMENT is considered ) The exact same location means the same
1092 coordination and the same strand. Then MACS will only keep at most
1093 this number of tags at the exact same location in the following
1094 analysis. An option --keep-dup can let MACS skip the filtering and
1095 keep all the tags. However this may bring in a lot of sequencing
1096 bias, so you may get many false positive peaks.
1098 ** Single wiggle mode
1100 First thing to mention, this is not the score track that I
1101 described before. By default, MACS generates wiggle files for
1102 fragment pileup for every chromosomes separately. When you use
1103 --single-wig option, MACS will generate a single wiggle file for
1104 all the chromosomes so you will get a wig.gz for TREATMENT and
1105 another wig.gz for CONTROL if available.
1107 ** Sniff -- automatic format detection
1109 Now, by default or "-f AUTO", MACS will decide the input file
1110 format automatically. Technically, it will try to read at most
1111 1000 records for the first 10 non-comment lines. If it succeeds,
1112 the format is decided. I recommend not to use AUTO and specify the
1113 right format for your input files, unless you combine different
1114 formats in a single MACS run.
1118 --single-wig and --keep-dup are added. Check previous section in
1119 ChangeLog for detail.
1121 -f (--format) AUTO is now the default option.
1123 --slocal default: 1000
1124 --llocal default: 10000
1128 Setup script will stop the installation if python version is not
1129 python2.6 or python2.7.
1131 Local lambda calculation has been changed back. MACS will check
1132 peak_region, slocal( default 1K) and llocal (default 10K) for the
1133 local bias. The previous 200bps default will cause MACS misses
1134 some peaks where the input bias is very sharp.
1136 sam2bed.py script is corrected.
1138 Relative pos in xls output is fixed.
1140 Parser for ELAND_export is fixed to pass some of the no match
1141 lines. And elandexport2bed.py is fixed too. ( however I can't
1142 guarantee that it works on any eland_export files. )
1144 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1145 Version 1.4.0alpha2 (be smarter)
1149 --gsize now provides shortcuts for common genomes, including
1150 human, mouse, C. elegans and fruitfly.
1152 --llocal now will be 5000 bps if there is no input file, so that
1153 local lambda doesn't overkill enriched binding sites.
1155 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1156 Version 1.4alpha (be smarter)
1160 --tsize option is redesigned. MACS will use the first 10 lines of
1161 the input to decide the tag size. If user specifies --tsize, it
1162 will override the auto decided tsize.
1164 --lambdaset is replaced by --slocal and --llocal which mean the
1165 small local region and large local region.
1167 --bw has no effect on the scan-window size now. It only affects the
1168 paired-peaks model process.
1172 During the model building, MACS will pick out the enriched regions
1173 which are not too high and not too low to build the paired-peak
1174 model. Default the region is from fold 10 to fold 30. If MACS
1175 fails to build the model, by default it will use the nomodel
1176 settings, like shiftsize=100bps, to shift and extend each
1177 tags. This behavior can be turned off by '--off-auto'.
1181 An extra file including all the summit positions are saved in
1182 *_summits.bed file. An option '--call-subpeaks' will invoke
1183 PeakSplitter developed by Mali Salmon to split wide peaks into
1186 * Sniff ( will in beta )
1188 Automatically recognize the input file format, so use can combine
1189 different format in one MACS run.
1191 Not implemented features/TODO:
1193 * Algorithms ( in near future? )
1195 MACS will try to refine the peak boundaries by calculating the
1196 scores for every point in the candidate peak regions. The score
1197 will be the -10*log(10,pvalue) on a local poisson distribution. A
1198 cutoff specified by users (--pvalue) will be applied to find the
1199 precise sub-peaks in the original candidate peak region. Peak
1200 boudaries and peak summits positions will be saved in separate BED
1203 * Single wiggle track ( in near future? )
1205 A single wiggle track will be generated to save the scores within
1206 candidate peak regions in the 10bps resolution. The wiggle file
1207 is in fixedStep format.
1210 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1211 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1215 Fixed typo. FCSTEP -> FESTEP
1219 The 'femax' attribute bug is fixed
1221 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1222 Version 1.3.7 (Oktoberfest)
1224 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1226 Enhancements by Peter Chines:
1228 1. gzip files are supported.
1229 2. when --diag is on, user can set the increment and endpoint for
1230 fold enrichment analysis by setting --fe-step and --fe-max.
1232 Enhancements by Davide Cittaro:
1234 1. BAM and SAM formats are supported.
1235 2. small changes in the header lines of wiggle output.
1238 1. I added --fe-min option;
1239 2. Bowtie ascii output with suffix ".map" is supported.
1243 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1244 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1245 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1246 4. Some "fold change" have been changed to "fold enrichment".
1248 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1249 Version 1.3.6.1 (default parameter change)
1251 * bin/macs, lib/PeakDetect.py
1253 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1254 default. "--futurefdr" is added which can turn on the 'new' method
1255 introduced in 1.3.6. By default it's off.
1259 Fixed a bug. p-value is corrected a little bit.
1262 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1263 Version 1.3.6 (Birthday cake)
1267 "track name" is added to the header of BED output file.
1269 Now the default peak detection method is to consider 5k and 10k
1270 nearby regions in treatment data and peak location, 1k, 5k, and
1271 10k regions in control data to calculate local bias. The old
1272 method can be called through '--old' option.
1274 Information about how many total/unique tags in treatment or
1275 control will be saved in final .xls output.
1277 * lib/IO/__init__.py
1279 ".fa" will be removed from input tag alignment so only the
1280 chromosome names are kept.
1282 WigTrackI class is added for Wiggle like data structure. (not used
1285 The parser for ELAND multi PET files has been fixed. Now the 5'
1286 tag position for a pair will be kept, whereas in the previous
1287 version, the middle points are kept.
1289 * lib/IO/BinKeeper.py
1291 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1292 browser, which can quickly access certain region for values in a
1293 large wiggle like data file. (not used now)
1295 * lib/OptValidator.py
1301 Now the default peak detection method is to consider 5k and 10k
1302 nearby regions in treatment data and peak location, 1k, 5k, and
1303 10k regions in control data to calculate local bias. The old
1304 method can be called through '--old' option.
1306 Two columns have beed added to BED output file. 4th column: peak
1307 name; 5th column: peak score using -10log(10,pvalue) as score.
1311 Add support to build a Mac App through 'setup.py py2app', or a
1312 Windows executable through 'setup.py py2exe'. You need to install
1313 py2app or py2exe package in order to use these functions.
1315 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1316 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1320 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1321 in control data to calculate local lambda for each peak. Peak
1322 calling results will be slightly different with previous version,
1327 Typo fixed, ELANDParser -> ELANDResultParser
1331 Now, modeled d value will be shown on the model figure.
1333 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1334 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1336 * macs, IO/__init__.py, PeakDetect.py
1338 Add support for ELAND multi format. Add support for Pair-End
1339 experiment, in this case, 5'end and 3'end ELAND multi format files
1340 are required for treatment or control data. See 00README file for
1343 Add wigextend option.
1345 Add petdist option for Pair-End Tag experiment, which is the best
1346 distance between 5' and 3' tags.
1350 Fixed a bug which cause the end positions of every peak region
1351 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1355 Fix bugs while generating wiggle files. The start position of
1356 wiggle file is set to 1 instead of 0.
1358 Fix a bug that every 10M bps, signals in the first 'd' range are
1359 lower than actual. ( Thanks Mali Salmon!)
1362 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1363 Version 1.3.3 (wiggle bugs fixed)
1367 Fix bugs while generating wiggle files. 1. 'span=' is added to
1368 'variableStep' line; 2. previously, every 10M bps, the coordinates
1369 were wrongly shifted to the right for 'd' basepairs.
1371 * macs, PeakDetect.py
1373 Add an option to save wiggle files on different resolution.
1375 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1376 Version 1.3.2 (tiny bugs fixed)
1380 Fix 65536 -> 65535. ( Thank Joon)
1384 Improved for binomial function with extra large number. Imported
1385 from Cistrome project.
1389 If treatment channel misses reads in some chromosome included in
1390 control channel, or vice versa, MACS will not exit. (Thank Shaun
1393 Instead, MACS will fake a tag at position -1 when calling
1394 treatment peaks vs control, but will ignore the chromosome while
1395 calling negative peaks.
1397 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1398 Version 1.3.1 (tiny bugs fixed version)
1402 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1403 binomial functions can tolerate large and small numbers.
1407 Parsers now split lines in BED/ELAND file using any
1408 whitespaces. 'track' or 'browser' lines will be regarded as
1409 comment lines. A bug fixed when throwing StrandFormatError. The
1410 maximum redundant tag number at a single position can be no less
1414 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1415 Version 1.3 (naming clarification version)
1417 * Naming clarification changes according to our manuscript:
1419 'frag_len' is changed to 'd'.
1421 'fold_change' is changed to 'fold_enrichment'.
1423 Suggest '--bw' parameter to be determined by users from the real
1426 Maximum FDR is 100% in the output file.
1428 And other clarifications in 00README file and the documents on the
1432 If the redundant tag number at a single position is over 32767,
1433 just remember 32767, instead of raising an overflow exception.
1439 Bug fixed for diagnosis report.
1442 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1447 Poisson distribution CDF and inverse CDF functions are
1448 corrected. They can produce right results even for huge lambda
1449 now. So that the p-value and FDR values in the final excel sheet
1452 IO package now can tolerate some rare cases; ELANDParser in IO
1453 package is fixed. (Thank Bogdan)
1457 Reverse paired peaks in model are rejected. So there will be no
1458 negative 'frag_len'. (Thank Bogdan)
1462 Diagnosis function is completed. Which can output a table file for
1463 users to estimate their sequencing depth.
1466 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1469 * Probe.py is added!
1471 GSL is totally removed from MACS. Instead, I have implemented the
1472 CDF and inverse CDF for poisson and binomial distribution purely
1475 * Constants.py is added!
1477 Organize constants used in MACS in the Constants.py file.
1479 * All other files are modified!
1481 Foldchange calculation is modified. Now the foldchange only be
1482 calculated at the peak summit position instead of the whole peak
1483 region. The values will be higher and more robust than before.
1487 1. MACS can save wiggle format files containing the tag number at
1488 every 10 bp along the genome. Tags are shifted according to our
1489 model before they are calculated.
1491 2. Model building and local lambda calculation can be skipped with
1494 3. A diagnosis report can be generated through '--diag'
1495 option. This report can help you get an assumption about the
1496 sequencing saturation. This funtion is only in beta stage.
1498 4. FDR calculation speed is highly improved.
1500 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1503 * TabIO, PeakModel.py ...
1504 Bug fixed to let MACS tolerate some cases while there is no tag on
1505 either plus strand or minus strand.
1508 Check the version of python. If the version is lower than 2.4,
1509 refuse to install with warning.
1512 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1513 MACS version 2.0.10 20130731 (tag:alpha)
1515 * callpeak --call-summits
1517 Fix bugs causing callpeak --call-summits option generating extra
1518 number of peaks and inconsistent peak boundaries comparing to
1519 default option. Thank Ben Levinson!
1523 Fix bugs causing bdgcmp output logLR all in positive values. Now
1524 'depletion' can be correctly represented as negative values.
1528 Fix the behavior of bdgdiff module. Now it can take four
1529 bedGraph files, then use logLR as cutoff to call differential
1530 regions. Check command line of bdgdiff for detail.
1532 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1533 MACS version 2.0.10 20130713 (tag:alpha)
1535 * fix bugs while output broadPeak and gappedPeak.
1537 Note. Those weak broad regions without any strong enrichment
1538 regions inside won't be saved in gappedPeak file.
1540 * bdgcmp -T and -C are merged into -S and description is updated.
1542 Now, you can use it to override SPMR values in your input for
1543 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1544 statistics will cause weird results ( in most cases, lower
1545 significancy), and won't be consistent with MACS2 callpeak
1546 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1547 sample size in MILLION according to 'callpeak --to-large' option.
1549 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1550 MACS version 2.0.10 20130710 (tag:alpha)
1552 * fix BED style output format of callpeak module:
1554 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1555 the output. Old BED format file won't be saved.
1557 2) with --broad: broadPeak (BED6+3) for broad region and
1558 gappedPeak (BED12+3) for chained enriched regions will be the
1559 output. Old BED format, narrowPeak format, summit file won't be
1562 * bdgcmp now can accept list of methods to calculate scores. So
1563 you can run it once to generate multiple types of scores. Thank
1564 Jon Urban for this suggestion!
1566 * C codes are re-generated through Cython 0.19.1.
1568 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1569 MACS version 2.0.10 20130520 (tag:alpha)
1571 * broad peak calling modules are modified in order to report all
1572 relexed regions even there is no strong enrichment inside.
1574 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1575 MACS version 2.0.10 20130501 (tag:alpha)
1577 * Memory usage is decreased to about 1/4-1/5 of previous usage
1578 Now, the internal data structure and algorithm are both
1579 re-organized, so that intermediate data wouldn't be saved in
1580 memory. Intead they will be calculated on the fly. New MACS2 will
1581 spend longer time (1.5 to 2 times) however it will use less memory
1582 so can be more usable on small mem servers.
1584 * --seed option is added to callpeak and randsample commands
1585 Thank Mathieu Gineste for this suggestion!
1587 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1588 MACS version 2.0.10 20130306 (tag:alpha)
1590 * diffpeak module New module to detect differential binding sites
1591 with more statistics.
1593 * Introduced --refine-peaks
1594 Calculates reads balancing to refine peak summits
1596 * Ouput file names prefix
1597 Correct encodePeak to narrowPeak, broadPeak to bed12.
1599 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1600 MACS version 2.0.10 (tag:alpha not released)
1602 * Introduced BAMPEParser
1603 Reads PE data directly, requires bedtools for now
1605 * Introduced --call-summits
1606 Uses signal processing methods to call overlapping peaks
1608 * Added --no-trackline
1609 By default, files have descriptive tracklines now
1611 * new refinepeak command (experimental)
1612 This new function will use a similar method in SPP (wtd), to
1613 analyze raw tag distribution in peak region, then redefine the
1614 peak summit where plus and minus tags are evenly distributed
1617 * Changes to output *
1618 cPeakDetect.pyx has full support for new print/write methods and
1619 --call-peaks, BAMPEParser, and use of paired-end data
1621 * Parser optimization
1623 cParser.pyx is rewritten to use io.BufferedReader to speed
1624 up. Speed is doubled.
1626 Code is reorganized -- most of functions are inherited from
1627 GenericParser class.
1629 * Use cross-correlation to calculate fragment size
1631 First, all pairs will be used in prediction for fragment
1632 size. Previously, only no more than 1000 pairs are used. Second,
1633 cross-correlation is used to find the best phase difference
1634 between + and - tag pileups.
1636 * Speed up p-value and q-value calculation
1638 This part is ten times faster now. I am using a dictionary to
1639 cache p-value results from Poisson CDF function. A bit more memory
1640 will be used to increase speed. I hope this dictionary would not
1641 explode since the possible pairs of ChIP signal and control lambda
1642 are hugely redundant. Also, I rewrited part of q-value
1645 * Speed up peak detection
1647 This part is about hundred of times faster now. Optimizations
1648 include using Numpy functions as much as possible, and making loop
1649 body as small as possible.
1651 * Post-processing on differential calls
1653 After macs2diff finds differential binding sites between two
1654 conditions, it will try to annotate the peak calls from one of two
1655 conditions, describe the changes ...
1657 * Fragment size prediction in macs2diff
1659 Now by default, macs2diff will try to use the average fragment
1660 size from both condition 1 and condition 2 for tag extension and
1661 peak calling. Previously, by default, it will use different sizes
1662 unless --nomodel is specified.
1664 Technically, I separate model building processes out. So macs2diff
1665 will build fragment sizes for condition 1 and 2 in parallel (2
1666 processes maximum), then perform 4-way comparisons in parallel (4
1671 Combine two p/qscore tracks together. At regions where condition 1
1672 is higher than condition 2, score would be positive, otherwise,
1675 * SAMParser and BAMParser
1677 Bug fixed for paired-end sequencing data.
1681 Fixed a bug while calling peaks from BedGraph file. It previously
1682 mistakenly output same peaks multiple times at the end of
1685 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1686 MACS version 2.0.9 (tag:alpha)
1688 * Auto fixation on predicted d is turned off by default!
1690 Previous --off-auto is now default. MACS will not automatically
1691 fix d less than 2 times of tag size according to
1692 --shiftsize. While tag size is getting longer nowadays, it would
1693 be easier to have d less than 2 times of tag size, however d may
1694 still be meaningful and useful. Please judge it using your own
1699 Now, the default scaling while treatment and input are unbalanced
1700 has been adjusted. By default, larger sample will be scaled down
1701 linearly to match the smaller sample. In this way, background
1702 noise will be reduced more than real signals, so we expect to have
1703 more specific results than the other way around (i.e. --to-large
1706 Also, an alternative option to randomly sample larger data
1707 (--down-sample) is provided to replace default linear
1708 scaling. However, this option will cause results irresproducible,
1713 A new script 'randsample' is added, which can randomly sample
1714 certain percentage or number of tags.
1718 Now, MACS will decide peak summits according to pileup height
1719 instead of qvalue scores. In this way, the summit may be more
1724 MACS calculate qvalue scores as differential scores. When compare
1725 two conditions (saying A and B), the maximum qscore for comparing
1726 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1727 will be computed. If maxqscore_a2b is bigger, the diff score is
1728 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1730 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1731 MACS version 2.0.8 (tag:alpha)
1733 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1735 New script bdgbroadcall and the extra option '--broad' for macs2
1736 script, can be used to call broad regions with a loose cutoff to
1737 link nearby significant regions. The output is represented as
1740 * MACS2/IO/cScoreTrack.pyx
1742 Fix q-value calculation to generate forcefully monotonic values.
1744 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1746 They are combined to one more powerful script called
1747 "filterdup". The script filterdup can filter duplicated reads
1748 according to sequencing depth and genome size. The script can also
1749 convert any format supported by MACS to BED format.
1751 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1752 MACS version 2.0.7 (tag:alpha)
1754 * bin/macsdiff renamed to bin/bdgdiff
1756 Now this script will work as a low-level finetuning tool as bdgcmp
1761 A new script to take treatment and control files from two
1762 condition, calculate fragment size, use local poisson to get
1763 pvalues and BH process to get qvalues, then combine 4-ways result
1764 to call differential sites.
1766 This script can use upto 4 cpus to speed up 4-ways calculation. (
1767 I am trying multiprocessing in python. )
1769 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1770 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1771 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1773 All above files are modified for the new macs2diff script.
1775 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1777 Now q-value 0.01 is the default cutoff. If -p is specified,
1778 p-value cutoff will be used instead.
1780 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1781 MACS version 2.0.6 (tag:alpha)
1785 A script to call differential regions. A naive way is introduced
1786 to find the regions where:
1788 1. signal from condition 1 is larger than input 1 and condition 2 --
1789 unique region in condition 1;
1790 2. signal from condition 2 is larger than input 2 and condition 1
1791 -- unique region in condition 2;
1792 3. signal from condition 1 is larger than input 1, signal from
1793 condition 2 is larger than input 2, however either signal from
1794 condition 1 or 2 is not larger than the other.
1796 Here 'larger' means the pvalue or qvalue from a Poisson test is
1797 under certain cutoff.
1799 (I will make another script to wrap up mulitple scripts for
1800 differential calling)
1802 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1803 MACS version 2.0.5 (tag:alpha)
1805 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1806 MACS2/IO/cPeakIO.pyx
1808 Use hash to store peak information. Add back the feature to deal
1809 with data without control.
1811 Fix bug which incorrectly allows small peaks at the end of
1814 * bin/bdgpeakcall, bin/bdgcmp
1816 Fix bugs. bdgpeakcall can output encodePeak format.
1818 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1819 MACS version 2.0.4 (tag:alpha)
1823 Fix a bug, correctly assign lambda_bg while --to-small is
1824 set. Thanks Junya Seo!
1826 Add rank and num of bp columns to pvalue-qvalue table.
1830 Fix bugs to correctly deal with peakless chromosomes. Thanks
1833 Use AFDR for independent tests instead.
1837 Now MACS can output peak coordinates together with pvalue, qvalue,
1838 summit positions in a single encodePeak format (designed for
1839 ENCODE project) file. This file can be loaded to UCSC
1840 browser. Definition of some specific columns are: 5th:
1841 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1842 -log10qvalue, 10th: relative summit position to peak start.
1845 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1846 MACS version 2.0.3 (tag:alpha)
1848 * Rich output with qvalue, fold enrichment, and pileup height
1850 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1853 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1855 Now we have a similiar xls output file as before. The differences
1856 from previous file are:
1858 1. Summit now is absolute summit, instead of relative summit
1860 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1861 pileup at the peak summit;
1862 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1863 5.00 means 1e-5, simple and less confusing.
1864 4. FDR column becomes '-log10(qvalue)' column.
1865 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1866 the values at the peak summit.
1868 * Extra output files
1870 NAME_pqtable.txt contains pvalue and qvalue relationships.
1872 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1873 and -log10qvalue scores in BedGraph format. Nearby regions with
1874 the same value are not merged.
1876 * Separation of FeatIO.py
1878 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1879 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1880 implemented to store pileup, local lambda, pvalue, and qvalue
1881 alltogether in cScoreTrack.pyx.
1883 * Experimental option --half-ext
1885 Suggested by NPS algorithm, I added an experimental option
1886 --half-ext to let MACS only extends ChIP fragment around its
1887 middle point for only 1/2 d.
1889 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1890 MACS version 2.0.2 (tag:alpha)
1894 Add an error check to see if there is no common chromosome names
1895 from treatment file and control file
1897 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1899 Reduce memory usage by removing deepcopy() calls.
1901 * Modify README documents and others.
1903 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1904 MACS Version 2.0.1 (tag:alpha)
1906 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1908 Jie suggested me a brilliant simple method to pileup fragments
1909 into bedGraph track. It works extremely faster than the previous
1910 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1911 large local lambda calculation in MACSv2 now. Now I generate three
1912 bedGraphs for d-size local bias, slocal-size and llocal-size local
1913 bias, and calculate the maximum local bias as local lambda
1916 Minor: add_loc in bedGraphTrackI now can correctly merge the
1917 region with its preceding region if their value are the same.
1921 Add an option to shift control tags before extension. By default,
1922 control tags will be extended to both sides regardless of strand
1925 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1926 MACS Version 2.0.0 (tag:alpha)
1928 * Use bedGraph type to store data internally and externally.
1930 We can have theoretically one-basepair resolution profiles. 10
1931 times smaller in filesize and even smaller after converting to
1932 bigWig for visualization.
1934 * Peak calling process modified. Better peak boundary detection.
1936 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1937 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1938 one will be averaged to d size) Then calculate the maximum value
1939 of these two tracks and a global background, to have a
1940 local-lambda bedGraph.
1942 Use -10log10poisson_pvalue as scores to generate a score track
1943 before peak calling.
1945 A general peak calling based on a score cutoff, min length of peak
1946 and max gap between nearby peaks.
1950 Wiggle file output is removed. Now we only support bedGraph
1951 output. The generation of bedGraph is highly recommended since it
1952 will not cost extra time. In other words, bedGraph generation is
1953 internally run even you don't want to save bedGraphs on disk, due
1954 to the peak calling algorithm in MACS v2.
1958 We now can calculate poisson pvalue in log space so that the score
1959 (-10*log10pvalue) will not have a upper limit of 3100 due to
1960 precision of float number.
1962 * Cython is adopted to speed up Python code.
1964 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1967 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1969 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1970 Version 1.4.0rc2 (Valentine)
1972 * --single-wig option is renamed to --single-profile
1974 * BedGraph output with --bdg or -B option.
1976 The BedGraph output provides 1bp resolution fragment pileup
1977 profile. File size is smaller than wig file. This option can be
1978 combined with --single-profile option to produce a bedgraph file
1979 for the whole genome. This option can also make --space,
1980 --call-subpeaks invalid.
1982 * Fix the description of --shiftsize to correctly state that the
1983 value is 1/2 d (fragment size).
1985 * Fix a bug in the call to __filter_w_control_tags when control is
1988 * Fix a bug on --to-small option. Now it works as expected.
1990 * Fix a bug while counting the tags in candidate peak region, an
1991 extra tag may be included. (Thanks to Jake Biesinger!)
1993 * Fix the bug for the peaks extended outside of chromosome
1994 start. If the minus strand tag goes outside of chromosome start
1995 after extension of d, it will be thrown out.
1997 * Post-process script for a combined wig file:
1999 The "wignorm" command can be called after a full run of MACS14 as
2000 a postprocess. wignorm can calculate the local background from the
2001 control wig file from MACS14, then use either foldchange,
2002 -10*log10(pvalue) from possion test, or difference after asinh
2003 transformation as the score to build a single wig track to
2004 represent the binding strength. This script will take a
2005 significant long time to process.
2007 * --wigextend has been obsoleted.
2009 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
2010 Version 1.4.0rc1 (Starry Sky)
2012 * Duplicate reads option
2014 --keep-dup behavior is changed. Now user can specify how many
2015 reads he/she wants to keep at the same genomic location. 'auto' to
2016 let MACS decide the number based on binomial distribution, 'all'
2017 to let MACS keep all reads.
2019 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2021 By default, MACS will now scale the smaller dataset to the bigger
2022 dataset. For instance, if IP has 10 million reads, and Input has 5
2023 million, MACS will double the lambda value calculated from Input
2024 reads while calling BOTH the positive peaks and negative
2025 peaks. This will address the issue caused by unbalanced numbers of
2026 reads from IP and Input. If --to-small is turned on, MACS will
2027 scale the larger dataset to the smaller one. So from now on, if d
2028 is fixed, then the peaks from a MACS call for A vs B should be
2029 identical to the negative peaks from a B vs A.
2031 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2032 Version 1.4.0beta (summer wishes)
2038 The default behavior in the model building step is slightly
2039 changed. When MACS can't find enough pairs to build model
2040 (implemented in alpha version) or the modeled fragment length is
2041 less than 2 times of tag length (implemented in beta version),
2042 MACS will use 2 times of --shiftsize value as fragment length in
2043 the later analysis. --off-auto can turn off this default behavior.
2045 ** Redundant tag filtering
2047 The IO module is rewritten. The redundant tag filtering process
2048 becomes simpler and works as promise. The maximum allowed number
2049 of tags at the exact same location is calculated from the
2050 sequencing depth and genome size using a binomial distribution,
2051 for both TREAMENT and CONTROL separately. ( previously only
2052 TREATMENT is considered ) The exact same location means the same
2053 coordination and the same strand. Then MACS will only keep at most
2054 this number of tags at the exact same location in the following
2055 analysis. An option --keep-dup can let MACS skip the filtering and
2056 keep all the tags. However this may bring in a lot of sequencing
2057 bias, so you may get many false positive peaks.
2059 ** Single wiggle mode
2061 First thing to mention, this is not the score track that I
2062 described before. By default, MACS generates wiggle files for
2063 fragment pileup for every chromosomes separately. When you use
2064 --single-wig option, MACS will generate a single wiggle file for
2065 all the chromosomes so you will get a wig.gz for TREATMENT and
2066 another wig.gz for CONTROL if available.
2068 ** Sniff -- automatic format detection
2070 Now, by default or "-f AUTO", MACS will decide the input file
2071 format automatically. Technically, it will try to read at most
2072 1000 records for the first 10 non-comment lines. If it succeeds,
2073 the format is decided. I recommend not to use AUTO and specify the
2074 right format for your input files, unless you combine different
2075 formats in a single MACS run.
2079 --single-wig and --keep-dup are added. Check previous section in
2080 ChangeLog for detail.
2082 -f (--format) AUTO is now the default option.
2084 --slocal default: 1000
2085 --llocal default: 10000
2089 Setup script will stop the installation if python version is not
2090 python2.6 or python2.7.
2092 Local lambda calculation has been changed back. MACS will check
2093 peak_region, slocal( default 1K) and llocal (default 10K) for the
2094 local bias. The previous 200bps default will cause MACS misses
2095 some peaks where the input bias is very sharp.
2097 sam2bed.py script is corrected.
2099 Relative pos in xls output is fixed.
2101 Parser for ELAND_export is fixed to pass some of the no match
2102 lines. And elandexport2bed.py is fixed too. ( however I can't
2103 guarantee that it works on any eland_export files. )
2105 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2106 Version 1.4.0alpha2 (be smarter)
2110 --gsize now provides shortcuts for common genomes, including
2111 human, mouse, C. elegans and fruitfly.
2113 --llocal now will be 5000 bps if there is no input file, so that
2114 local lambda doesn't overkill enriched binding sites.
2116 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2117 Version 1.4alpha (be smarter)
2121 --tsize option is redesigned. MACS will use the first 10 lines of
2122 the input to decide the tag size. If user specifies --tsize, it
2123 will override the auto decided tsize.
2125 --lambdaset is replaced by --slocal and --llocal which mean the
2126 small local region and large local region.
2128 --bw has no effect on the scan-window size now. It only affects the
2129 paired-peaks model process.
2133 During the model building, MACS will pick out the enriched regions
2134 which are not too high and not too low to build the paired-peak
2135 model. Default the region is from fold 10 to fold 30. If MACS
2136 fails to build the model, by default it will use the nomodel
2137 settings, like shiftsize=100bps, to shift and extend each
2138 tags. This behavior can be turned off by '--off-auto'.
2142 An extra file including all the summit positions are saved in
2143 *_summits.bed file. An option '--call-subpeaks' will invoke
2144 PeakSplitter developed by Mali Salmon to split wide peaks into
2147 * Sniff ( will in beta )
2149 Automatically recognize the input file format, so use can combine
2150 different format in one MACS run.
2152 Not implemented features/TODO:
2154 * Algorithms ( in near future? )
2156 MACS will try to refine the peak boundaries by calculating the
2157 scores for every point in the candidate peak regions. The score
2158 will be the -10*log(10,pvalue) on a local poisson distribution. A
2159 cutoff specified by users (--pvalue) will be applied to find the
2160 precise sub-peaks in the original candidate peak region. Peak
2161 boudaries and peak summits positions will be saved in separate BED
2164 * Single wiggle track ( in near future? )
2166 A single wiggle track will be generated to save the scores within
2167 candidate peak regions in the 10bps resolution. The wiggle file
2168 is in fixedStep format.
2171 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2172 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2176 Fixed typo. FCSTEP -> FESTEP
2180 The 'femax' attribute bug is fixed
2182 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2183 Version 1.3.7 (Oktoberfest)
2185 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2187 Enhancements by Peter Chines:
2189 1. gzip files are supported.
2190 2. when --diag is on, user can set the increment and endpoint for
2191 fold enrichment analysis by setting --fe-step and --fe-max.
2193 Enhancements by Davide Cittaro:
2195 1. BAM and SAM formats are supported.
2196 2. small changes in the header lines of wiggle output.
2199 1. I added --fe-min option;
2200 2. Bowtie ascii output with suffix ".map" is supported.
2204 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2205 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2206 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2207 4. Some "fold change" have been changed to "fold enrichment".
2209 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2210 Version 1.3.6.1 (default parameter change)
2212 * bin/macs, lib/PeakDetect.py
2214 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2215 default. "--futurefdr" is added which can turn on the 'new' method
2216 introduced in 1.3.6. By default it's off.
2220 Fixed a bug. p-value is corrected a little bit.
2223 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2224 Version 1.3.6 (Birthday cake)
2228 "track name" is added to the header of BED output file.
2230 Now the default peak detection method is to consider 5k and 10k
2231 nearby regions in treatment data and peak location, 1k, 5k, and
2232 10k regions in control data to calculate local bias. The old
2233 method can be called through '--old' option.
2235 Information about how many total/unique tags in treatment or
2236 control will be saved in final .xls output.
2238 * lib/IO/__init__.py
2240 ".fa" will be removed from input tag alignment so only the
2241 chromosome names are kept.
2243 WigTrackI class is added for Wiggle like data structure. (not used
2246 The parser for ELAND multi PET files has been fixed. Now the 5'
2247 tag position for a pair will be kept, whereas in the previous
2248 version, the middle points are kept.
2250 * lib/IO/BinKeeper.py
2252 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2253 browser, which can quickly access certain region for values in a
2254 large wiggle like data file. (not used now)
2256 * lib/OptValidator.py
2262 Now the default peak detection method is to consider 5k and 10k
2263 nearby regions in treatment data and peak location, 1k, 5k, and
2264 10k regions in control data to calculate local bias. The old
2265 method can be called through '--old' option.
2267 Two columns have beed added to BED output file. 4th column: peak
2268 name; 5th column: peak score using -10log(10,pvalue) as score.
2272 Add support to build a Mac App through 'setup.py py2app', or a
2273 Windows executable through 'setup.py py2exe'. You need to install
2274 py2app or py2exe package in order to use these functions.
2276 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2277 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2281 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2282 in control data to calculate local lambda for each peak. Peak
2283 calling results will be slightly different with previous version,
2288 Typo fixed, ELANDParser -> ELANDResultParser
2292 Now, modeled d value will be shown on the model figure.
2294 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2295 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2297 * macs, IO/__init__.py, PeakDetect.py
2299 Add support for ELAND multi format. Add support for Pair-End
2300 experiment, in this case, 5'end and 3'end ELAND multi format files
2301 are required for treatment or control data. See 00README file for
2304 Add wigextend option.
2306 Add petdist option for Pair-End Tag experiment, which is the best
2307 distance between 5' and 3' tags.
2311 Fixed a bug which cause the end positions of every peak region
2312 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2316 Fix bugs while generating wiggle files. The start position of
2317 wiggle file is set to 1 instead of 0.
2319 Fix a bug that every 10M bps, signals in the first 'd' range are
2320 lower than actual. ( Thanks Mali Salmon!)
2323 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2324 Version 1.3.3 (wiggle bugs fixed)
2328 Fix bugs while generating wiggle files. 1. 'span=' is added to
2329 'variableStep' line; 2. previously, every 10M bps, the coordinates
2330 were wrongly shifted to the right for 'd' basepairs.
2332 * macs, PeakDetect.py
2334 Add an option to save wiggle files on different resolution.
2336 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2337 Version 1.3.2 (tiny bugs fixed)
2341 Fix 65536 -> 65535. ( Thank Joon)
2345 Improved for binomial function with extra large number. Imported
2346 from Cistrome project.
2350 If treatment channel misses reads in some chromosome included in
2351 control channel, or vice versa, MACS will not exit. (Thank Shaun
2354 Instead, MACS will fake a tag at position -1 when calling
2355 treatment peaks vs control, but will ignore the chromosome while
2356 calling negative peaks.
2358 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2359 Version 1.3.1 (tiny bugs fixed version)
2363 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2364 binomial functions can tolerate large and small numbers.
2368 Parsers now split lines in BED/ELAND file using any
2369 whitespaces. 'track' or 'browser' lines will be regarded as
2370 comment lines. A bug fixed when throwing StrandFormatError. The
2371 maximum redundant tag number at a single position can be no less
2375 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2376 Version 1.3 (naming clarification version)
2378 * Naming clarification changes according to our manuscript:
2380 'frag_len' is changed to 'd'.
2382 'fold_change' is changed to 'fold_enrichment'.
2384 Suggest '--bw' parameter to be determined by users from the real
2387 Maximum FDR is 100% in the output file.
2389 And other clarifications in 00README file and the documents on the
2393 If the redundant tag number at a single position is over 32767,
2394 just remember 32767, instead of raising an overflow exception.
2400 Bug fixed for diagnosis report.
2403 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2408 Poisson distribution CDF and inverse CDF functions are
2409 corrected. They can produce right results even for huge lambda
2410 now. So that the p-value and FDR values in the final excel sheet
2413 IO package now can tolerate some rare cases; ELANDParser in IO
2414 package is fixed. (Thank Bogdan)
2418 Reverse paired peaks in model are rejected. So there will be no
2419 negative 'frag_len'. (Thank Bogdan)
2423 Diagnosis function is completed. Which can output a table file for
2424 users to estimate their sequencing depth.
2427 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2430 * Probe.py is added!
2432 GSL is totally removed from MACS. Instead, I have implemented the
2433 CDF and inverse CDF for poisson and binomial distribution purely
2436 * Constants.py is added!
2438 Organize constants used in MACS in the Constants.py file.
2440 * All other files are modified!
2442 Foldchange calculation is modified. Now the foldchange only be
2443 calculated at the peak summit position instead of the whole peak
2444 region. The values will be higher and more robust than before.
2448 1. MACS can save wiggle format files containing the tag number at
2449 every 10 bp along the genome. Tags are shifted according to our
2450 model before they are calculated.
2452 2. Model building and local lambda calculation can be skipped with
2455 3. A diagnosis report can be generated through '--diag'
2456 option. This report can help you get an assumption about the
2457 sequencing saturation. This funtion is only in beta stage.
2459 4. FDR calculation speed is highly improved.
2461 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2464 * TabIO, PeakModel.py ...
2465 Bug fixed to let MACS tolerate some cases while there is no tag on
2466 either plus strand or minus strand.
2469 Check the version of python. If the version is lower than 2.4,
2470 refuse to install with warning.