1 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
6 1) Speed up MACS2. Some programming tricks and code cleanup. The
7 filter_dup function replaces separate_dups. The later one was
8 implemented for potentially putting back duplicate reads in
9 certain downstream analysis. However such analysis hasn't been
10 implemented. Optimize the speed of writing bedGraph
11 files. Optimize BAM and BAMPE parsing with pointer casting instead
14 2) The comment lines in the headers of BED or SAM files will be
15 correctly skipped. However, MACS2 won't check comment lines in the
20 1) Cutoff-analysis in callpeak command. #341
22 2) Issues related to SAMParser and three ELAND Parsers are
27 1) cmdlinetest script in test/ folder has been updated to: 1. test
28 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
29 and after the error or warning message during tests; 3. output
30 only the first 10 lines if the difference between test result and
31 standard result can be found; 4. prockreport monitor CPU time and
32 memory usage in 1 sec interval -- a bit more accurate.
34 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
36 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
37 MACS version 2.2.5 (Py3 speed up)
41 1) *Github code only and Not included in MACS2 release* New
42 testing data for performance test. An subsampled ENCODE2 CTCF
43 ChIP-seq dataset, including 5million ChIP reads and 5 million
44 control reads, has been included in the test folder for testing
45 CPU and memory usage (i.e. 5M test). Several related scripts ,
46 including `prockreport` for output cpu memory usage, `pyprofile`
47 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
50 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
51 The old hashtable.pyx implementation copied from Pandas (very old
52 version) doesn't work well in Python3+Cython. It slows down the
53 pqtable checkup using the identical Cython codes as in
54 v2.1.4. While running 5M test, the `__getitem__` function in the
55 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
56 148.6s with the same number of calls in MACS2 v2.2.4. As a
57 consequence, the standard python dictionary implementation has
58 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
59 faster than py2 version, but uses a bit more memory. In general,
60 v2.2.5 can finish 5M reads test in 20% less time than MACS2
61 v2.1.4, but use 15% more memory.
65 1) More Python3 related fixes, e.g. the return value of keys from
69 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
70 MACS version 2.2.4 (Python3)
74 1) First Python3 version MACS2 released.
76 2) Version number 2.2.X will be used for MACS2 in Python3, in
79 3) More comprehensive test.sh script to check the consistency of
80 results from Python2 version and Python3 version.
82 4) Simplify setup.py script since the newest version transparently
83 supports cython. And when cython is not installed by the user,
84 setup.py can still compile using only C codes.
86 5) Fix Signal.pyx to use np.array instead of np.mat.
88 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
93 Github Actions is used together with Travis CI for testing and
100 1) #318 Random score in bdgdiff output. It turns out the sum_v is
101 not initialized as 0 before adding. Potential bugs are fixed in
102 other functions in ScoreTrack and CallPeakUnit codes.
104 2) #321 Cython dependency in setup.py script is removed. And place
105 'cythonzie' call to the correct position.
107 3) A typo is fixed in Github Actions script.
109 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
114 1) Support Docker auto-deploy. PR #309
116 2) Support Travis CI auto-testing, update unit-testing
117 scripts, and enable subcommand testing on small datasets.
119 3) Update README documents. #297 PR #306
121 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
122 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
124 5) `--d-min` option is added in `callpeak` and `predictd`, to
125 exclude predictions of fragment size smaller than the given
126 value. Merged from PR #267 @shouldsee.
128 6) `--buffer-size` option is added in `predictd`, `filterdup`,
129 `pileup` and `refinepeak` subcommands. Users can use this option
130 to decrease memory usage while there are a large number of contigs
131 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
132 `pileup` and `refinepeak` will suggest users to tweak
133 `--buffer-size` while catching a MemoryError. #313 PR #314
137 1) #265 Fixed a bug where the pseudocount hasn't been applied
138 while calculating p-value score in ScoreTrack object.
140 2) Fixed bdgbroadcall so that it will report those broad peaks
141 without strong peak inside, a consistent behavior as `callpeak
144 3) Rename COPYING to LICENSE.
146 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
151 1) Added missing BEDPE support. And enable the support for BAMPE
152 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
153 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
154 will pile up the whole fragment defined by mapping locations of
155 the left end and right end of each read pair. Thank @purcaro
157 2) Added options to callpeak command for tweaking max-gap and
158 min-len during peak calling. Thank @jsh58!
160 3) The callpeak option "--to-large" option is replaced with
163 4) The randsample option "-t" has been replaced with "-i".
167 1) Fixed memory issue related to #122 and #146
169 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
171 3) Fixed a bug while setting commandline qvalue cutoff.
173 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
175 5) Fixed the calculation of average fragment length for paired-end
178 6) Fixed bugs caused by khash while computing p/q-value and log
179 likelihood ratios. Thank @jsh58
181 7) More spelling tweaks in source code. Thank @mr-c
183 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
184 MACS version 2.1.1 20160309
188 * Fixed spelling. Merged pull request #120. Thank @mr-c!
190 * Change filtering criteria for reading BAM/SAM files
192 Related to callpeak and filterdup commands. Now the
193 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
194 still be read although MACS2 may decide them as duplicates
195 later. Related to old issue #33. Sorry I forgot to address it for
198 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
199 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
203 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
204 the former option is not supported by older GCC. Related to issues
207 2) Issue #108 is fixed. If no peak can be found in a chromosome,
208 the PeakIO won't throw an error.
214 a) A more flexible format, BEDPE, is supported. Now users can
215 define the left and right position of the ChIPed fragment, and
216 MACS2 will skip model building and directly pileup the
217 fragments. Related to issue #112.
219 b) The 'tempdir' can be specified, to save cached pileup
220 tracks. Originially, the temporary files were stored in
221 /tmp. Thank @daler! Related to issues #97 and #105.
225 New operations are added, to calculate the maximum or minimum value between
226 values in BEDGRAPH and given value.
230 New method is added, to calculate the maximum value between values
231 defined in two BEDGRAPH files.
233 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
234 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
238 1) Fix a bug while dealing with some chromosomes only containing
239 one read (pair). The size of dup_plus/dup_minus arrays after
240 filtering dups should +1.
242 2) Fix a bug related to the broad peak calling function in
243 previous versions. The gaps were miscalculated, so segmented weak
244 broad calls may be reported, and sometimes you would see peaks
245 with lower than cutoff values in the output files.
247 3) "Potentially" Fixed issue #105 on temporary cache files, need
251 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
252 MACS version 2.1.0 20150731 (tag:rc)
256 1) Fixed issue #76: information about broad/narrow cutoff will be
259 2) Fixed issue #79: bdgopt extparam option is fixed.
261 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
262 for filterdup command.
264 4) Fixed issue #78, #88 and similar issue reported in MACS google
265 group: MACS2 now can correctly deal with multiple alignment files
266 for -t or -c. The 'finalize' function will be correctly
267 called. Multiple files option is enabled for filterdup,
268 randsample, predictd, pileup and refinepeak commands.
270 5) A related issue to #88, when BAMPE mode is used, PE pairs will
271 be sorted by leftmost then rightmost ends.
273 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
274 array. This will cause 'callpeak --nolambda' hang forever while
275 calculating pvalues and qvalues.
277 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
278 MACS version 2.1.0 20150420 (tag:rc)
282 1) bdgopt: some convenient functions to modify bedGraph files.
284 2) cmbreps: Combine scores from two replicates. Including three
285 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
286 method to combine two p-value scores. After that, user can use
287 bdgpeakcall to call peaks on combined scores.
291 1) callpeak and bdgpeakcall now can try to analyze the
292 relationship between p-values and number/length of peaks then
293 generate a summary to help users decide an appropriate cutoff.
295 2) callpeak now can accept fold-enrichment cutoff as a filter for
300 Now MACS2 runs about 3X as fast as previous version. Trade
301 clean python codes for speed... Now while processing 50M ChIP vs
302 50M control, it will take only 10 minutes.
306 1) Sampling function in BAMPE mode.
308 2) Callpeak while there are >= 2 input files for -t or -c.
310 3) While reading BAM/SAM, those secondary or supplementary
311 alignments will be correctly skipped.
313 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
314 option that MACS2 will discard those SAM/BAM alignments with bit
315 1024 no matter how --keep-dup is set.
317 5) Fixed issue #49: setuptools is used intead of distutils
319 6) Fixed issue #51: fix the problem when using --trackline
320 argument when control file is absent.
322 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
323 read mapped to minus strand. Previous implementation will find
324 incorrect 5' end if there is indel in alignment.
326 8) Fixed issue #56: An incorrect sorting method used for BAMPE
327 mode which will cause incorrect filtering of duplicated reads. Now
330 9) Issue #63: Merged from jayhesselberth@github, extsize now can
333 10) Issue #71: Merged from aertslab@github, close file descriptor
334 after creating them with mkstemp().
336 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
337 MACS version 2.1.0 20140616 (tag:rc)
341 "--ratio" is added to manually assign the scaling factor of ChIP
342 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
343 implementing the patch file!
345 "--shift" is added to move cutting ends (5' end of reads) around,
346 in order to process DNAse-Seq data, e.g., use "--shift -100
347 --extsize 200" to get 200bps fragments around 5' ends. For general
348 ChIP-Seq data analysis, this option should be always set as
349 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
352 ** Do not output negative fragment size from cross-correlation
353 analysis. Thank Alvin Qin for the feedback!
355 ** --half-ext and --control-shift are removed. For complex read
356 shifting and extending, combine '--shift' and '--extsize'
357 options. For comparing two conditions, use 'bdgdiff' module
360 ** a bug is fixed to output the last pileup value in bdg file
365 A 'dry-run' option is added to only output numbers, including the
366 number of allowed duplicates, the total number of reads before and
367 after filtering duplicates and the estimated duplication
368 rate. Thank John Urban for the suggestion!
371 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
372 MACS version 2.0.10 20131216 (tag:alpha)
376 * We changed license from Artistic License to 3-clauses BSD license.
378 Yes. Simpler the better.
380 * Process paired-end data with "-f BAMPE" without control
382 * GappedPeak output for --broad option has been fixed again to be
383 consistent with official UCSC format. We add 1bp pseudo-block to
384 left and/or right of broad region when necessary, so that you can
385 virtualize the regions without strong enrichment inside
386 successfully. In downstream analysis except for virtualization,
387 you may need to remove all 1bps blocks from gappedPeak file.
389 * diffpeak subcommand is temporarily disabled. Till we
392 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
393 MACS version 2.0.10 20131028 (tag:alpha)
395 * callpeak --call-summits improvement
397 The smoothing window length has been fixed as fragment length
398 instead of short read length. The larger smoothing window will
399 grant better smoothing results and better sub-peak summits
402 * --outdir and --ofile options for almost all commands
404 Thank Björn Grüning for initially implementing these options!
405 Now, MACS2 will save results into a specified
406 directory by '--outdir' option, and/or save result into a
407 specified file by '--ofile' option. Note, in case '--ofile' is
408 available for a subcommand, '-o' now has been adjusted to be the
409 same as '--ofile' instead of '--o-prefix'.
411 Here is the list of changes. For more detail, use 'macs2 xxx -h'
414 ** callpeak: --outdir
415 ** diffpeak: Not implemented
416 ** bdgpeakcall: --outdir and --ofile
417 ** bdgbroadcall: --outdir and --ofile
418 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
419 and the order of arguments for --ofile must be the same as for -m.
420 ** bdgdiff: --outdir and --ofile
421 ** filterdup: --outdir
423 ** randsample: --outdir
424 ** refinepeak: --outdir and --ofile
427 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
428 MACS version 2.0.10 20130915 (tag:alpha)
430 * callpeak Added a new option --buffer-size
432 This option is to tweak a previously hidden parameter that
433 controls the steps to increase array size for storing alignment
434 information. While in some rare cases, the number of
435 chromosomes/contigs/scaffolds is huge, the original default
436 setting will cause a huge memory waste. In these cases, we
437 recommend to decrease --buffer-size (e.g., 1000) to save memory,
438 although the decrease will slow process to read alignment files.
440 * an optimization to speed up pvalue-qvalue statistics
442 Previously, it took a hour to prepare p-q-table for 65M vs 65M
443 human TF library, and now it will take 10 minutes. It was due to a
444 single line of code to get a value from a numpy array ...
448 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
449 MACS version 2.0.10 20130731 (tag:alpha)
451 * callpeak --call-summits
453 Fix bugs causing callpeak --call-summits option generating extra
454 number of peaks and inconsistent peak boundaries comparing to
455 default option. Thank Ben Levinson!
459 Fix bugs causing bdgcmp output logLR all in positive values. Now
460 'depletion' can be correctly represented as negative values.
464 Fix the behavior of bdgdiff module. Now it can take four
465 bedGraph files, then use logLR as cutoff to call differential
466 regions. Check command line of bdgdiff for detail.
468 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
469 MACS version 2.0.10 20130713 (tag:alpha)
471 * fix bugs while output broadPeak and gappedPeak.
473 Note. Those weak broad regions without any strong enrichment
474 regions inside won't be saved in gappedPeak file.
476 * bdgcmp -T and -C are merged into -S and description is updated.
478 Now, you can use it to override SPMR values in your input for
479 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
480 statistics will cause weird results ( in most cases, lower
481 significancy), and won't be consistent with MACS2 callpeak
482 behavior. So if you have SPMR bedGraphs, input the smaller/larger
483 sample size in MILLION according to 'callpeak --to-large' option.
485 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
486 MACS version 2.0.10 20130710 (tag:alpha)
488 * fix BED style output format of callpeak module:
490 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
491 the output. Old BED format file won't be saved.
493 2) with --broad: broadPeak (BED6+3) for broad region and
494 gappedPeak (BED12+3) for chained enriched regions will be the
495 output. Old BED format, narrowPeak format, summit file won't be
498 * bdgcmp now can accept list of methods to calculate scores. So
499 you can run it once to generate multiple types of scores. Thank
500 Jon Urban for this suggestion!
502 * C codes are re-generated through Cython 0.19.1.
504 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
505 MACS version 2.0.10 20130520 (tag:alpha)
507 * broad peak calling modules are modified in order to report all
508 relexed regions even there is no strong enrichment inside.
510 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
511 MACS version 2.0.10 20130501 (tag:alpha)
513 * Memory usage is decreased to about 1/4-1/5 of previous usage
514 Now, the internal data structure and algorithm are both
515 re-organized, so that intermediate data wouldn't be saved in
516 memory. Intead they will be calculated on the fly. New MACS2 will
517 spend longer time (1.5 to 2 times) however it will use less memory
518 so can be more usable on small mem servers.
520 * --seed option is added to callpeak and randsample commands
521 Thank Mathieu Gineste for this suggestion!
523 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
524 MACS version 2.0.10 20130306 (tag:alpha)
526 * diffpeak module New module to detect differential binding sites
527 with more statistics.
529 * Introduced --refine-peaks
530 Calculates reads balancing to refine peak summits
532 * Ouput file names prefix
533 Correct encodePeak to narrowPeak, broadPeak to bed12.
535 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
536 MACS version 2.0.10 (tag:alpha not released)
538 * Introduced BAMPEParser
539 Reads PE data directly, requires bedtools for now
541 * Introduced --call-summits
542 Uses signal processing methods to call overlapping peaks
544 * Added --no-trackline
545 By default, files have descriptive tracklines now
547 * new refinepeak command (experimental)
548 This new function will use a similar method in SPP (wtd), to
549 analyze raw tag distribution in peak region, then redefine the
550 peak summit where plus and minus tags are evenly distributed
553 * Changes to output *
554 cPeakDetect.pyx has full support for new print/write methods and
555 --call-peaks, BAMPEParser, and use of paired-end data
557 * Parser optimization
559 cParser.pyx is rewritten to use io.BufferedReader to speed
560 up. Speed is doubled.
562 Code is reorganized -- most of functions are inherited from
565 * Use cross-correlation to calculate fragment size
567 First, all pairs will be used in prediction for fragment
568 size. Previously, only no more than 1000 pairs are used. Second,
569 cross-correlation is used to find the best phase difference
570 between + and - tag pileups.
572 * Speed up p-value and q-value calculation
574 This part is ten times faster now. I am using a dictionary to
575 cache p-value results from Poisson CDF function. A bit more memory
576 will be used to increase speed. I hope this dictionary would not
577 explode since the possible pairs of ChIP signal and control lambda
578 are hugely redundant. Also, I rewrited part of q-value
581 * Speed up peak detection
583 This part is about hundred of times faster now. Optimizations
584 include using Numpy functions as much as possible, and making loop
585 body as small as possible.
587 * Post-processing on differential calls
589 After macs2diff finds differential binding sites between two
590 conditions, it will try to annotate the peak calls from one of two
591 conditions, describe the changes ...
593 * Fragment size prediction in macs2diff
595 Now by default, macs2diff will try to use the average fragment
596 size from both condition 1 and condition 2 for tag extension and
597 peak calling. Previously, by default, it will use different sizes
598 unless --nomodel is specified.
600 Technically, I separate model building processes out. So macs2diff
601 will build fragment sizes for condition 1 and 2 in parallel (2
602 processes maximum), then perform 4-way comparisons in parallel (4
607 Combine two p/qscore tracks together. At regions where condition 1
608 is higher than condition 2, score would be positive, otherwise,
611 * SAMParser and BAMParser
613 Bug fixed for paired-end sequencing data.
617 Fixed a bug while calling peaks from BedGraph file. It previously
618 mistakenly output same peaks multiple times at the end of
621 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
622 MACS version 2.0.9 (tag:alpha)
624 * Auto fixation on predicted d is turned off by default!
626 Previous --off-auto is now default. MACS will not automatically
627 fix d less than 2 times of tag size according to
628 --shiftsize. While tag size is getting longer nowadays, it would
629 be easier to have d less than 2 times of tag size, however d may
630 still be meaningful and useful. Please judge it using your own
635 Now, the default scaling while treatment and input are unbalanced
636 has been adjusted. By default, larger sample will be scaled down
637 linearly to match the smaller sample. In this way, background
638 noise will be reduced more than real signals, so we expect to have
639 more specific results than the other way around (i.e. --to-large
642 Also, an alternative option to randomly sample larger data
643 (--down-sample) is provided to replace default linear
644 scaling. However, this option will cause results irresproducible,
649 A new script 'randsample' is added, which can randomly sample
650 certain percentage or number of tags.
654 Now, MACS will decide peak summits according to pileup height
655 instead of qvalue scores. In this way, the summit may be more
660 MACS calculate qvalue scores as differential scores. When compare
661 two conditions (saying A and B), the maximum qscore for comparing
662 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
663 will be computed. If maxqscore_a2b is bigger, the diff score is
664 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
666 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
667 MACS version 2.0.8 (tag:alpha)
669 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
671 New script bdgbroadcall and the extra option '--broad' for macs2
672 script, can be used to call broad regions with a loose cutoff to
673 link nearby significant regions. The output is represented as
676 * MACS2/IO/cScoreTrack.pyx
678 Fix q-value calculation to generate forcefully monotonic values.
680 * bin/eland*2bed, bin/sam2bed and bin/filterdup
682 They are combined to one more powerful script called
683 "filterdup". The script filterdup can filter duplicated reads
684 according to sequencing depth and genome size. The script can also
685 convert any format supported by MACS to BED format.
687 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
688 MACS version 2.0.7 (tag:alpha)
690 * bin/macsdiff renamed to bin/bdgdiff
692 Now this script will work as a low-level finetuning tool as bdgcmp
697 A new script to take treatment and control files from two
698 condition, calculate fragment size, use local poisson to get
699 pvalues and BH process to get qvalues, then combine 4-ways result
700 to call differential sites.
702 This script can use upto 4 cpus to speed up 4-ways calculation. (
703 I am trying multiprocessing in python. )
705 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
706 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
707 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
709 All above files are modified for the new macs2diff script.
711 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
713 Now q-value 0.01 is the default cutoff. If -p is specified,
714 p-value cutoff will be used instead.
716 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
717 MACS version 2.0.6 (tag:alpha)
721 A script to call differential regions. A naive way is introduced
722 to find the regions where:
724 1. signal from condition 1 is larger than input 1 and condition 2 --
725 unique region in condition 1;
726 2. signal from condition 2 is larger than input 2 and condition 1
727 -- unique region in condition 2;
728 3. signal from condition 1 is larger than input 1, signal from
729 condition 2 is larger than input 2, however either signal from
730 condition 1 or 2 is not larger than the other.
732 Here 'larger' means the pvalue or qvalue from a Poisson test is
733 under certain cutoff.
735 (I will make another script to wrap up mulitple scripts for
736 differential calling)
738 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
739 MACS version 2.0.5 (tag:alpha)
741 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
744 Use hash to store peak information. Add back the feature to deal
745 with data without control.
747 Fix bug which incorrectly allows small peaks at the end of
750 * bin/bdgpeakcall, bin/bdgcmp
752 Fix bugs. bdgpeakcall can output encodePeak format.
754 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
755 MACS version 2.0.4 (tag:alpha)
759 Fix a bug, correctly assign lambda_bg while --to-small is
760 set. Thanks Junya Seo!
762 Add rank and num of bp columns to pvalue-qvalue table.
766 Fix bugs to correctly deal with peakless chromosomes. Thanks
769 Use AFDR for independent tests instead.
773 Now MACS can output peak coordinates together with pvalue, qvalue,
774 summit positions in a single encodePeak format (designed for
775 ENCODE project) file. This file can be loaded to UCSC
776 browser. Definition of some specific columns are: 5th:
777 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
778 -log10qvalue, 10th: relative summit position to peak start.
781 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
782 MACS version 2.0.3 (tag:alpha)
784 * Rich output with qvalue, fold enrichment, and pileup height
786 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
789 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
791 Now we have a similiar xls output file as before. The differences
792 from previous file are:
794 1. Summit now is absolute summit, instead of relative summit
796 2. 'Pileup' is previous 'tag' column. It's the extended fragment
797 pileup at the peak summit;
798 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
799 5.00 means 1e-5, simple and less confusing.
800 4. FDR column becomes '-log10(qvalue)' column.
801 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
802 the values at the peak summit.
806 NAME_pqtable.txt contains pvalue and qvalue relationships.
808 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
809 and -log10qvalue scores in BedGraph format. Nearby regions with
810 the same value are not merged.
812 * Separation of FeatIO.py
814 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
815 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
816 implemented to store pileup, local lambda, pvalue, and qvalue
817 alltogether in cScoreTrack.pyx.
819 * Experimental option --half-ext
821 Suggested by NPS algorithm, I added an experimental option
822 --half-ext to let MACS only extends ChIP fragment around its
823 middle point for only 1/2 d.
825 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
826 MACS version 2.0.2 (tag:alpha)
830 Add an error check to see if there is no common chromosome names
831 from treatment file and control file
833 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
835 Reduce memory usage by removing deepcopy() calls.
837 * Modify README documents and others.
839 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
840 MACS Version 2.0.1 (tag:alpha)
842 * cPileup.pyx, cPeakDetect.pyx and peak calling process
844 Jie suggested me a brilliant simple method to pileup fragments
845 into bedGraph track. It works extremely faster than the previous
846 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
847 large local lambda calculation in MACSv2 now. Now I generate three
848 bedGraphs for d-size local bias, slocal-size and llocal-size local
849 bias, and calculate the maximum local bias as local lambda
852 Minor: add_loc in bedGraphTrackI now can correctly merge the
853 region with its preceding region if their value are the same.
857 Add an option to shift control tags before extension. By default,
858 control tags will be extended to both sides regardless of strand
861 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
862 MACS Version 2.0.0 (tag:alpha)
864 * Use bedGraph type to store data internally and externally.
866 We can have theoretically one-basepair resolution profiles. 10
867 times smaller in filesize and even smaller after converting to
868 bigWig for visualization.
870 * Peak calling process modified. Better peak boundary detection.
872 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
873 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
874 one will be averaged to d size) Then calculate the maximum value
875 of these two tracks and a global background, to have a
876 local-lambda bedGraph.
878 Use -10log10poisson_pvalue as scores to generate a score track
881 A general peak calling based on a score cutoff, min length of peak
882 and max gap between nearby peaks.
886 Wiggle file output is removed. Now we only support bedGraph
887 output. The generation of bedGraph is highly recommended since it
888 will not cost extra time. In other words, bedGraph generation is
889 internally run even you don't want to save bedGraphs on disk, due
890 to the peak calling algorithm in MACS v2.
894 We now can calculate poisson pvalue in log space so that the score
895 (-10*log10pvalue) will not have a upper limit of 3100 due to
896 precision of float number.
898 * Cython is adopted to speed up Python code.
900 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
903 * Replaced with a newest WigTrackI class and fixed the wignorm script.
905 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
906 Version 1.4.0rc2 (Valentine)
908 * --single-wig option is renamed to --single-profile
910 * BedGraph output with --bdg or -B option.
912 The BedGraph output provides 1bp resolution fragment pileup
913 profile. File size is smaller than wig file. This option can be
914 combined with --single-profile option to produce a bedgraph file
915 for the whole genome. This option can also make --space,
916 --call-subpeaks invalid.
918 * Fix the description of --shiftsize to correctly state that the
919 value is 1/2 d (fragment size).
921 * Fix a bug in the call to __filter_w_control_tags when control is
924 * Fix a bug on --to-small option. Now it works as expected.
926 * Fix a bug while counting the tags in candidate peak region, an
927 extra tag may be included. (Thanks to Jake Biesinger!)
929 * Fix the bug for the peaks extended outside of chromosome
930 start. If the minus strand tag goes outside of chromosome start
931 after extension of d, it will be thrown out.
933 * Post-process script for a combined wig file:
935 The "wignorm" command can be called after a full run of MACS14 as
936 a postprocess. wignorm can calculate the local background from the
937 control wig file from MACS14, then use either foldchange,
938 -10*log10(pvalue) from possion test, or difference after asinh
939 transformation as the score to build a single wig track to
940 represent the binding strength. This script will take a
941 significant long time to process.
943 * --wigextend has been obsoleted.
945 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
946 Version 1.4.0rc1 (Starry Sky)
948 * Duplicate reads option
950 --keep-dup behavior is changed. Now user can specify how many
951 reads he/she wants to keep at the same genomic location. 'auto' to
952 let MACS decide the number based on binomial distribution, 'all'
953 to let MACS keep all reads.
955 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
957 By default, MACS will now scale the smaller dataset to the bigger
958 dataset. For instance, if IP has 10 million reads, and Input has 5
959 million, MACS will double the lambda value calculated from Input
960 reads while calling BOTH the positive peaks and negative
961 peaks. This will address the issue caused by unbalanced numbers of
962 reads from IP and Input. If --to-small is turned on, MACS will
963 scale the larger dataset to the smaller one. So from now on, if d
964 is fixed, then the peaks from a MACS call for A vs B should be
965 identical to the negative peaks from a B vs A.
967 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
968 Version 1.4.0beta (summer wishes)
974 The default behavior in the model building step is slightly
975 changed. When MACS can't find enough pairs to build model
976 (implemented in alpha version) or the modeled fragment length is
977 less than 2 times of tag length (implemented in beta version),
978 MACS will use 2 times of --shiftsize value as fragment length in
979 the later analysis. --off-auto can turn off this default behavior.
981 ** Redundant tag filtering
983 The IO module is rewritten. The redundant tag filtering process
984 becomes simpler and works as promise. The maximum allowed number
985 of tags at the exact same location is calculated from the
986 sequencing depth and genome size using a binomial distribution,
987 for both TREAMENT and CONTROL separately. ( previously only
988 TREATMENT is considered ) The exact same location means the same
989 coordination and the same strand. Then MACS will only keep at most
990 this number of tags at the exact same location in the following
991 analysis. An option --keep-dup can let MACS skip the filtering and
992 keep all the tags. However this may bring in a lot of sequencing
993 bias, so you may get many false positive peaks.
995 ** Single wiggle mode
997 First thing to mention, this is not the score track that I
998 described before. By default, MACS generates wiggle files for
999 fragment pileup for every chromosomes separately. When you use
1000 --single-wig option, MACS will generate a single wiggle file for
1001 all the chromosomes so you will get a wig.gz for TREATMENT and
1002 another wig.gz for CONTROL if available.
1004 ** Sniff -- automatic format detection
1006 Now, by default or "-f AUTO", MACS will decide the input file
1007 format automatically. Technically, it will try to read at most
1008 1000 records for the first 10 non-comment lines. If it succeeds,
1009 the format is decided. I recommend not to use AUTO and specify the
1010 right format for your input files, unless you combine different
1011 formats in a single MACS run.
1015 --single-wig and --keep-dup are added. Check previous section in
1016 ChangeLog for detail.
1018 -f (--format) AUTO is now the default option.
1020 --slocal default: 1000
1021 --llocal default: 10000
1025 Setup script will stop the installation if python version is not
1026 python2.6 or python2.7.
1028 Local lambda calculation has been changed back. MACS will check
1029 peak_region, slocal( default 1K) and llocal (default 10K) for the
1030 local bias. The previous 200bps default will cause MACS misses
1031 some peaks where the input bias is very sharp.
1033 sam2bed.py script is corrected.
1035 Relative pos in xls output is fixed.
1037 Parser for ELAND_export is fixed to pass some of the no match
1038 lines. And elandexport2bed.py is fixed too. ( however I can't
1039 guarantee that it works on any eland_export files. )
1041 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1042 Version 1.4.0alpha2 (be smarter)
1046 --gsize now provides shortcuts for common genomes, including
1047 human, mouse, C. elegans and fruitfly.
1049 --llocal now will be 5000 bps if there is no input file, so that
1050 local lambda doesn't overkill enriched binding sites.
1052 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1053 Version 1.4alpha (be smarter)
1057 --tsize option is redesigned. MACS will use the first 10 lines of
1058 the input to decide the tag size. If user specifies --tsize, it
1059 will override the auto decided tsize.
1061 --lambdaset is replaced by --slocal and --llocal which mean the
1062 small local region and large local region.
1064 --bw has no effect on the scan-window size now. It only affects the
1065 paired-peaks model process.
1069 During the model building, MACS will pick out the enriched regions
1070 which are not too high and not too low to build the paired-peak
1071 model. Default the region is from fold 10 to fold 30. If MACS
1072 fails to build the model, by default it will use the nomodel
1073 settings, like shiftsize=100bps, to shift and extend each
1074 tags. This behavior can be turned off by '--off-auto'.
1078 An extra file including all the summit positions are saved in
1079 *_summits.bed file. An option '--call-subpeaks' will invoke
1080 PeakSplitter developed by Mali Salmon to split wide peaks into
1083 * Sniff ( will in beta )
1085 Automatically recognize the input file format, so use can combine
1086 different format in one MACS run.
1088 Not implemented features/TODO:
1090 * Algorithms ( in near future? )
1092 MACS will try to refine the peak boundaries by calculating the
1093 scores for every point in the candidate peak regions. The score
1094 will be the -10*log(10,pvalue) on a local poisson distribution. A
1095 cutoff specified by users (--pvalue) will be applied to find the
1096 precise sub-peaks in the original candidate peak region. Peak
1097 boudaries and peak summits positions will be saved in separate BED
1100 * Single wiggle track ( in near future? )
1102 A single wiggle track will be generated to save the scores within
1103 candidate peak regions in the 10bps resolution. The wiggle file
1104 is in fixedStep format.
1107 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1108 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1112 Fixed typo. FCSTEP -> FESTEP
1116 The 'femax' attribute bug is fixed
1118 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1119 Version 1.3.7 (Oktoberfest)
1121 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1123 Enhancements by Peter Chines:
1125 1. gzip files are supported.
1126 2. when --diag is on, user can set the increment and endpoint for
1127 fold enrichment analysis by setting --fe-step and --fe-max.
1129 Enhancements by Davide Cittaro:
1131 1. BAM and SAM formats are supported.
1132 2. small changes in the header lines of wiggle output.
1135 1. I added --fe-min option;
1136 2. Bowtie ascii output with suffix ".map" is supported.
1140 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1141 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1142 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1143 4. Some "fold change" have been changed to "fold enrichment".
1145 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1146 Version 1.3.6.1 (default parameter change)
1148 * bin/macs, lib/PeakDetect.py
1150 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1151 default. "--futurefdr" is added which can turn on the 'new' method
1152 introduced in 1.3.6. By default it's off.
1156 Fixed a bug. p-value is corrected a little bit.
1159 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1160 Version 1.3.6 (Birthday cake)
1164 "track name" is added to the header of BED output file.
1166 Now the default peak detection method is to consider 5k and 10k
1167 nearby regions in treatment data and peak location, 1k, 5k, and
1168 10k regions in control data to calculate local bias. The old
1169 method can be called through '--old' option.
1171 Information about how many total/unique tags in treatment or
1172 control will be saved in final .xls output.
1174 * lib/IO/__init__.py
1176 ".fa" will be removed from input tag alignment so only the
1177 chromosome names are kept.
1179 WigTrackI class is added for Wiggle like data structure. (not used
1182 The parser for ELAND multi PET files has been fixed. Now the 5'
1183 tag position for a pair will be kept, whereas in the previous
1184 version, the middle points are kept.
1186 * lib/IO/BinKeeper.py
1188 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1189 browser, which can quickly access certain region for values in a
1190 large wiggle like data file. (not used now)
1192 * lib/OptValidator.py
1198 Now the default peak detection method is to consider 5k and 10k
1199 nearby regions in treatment data and peak location, 1k, 5k, and
1200 10k regions in control data to calculate local bias. The old
1201 method can be called through '--old' option.
1203 Two columns have beed added to BED output file. 4th column: peak
1204 name; 5th column: peak score using -10log(10,pvalue) as score.
1208 Add support to build a Mac App through 'setup.py py2app', or a
1209 Windows executable through 'setup.py py2exe'. You need to install
1210 py2app or py2exe package in order to use these functions.
1212 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1213 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1217 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1218 in control data to calculate local lambda for each peak. Peak
1219 calling results will be slightly different with previous version,
1224 Typo fixed, ELANDParser -> ELANDResultParser
1228 Now, modeled d value will be shown on the model figure.
1230 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1231 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1233 * macs, IO/__init__.py, PeakDetect.py
1235 Add support for ELAND multi format. Add support for Pair-End
1236 experiment, in this case, 5'end and 3'end ELAND multi format files
1237 are required for treatment or control data. See 00README file for
1240 Add wigextend option.
1242 Add petdist option for Pair-End Tag experiment, which is the best
1243 distance between 5' and 3' tags.
1247 Fixed a bug which cause the end positions of every peak region
1248 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1252 Fix bugs while generating wiggle files. The start position of
1253 wiggle file is set to 1 instead of 0.
1255 Fix a bug that every 10M bps, signals in the first 'd' range are
1256 lower than actual. ( Thanks Mali Salmon!)
1259 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1260 Version 1.3.3 (wiggle bugs fixed)
1264 Fix bugs while generating wiggle files. 1. 'span=' is added to
1265 'variableStep' line; 2. previously, every 10M bps, the coordinates
1266 were wrongly shifted to the right for 'd' basepairs.
1268 * macs, PeakDetect.py
1270 Add an option to save wiggle files on different resolution.
1272 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1273 Version 1.3.2 (tiny bugs fixed)
1277 Fix 65536 -> 65535. ( Thank Joon)
1281 Improved for binomial function with extra large number. Imported
1282 from Cistrome project.
1286 If treatment channel misses reads in some chromosome included in
1287 control channel, or vice versa, MACS will not exit. (Thank Shaun
1290 Instead, MACS will fake a tag at position -1 when calling
1291 treatment peaks vs control, but will ignore the chromosome while
1292 calling negative peaks.
1294 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1295 Version 1.3.1 (tiny bugs fixed version)
1299 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1300 binomial functions can tolerate large and small numbers.
1304 Parsers now split lines in BED/ELAND file using any
1305 whitespaces. 'track' or 'browser' lines will be regarded as
1306 comment lines. A bug fixed when throwing StrandFormatError. The
1307 maximum redundant tag number at a single position can be no less
1311 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1312 Version 1.3 (naming clarification version)
1314 * Naming clarification changes according to our manuscript:
1316 'frag_len' is changed to 'd'.
1318 'fold_change' is changed to 'fold_enrichment'.
1320 Suggest '--bw' parameter to be determined by users from the real
1323 Maximum FDR is 100% in the output file.
1325 And other clarifications in 00README file and the documents on the
1329 If the redundant tag number at a single position is over 32767,
1330 just remember 32767, instead of raising an overflow exception.
1336 Bug fixed for diagnosis report.
1339 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1344 Poisson distribution CDF and inverse CDF functions are
1345 corrected. They can produce right results even for huge lambda
1346 now. So that the p-value and FDR values in the final excel sheet
1349 IO package now can tolerate some rare cases; ELANDParser in IO
1350 package is fixed. (Thank Bogdan)
1354 Reverse paired peaks in model are rejected. So there will be no
1355 negative 'frag_len'. (Thank Bogdan)
1359 Diagnosis function is completed. Which can output a table file for
1360 users to estimate their sequencing depth.
1363 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1366 * Probe.py is added!
1368 GSL is totally removed from MACS. Instead, I have implemented the
1369 CDF and inverse CDF for poisson and binomial distribution purely
1372 * Constants.py is added!
1374 Organize constants used in MACS in the Constants.py file.
1376 * All other files are modified!
1378 Foldchange calculation is modified. Now the foldchange only be
1379 calculated at the peak summit position instead of the whole peak
1380 region. The values will be higher and more robust than before.
1384 1. MACS can save wiggle format files containing the tag number at
1385 every 10 bp along the genome. Tags are shifted according to our
1386 model before they are calculated.
1388 2. Model building and local lambda calculation can be skipped with
1391 3. A diagnosis report can be generated through '--diag'
1392 option. This report can help you get an assumption about the
1393 sequencing saturation. This funtion is only in beta stage.
1395 4. FDR calculation speed is highly improved.
1397 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1400 * TabIO, PeakModel.py ...
1401 Bug fixed to let MACS tolerate some cases while there is no tag on
1402 either plus strand or minus strand.
1405 Check the version of python. If the version is lower than 2.4,
1406 refuse to install with warning.
1409 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1410 MACS version 2.0.10 20130731 (tag:alpha)
1412 * callpeak --call-summits
1414 Fix bugs causing callpeak --call-summits option generating extra
1415 number of peaks and inconsistent peak boundaries comparing to
1416 default option. Thank Ben Levinson!
1420 Fix bugs causing bdgcmp output logLR all in positive values. Now
1421 'depletion' can be correctly represented as negative values.
1425 Fix the behavior of bdgdiff module. Now it can take four
1426 bedGraph files, then use logLR as cutoff to call differential
1427 regions. Check command line of bdgdiff for detail.
1429 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1430 MACS version 2.0.10 20130713 (tag:alpha)
1432 * fix bugs while output broadPeak and gappedPeak.
1434 Note. Those weak broad regions without any strong enrichment
1435 regions inside won't be saved in gappedPeak file.
1437 * bdgcmp -T and -C are merged into -S and description is updated.
1439 Now, you can use it to override SPMR values in your input for
1440 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1441 statistics will cause weird results ( in most cases, lower
1442 significancy), and won't be consistent with MACS2 callpeak
1443 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1444 sample size in MILLION according to 'callpeak --to-large' option.
1446 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1447 MACS version 2.0.10 20130710 (tag:alpha)
1449 * fix BED style output format of callpeak module:
1451 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1452 the output. Old BED format file won't be saved.
1454 2) with --broad: broadPeak (BED6+3) for broad region and
1455 gappedPeak (BED12+3) for chained enriched regions will be the
1456 output. Old BED format, narrowPeak format, summit file won't be
1459 * bdgcmp now can accept list of methods to calculate scores. So
1460 you can run it once to generate multiple types of scores. Thank
1461 Jon Urban for this suggestion!
1463 * C codes are re-generated through Cython 0.19.1.
1465 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1466 MACS version 2.0.10 20130520 (tag:alpha)
1468 * broad peak calling modules are modified in order to report all
1469 relexed regions even there is no strong enrichment inside.
1471 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1472 MACS version 2.0.10 20130501 (tag:alpha)
1474 * Memory usage is decreased to about 1/4-1/5 of previous usage
1475 Now, the internal data structure and algorithm are both
1476 re-organized, so that intermediate data wouldn't be saved in
1477 memory. Intead they will be calculated on the fly. New MACS2 will
1478 spend longer time (1.5 to 2 times) however it will use less memory
1479 so can be more usable on small mem servers.
1481 * --seed option is added to callpeak and randsample commands
1482 Thank Mathieu Gineste for this suggestion!
1484 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1485 MACS version 2.0.10 20130306 (tag:alpha)
1487 * diffpeak module New module to detect differential binding sites
1488 with more statistics.
1490 * Introduced --refine-peaks
1491 Calculates reads balancing to refine peak summits
1493 * Ouput file names prefix
1494 Correct encodePeak to narrowPeak, broadPeak to bed12.
1496 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1497 MACS version 2.0.10 (tag:alpha not released)
1499 * Introduced BAMPEParser
1500 Reads PE data directly, requires bedtools for now
1502 * Introduced --call-summits
1503 Uses signal processing methods to call overlapping peaks
1505 * Added --no-trackline
1506 By default, files have descriptive tracklines now
1508 * new refinepeak command (experimental)
1509 This new function will use a similar method in SPP (wtd), to
1510 analyze raw tag distribution in peak region, then redefine the
1511 peak summit where plus and minus tags are evenly distributed
1514 * Changes to output *
1515 cPeakDetect.pyx has full support for new print/write methods and
1516 --call-peaks, BAMPEParser, and use of paired-end data
1518 * Parser optimization
1520 cParser.pyx is rewritten to use io.BufferedReader to speed
1521 up. Speed is doubled.
1523 Code is reorganized -- most of functions are inherited from
1524 GenericParser class.
1526 * Use cross-correlation to calculate fragment size
1528 First, all pairs will be used in prediction for fragment
1529 size. Previously, only no more than 1000 pairs are used. Second,
1530 cross-correlation is used to find the best phase difference
1531 between + and - tag pileups.
1533 * Speed up p-value and q-value calculation
1535 This part is ten times faster now. I am using a dictionary to
1536 cache p-value results from Poisson CDF function. A bit more memory
1537 will be used to increase speed. I hope this dictionary would not
1538 explode since the possible pairs of ChIP signal and control lambda
1539 are hugely redundant. Also, I rewrited part of q-value
1542 * Speed up peak detection
1544 This part is about hundred of times faster now. Optimizations
1545 include using Numpy functions as much as possible, and making loop
1546 body as small as possible.
1548 * Post-processing on differential calls
1550 After macs2diff finds differential binding sites between two
1551 conditions, it will try to annotate the peak calls from one of two
1552 conditions, describe the changes ...
1554 * Fragment size prediction in macs2diff
1556 Now by default, macs2diff will try to use the average fragment
1557 size from both condition 1 and condition 2 for tag extension and
1558 peak calling. Previously, by default, it will use different sizes
1559 unless --nomodel is specified.
1561 Technically, I separate model building processes out. So macs2diff
1562 will build fragment sizes for condition 1 and 2 in parallel (2
1563 processes maximum), then perform 4-way comparisons in parallel (4
1568 Combine two p/qscore tracks together. At regions where condition 1
1569 is higher than condition 2, score would be positive, otherwise,
1572 * SAMParser and BAMParser
1574 Bug fixed for paired-end sequencing data.
1578 Fixed a bug while calling peaks from BedGraph file. It previously
1579 mistakenly output same peaks multiple times at the end of
1582 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1583 MACS version 2.0.9 (tag:alpha)
1585 * Auto fixation on predicted d is turned off by default!
1587 Previous --off-auto is now default. MACS will not automatically
1588 fix d less than 2 times of tag size according to
1589 --shiftsize. While tag size is getting longer nowadays, it would
1590 be easier to have d less than 2 times of tag size, however d may
1591 still be meaningful and useful. Please judge it using your own
1596 Now, the default scaling while treatment and input are unbalanced
1597 has been adjusted. By default, larger sample will be scaled down
1598 linearly to match the smaller sample. In this way, background
1599 noise will be reduced more than real signals, so we expect to have
1600 more specific results than the other way around (i.e. --to-large
1603 Also, an alternative option to randomly sample larger data
1604 (--down-sample) is provided to replace default linear
1605 scaling. However, this option will cause results irresproducible,
1610 A new script 'randsample' is added, which can randomly sample
1611 certain percentage or number of tags.
1615 Now, MACS will decide peak summits according to pileup height
1616 instead of qvalue scores. In this way, the summit may be more
1621 MACS calculate qvalue scores as differential scores. When compare
1622 two conditions (saying A and B), the maximum qscore for comparing
1623 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1624 will be computed. If maxqscore_a2b is bigger, the diff score is
1625 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1627 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1628 MACS version 2.0.8 (tag:alpha)
1630 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1632 New script bdgbroadcall and the extra option '--broad' for macs2
1633 script, can be used to call broad regions with a loose cutoff to
1634 link nearby significant regions. The output is represented as
1637 * MACS2/IO/cScoreTrack.pyx
1639 Fix q-value calculation to generate forcefully monotonic values.
1641 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1643 They are combined to one more powerful script called
1644 "filterdup". The script filterdup can filter duplicated reads
1645 according to sequencing depth and genome size. The script can also
1646 convert any format supported by MACS to BED format.
1648 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1649 MACS version 2.0.7 (tag:alpha)
1651 * bin/macsdiff renamed to bin/bdgdiff
1653 Now this script will work as a low-level finetuning tool as bdgcmp
1658 A new script to take treatment and control files from two
1659 condition, calculate fragment size, use local poisson to get
1660 pvalues and BH process to get qvalues, then combine 4-ways result
1661 to call differential sites.
1663 This script can use upto 4 cpus to speed up 4-ways calculation. (
1664 I am trying multiprocessing in python. )
1666 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1667 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1668 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1670 All above files are modified for the new macs2diff script.
1672 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1674 Now q-value 0.01 is the default cutoff. If -p is specified,
1675 p-value cutoff will be used instead.
1677 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1678 MACS version 2.0.6 (tag:alpha)
1682 A script to call differential regions. A naive way is introduced
1683 to find the regions where:
1685 1. signal from condition 1 is larger than input 1 and condition 2 --
1686 unique region in condition 1;
1687 2. signal from condition 2 is larger than input 2 and condition 1
1688 -- unique region in condition 2;
1689 3. signal from condition 1 is larger than input 1, signal from
1690 condition 2 is larger than input 2, however either signal from
1691 condition 1 or 2 is not larger than the other.
1693 Here 'larger' means the pvalue or qvalue from a Poisson test is
1694 under certain cutoff.
1696 (I will make another script to wrap up mulitple scripts for
1697 differential calling)
1699 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1700 MACS version 2.0.5 (tag:alpha)
1702 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1703 MACS2/IO/cPeakIO.pyx
1705 Use hash to store peak information. Add back the feature to deal
1706 with data without control.
1708 Fix bug which incorrectly allows small peaks at the end of
1711 * bin/bdgpeakcall, bin/bdgcmp
1713 Fix bugs. bdgpeakcall can output encodePeak format.
1715 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1716 MACS version 2.0.4 (tag:alpha)
1720 Fix a bug, correctly assign lambda_bg while --to-small is
1721 set. Thanks Junya Seo!
1723 Add rank and num of bp columns to pvalue-qvalue table.
1727 Fix bugs to correctly deal with peakless chromosomes. Thanks
1730 Use AFDR for independent tests instead.
1734 Now MACS can output peak coordinates together with pvalue, qvalue,
1735 summit positions in a single encodePeak format (designed for
1736 ENCODE project) file. This file can be loaded to UCSC
1737 browser. Definition of some specific columns are: 5th:
1738 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1739 -log10qvalue, 10th: relative summit position to peak start.
1742 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1743 MACS version 2.0.3 (tag:alpha)
1745 * Rich output with qvalue, fold enrichment, and pileup height
1747 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1750 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1752 Now we have a similiar xls output file as before. The differences
1753 from previous file are:
1755 1. Summit now is absolute summit, instead of relative summit
1757 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1758 pileup at the peak summit;
1759 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1760 5.00 means 1e-5, simple and less confusing.
1761 4. FDR column becomes '-log10(qvalue)' column.
1762 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1763 the values at the peak summit.
1765 * Extra output files
1767 NAME_pqtable.txt contains pvalue and qvalue relationships.
1769 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1770 and -log10qvalue scores in BedGraph format. Nearby regions with
1771 the same value are not merged.
1773 * Separation of FeatIO.py
1775 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1776 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1777 implemented to store pileup, local lambda, pvalue, and qvalue
1778 alltogether in cScoreTrack.pyx.
1780 * Experimental option --half-ext
1782 Suggested by NPS algorithm, I added an experimental option
1783 --half-ext to let MACS only extends ChIP fragment around its
1784 middle point for only 1/2 d.
1786 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1787 MACS version 2.0.2 (tag:alpha)
1791 Add an error check to see if there is no common chromosome names
1792 from treatment file and control file
1794 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1796 Reduce memory usage by removing deepcopy() calls.
1798 * Modify README documents and others.
1800 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1801 MACS Version 2.0.1 (tag:alpha)
1803 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1805 Jie suggested me a brilliant simple method to pileup fragments
1806 into bedGraph track. It works extremely faster than the previous
1807 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1808 large local lambda calculation in MACSv2 now. Now I generate three
1809 bedGraphs for d-size local bias, slocal-size and llocal-size local
1810 bias, and calculate the maximum local bias as local lambda
1813 Minor: add_loc in bedGraphTrackI now can correctly merge the
1814 region with its preceding region if their value are the same.
1818 Add an option to shift control tags before extension. By default,
1819 control tags will be extended to both sides regardless of strand
1822 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1823 MACS Version 2.0.0 (tag:alpha)
1825 * Use bedGraph type to store data internally and externally.
1827 We can have theoretically one-basepair resolution profiles. 10
1828 times smaller in filesize and even smaller after converting to
1829 bigWig for visualization.
1831 * Peak calling process modified. Better peak boundary detection.
1833 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1834 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1835 one will be averaged to d size) Then calculate the maximum value
1836 of these two tracks and a global background, to have a
1837 local-lambda bedGraph.
1839 Use -10log10poisson_pvalue as scores to generate a score track
1840 before peak calling.
1842 A general peak calling based on a score cutoff, min length of peak
1843 and max gap between nearby peaks.
1847 Wiggle file output is removed. Now we only support bedGraph
1848 output. The generation of bedGraph is highly recommended since it
1849 will not cost extra time. In other words, bedGraph generation is
1850 internally run even you don't want to save bedGraphs on disk, due
1851 to the peak calling algorithm in MACS v2.
1855 We now can calculate poisson pvalue in log space so that the score
1856 (-10*log10pvalue) will not have a upper limit of 3100 due to
1857 precision of float number.
1859 * Cython is adopted to speed up Python code.
1861 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1864 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1866 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1867 Version 1.4.0rc2 (Valentine)
1869 * --single-wig option is renamed to --single-profile
1871 * BedGraph output with --bdg or -B option.
1873 The BedGraph output provides 1bp resolution fragment pileup
1874 profile. File size is smaller than wig file. This option can be
1875 combined with --single-profile option to produce a bedgraph file
1876 for the whole genome. This option can also make --space,
1877 --call-subpeaks invalid.
1879 * Fix the description of --shiftsize to correctly state that the
1880 value is 1/2 d (fragment size).
1882 * Fix a bug in the call to __filter_w_control_tags when control is
1885 * Fix a bug on --to-small option. Now it works as expected.
1887 * Fix a bug while counting the tags in candidate peak region, an
1888 extra tag may be included. (Thanks to Jake Biesinger!)
1890 * Fix the bug for the peaks extended outside of chromosome
1891 start. If the minus strand tag goes outside of chromosome start
1892 after extension of d, it will be thrown out.
1894 * Post-process script for a combined wig file:
1896 The "wignorm" command can be called after a full run of MACS14 as
1897 a postprocess. wignorm can calculate the local background from the
1898 control wig file from MACS14, then use either foldchange,
1899 -10*log10(pvalue) from possion test, or difference after asinh
1900 transformation as the score to build a single wig track to
1901 represent the binding strength. This script will take a
1902 significant long time to process.
1904 * --wigextend has been obsoleted.
1906 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1907 Version 1.4.0rc1 (Starry Sky)
1909 * Duplicate reads option
1911 --keep-dup behavior is changed. Now user can specify how many
1912 reads he/she wants to keep at the same genomic location. 'auto' to
1913 let MACS decide the number based on binomial distribution, 'all'
1914 to let MACS keep all reads.
1916 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1918 By default, MACS will now scale the smaller dataset to the bigger
1919 dataset. For instance, if IP has 10 million reads, and Input has 5
1920 million, MACS will double the lambda value calculated from Input
1921 reads while calling BOTH the positive peaks and negative
1922 peaks. This will address the issue caused by unbalanced numbers of
1923 reads from IP and Input. If --to-small is turned on, MACS will
1924 scale the larger dataset to the smaller one. So from now on, if d
1925 is fixed, then the peaks from a MACS call for A vs B should be
1926 identical to the negative peaks from a B vs A.
1928 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1929 Version 1.4.0beta (summer wishes)
1935 The default behavior in the model building step is slightly
1936 changed. When MACS can't find enough pairs to build model
1937 (implemented in alpha version) or the modeled fragment length is
1938 less than 2 times of tag length (implemented in beta version),
1939 MACS will use 2 times of --shiftsize value as fragment length in
1940 the later analysis. --off-auto can turn off this default behavior.
1942 ** Redundant tag filtering
1944 The IO module is rewritten. The redundant tag filtering process
1945 becomes simpler and works as promise. The maximum allowed number
1946 of tags at the exact same location is calculated from the
1947 sequencing depth and genome size using a binomial distribution,
1948 for both TREAMENT and CONTROL separately. ( previously only
1949 TREATMENT is considered ) The exact same location means the same
1950 coordination and the same strand. Then MACS will only keep at most
1951 this number of tags at the exact same location in the following
1952 analysis. An option --keep-dup can let MACS skip the filtering and
1953 keep all the tags. However this may bring in a lot of sequencing
1954 bias, so you may get many false positive peaks.
1956 ** Single wiggle mode
1958 First thing to mention, this is not the score track that I
1959 described before. By default, MACS generates wiggle files for
1960 fragment pileup for every chromosomes separately. When you use
1961 --single-wig option, MACS will generate a single wiggle file for
1962 all the chromosomes so you will get a wig.gz for TREATMENT and
1963 another wig.gz for CONTROL if available.
1965 ** Sniff -- automatic format detection
1967 Now, by default or "-f AUTO", MACS will decide the input file
1968 format automatically. Technically, it will try to read at most
1969 1000 records for the first 10 non-comment lines. If it succeeds,
1970 the format is decided. I recommend not to use AUTO and specify the
1971 right format for your input files, unless you combine different
1972 formats in a single MACS run.
1976 --single-wig and --keep-dup are added. Check previous section in
1977 ChangeLog for detail.
1979 -f (--format) AUTO is now the default option.
1981 --slocal default: 1000
1982 --llocal default: 10000
1986 Setup script will stop the installation if python version is not
1987 python2.6 or python2.7.
1989 Local lambda calculation has been changed back. MACS will check
1990 peak_region, slocal( default 1K) and llocal (default 10K) for the
1991 local bias. The previous 200bps default will cause MACS misses
1992 some peaks where the input bias is very sharp.
1994 sam2bed.py script is corrected.
1996 Relative pos in xls output is fixed.
1998 Parser for ELAND_export is fixed to pass some of the no match
1999 lines. And elandexport2bed.py is fixed too. ( however I can't
2000 guarantee that it works on any eland_export files. )
2002 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2003 Version 1.4.0alpha2 (be smarter)
2007 --gsize now provides shortcuts for common genomes, including
2008 human, mouse, C. elegans and fruitfly.
2010 --llocal now will be 5000 bps if there is no input file, so that
2011 local lambda doesn't overkill enriched binding sites.
2013 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2014 Version 1.4alpha (be smarter)
2018 --tsize option is redesigned. MACS will use the first 10 lines of
2019 the input to decide the tag size. If user specifies --tsize, it
2020 will override the auto decided tsize.
2022 --lambdaset is replaced by --slocal and --llocal which mean the
2023 small local region and large local region.
2025 --bw has no effect on the scan-window size now. It only affects the
2026 paired-peaks model process.
2030 During the model building, MACS will pick out the enriched regions
2031 which are not too high and not too low to build the paired-peak
2032 model. Default the region is from fold 10 to fold 30. If MACS
2033 fails to build the model, by default it will use the nomodel
2034 settings, like shiftsize=100bps, to shift and extend each
2035 tags. This behavior can be turned off by '--off-auto'.
2039 An extra file including all the summit positions are saved in
2040 *_summits.bed file. An option '--call-subpeaks' will invoke
2041 PeakSplitter developed by Mali Salmon to split wide peaks into
2044 * Sniff ( will in beta )
2046 Automatically recognize the input file format, so use can combine
2047 different format in one MACS run.
2049 Not implemented features/TODO:
2051 * Algorithms ( in near future? )
2053 MACS will try to refine the peak boundaries by calculating the
2054 scores for every point in the candidate peak regions. The score
2055 will be the -10*log(10,pvalue) on a local poisson distribution. A
2056 cutoff specified by users (--pvalue) will be applied to find the
2057 precise sub-peaks in the original candidate peak region. Peak
2058 boudaries and peak summits positions will be saved in separate BED
2061 * Single wiggle track ( in near future? )
2063 A single wiggle track will be generated to save the scores within
2064 candidate peak regions in the 10bps resolution. The wiggle file
2065 is in fixedStep format.
2068 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2069 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2073 Fixed typo. FCSTEP -> FESTEP
2077 The 'femax' attribute bug is fixed
2079 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2080 Version 1.3.7 (Oktoberfest)
2082 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2084 Enhancements by Peter Chines:
2086 1. gzip files are supported.
2087 2. when --diag is on, user can set the increment and endpoint for
2088 fold enrichment analysis by setting --fe-step and --fe-max.
2090 Enhancements by Davide Cittaro:
2092 1. BAM and SAM formats are supported.
2093 2. small changes in the header lines of wiggle output.
2096 1. I added --fe-min option;
2097 2. Bowtie ascii output with suffix ".map" is supported.
2101 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2102 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2103 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2104 4. Some "fold change" have been changed to "fold enrichment".
2106 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2107 Version 1.3.6.1 (default parameter change)
2109 * bin/macs, lib/PeakDetect.py
2111 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2112 default. "--futurefdr" is added which can turn on the 'new' method
2113 introduced in 1.3.6. By default it's off.
2117 Fixed a bug. p-value is corrected a little bit.
2120 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2121 Version 1.3.6 (Birthday cake)
2125 "track name" is added to the header of BED output file.
2127 Now the default peak detection method is to consider 5k and 10k
2128 nearby regions in treatment data and peak location, 1k, 5k, and
2129 10k regions in control data to calculate local bias. The old
2130 method can be called through '--old' option.
2132 Information about how many total/unique tags in treatment or
2133 control will be saved in final .xls output.
2135 * lib/IO/__init__.py
2137 ".fa" will be removed from input tag alignment so only the
2138 chromosome names are kept.
2140 WigTrackI class is added for Wiggle like data structure. (not used
2143 The parser for ELAND multi PET files has been fixed. Now the 5'
2144 tag position for a pair will be kept, whereas in the previous
2145 version, the middle points are kept.
2147 * lib/IO/BinKeeper.py
2149 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2150 browser, which can quickly access certain region for values in a
2151 large wiggle like data file. (not used now)
2153 * lib/OptValidator.py
2159 Now the default peak detection method is to consider 5k and 10k
2160 nearby regions in treatment data and peak location, 1k, 5k, and
2161 10k regions in control data to calculate local bias. The old
2162 method can be called through '--old' option.
2164 Two columns have beed added to BED output file. 4th column: peak
2165 name; 5th column: peak score using -10log(10,pvalue) as score.
2169 Add support to build a Mac App through 'setup.py py2app', or a
2170 Windows executable through 'setup.py py2exe'. You need to install
2171 py2app or py2exe package in order to use these functions.
2173 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2174 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2178 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2179 in control data to calculate local lambda for each peak. Peak
2180 calling results will be slightly different with previous version,
2185 Typo fixed, ELANDParser -> ELANDResultParser
2189 Now, modeled d value will be shown on the model figure.
2191 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2192 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2194 * macs, IO/__init__.py, PeakDetect.py
2196 Add support for ELAND multi format. Add support for Pair-End
2197 experiment, in this case, 5'end and 3'end ELAND multi format files
2198 are required for treatment or control data. See 00README file for
2201 Add wigextend option.
2203 Add petdist option for Pair-End Tag experiment, which is the best
2204 distance between 5' and 3' tags.
2208 Fixed a bug which cause the end positions of every peak region
2209 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2213 Fix bugs while generating wiggle files. The start position of
2214 wiggle file is set to 1 instead of 0.
2216 Fix a bug that every 10M bps, signals in the first 'd' range are
2217 lower than actual. ( Thanks Mali Salmon!)
2220 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2221 Version 1.3.3 (wiggle bugs fixed)
2225 Fix bugs while generating wiggle files. 1. 'span=' is added to
2226 'variableStep' line; 2. previously, every 10M bps, the coordinates
2227 were wrongly shifted to the right for 'd' basepairs.
2229 * macs, PeakDetect.py
2231 Add an option to save wiggle files on different resolution.
2233 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2234 Version 1.3.2 (tiny bugs fixed)
2238 Fix 65536 -> 65535. ( Thank Joon)
2242 Improved for binomial function with extra large number. Imported
2243 from Cistrome project.
2247 If treatment channel misses reads in some chromosome included in
2248 control channel, or vice versa, MACS will not exit. (Thank Shaun
2251 Instead, MACS will fake a tag at position -1 when calling
2252 treatment peaks vs control, but will ignore the chromosome while
2253 calling negative peaks.
2255 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2256 Version 1.3.1 (tiny bugs fixed version)
2260 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2261 binomial functions can tolerate large and small numbers.
2265 Parsers now split lines in BED/ELAND file using any
2266 whitespaces. 'track' or 'browser' lines will be regarded as
2267 comment lines. A bug fixed when throwing StrandFormatError. The
2268 maximum redundant tag number at a single position can be no less
2272 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2273 Version 1.3 (naming clarification version)
2275 * Naming clarification changes according to our manuscript:
2277 'frag_len' is changed to 'd'.
2279 'fold_change' is changed to 'fold_enrichment'.
2281 Suggest '--bw' parameter to be determined by users from the real
2284 Maximum FDR is 100% in the output file.
2286 And other clarifications in 00README file and the documents on the
2290 If the redundant tag number at a single position is over 32767,
2291 just remember 32767, instead of raising an overflow exception.
2297 Bug fixed for diagnosis report.
2300 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2305 Poisson distribution CDF and inverse CDF functions are
2306 corrected. They can produce right results even for huge lambda
2307 now. So that the p-value and FDR values in the final excel sheet
2310 IO package now can tolerate some rare cases; ELANDParser in IO
2311 package is fixed. (Thank Bogdan)
2315 Reverse paired peaks in model are rejected. So there will be no
2316 negative 'frag_len'. (Thank Bogdan)
2320 Diagnosis function is completed. Which can output a table file for
2321 users to estimate their sequencing depth.
2324 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2327 * Probe.py is added!
2329 GSL is totally removed from MACS. Instead, I have implemented the
2330 CDF and inverse CDF for poisson and binomial distribution purely
2333 * Constants.py is added!
2335 Organize constants used in MACS in the Constants.py file.
2337 * All other files are modified!
2339 Foldchange calculation is modified. Now the foldchange only be
2340 calculated at the peak summit position instead of the whole peak
2341 region. The values will be higher and more robust than before.
2345 1. MACS can save wiggle format files containing the tag number at
2346 every 10 bp along the genome. Tags are shifted according to our
2347 model before they are calculated.
2349 2. Model building and local lambda calculation can be skipped with
2352 3. A diagnosis report can be generated through '--diag'
2353 option. This report can help you get an assumption about the
2354 sequencing saturation. This funtion is only in beta stage.
2356 4. FDR calculation speed is highly improved.
2358 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2361 * TabIO, PeakModel.py ...
2362 Bug fixed to let MACS tolerate some cases while there is no tag on
2363 either plus strand or minus strand.
2366 Check the version of python. If the version is lower than 2.4,
2367 refuse to install with warning.