1 2021-02-07 Tao Liu <vladimir.liu@gmail.com>
6 1) Speed/memory optimization. Use the cykhash to replace python
7 dictionary. Use buffer (10MB) to read and parse input file (not
8 available for BAM file parser). And many optimization tweaks.
10 2) Code cleanup. Reorganize source codes.
14 4) R wrappers for MACS -- MACSr
16 5) Switch to Github Action for CI, support multi-arch testing
17 including x64, armv7, aarch64, s390x and ppc64le.
19 6) MACS tag-shifting model has been refined. Now it will use a
20 naive peak calling approach to find ALL possible paired peaks at +
21 and - strand, then use all of them to calculate the
22 cross-correlation. (a related bug has been fix #442)
24 7) Call variants in peak regions directly from BAM files. The
25 function was originally developed under code name SAPPER. Now
26 SAPPER has been merged into MACS. Also, `simde` has been added as
27 a submodule in order to support fermi-lite library under non-x64
30 2020-04-11 Tao Liu <vladimir.liu@gmail.com>
35 Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
38 2020-04-10 Tao Liu <vladimir.liu@gmail.com>
43 1) MACS2 has been tested on multiple architectures to make sure it
44 can successfully generate consistent results. Currently the
45 supported architectures are: AMD64, ARM64, i386, PPC64LE, and
46 S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
47 #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
48 and #370. The lesson is that if the project is built on Cython and
49 is aimed at memory efficiency, we should specifically define all
50 int/float types in pyx files such as int8_t or uint32_t using
51 either libc or numpy (c version) instead of relying on Cython
52 types such as short, long, double.
54 2) MACS2 setup script will check numpy and install numpy if
55 necessary. PR #378, issue #364
57 3) `bdgbroadcall` command will correctly add the score column (5th
58 column). The score (5th) column contains 10 times of the average
59 score in the broad region. PR #373, issue #362
61 4) The missing test on `bdgopt` subcommand has been added. PR #363
63 5) The obsolete option `--ratio` from `callpeak` subcommand has
64 been removed. PR #369, issue #366
66 6) Fixed the incorrect description in README on the 'maximum
67 length of broad region is 4 times of d' to 'maximum gap for
68 merging broad regions is 4 times of tag size by default'. PR #380,
73 1) CODE OF CONDUCT document has been added to MACS2 github
76 2019-12-12 Tao Liu <vladimir.liu@gmail.com>
81 1) Speed up MACS2. Some programming tricks and code cleanup. The
82 filter_dup function replaces separate_dups. The later one was
83 implemented for potentially putting back duplicate reads in
84 certain downstream analysis. However such analysis hasn't been
85 implemented. Optimize the speed of writing bedGraph
86 files. Optimize BAM and BAMPE parsing with pointer casting instead
89 2) The comment lines in the headers of BED or SAM files will be
90 correctly skipped. However, MACS2 won't check comment lines in the
95 1) Cutoff-analysis in callpeak command. #341
97 2) Issues related to SAMParser and three ELAND Parsers are
102 1) cmdlinetest script in test/ folder has been updated to: 1. test
103 cutoff-analysis with callpeak cmd; 2. output the 2 lines before
104 and after the error or warning message during tests; 3. output
105 only the first 10 lines if the difference between test result and
106 standard result can be found; 4. prockreport monitor CPU time and
107 memory usage in 1 sec interval -- a bit more accurate.
109 2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
111 2019-10-31 Tao Liu <vladimir.liu@gmail.com>
112 MACS version 2.2.5 (Py3 speed up)
116 1) *Github code only and Not included in MACS2 release* New
117 testing data for performance test. An subsampled ENCODE2 CTCF
118 ChIP-seq dataset, including 5million ChIP reads and 5 million
119 control reads, has been included in the test folder for testing
120 CPU and memory usage (i.e. 5M test). Several related scripts ,
121 including `prockreport` for output cpu memory usage, `pyprofile`
122 and `pyprofile_stat` for debuging and profiling MACS2 codes, have
125 2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
126 The old hashtable.pyx implementation copied from Pandas (very old
127 version) doesn't work well in Python3+Cython. It slows down the
128 pqtable checkup using the identical Cython codes as in
129 v2.1.4. While running 5M test, the `__getitem__` function in the
130 hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
131 148.6s with the same number of calls in MACS2 v2.2.4. As a
132 consequence, the standard python dictionary implementation has
133 replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
134 faster than py2 version, but uses a bit more memory. In general,
135 v2.2.5 can finish 5M reads test in 20% less time than MACS2
136 v2.1.4, but use 15% more memory.
140 1) More Python3 related fixes, e.g. the return value of keys from
144 2019-10-01 Tao Liu <vladimir.liu@gmail.com>
145 MACS version 2.2.4 (Python3)
149 1) First Python3 version MACS2 released.
151 2) Version number 2.2.X will be used for MACS2 in Python3, in
154 3) More comprehensive test.sh script to check the consistency of
155 results from Python2 version and Python3 version.
157 4) Simplify setup.py script since the newest version transparently
158 supports cython. And when cython is not installed by the user,
159 setup.py can still compile using only C codes.
161 5) Fix Signal.pyx to use np.array instead of np.mat.
163 2019-09-30 Tao Liu <vladimir.liu@gmail.com>
168 Github Actions is used together with Travis CI for testing and
175 1) #318 Random score in bdgdiff output. It turns out the sum_v is
176 not initialized as 0 before adding. Potential bugs are fixed in
177 other functions in ScoreTrack and CallPeakUnit codes.
179 2) #321 Cython dependency in setup.py script is removed. And place
180 'cythonzie' call to the correct position.
182 3) A typo is fixed in Github Actions script.
184 2019-09-19 Tao Liu <vladimir.liu@gmail.com>
189 1) Support Docker auto-deploy. PR #309
191 2) Support Travis CI auto-testing, update unit-testing
192 scripts, and enable subcommand testing on small datasets.
194 3) Update README documents. #297 PR #306
196 4) `cmbreps` supports more than 2 replicates. Merged from PR #304
197 @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
199 5) `--d-min` option is added in `callpeak` and `predictd`, to
200 exclude predictions of fragment size smaller than the given
201 value. Merged from PR #267 @shouldsee.
203 6) `--buffer-size` option is added in `predictd`, `filterdup`,
204 `pileup` and `refinepeak` subcommands. Users can use this option
205 to decrease memory usage while there are a large number of contigs
206 in the data. Also, now `callpeak`, `predictd`, `filterdup`,
207 `pileup` and `refinepeak` will suggest users to tweak
208 `--buffer-size` while catching a MemoryError. #313 PR #314
212 1) #265 Fixed a bug where the pseudocount hasn't been applied
213 while calculating p-value score in ScoreTrack object.
215 2) Fixed bdgbroadcall so that it will report those broad peaks
216 without strong peak inside, a consistent behavior as `callpeak
219 3) Rename COPYING to LICENSE.
221 2018-10-17 Tao Liu <vladimir.liu@gmail.com>
226 1) Added missing BEDPE support. And enable the support for BAMPE
227 and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
228 subcommands. When format is BAMPE or BEDPE, The 'pileup' command
229 will pile up the whole fragment defined by mapping locations of
230 the left end and right end of each read pair. Thank @purcaro
232 2) Added options to callpeak command for tweaking max-gap and
233 min-len during peak calling. Thank @jsh58!
235 3) The callpeak option "--to-large" option is replaced with
238 4) The randsample option "-t" has been replaced with "-i".
242 1) Fixed memory issue related to #122 and #146
244 2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
246 3) Fixed a bug while setting commandline qvalue cutoff.
248 4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
250 5) Fixed the calculation of average fragment length for paired-end
253 6) Fixed bugs caused by khash while computing p/q-value and log
254 likelihood ratios. Thank @jsh58
256 7) More spelling tweaks in source code. Thank @mr-c
258 2016-03-09 Tao Liu <vladimir.liu@gmail.com>
259 MACS version 2.1.1 20160309
263 * Fixed spelling. Merged pull request #120. Thank @mr-c!
265 * Change filtering criteria for reading BAM/SAM files
267 Related to callpeak and filterdup commands. Now the
268 reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
269 still be read although MACS2 may decide them as duplicates
270 later. Related to old issue #33. Sorry I forgot to address it for
273 2016-02-26 Tao Liu <vladimir.liu@gmail.com>
274 MACS version 2.1.1 20160226 (tag:rc Zhengyue)
278 1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
279 the former option is not supported by older GCC. Related to issues
282 2) Issue #108 is fixed. If no peak can be found in a chromosome,
283 the PeakIO won't throw an error.
289 a) A more flexible format, BEDPE, is supported. Now users can
290 define the left and right position of the ChIPed fragment, and
291 MACS2 will skip model building and directly pileup the
292 fragments. Related to issue #112.
294 b) The 'tempdir' can be specified, to save cached pileup
295 tracks. Originially, the temporary files were stored in
296 /tmp. Thank @daler! Related to issues #97 and #105.
300 New operations are added, to calculate the maximum or minimum value between
301 values in BEDGRAPH and given value.
305 New method is added, to calculate the maximum value between values
306 defined in two BEDGRAPH files.
308 2015-12-22 Tao Liu <vladimir.liu@gmail.com>
309 MACS version 2.1.0 20151222 (tag:rc Dongzhi)
313 1) Fix a bug while dealing with some chromosomes only containing
314 one read (pair). The size of dup_plus/dup_minus arrays after
315 filtering dups should +1.
317 2) Fix a bug related to the broad peak calling function in
318 previous versions. The gaps were miscalculated, so segmented weak
319 broad calls may be reported, and sometimes you would see peaks
320 with lower than cutoff values in the output files.
322 3) "Potentially" Fixed issue #105 on temporary cache files, need
326 2015-07-31 Tao Liu <vladimir.liu@gmail.com>
327 MACS version 2.1.0 20150731 (tag:rc)
331 1) Fixed issue #76: information about broad/narrow cutoff will be
334 2) Fixed issue #79: bdgopt extparam option is fixed.
336 3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
337 for filterdup command.
339 4) Fixed issue #78, #88 and similar issue reported in MACS google
340 group: MACS2 now can correctly deal with multiple alignment files
341 for -t or -c. The 'finalize' function will be correctly
342 called. Multiple files option is enabled for filterdup,
343 randsample, predictd, pileup and refinepeak commands.
345 5) A related issue to #88, when BAMPE mode is used, PE pairs will
346 be sorted by leftmost then rightmost ends.
348 6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
349 array. This will cause 'callpeak --nolambda' hang forever while
350 calculating pvalues and qvalues.
352 2015-04-20 Tao Liu <vladimir.liu@gmail.com>
353 MACS version 2.1.0 20150420 (tag:rc)
357 1) bdgopt: some convenient functions to modify bedGraph files.
359 2) cmbreps: Combine scores from two replicates. Including three
360 methods: 1. take the maximum; 2. take the average; 3. use Fisher's
361 method to combine two p-value scores. After that, user can use
362 bdgpeakcall to call peaks on combined scores.
366 1) callpeak and bdgpeakcall now can try to analyze the
367 relationship between p-values and number/length of peaks then
368 generate a summary to help users decide an appropriate cutoff.
370 2) callpeak now can accept fold-enrichment cutoff as a filter for
375 Now MACS2 runs about 3X as fast as previous version. Trade
376 clean python codes for speed... Now while processing 50M ChIP vs
377 50M control, it will take only 10 minutes.
381 1) Sampling function in BAMPE mode.
383 2) Callpeak while there are >= 2 input files for -t or -c.
385 3) While reading BAM/SAM, those secondary or supplementary
386 alignments will be correctly skipped.
388 4) Fixed issue #33: Explanation is added to callpeak --keep-dup
389 option that MACS2 will discard those SAM/BAM alignments with bit
390 1024 no matter how --keep-dup is set.
392 5) Fixed issue #49: setuptools is used intead of distutils
394 6) Fixed issue #51: fix the problem when using --trackline
395 argument when control file is absent.
397 7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
398 read mapped to minus strand. Previous implementation will find
399 incorrect 5' end if there is indel in alignment.
401 8) Fixed issue #56: An incorrect sorting method used for BAMPE
402 mode which will cause incorrect filtering of duplicated reads. Now
405 9) Issue #63: Merged from jayhesselberth@github, extsize now can
408 10) Issue #71: Merged from aertslab@github, close file descriptor
409 after creating them with mkstemp().
411 2014-06-16 Tao Liu <vladimir.liu@gmail.com>
412 MACS version 2.1.0 20140616 (tag:rc)
416 "--ratio" is added to manually assign the scaling factor of ChIP
417 vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
418 implementing the patch file!
420 "--shift" is added to move cutting ends (5' end of reads) around,
421 in order to process DNAse-Seq data, e.g., use "--shift -100
422 --extsize 200" to get 200bps fragments around 5' ends. For general
423 ChIP-Seq data analysis, this option should be always set as
424 0. Thank Xi Chen and Anshul Kundaje for the discussions in user
427 ** Do not output negative fragment size from cross-correlation
428 analysis. Thank Alvin Qin for the feedback!
430 ** --half-ext and --control-shift are removed. For complex read
431 shifting and extending, combine '--shift' and '--extsize'
432 options. For comparing two conditions, use 'bdgdiff' module
435 ** a bug is fixed to output the last pileup value in bdg file
440 A 'dry-run' option is added to only output numbers, including the
441 number of allowed duplicates, the total number of reads before and
442 after filtering duplicates and the estimated duplication
443 rate. Thank John Urban for the suggestion!
446 2013-12-16 Tao Liu <vladimir.liu@gmail.com>
447 MACS version 2.0.10 20131216 (tag:alpha)
451 * We changed license from Artistic License to 3-clauses BSD license.
453 Yes. Simpler the better.
455 * Process paired-end data with "-f BAMPE" without control
457 * GappedPeak output for --broad option has been fixed again to be
458 consistent with official UCSC format. We add 1bp pseudo-block to
459 left and/or right of broad region when necessary, so that you can
460 virtualize the regions without strong enrichment inside
461 successfully. In downstream analysis except for virtualization,
462 you may need to remove all 1bps blocks from gappedPeak file.
464 * diffpeak subcommand is temporarily disabled. Till we
467 2013-10-28 Tao Liu <vladimir.liu@gmail.com>
468 MACS version 2.0.10 20131028 (tag:alpha)
470 * callpeak --call-summits improvement
472 The smoothing window length has been fixed as fragment length
473 instead of short read length. The larger smoothing window will
474 grant better smoothing results and better sub-peak summits
477 * --outdir and --ofile options for almost all commands
479 Thank Björn Grüning for initially implementing these options!
480 Now, MACS2 will save results into a specified
481 directory by '--outdir' option, and/or save result into a
482 specified file by '--ofile' option. Note, in case '--ofile' is
483 available for a subcommand, '-o' now has been adjusted to be the
484 same as '--ofile' instead of '--o-prefix'.
486 Here is the list of changes. For more detail, use 'macs2 xxx -h'
489 ** callpeak: --outdir
490 ** diffpeak: Not implemented
491 ** bdgpeakcall: --outdir and --ofile
492 ** bdgbroadcall: --outdir and --ofile
493 ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
494 and the order of arguments for --ofile must be the same as for -m.
495 ** bdgdiff: --outdir and --ofile
496 ** filterdup: --outdir
498 ** randsample: --outdir
499 ** refinepeak: --outdir and --ofile
502 2013-09-15 Tao Liu <vladimir.liu@gmail.com>
503 MACS version 2.0.10 20130915 (tag:alpha)
505 * callpeak Added a new option --buffer-size
507 This option is to tweak a previously hidden parameter that
508 controls the steps to increase array size for storing alignment
509 information. While in some rare cases, the number of
510 chromosomes/contigs/scaffolds is huge, the original default
511 setting will cause a huge memory waste. In these cases, we
512 recommend to decrease --buffer-size (e.g., 1000) to save memory,
513 although the decrease will slow process to read alignment files.
515 * an optimization to speed up pvalue-qvalue statistics
517 Previously, it took a hour to prepare p-q-table for 65M vs 65M
518 human TF library, and now it will take 10 minutes. It was due to a
519 single line of code to get a value from a numpy array ...
523 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
524 MACS version 2.0.10 20130731 (tag:alpha)
526 * callpeak --call-summits
528 Fix bugs causing callpeak --call-summits option generating extra
529 number of peaks and inconsistent peak boundaries comparing to
530 default option. Thank Ben Levinson!
534 Fix bugs causing bdgcmp output logLR all in positive values. Now
535 'depletion' can be correctly represented as negative values.
539 Fix the behavior of bdgdiff module. Now it can take four
540 bedGraph files, then use logLR as cutoff to call differential
541 regions. Check command line of bdgdiff for detail.
543 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
544 MACS version 2.0.10 20130713 (tag:alpha)
546 * fix bugs while output broadPeak and gappedPeak.
548 Note. Those weak broad regions without any strong enrichment
549 regions inside won't be saved in gappedPeak file.
551 * bdgcmp -T and -C are merged into -S and description is updated.
553 Now, you can use it to override SPMR values in your input for
554 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
555 statistics will cause weird results ( in most cases, lower
556 significancy), and won't be consistent with MACS2 callpeak
557 behavior. So if you have SPMR bedGraphs, input the smaller/larger
558 sample size in MILLION according to 'callpeak --to-large' option.
560 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
561 MACS version 2.0.10 20130710 (tag:alpha)
563 * fix BED style output format of callpeak module:
565 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
566 the output. Old BED format file won't be saved.
568 2) with --broad: broadPeak (BED6+3) for broad region and
569 gappedPeak (BED12+3) for chained enriched regions will be the
570 output. Old BED format, narrowPeak format, summit file won't be
573 * bdgcmp now can accept list of methods to calculate scores. So
574 you can run it once to generate multiple types of scores. Thank
575 Jon Urban for this suggestion!
577 * C codes are re-generated through Cython 0.19.1.
579 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
580 MACS version 2.0.10 20130520 (tag:alpha)
582 * broad peak calling modules are modified in order to report all
583 relexed regions even there is no strong enrichment inside.
585 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
586 MACS version 2.0.10 20130501 (tag:alpha)
588 * Memory usage is decreased to about 1/4-1/5 of previous usage
589 Now, the internal data structure and algorithm are both
590 re-organized, so that intermediate data wouldn't be saved in
591 memory. Intead they will be calculated on the fly. New MACS2 will
592 spend longer time (1.5 to 2 times) however it will use less memory
593 so can be more usable on small mem servers.
595 * --seed option is added to callpeak and randsample commands
596 Thank Mathieu Gineste for this suggestion!
598 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
599 MACS version 2.0.10 20130306 (tag:alpha)
601 * diffpeak module New module to detect differential binding sites
602 with more statistics.
604 * Introduced --refine-peaks
605 Calculates reads balancing to refine peak summits
607 * Ouput file names prefix
608 Correct encodePeak to narrowPeak, broadPeak to bed12.
610 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
611 MACS version 2.0.10 (tag:alpha not released)
613 * Introduced BAMPEParser
614 Reads PE data directly, requires bedtools for now
616 * Introduced --call-summits
617 Uses signal processing methods to call overlapping peaks
619 * Added --no-trackline
620 By default, files have descriptive tracklines now
622 * new refinepeak command (experimental)
623 This new function will use a similar method in SPP (wtd), to
624 analyze raw tag distribution in peak region, then redefine the
625 peak summit where plus and minus tags are evenly distributed
628 * Changes to output *
629 cPeakDetect.pyx has full support for new print/write methods and
630 --call-peaks, BAMPEParser, and use of paired-end data
632 * Parser optimization
634 cParser.pyx is rewritten to use io.BufferedReader to speed
635 up. Speed is doubled.
637 Code is reorganized -- most of functions are inherited from
640 * Use cross-correlation to calculate fragment size
642 First, all pairs will be used in prediction for fragment
643 size. Previously, only no more than 1000 pairs are used. Second,
644 cross-correlation is used to find the best phase difference
645 between + and - tag pileups.
647 * Speed up p-value and q-value calculation
649 This part is ten times faster now. I am using a dictionary to
650 cache p-value results from Poisson CDF function. A bit more memory
651 will be used to increase speed. I hope this dictionary would not
652 explode since the possible pairs of ChIP signal and control lambda
653 are hugely redundant. Also, I rewrited part of q-value
656 * Speed up peak detection
658 This part is about hundred of times faster now. Optimizations
659 include using Numpy functions as much as possible, and making loop
660 body as small as possible.
662 * Post-processing on differential calls
664 After macs2diff finds differential binding sites between two
665 conditions, it will try to annotate the peak calls from one of two
666 conditions, describe the changes ...
668 * Fragment size prediction in macs2diff
670 Now by default, macs2diff will try to use the average fragment
671 size from both condition 1 and condition 2 for tag extension and
672 peak calling. Previously, by default, it will use different sizes
673 unless --nomodel is specified.
675 Technically, I separate model building processes out. So macs2diff
676 will build fragment sizes for condition 1 and 2 in parallel (2
677 processes maximum), then perform 4-way comparisons in parallel (4
682 Combine two p/qscore tracks together. At regions where condition 1
683 is higher than condition 2, score would be positive, otherwise,
686 * SAMParser and BAMParser
688 Bug fixed for paired-end sequencing data.
692 Fixed a bug while calling peaks from BedGraph file. It previously
693 mistakenly output same peaks multiple times at the end of
696 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
697 MACS version 2.0.9 (tag:alpha)
699 * Auto fixation on predicted d is turned off by default!
701 Previous --off-auto is now default. MACS will not automatically
702 fix d less than 2 times of tag size according to
703 --shiftsize. While tag size is getting longer nowadays, it would
704 be easier to have d less than 2 times of tag size, however d may
705 still be meaningful and useful. Please judge it using your own
710 Now, the default scaling while treatment and input are unbalanced
711 has been adjusted. By default, larger sample will be scaled down
712 linearly to match the smaller sample. In this way, background
713 noise will be reduced more than real signals, so we expect to have
714 more specific results than the other way around (i.e. --to-large
717 Also, an alternative option to randomly sample larger data
718 (--down-sample) is provided to replace default linear
719 scaling. However, this option will cause results irresproducible,
724 A new script 'randsample' is added, which can randomly sample
725 certain percentage or number of tags.
729 Now, MACS will decide peak summits according to pileup height
730 instead of qvalue scores. In this way, the summit may be more
735 MACS calculate qvalue scores as differential scores. When compare
736 two conditions (saying A and B), the maximum qscore for comparing
737 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
738 will be computed. If maxqscore_a2b is bigger, the diff score is
739 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
741 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
742 MACS version 2.0.8 (tag:alpha)
744 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
746 New script bdgbroadcall and the extra option '--broad' for macs2
747 script, can be used to call broad regions with a loose cutoff to
748 link nearby significant regions. The output is represented as
751 * MACS2/IO/cScoreTrack.pyx
753 Fix q-value calculation to generate forcefully monotonic values.
755 * bin/eland*2bed, bin/sam2bed and bin/filterdup
757 They are combined to one more powerful script called
758 "filterdup". The script filterdup can filter duplicated reads
759 according to sequencing depth and genome size. The script can also
760 convert any format supported by MACS to BED format.
762 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
763 MACS version 2.0.7 (tag:alpha)
765 * bin/macsdiff renamed to bin/bdgdiff
767 Now this script will work as a low-level finetuning tool as bdgcmp
772 A new script to take treatment and control files from two
773 condition, calculate fragment size, use local poisson to get
774 pvalues and BH process to get qvalues, then combine 4-ways result
775 to call differential sites.
777 This script can use upto 4 cpus to speed up 4-ways calculation. (
778 I am trying multiprocessing in python. )
780 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
781 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
782 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
784 All above files are modified for the new macs2diff script.
786 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
788 Now q-value 0.01 is the default cutoff. If -p is specified,
789 p-value cutoff will be used instead.
791 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
792 MACS version 2.0.6 (tag:alpha)
796 A script to call differential regions. A naive way is introduced
797 to find the regions where:
799 1. signal from condition 1 is larger than input 1 and condition 2 --
800 unique region in condition 1;
801 2. signal from condition 2 is larger than input 2 and condition 1
802 -- unique region in condition 2;
803 3. signal from condition 1 is larger than input 1, signal from
804 condition 2 is larger than input 2, however either signal from
805 condition 1 or 2 is not larger than the other.
807 Here 'larger' means the pvalue or qvalue from a Poisson test is
808 under certain cutoff.
810 (I will make another script to wrap up mulitple scripts for
811 differential calling)
813 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
814 MACS version 2.0.5 (tag:alpha)
816 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
819 Use hash to store peak information. Add back the feature to deal
820 with data without control.
822 Fix bug which incorrectly allows small peaks at the end of
825 * bin/bdgpeakcall, bin/bdgcmp
827 Fix bugs. bdgpeakcall can output encodePeak format.
829 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
830 MACS version 2.0.4 (tag:alpha)
834 Fix a bug, correctly assign lambda_bg while --to-small is
835 set. Thanks Junya Seo!
837 Add rank and num of bp columns to pvalue-qvalue table.
841 Fix bugs to correctly deal with peakless chromosomes. Thanks
844 Use AFDR for independent tests instead.
848 Now MACS can output peak coordinates together with pvalue, qvalue,
849 summit positions in a single encodePeak format (designed for
850 ENCODE project) file. This file can be loaded to UCSC
851 browser. Definition of some specific columns are: 5th:
852 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
853 -log10qvalue, 10th: relative summit position to peak start.
856 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
857 MACS version 2.0.3 (tag:alpha)
859 * Rich output with qvalue, fold enrichment, and pileup height
861 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
864 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
866 Now we have a similiar xls output file as before. The differences
867 from previous file are:
869 1. Summit now is absolute summit, instead of relative summit
871 2. 'Pileup' is previous 'tag' column. It's the extended fragment
872 pileup at the peak summit;
873 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
874 5.00 means 1e-5, simple and less confusing.
875 4. FDR column becomes '-log10(qvalue)' column.
876 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
877 the values at the peak summit.
881 NAME_pqtable.txt contains pvalue and qvalue relationships.
883 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
884 and -log10qvalue scores in BedGraph format. Nearby regions with
885 the same value are not merged.
887 * Separation of FeatIO.py
889 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
890 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
891 implemented to store pileup, local lambda, pvalue, and qvalue
892 alltogether in cScoreTrack.pyx.
894 * Experimental option --half-ext
896 Suggested by NPS algorithm, I added an experimental option
897 --half-ext to let MACS only extends ChIP fragment around its
898 middle point for only 1/2 d.
900 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
901 MACS version 2.0.2 (tag:alpha)
905 Add an error check to see if there is no common chromosome names
906 from treatment file and control file
908 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
910 Reduce memory usage by removing deepcopy() calls.
912 * Modify README documents and others.
914 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
915 MACS Version 2.0.1 (tag:alpha)
917 * cPileup.pyx, cPeakDetect.pyx and peak calling process
919 Jie suggested me a brilliant simple method to pileup fragments
920 into bedGraph track. It works extremely faster than the previous
921 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
922 large local lambda calculation in MACSv2 now. Now I generate three
923 bedGraphs for d-size local bias, slocal-size and llocal-size local
924 bias, and calculate the maximum local bias as local lambda
927 Minor: add_loc in bedGraphTrackI now can correctly merge the
928 region with its preceding region if their value are the same.
932 Add an option to shift control tags before extension. By default,
933 control tags will be extended to both sides regardless of strand
936 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
937 MACS Version 2.0.0 (tag:alpha)
939 * Use bedGraph type to store data internally and externally.
941 We can have theoretically one-basepair resolution profiles. 10
942 times smaller in filesize and even smaller after converting to
943 bigWig for visualization.
945 * Peak calling process modified. Better peak boundary detection.
947 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
948 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
949 one will be averaged to d size) Then calculate the maximum value
950 of these two tracks and a global background, to have a
951 local-lambda bedGraph.
953 Use -10log10poisson_pvalue as scores to generate a score track
956 A general peak calling based on a score cutoff, min length of peak
957 and max gap between nearby peaks.
961 Wiggle file output is removed. Now we only support bedGraph
962 output. The generation of bedGraph is highly recommended since it
963 will not cost extra time. In other words, bedGraph generation is
964 internally run even you don't want to save bedGraphs on disk, due
965 to the peak calling algorithm in MACS v2.
969 We now can calculate poisson pvalue in log space so that the score
970 (-10*log10pvalue) will not have a upper limit of 3100 due to
971 precision of float number.
973 * Cython is adopted to speed up Python code.
975 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
978 * Replaced with a newest WigTrackI class and fixed the wignorm script.
980 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
981 Version 1.4.0rc2 (Valentine)
983 * --single-wig option is renamed to --single-profile
985 * BedGraph output with --bdg or -B option.
987 The BedGraph output provides 1bp resolution fragment pileup
988 profile. File size is smaller than wig file. This option can be
989 combined with --single-profile option to produce a bedgraph file
990 for the whole genome. This option can also make --space,
991 --call-subpeaks invalid.
993 * Fix the description of --shiftsize to correctly state that the
994 value is 1/2 d (fragment size).
996 * Fix a bug in the call to __filter_w_control_tags when control is
999 * Fix a bug on --to-small option. Now it works as expected.
1001 * Fix a bug while counting the tags in candidate peak region, an
1002 extra tag may be included. (Thanks to Jake Biesinger!)
1004 * Fix the bug for the peaks extended outside of chromosome
1005 start. If the minus strand tag goes outside of chromosome start
1006 after extension of d, it will be thrown out.
1008 * Post-process script for a combined wig file:
1010 The "wignorm" command can be called after a full run of MACS14 as
1011 a postprocess. wignorm can calculate the local background from the
1012 control wig file from MACS14, then use either foldchange,
1013 -10*log10(pvalue) from possion test, or difference after asinh
1014 transformation as the score to build a single wig track to
1015 represent the binding strength. This script will take a
1016 significant long time to process.
1018 * --wigextend has been obsoleted.
1020 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1021 Version 1.4.0rc1 (Starry Sky)
1023 * Duplicate reads option
1025 --keep-dup behavior is changed. Now user can specify how many
1026 reads he/she wants to keep at the same genomic location. 'auto' to
1027 let MACS decide the number based on binomial distribution, 'all'
1028 to let MACS keep all reads.
1030 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1032 By default, MACS will now scale the smaller dataset to the bigger
1033 dataset. For instance, if IP has 10 million reads, and Input has 5
1034 million, MACS will double the lambda value calculated from Input
1035 reads while calling BOTH the positive peaks and negative
1036 peaks. This will address the issue caused by unbalanced numbers of
1037 reads from IP and Input. If --to-small is turned on, MACS will
1038 scale the larger dataset to the smaller one. So from now on, if d
1039 is fixed, then the peaks from a MACS call for A vs B should be
1040 identical to the negative peaks from a B vs A.
1042 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
1043 Version 1.4.0beta (summer wishes)
1049 The default behavior in the model building step is slightly
1050 changed. When MACS can't find enough pairs to build model
1051 (implemented in alpha version) or the modeled fragment length is
1052 less than 2 times of tag length (implemented in beta version),
1053 MACS will use 2 times of --shiftsize value as fragment length in
1054 the later analysis. --off-auto can turn off this default behavior.
1056 ** Redundant tag filtering
1058 The IO module is rewritten. The redundant tag filtering process
1059 becomes simpler and works as promise. The maximum allowed number
1060 of tags at the exact same location is calculated from the
1061 sequencing depth and genome size using a binomial distribution,
1062 for both TREAMENT and CONTROL separately. ( previously only
1063 TREATMENT is considered ) The exact same location means the same
1064 coordination and the same strand. Then MACS will only keep at most
1065 this number of tags at the exact same location in the following
1066 analysis. An option --keep-dup can let MACS skip the filtering and
1067 keep all the tags. However this may bring in a lot of sequencing
1068 bias, so you may get many false positive peaks.
1070 ** Single wiggle mode
1072 First thing to mention, this is not the score track that I
1073 described before. By default, MACS generates wiggle files for
1074 fragment pileup for every chromosomes separately. When you use
1075 --single-wig option, MACS will generate a single wiggle file for
1076 all the chromosomes so you will get a wig.gz for TREATMENT and
1077 another wig.gz for CONTROL if available.
1079 ** Sniff -- automatic format detection
1081 Now, by default or "-f AUTO", MACS will decide the input file
1082 format automatically. Technically, it will try to read at most
1083 1000 records for the first 10 non-comment lines. If it succeeds,
1084 the format is decided. I recommend not to use AUTO and specify the
1085 right format for your input files, unless you combine different
1086 formats in a single MACS run.
1090 --single-wig and --keep-dup are added. Check previous section in
1091 ChangeLog for detail.
1093 -f (--format) AUTO is now the default option.
1095 --slocal default: 1000
1096 --llocal default: 10000
1100 Setup script will stop the installation if python version is not
1101 python2.6 or python2.7.
1103 Local lambda calculation has been changed back. MACS will check
1104 peak_region, slocal( default 1K) and llocal (default 10K) for the
1105 local bias. The previous 200bps default will cause MACS misses
1106 some peaks where the input bias is very sharp.
1108 sam2bed.py script is corrected.
1110 Relative pos in xls output is fixed.
1112 Parser for ELAND_export is fixed to pass some of the no match
1113 lines. And elandexport2bed.py is fixed too. ( however I can't
1114 guarantee that it works on any eland_export files. )
1116 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
1117 Version 1.4.0alpha2 (be smarter)
1121 --gsize now provides shortcuts for common genomes, including
1122 human, mouse, C. elegans and fruitfly.
1124 --llocal now will be 5000 bps if there is no input file, so that
1125 local lambda doesn't overkill enriched binding sites.
1127 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
1128 Version 1.4alpha (be smarter)
1132 --tsize option is redesigned. MACS will use the first 10 lines of
1133 the input to decide the tag size. If user specifies --tsize, it
1134 will override the auto decided tsize.
1136 --lambdaset is replaced by --slocal and --llocal which mean the
1137 small local region and large local region.
1139 --bw has no effect on the scan-window size now. It only affects the
1140 paired-peaks model process.
1144 During the model building, MACS will pick out the enriched regions
1145 which are not too high and not too low to build the paired-peak
1146 model. Default the region is from fold 10 to fold 30. If MACS
1147 fails to build the model, by default it will use the nomodel
1148 settings, like shiftsize=100bps, to shift and extend each
1149 tags. This behavior can be turned off by '--off-auto'.
1153 An extra file including all the summit positions are saved in
1154 *_summits.bed file. An option '--call-subpeaks' will invoke
1155 PeakSplitter developed by Mali Salmon to split wide peaks into
1158 * Sniff ( will in beta )
1160 Automatically recognize the input file format, so use can combine
1161 different format in one MACS run.
1163 Not implemented features/TODO:
1165 * Algorithms ( in near future? )
1167 MACS will try to refine the peak boundaries by calculating the
1168 scores for every point in the candidate peak regions. The score
1169 will be the -10*log(10,pvalue) on a local poisson distribution. A
1170 cutoff specified by users (--pvalue) will be applied to find the
1171 precise sub-peaks in the original candidate peak region. Peak
1172 boudaries and peak summits positions will be saved in separate BED
1175 * Single wiggle track ( in near future? )
1177 A single wiggle track will be generated to save the scores within
1178 candidate peak regions in the 10bps resolution. The wiggle file
1179 is in fixedStep format.
1182 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
1183 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1187 Fixed typo. FCSTEP -> FESTEP
1191 The 'femax' attribute bug is fixed
1193 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1194 Version 1.3.7 (Oktoberfest)
1196 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1198 Enhancements by Peter Chines:
1200 1. gzip files are supported.
1201 2. when --diag is on, user can set the increment and endpoint for
1202 fold enrichment analysis by setting --fe-step and --fe-max.
1204 Enhancements by Davide Cittaro:
1206 1. BAM and SAM formats are supported.
1207 2. small changes in the header lines of wiggle output.
1210 1. I added --fe-min option;
1211 2. Bowtie ascii output with suffix ".map" is supported.
1215 1. --nolambda bug is fixed. ( reported by Martin in JHU )
1216 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1217 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1218 4. Some "fold change" have been changed to "fold enrichment".
1220 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
1221 Version 1.3.6.1 (default parameter change)
1223 * bin/macs, lib/PeakDetect.py
1225 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1226 default. "--futurefdr" is added which can turn on the 'new' method
1227 introduced in 1.3.6. By default it's off.
1231 Fixed a bug. p-value is corrected a little bit.
1234 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
1235 Version 1.3.6 (Birthday cake)
1239 "track name" is added to the header of BED output file.
1241 Now the default peak detection method is to consider 5k and 10k
1242 nearby regions in treatment data and peak location, 1k, 5k, and
1243 10k regions in control data to calculate local bias. The old
1244 method can be called through '--old' option.
1246 Information about how many total/unique tags in treatment or
1247 control will be saved in final .xls output.
1249 * lib/IO/__init__.py
1251 ".fa" will be removed from input tag alignment so only the
1252 chromosome names are kept.
1254 WigTrackI class is added for Wiggle like data structure. (not used
1257 The parser for ELAND multi PET files has been fixed. Now the 5'
1258 tag position for a pair will be kept, whereas in the previous
1259 version, the middle points are kept.
1261 * lib/IO/BinKeeper.py
1263 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1264 browser, which can quickly access certain region for values in a
1265 large wiggle like data file. (not used now)
1267 * lib/OptValidator.py
1273 Now the default peak detection method is to consider 5k and 10k
1274 nearby regions in treatment data and peak location, 1k, 5k, and
1275 10k regions in control data to calculate local bias. The old
1276 method can be called through '--old' option.
1278 Two columns have beed added to BED output file. 4th column: peak
1279 name; 5th column: peak score using -10log(10,pvalue) as score.
1283 Add support to build a Mac App through 'setup.py py2app', or a
1284 Windows executable through 'setup.py py2exe'. You need to install
1285 py2app or py2exe package in order to use these functions.
1287 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
1288 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1292 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1293 in control data to calculate local lambda for each peak. Peak
1294 calling results will be slightly different with previous version,
1299 Typo fixed, ELANDParser -> ELANDResultParser
1303 Now, modeled d value will be shown on the model figure.
1305 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
1306 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1308 * macs, IO/__init__.py, PeakDetect.py
1310 Add support for ELAND multi format. Add support for Pair-End
1311 experiment, in this case, 5'end and 3'end ELAND multi format files
1312 are required for treatment or control data. See 00README file for
1315 Add wigextend option.
1317 Add petdist option for Pair-End Tag experiment, which is the best
1318 distance between 5' and 3' tags.
1322 Fixed a bug which cause the end positions of every peak region
1323 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1327 Fix bugs while generating wiggle files. The start position of
1328 wiggle file is set to 1 instead of 0.
1330 Fix a bug that every 10M bps, signals in the first 'd' range are
1331 lower than actual. ( Thanks Mali Salmon!)
1334 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
1335 Version 1.3.3 (wiggle bugs fixed)
1339 Fix bugs while generating wiggle files. 1. 'span=' is added to
1340 'variableStep' line; 2. previously, every 10M bps, the coordinates
1341 were wrongly shifted to the right for 'd' basepairs.
1343 * macs, PeakDetect.py
1345 Add an option to save wiggle files on different resolution.
1347 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
1348 Version 1.3.2 (tiny bugs fixed)
1352 Fix 65536 -> 65535. ( Thank Joon)
1356 Improved for binomial function with extra large number. Imported
1357 from Cistrome project.
1361 If treatment channel misses reads in some chromosome included in
1362 control channel, or vice versa, MACS will not exit. (Thank Shaun
1365 Instead, MACS will fake a tag at position -1 when calling
1366 treatment peaks vs control, but will ignore the chromosome while
1367 calling negative peaks.
1369 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
1370 Version 1.3.1 (tiny bugs fixed version)
1374 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1375 binomial functions can tolerate large and small numbers.
1379 Parsers now split lines in BED/ELAND file using any
1380 whitespaces. 'track' or 'browser' lines will be regarded as
1381 comment lines. A bug fixed when throwing StrandFormatError. The
1382 maximum redundant tag number at a single position can be no less
1386 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
1387 Version 1.3 (naming clarification version)
1389 * Naming clarification changes according to our manuscript:
1391 'frag_len' is changed to 'd'.
1393 'fold_change' is changed to 'fold_enrichment'.
1395 Suggest '--bw' parameter to be determined by users from the real
1398 Maximum FDR is 100% in the output file.
1400 And other clarifications in 00README file and the documents on the
1404 If the redundant tag number at a single position is over 32767,
1405 just remember 32767, instead of raising an overflow exception.
1411 Bug fixed for diagnosis report.
1414 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
1419 Poisson distribution CDF and inverse CDF functions are
1420 corrected. They can produce right results even for huge lambda
1421 now. So that the p-value and FDR values in the final excel sheet
1424 IO package now can tolerate some rare cases; ELANDParser in IO
1425 package is fixed. (Thank Bogdan)
1429 Reverse paired peaks in model are rejected. So there will be no
1430 negative 'frag_len'. (Thank Bogdan)
1434 Diagnosis function is completed. Which can output a table file for
1435 users to estimate their sequencing depth.
1438 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
1441 * Probe.py is added!
1443 GSL is totally removed from MACS. Instead, I have implemented the
1444 CDF and inverse CDF for poisson and binomial distribution purely
1447 * Constants.py is added!
1449 Organize constants used in MACS in the Constants.py file.
1451 * All other files are modified!
1453 Foldchange calculation is modified. Now the foldchange only be
1454 calculated at the peak summit position instead of the whole peak
1455 region. The values will be higher and more robust than before.
1459 1. MACS can save wiggle format files containing the tag number at
1460 every 10 bp along the genome. Tags are shifted according to our
1461 model before they are calculated.
1463 2. Model building and local lambda calculation can be skipped with
1466 3. A diagnosis report can be generated through '--diag'
1467 option. This report can help you get an assumption about the
1468 sequencing saturation. This funtion is only in beta stage.
1470 4. FDR calculation speed is highly improved.
1472 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
1475 * TabIO, PeakModel.py ...
1476 Bug fixed to let MACS tolerate some cases while there is no tag on
1477 either plus strand or minus strand.
1480 Check the version of python. If the version is lower than 2.4,
1481 refuse to install with warning.
1484 2013-07-31 Tao Liu <vladimir.liu@gmail.com>
1485 MACS version 2.0.10 20130731 (tag:alpha)
1487 * callpeak --call-summits
1489 Fix bugs causing callpeak --call-summits option generating extra
1490 number of peaks and inconsistent peak boundaries comparing to
1491 default option. Thank Ben Levinson!
1495 Fix bugs causing bdgcmp output logLR all in positive values. Now
1496 'depletion' can be correctly represented as negative values.
1500 Fix the behavior of bdgdiff module. Now it can take four
1501 bedGraph files, then use logLR as cutoff to call differential
1502 regions. Check command line of bdgdiff for detail.
1504 2013-07-13 Tao Liu <vladimir.liu@gmail.com>
1505 MACS version 2.0.10 20130713 (tag:alpha)
1507 * fix bugs while output broadPeak and gappedPeak.
1509 Note. Those weak broad regions without any strong enrichment
1510 regions inside won't be saved in gappedPeak file.
1512 * bdgcmp -T and -C are merged into -S and description is updated.
1514 Now, you can use it to override SPMR values in your input for
1515 bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1516 statistics will cause weird results ( in most cases, lower
1517 significancy), and won't be consistent with MACS2 callpeak
1518 behavior. So if you have SPMR bedGraphs, input the smaller/larger
1519 sample size in MILLION according to 'callpeak --to-large' option.
1521 2013-07-10 Tao Liu <vladimir.liu@gmail.com>
1522 MACS version 2.0.10 20130710 (tag:alpha)
1524 * fix BED style output format of callpeak module:
1526 1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1527 the output. Old BED format file won't be saved.
1529 2) with --broad: broadPeak (BED6+3) for broad region and
1530 gappedPeak (BED12+3) for chained enriched regions will be the
1531 output. Old BED format, narrowPeak format, summit file won't be
1534 * bdgcmp now can accept list of methods to calculate scores. So
1535 you can run it once to generate multiple types of scores. Thank
1536 Jon Urban for this suggestion!
1538 * C codes are re-generated through Cython 0.19.1.
1540 2013-05-21 Tao Liu <vladimir.liu@gmail.com>
1541 MACS version 2.0.10 20130520 (tag:alpha)
1543 * broad peak calling modules are modified in order to report all
1544 relexed regions even there is no strong enrichment inside.
1546 2013-05-01 Tao Liu <vladimir.liu@gmail.com>
1547 MACS version 2.0.10 20130501 (tag:alpha)
1549 * Memory usage is decreased to about 1/4-1/5 of previous usage
1550 Now, the internal data structure and algorithm are both
1551 re-organized, so that intermediate data wouldn't be saved in
1552 memory. Intead they will be calculated on the fly. New MACS2 will
1553 spend longer time (1.5 to 2 times) however it will use less memory
1554 so can be more usable on small mem servers.
1556 * --seed option is added to callpeak and randsample commands
1557 Thank Mathieu Gineste for this suggestion!
1559 2013-03-05 Tao Liu <vladimir.liu@gmail.com>
1560 MACS version 2.0.10 20130306 (tag:alpha)
1562 * diffpeak module New module to detect differential binding sites
1563 with more statistics.
1565 * Introduced --refine-peaks
1566 Calculates reads balancing to refine peak summits
1568 * Ouput file names prefix
1569 Correct encodePeak to narrowPeak, broadPeak to bed12.
1571 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>, Tao Liu <taoliu@jimmy.harvard.edu>
1572 MACS version 2.0.10 (tag:alpha not released)
1574 * Introduced BAMPEParser
1575 Reads PE data directly, requires bedtools for now
1577 * Introduced --call-summits
1578 Uses signal processing methods to call overlapping peaks
1580 * Added --no-trackline
1581 By default, files have descriptive tracklines now
1583 * new refinepeak command (experimental)
1584 This new function will use a similar method in SPP (wtd), to
1585 analyze raw tag distribution in peak region, then redefine the
1586 peak summit where plus and minus tags are evenly distributed
1589 * Changes to output *
1590 cPeakDetect.pyx has full support for new print/write methods and
1591 --call-peaks, BAMPEParser, and use of paired-end data
1593 * Parser optimization
1595 cParser.pyx is rewritten to use io.BufferedReader to speed
1596 up. Speed is doubled.
1598 Code is reorganized -- most of functions are inherited from
1599 GenericParser class.
1601 * Use cross-correlation to calculate fragment size
1603 First, all pairs will be used in prediction for fragment
1604 size. Previously, only no more than 1000 pairs are used. Second,
1605 cross-correlation is used to find the best phase difference
1606 between + and - tag pileups.
1608 * Speed up p-value and q-value calculation
1610 This part is ten times faster now. I am using a dictionary to
1611 cache p-value results from Poisson CDF function. A bit more memory
1612 will be used to increase speed. I hope this dictionary would not
1613 explode since the possible pairs of ChIP signal and control lambda
1614 are hugely redundant. Also, I rewrited part of q-value
1617 * Speed up peak detection
1619 This part is about hundred of times faster now. Optimizations
1620 include using Numpy functions as much as possible, and making loop
1621 body as small as possible.
1623 * Post-processing on differential calls
1625 After macs2diff finds differential binding sites between two
1626 conditions, it will try to annotate the peak calls from one of two
1627 conditions, describe the changes ...
1629 * Fragment size prediction in macs2diff
1631 Now by default, macs2diff will try to use the average fragment
1632 size from both condition 1 and condition 2 for tag extension and
1633 peak calling. Previously, by default, it will use different sizes
1634 unless --nomodel is specified.
1636 Technically, I separate model building processes out. So macs2diff
1637 will build fragment sizes for condition 1 and 2 in parallel (2
1638 processes maximum), then perform 4-way comparisons in parallel (4
1643 Combine two p/qscore tracks together. At regions where condition 1
1644 is higher than condition 2, score would be positive, otherwise,
1647 * SAMParser and BAMParser
1649 Bug fixed for paired-end sequencing data.
1653 Fixed a bug while calling peaks from BedGraph file. It previously
1654 mistakenly output same peaks multiple times at the end of
1657 2011-11-2 Tao Liu <taoliu@jimmy.harvard.edu>
1658 MACS version 2.0.9 (tag:alpha)
1660 * Auto fixation on predicted d is turned off by default!
1662 Previous --off-auto is now default. MACS will not automatically
1663 fix d less than 2 times of tag size according to
1664 --shiftsize. While tag size is getting longer nowadays, it would
1665 be easier to have d less than 2 times of tag size, however d may
1666 still be meaningful and useful. Please judge it using your own
1671 Now, the default scaling while treatment and input are unbalanced
1672 has been adjusted. By default, larger sample will be scaled down
1673 linearly to match the smaller sample. In this way, background
1674 noise will be reduced more than real signals, so we expect to have
1675 more specific results than the other way around (i.e. --to-large
1678 Also, an alternative option to randomly sample larger data
1679 (--down-sample) is provided to replace default linear
1680 scaling. However, this option will cause results irresproducible,
1685 A new script 'randsample' is added, which can randomly sample
1686 certain percentage or number of tags.
1690 Now, MACS will decide peak summits according to pileup height
1691 instead of qvalue scores. In this way, the summit may be more
1696 MACS calculate qvalue scores as differential scores. When compare
1697 two conditions (saying A and B), the maximum qscore for comparing
1698 A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1699 will be computed. If maxqscore_a2b is bigger, the diff score is
1700 +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1702 2011-09-15 Tao Liu <taoliu@jimmy.harvard.edu>
1703 MACS version 2.0.8 (tag:alpha)
1705 * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1707 New script bdgbroadcall and the extra option '--broad' for macs2
1708 script, can be used to call broad regions with a loose cutoff to
1709 link nearby significant regions. The output is represented as
1712 * MACS2/IO/cScoreTrack.pyx
1714 Fix q-value calculation to generate forcefully monotonic values.
1716 * bin/eland*2bed, bin/sam2bed and bin/filterdup
1718 They are combined to one more powerful script called
1719 "filterdup". The script filterdup can filter duplicated reads
1720 according to sequencing depth and genome size. The script can also
1721 convert any format supported by MACS to BED format.
1723 2011-08-21 Tao Liu <taoliu@jimmy.harvard.edu>
1724 MACS version 2.0.7 (tag:alpha)
1726 * bin/macsdiff renamed to bin/bdgdiff
1728 Now this script will work as a low-level finetuning tool as bdgcmp
1733 A new script to take treatment and control files from two
1734 condition, calculate fragment size, use local poisson to get
1735 pvalues and BH process to get qvalues, then combine 4-ways result
1736 to call differential sites.
1738 This script can use upto 4 cpus to speed up 4-ways calculation. (
1739 I am trying multiprocessing in python. )
1741 * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1742 MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1743 MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1745 All above files are modified for the new macs2diff script.
1747 * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1749 Now q-value 0.01 is the default cutoff. If -p is specified,
1750 p-value cutoff will be used instead.
1752 2011-07-25 Tao Liu <vladimir.liu@gmail.com>
1753 MACS version 2.0.6 (tag:alpha)
1757 A script to call differential regions. A naive way is introduced
1758 to find the regions where:
1760 1. signal from condition 1 is larger than input 1 and condition 2 --
1761 unique region in condition 1;
1762 2. signal from condition 2 is larger than input 2 and condition 1
1763 -- unique region in condition 2;
1764 3. signal from condition 1 is larger than input 1, signal from
1765 condition 2 is larger than input 2, however either signal from
1766 condition 1 or 2 is not larger than the other.
1768 Here 'larger' means the pvalue or qvalue from a Poisson test is
1769 under certain cutoff.
1771 (I will make another script to wrap up mulitple scripts for
1772 differential calling)
1774 2011-07-07 Tao Liu <vladimir.liu@gmail.com>
1775 MACS version 2.0.5 (tag:alpha)
1777 * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1778 MACS2/IO/cPeakIO.pyx
1780 Use hash to store peak information. Add back the feature to deal
1781 with data without control.
1783 Fix bug which incorrectly allows small peaks at the end of
1786 * bin/bdgpeakcall, bin/bdgcmp
1788 Fix bugs. bdgpeakcall can output encodePeak format.
1790 2011-06-22 Tao Liu <taoliu@jimmy.harvard.edu>
1791 MACS version 2.0.4 (tag:alpha)
1795 Fix a bug, correctly assign lambda_bg while --to-small is
1796 set. Thanks Junya Seo!
1798 Add rank and num of bp columns to pvalue-qvalue table.
1802 Fix bugs to correctly deal with peakless chromosomes. Thanks
1805 Use AFDR for independent tests instead.
1809 Now MACS can output peak coordinates together with pvalue, qvalue,
1810 summit positions in a single encodePeak format (designed for
1811 ENCODE project) file. This file can be loaded to UCSC
1812 browser. Definition of some specific columns are: 5th:
1813 int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1814 -log10qvalue, 10th: relative summit position to peak start.
1817 2011-06-19 Tao Liu <taoliu@jimmy.harvard.edu>
1818 MACS version 2.0.3 (tag:alpha)
1820 * Rich output with qvalue, fold enrichment, and pileup height
1822 Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1825 http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1827 Now we have a similiar xls output file as before. The differences
1828 from previous file are:
1830 1. Summit now is absolute summit, instead of relative summit
1832 2. 'Pileup' is previous 'tag' column. It's the extended fragment
1833 pileup at the peak summit;
1834 3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1835 5.00 means 1e-5, simple and less confusing.
1836 4. FDR column becomes '-log10(qvalue)' column.
1837 5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1838 the values at the peak summit.
1840 * Extra output files
1842 NAME_pqtable.txt contains pvalue and qvalue relationships.
1844 NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1845 and -log10qvalue scores in BedGraph format. Nearby regions with
1846 the same value are not merged.
1848 * Separation of FeatIO.py
1850 Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1851 cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1852 implemented to store pileup, local lambda, pvalue, and qvalue
1853 alltogether in cScoreTrack.pyx.
1855 * Experimental option --half-ext
1857 Suggested by NPS algorithm, I added an experimental option
1858 --half-ext to let MACS only extends ChIP fragment around its
1859 middle point for only 1/2 d.
1861 2011-06-12 Tao Liu <taoliu@jimmy.harvard.edu>
1862 MACS version 2.0.2 (tag:alpha)
1866 Add an error check to see if there is no common chromosome names
1867 from treatment file and control file
1869 * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1871 Reduce memory usage by removing deepcopy() calls.
1873 * Modify README documents and others.
1875 2011-05-19 Tao Liu <taoliu@jimmy.harvard.edu>
1876 MACS Version 2.0.1 (tag:alpha)
1878 * cPileup.pyx, cPeakDetect.pyx and peak calling process
1880 Jie suggested me a brilliant simple method to pileup fragments
1881 into bedGraph track. It works extremely faster than the previous
1882 function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1883 large local lambda calculation in MACSv2 now. Now I generate three
1884 bedGraphs for d-size local bias, slocal-size and llocal-size local
1885 bias, and calculate the maximum local bias as local lambda
1888 Minor: add_loc in bedGraphTrackI now can correctly merge the
1889 region with its preceding region if their value are the same.
1893 Add an option to shift control tags before extension. By default,
1894 control tags will be extended to both sides regardless of strand
1897 2011-05-17 Tao Liu <taoliu@jimmy.harvard.edu>
1898 MACS Version 2.0.0 (tag:alpha)
1900 * Use bedGraph type to store data internally and externally.
1902 We can have theoretically one-basepair resolution profiles. 10
1903 times smaller in filesize and even smaller after converting to
1904 bigWig for visualization.
1906 * Peak calling process modified. Better peak boundary detection.
1908 Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1909 Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1910 one will be averaged to d size) Then calculate the maximum value
1911 of these two tracks and a global background, to have a
1912 local-lambda bedGraph.
1914 Use -10log10poisson_pvalue as scores to generate a score track
1915 before peak calling.
1917 A general peak calling based on a score cutoff, min length of peak
1918 and max gap between nearby peaks.
1922 Wiggle file output is removed. Now we only support bedGraph
1923 output. The generation of bedGraph is highly recommended since it
1924 will not cost extra time. In other words, bedGraph generation is
1925 internally run even you don't want to save bedGraphs on disk, due
1926 to the peak calling algorithm in MACS v2.
1930 We now can calculate poisson pvalue in log space so that the score
1931 (-10*log10pvalue) will not have a upper limit of 3100 due to
1932 precision of float number.
1934 * Cython is adopted to speed up Python code.
1936 2011-02-28 Tao Liu <taoliu@jimmy.harvard.edu>
1939 * Replaced with a newest WigTrackI class and fixed the wignorm script.
1941 2011-02-21 Tao Liu <taoliu@jimmy.harvard.edu>
1942 Version 1.4.0rc2 (Valentine)
1944 * --single-wig option is renamed to --single-profile
1946 * BedGraph output with --bdg or -B option.
1948 The BedGraph output provides 1bp resolution fragment pileup
1949 profile. File size is smaller than wig file. This option can be
1950 combined with --single-profile option to produce a bedgraph file
1951 for the whole genome. This option can also make --space,
1952 --call-subpeaks invalid.
1954 * Fix the description of --shiftsize to correctly state that the
1955 value is 1/2 d (fragment size).
1957 * Fix a bug in the call to __filter_w_control_tags when control is
1960 * Fix a bug on --to-small option. Now it works as expected.
1962 * Fix a bug while counting the tags in candidate peak region, an
1963 extra tag may be included. (Thanks to Jake Biesinger!)
1965 * Fix the bug for the peaks extended outside of chromosome
1966 start. If the minus strand tag goes outside of chromosome start
1967 after extension of d, it will be thrown out.
1969 * Post-process script for a combined wig file:
1971 The "wignorm" command can be called after a full run of MACS14 as
1972 a postprocess. wignorm can calculate the local background from the
1973 control wig file from MACS14, then use either foldchange,
1974 -10*log10(pvalue) from possion test, or difference after asinh
1975 transformation as the score to build a single wig track to
1976 represent the binding strength. This script will take a
1977 significant long time to process.
1979 * --wigextend has been obsoleted.
1981 2010-09-21 Tao Liu <taoliu@jimmy.harvard.edu>
1982 Version 1.4.0rc1 (Starry Sky)
1984 * Duplicate reads option
1986 --keep-dup behavior is changed. Now user can specify how many
1987 reads he/she wants to keep at the same genomic location. 'auto' to
1988 let MACS decide the number based on binomial distribution, 'all'
1989 to let MACS keep all reads.
1991 * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1993 By default, MACS will now scale the smaller dataset to the bigger
1994 dataset. For instance, if IP has 10 million reads, and Input has 5
1995 million, MACS will double the lambda value calculated from Input
1996 reads while calling BOTH the positive peaks and negative
1997 peaks. This will address the issue caused by unbalanced numbers of
1998 reads from IP and Input. If --to-small is turned on, MACS will
1999 scale the larger dataset to the smaller one. So from now on, if d
2000 is fixed, then the peaks from a MACS call for A vs B should be
2001 identical to the negative peaks from a B vs A.
2003 2010-09-01 Tao Liu <taoliu@jimmy.harvard.edu>
2004 Version 1.4.0beta (summer wishes)
2010 The default behavior in the model building step is slightly
2011 changed. When MACS can't find enough pairs to build model
2012 (implemented in alpha version) or the modeled fragment length is
2013 less than 2 times of tag length (implemented in beta version),
2014 MACS will use 2 times of --shiftsize value as fragment length in
2015 the later analysis. --off-auto can turn off this default behavior.
2017 ** Redundant tag filtering
2019 The IO module is rewritten. The redundant tag filtering process
2020 becomes simpler and works as promise. The maximum allowed number
2021 of tags at the exact same location is calculated from the
2022 sequencing depth and genome size using a binomial distribution,
2023 for both TREAMENT and CONTROL separately. ( previously only
2024 TREATMENT is considered ) The exact same location means the same
2025 coordination and the same strand. Then MACS will only keep at most
2026 this number of tags at the exact same location in the following
2027 analysis. An option --keep-dup can let MACS skip the filtering and
2028 keep all the tags. However this may bring in a lot of sequencing
2029 bias, so you may get many false positive peaks.
2031 ** Single wiggle mode
2033 First thing to mention, this is not the score track that I
2034 described before. By default, MACS generates wiggle files for
2035 fragment pileup for every chromosomes separately. When you use
2036 --single-wig option, MACS will generate a single wiggle file for
2037 all the chromosomes so you will get a wig.gz for TREATMENT and
2038 another wig.gz for CONTROL if available.
2040 ** Sniff -- automatic format detection
2042 Now, by default or "-f AUTO", MACS will decide the input file
2043 format automatically. Technically, it will try to read at most
2044 1000 records for the first 10 non-comment lines. If it succeeds,
2045 the format is decided. I recommend not to use AUTO and specify the
2046 right format for your input files, unless you combine different
2047 formats in a single MACS run.
2051 --single-wig and --keep-dup are added. Check previous section in
2052 ChangeLog for detail.
2054 -f (--format) AUTO is now the default option.
2056 --slocal default: 1000
2057 --llocal default: 10000
2061 Setup script will stop the installation if python version is not
2062 python2.6 or python2.7.
2064 Local lambda calculation has been changed back. MACS will check
2065 peak_region, slocal( default 1K) and llocal (default 10K) for the
2066 local bias. The previous 200bps default will cause MACS misses
2067 some peaks where the input bias is very sharp.
2069 sam2bed.py script is corrected.
2071 Relative pos in xls output is fixed.
2073 Parser for ELAND_export is fixed to pass some of the no match
2074 lines. And elandexport2bed.py is fixed too. ( however I can't
2075 guarantee that it works on any eland_export files. )
2077 2010-06-04 Tao Liu <taoliu@jimmy.harvard.edu>
2078 Version 1.4.0alpha2 (be smarter)
2082 --gsize now provides shortcuts for common genomes, including
2083 human, mouse, C. elegans and fruitfly.
2085 --llocal now will be 5000 bps if there is no input file, so that
2086 local lambda doesn't overkill enriched binding sites.
2088 2010-06-02 Tao Liu <taoliu@jimmy.harvard.edu>
2089 Version 1.4alpha (be smarter)
2093 --tsize option is redesigned. MACS will use the first 10 lines of
2094 the input to decide the tag size. If user specifies --tsize, it
2095 will override the auto decided tsize.
2097 --lambdaset is replaced by --slocal and --llocal which mean the
2098 small local region and large local region.
2100 --bw has no effect on the scan-window size now. It only affects the
2101 paired-peaks model process.
2105 During the model building, MACS will pick out the enriched regions
2106 which are not too high and not too low to build the paired-peak
2107 model. Default the region is from fold 10 to fold 30. If MACS
2108 fails to build the model, by default it will use the nomodel
2109 settings, like shiftsize=100bps, to shift and extend each
2110 tags. This behavior can be turned off by '--off-auto'.
2114 An extra file including all the summit positions are saved in
2115 *_summits.bed file. An option '--call-subpeaks' will invoke
2116 PeakSplitter developed by Mali Salmon to split wide peaks into
2119 * Sniff ( will in beta )
2121 Automatically recognize the input file format, so use can combine
2122 different format in one MACS run.
2124 Not implemented features/TODO:
2126 * Algorithms ( in near future? )
2128 MACS will try to refine the peak boundaries by calculating the
2129 scores for every point in the candidate peak regions. The score
2130 will be the -10*log(10,pvalue) on a local poisson distribution. A
2131 cutoff specified by users (--pvalue) will be applied to find the
2132 precise sub-peaks in the original candidate peak region. Peak
2133 boudaries and peak summits positions will be saved in separate BED
2136 * Single wiggle track ( in near future? )
2138 A single wiggle track will be generated to save the scores within
2139 candidate peak regions in the 10bps resolution. The wiggle file
2140 is in fixedStep format.
2143 2009-10-16 Tao Liu <taoliu@jimmy.harvard.edu>
2144 Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2148 Fixed typo. FCSTEP -> FESTEP
2152 The 'femax' attribute bug is fixed
2154 2009-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2155 Version 1.3.7 (Oktoberfest)
2157 * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2159 Enhancements by Peter Chines:
2161 1. gzip files are supported.
2162 2. when --diag is on, user can set the increment and endpoint for
2163 fold enrichment analysis by setting --fe-step and --fe-max.
2165 Enhancements by Davide Cittaro:
2167 1. BAM and SAM formats are supported.
2168 2. small changes in the header lines of wiggle output.
2171 1. I added --fe-min option;
2172 2. Bowtie ascii output with suffix ".map" is supported.
2176 1. --nolambda bug is fixed. ( reported by Martin in JHU )
2177 2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2178 3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2179 4. Some "fold change" have been changed to "fold enrichment".
2181 2009-06-10 Tao Liu <taoliu@jimmy.harvard.edu>
2182 Version 1.3.6.1 (default parameter change)
2184 * bin/macs, lib/PeakDetect.py
2186 "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2187 default. "--futurefdr" is added which can turn on the 'new' method
2188 introduced in 1.3.6. By default it's off.
2192 Fixed a bug. p-value is corrected a little bit.
2195 2009-05-11 Tao Liu <taoliu@jimmy.harvard.edu>
2196 Version 1.3.6 (Birthday cake)
2200 "track name" is added to the header of BED output file.
2202 Now the default peak detection method is to consider 5k and 10k
2203 nearby regions in treatment data and peak location, 1k, 5k, and
2204 10k regions in control data to calculate local bias. The old
2205 method can be called through '--old' option.
2207 Information about how many total/unique tags in treatment or
2208 control will be saved in final .xls output.
2210 * lib/IO/__init__.py
2212 ".fa" will be removed from input tag alignment so only the
2213 chromosome names are kept.
2215 WigTrackI class is added for Wiggle like data structure. (not used
2218 The parser for ELAND multi PET files has been fixed. Now the 5'
2219 tag position for a pair will be kept, whereas in the previous
2220 version, the middle points are kept.
2222 * lib/IO/BinKeeper.py
2224 BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2225 browser, which can quickly access certain region for values in a
2226 large wiggle like data file. (not used now)
2228 * lib/OptValidator.py
2234 Now the default peak detection method is to consider 5k and 10k
2235 nearby regions in treatment data and peak location, 1k, 5k, and
2236 10k regions in control data to calculate local bias. The old
2237 method can be called through '--old' option.
2239 Two columns have beed added to BED output file. 4th column: peak
2240 name; 5th column: peak score using -10log(10,pvalue) as score.
2244 Add support to build a Mac App through 'setup.py py2app', or a
2245 Windows executable through 'setup.py py2exe'. You need to install
2246 py2app or py2exe package in order to use these functions.
2248 2009-02-12 Tao Liu <taoliu@jimmy.harvard.edu>
2249 Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2253 Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2254 in control data to calculate local lambda for each peak. Peak
2255 calling results will be slightly different with previous version,
2260 Typo fixed, ELANDParser -> ELANDResultParser
2264 Now, modeled d value will be shown on the model figure.
2266 2009-01-06 Tao Liu <taoliu@jimmy.harvard.edu>
2267 Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2269 * macs, IO/__init__.py, PeakDetect.py
2271 Add support for ELAND multi format. Add support for Pair-End
2272 experiment, in this case, 5'end and 3'end ELAND multi format files
2273 are required for treatment or control data. See 00README file for
2276 Add wigextend option.
2278 Add petdist option for Pair-End Tag experiment, which is the best
2279 distance between 5' and 3' tags.
2283 Fixed a bug which cause the end positions of every peak region
2284 incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2288 Fix bugs while generating wiggle files. The start position of
2289 wiggle file is set to 1 instead of 0.
2291 Fix a bug that every 10M bps, signals in the first 'd' range are
2292 lower than actual. ( Thanks Mali Salmon!)
2295 2008-12-03 Tao Liu <taoliu@jimmy.harvard.edu>
2296 Version 1.3.3 (wiggle bugs fixed)
2300 Fix bugs while generating wiggle files. 1. 'span=' is added to
2301 'variableStep' line; 2. previously, every 10M bps, the coordinates
2302 were wrongly shifted to the right for 'd' basepairs.
2304 * macs, PeakDetect.py
2306 Add an option to save wiggle files on different resolution.
2308 2008-10-02 Tao Liu <taoliu@jimmy.harvard.edu>
2309 Version 1.3.2 (tiny bugs fixed)
2313 Fix 65536 -> 65535. ( Thank Joon)
2317 Improved for binomial function with extra large number. Imported
2318 from Cistrome project.
2322 If treatment channel misses reads in some chromosome included in
2323 control channel, or vice versa, MACS will not exit. (Thank Shaun
2326 Instead, MACS will fake a tag at position -1 when calling
2327 treatment peaks vs control, but will ignore the chromosome while
2328 calling negative peaks.
2330 2008-09-04 Tao Liu <taoliu@jimmy.harvard.edu>
2331 Version 1.3.1 (tiny bugs fixed version)
2335 Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2336 binomial functions can tolerate large and small numbers.
2340 Parsers now split lines in BED/ELAND file using any
2341 whitespaces. 'track' or 'browser' lines will be regarded as
2342 comment lines. A bug fixed when throwing StrandFormatError. The
2343 maximum redundant tag number at a single position can be no less
2347 2008-07-15 Tao Liu <taoliu@jimmy.harvard.edu>
2348 Version 1.3 (naming clarification version)
2350 * Naming clarification changes according to our manuscript:
2352 'frag_len' is changed to 'd'.
2354 'fold_change' is changed to 'fold_enrichment'.
2356 Suggest '--bw' parameter to be determined by users from the real
2359 Maximum FDR is 100% in the output file.
2361 And other clarifications in 00README file and the documents on the
2365 If the redundant tag number at a single position is over 32767,
2366 just remember 32767, instead of raising an overflow exception.
2372 Bug fixed for diagnosis report.
2375 2008-07-10 Tao Liu <taoliu@jimmy.harvard.edu>
2380 Poisson distribution CDF and inverse CDF functions are
2381 corrected. They can produce right results even for huge lambda
2382 now. So that the p-value and FDR values in the final excel sheet
2385 IO package now can tolerate some rare cases; ELANDParser in IO
2386 package is fixed. (Thank Bogdan)
2390 Reverse paired peaks in model are rejected. So there will be no
2391 negative 'frag_len'. (Thank Bogdan)
2395 Diagnosis function is completed. Which can output a table file for
2396 users to estimate their sequencing depth.
2399 2008-06-30 Tao Liu <taoliu@jimmy.harvard.edu>
2402 * Probe.py is added!
2404 GSL is totally removed from MACS. Instead, I have implemented the
2405 CDF and inverse CDF for poisson and binomial distribution purely
2408 * Constants.py is added!
2410 Organize constants used in MACS in the Constants.py file.
2412 * All other files are modified!
2414 Foldchange calculation is modified. Now the foldchange only be
2415 calculated at the peak summit position instead of the whole peak
2416 region. The values will be higher and more robust than before.
2420 1. MACS can save wiggle format files containing the tag number at
2421 every 10 bp along the genome. Tags are shifted according to our
2422 model before they are calculated.
2424 2. Model building and local lambda calculation can be skipped with
2427 3. A diagnosis report can be generated through '--diag'
2428 option. This report can help you get an assumption about the
2429 sequencing saturation. This funtion is only in beta stage.
2431 4. FDR calculation speed is highly improved.
2433 2008-05-28 Tao Liu <taoliu@jimmy.harvard.edu>
2436 * TabIO, PeakModel.py ...
2437 Bug fixed to let MACS tolerate some cases while there is no tag on
2438 either plus strand or minus strand.
2441 Check the version of python. If the version is lower than 2.4,
2442 refuse to install with warning.