ChangeLog

   1 2024-02-19  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS 3.0.1
   3
   4         * Bugs fixed
   5
   6         1) Fixed a bug that the `hmmatac` can't correctly save the
   7         digested signal files. #605 #611
   8
   9         2) Applied a patch to remove cython requirement from the installed
  10         system. (it's needed for building the package). #606 #612
  11
  12         3) Relax the testing script while comparing the peaks called from
  13         current codes and the standard peaks. To implement this, we added
  14         'intersection' function to 'Regions' class to find the
  15         intersecting regions of two Regions object (similar to PeakIO but
  16         only recording chromosome, start and end positions). And we
  17         updated the unit test 'test_Region.py' then implemented a script
  18         'jaccard.py' to compute the Jaccard Index of two peak files. If
  19         the JI > 0.99 we would think the peaks called and the standard
  20         peaks are similar. This is to avoid the problem caused by
  21         different Numpy/SciPy/sci-kit learn libraries, when certain peak
  22         coordinates may have 10bps difference. #615 #619
  23
  24         4) Due to the changes in scikit-learn 1.3.0:
  25         https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
  26         0.3 uses Kmeans will end up with inconsistent results between
  27         sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
  28         hmm.GaussianHMM and adjusted the standard output from `hmmratac`
  29         subcommand. The change is based on
  30         https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
  31         the random seeding of KMeans 10 times. Now the `hmmratac` results
  32         should be more consistent (at least JI>0.99). #615 #620
  33
  34         * Other
  35
  36         1) We added some dependencies to MACS3. `hmmratc` subcommand needs
  37         `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
  38         `scikit-learn` needs `scipy`. Since major releases have happened
  39         for both`scipy` and `scikit-learn`, we have to set specific
  40         version requirements for them in order to make sure the output
  41         results from `hmmratac` are consistent.
  42
  43         2) We updated our documentation website using
  44         Sphinx. https://macs3-project.github.io/MACS/
  45
  46 2023-11-15  Tao Liu  <vladimir.liu@gmail.com>
  47         MACS 3.0.0
  48
  49         1) Call variants in peak regions directly from BAM files. The
  50         function was originally developed under code name SAPPER. Now
  51         SAPPER has been merged into MACS as the `callvar` command. It can
  52         be used to call SNVs and small INDELs directly from alignment
  53         files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
  54         the DNA sequence at the enriched genomic regions (binding sites or
  55         accessible DNA) and to refine the alignment when necessary. We
  56         added `simde` as a submodule in order to support fermi-lite
  57         library under non-x64 architectures.
  58
  59         2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
  60         a dedicated software to analyze ATAC-seq data. The basic idea
  61         behind HMMRATAC is to digest ATAC-seq data according to the
  62         fragment length of read pairs into four signal tracks: short
  63         fragments, mono-nucleosomal fragments, di-nucleosomal fragments
  64         and tri-nucleosomal fragments. Then integrate the four tracks
  65         again using Hidden Markov Model to consider three hidden states:
  66         open region, nucleosomal region, and background region. The
  67         orginal paper was published in 2019 written in JAVA, by Evan
  68         Tarbell. We implemented it in Python/Cython and optimize the whole
  69         process using existing MACS functions and hmmlearn. Now it can run
  70         much faster than the original JAVA version. Note: evaluation of
  71         the peak calling results is still underway.
  72
  73         3) Speed/memory optimization.  Use the cykhash to replace python
  74         dictionary. Use buffer (10MB) to read and parse input file (not
  75         available for BAM file parser). And many optimization tweaks. We
  76         added memory monitoring to the runtime messages.
  77
  78         4) R wrappers for MACS -- MACSr for bioconductor.
  79
  80         5) Code cleanup. Reorganize source codes.
  81
  82         6) Unit testing.
  83
  84         7) Switch to Github Action for CI, support multi-arch testing
  85         including x64, armv7, aarch64, s390x and ppc64le. We also test on
  86         Mac OS 12.
  87
  88         8) MACS tag-shifting model has been refined. Now it will use a
  89         naive peak calling approach to find ALL possible paired peaks at +
  90         and - strand, then use all of them to calculate the
  91         cross-correlation. (a related bug has been fix
  92         [#442](https://github.com/macs3-project/MACS/issues/442))
  93
  94         9) BAI index and random access to BAM file now is
  95         supported. [#449](https://github.com/macs3-project/MACS/issues/449).
  96
  97         10) Support of Python > 3.10
  98         [#498](https://github.com/macs3-project/MACS/issues/498)
  99
 100         11) The effective genome size parameters have been updated
 101         according to
 102         deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
 103
 104         12) Multiple updates regarding dependencies, anaconda built, CI/CD
 105         process.
 106
 107         13) Cython 3 is supported.
 108
 109         14) Documentations for each subcommand can be found under /docs
 110
 111         *Other*
 112
 113         1) Missing header line while no peaks can be called
 114         [#501](https://github.com/macs3-project/MACS/issues/501)
 115         [#502](https://github.com/macs3-project/MACS/issues/502)
 116
 117         2) Note: different numpy, scipy, sklearn may give slightly
 118         different results for hmmratac results. The current standard
 119         results for automated testing in `/test` directory are from Numpy
 120         1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
 121
 122 2020-04-11  Tao Liu  <vladimir.liu@gmail.com>
 123         MACS version 2.2.7.1
 124
 125         * hotfix:
 126
 127         Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
 128         work.
 129
 130 2020-04-10  Tao Liu  <vladimir.liu@gmail.com>
 131         MACS version 2.2.7
 132
 133         * Bugs fixed
 134
 135         1) MACS2 has been tested on multiple architectures to make sure it
 136         can successfully generate consistent results. Currently the
 137         supported architectures are: AMD64, ARM64, i386, PPC64LE, and
 138         S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
 139         #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
 140         and #370. The lesson is that if the project is built on Cython and
 141         is aimed at memory efficiency, we should specifically define all
 142         int/float types in pyx files such as int8_t or uint32_t using
 143         either libc or numpy (c version) instead of relying on Cython
 144         types such as short, long, double.
 145
 146         2) MACS2 setup script will check numpy and install numpy if
 147         necessary. PR #378, issue #364
 148
 149         3) `bdgbroadcall` command will correctly add the score column (5th
 150         column). The score (5th) column contains 10 times of the average
 151         score in the broad region. PR #373, issue #362
 152
 153         4) The missing test on `bdgopt` subcommand has been added. PR #363
 154
 155         5) The obsolete option `--ratio` from `callpeak` subcommand has
 156         been removed. PR #369, issue #366
 157
 158         6) Fixed the incorrect description in README on the 'maximum
 159         length of broad region is 4 times of d' to 'maximum gap for
 160         merging broad regions is 4 times of tag size by default'. PR #380,
 161         issue #365.
 162
 163         * Other
 164
 165         1) CODE OF CONDUCT document has been added to MACS2 github
 166         repository. PR #358
 167
 168 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
 169         MACS version 2.2.6
 170
 171         * New Features
 172
 173         1) Speed up MACS2. Some programming tricks and code cleanup. The
 174         filter_dup function replaces separate_dups. The later one was
 175         implemented for potentially putting back duplicate reads in
 176         certain downstream analysis. However such analysis hasn't been
 177         implemented. Optimize the speed of writing bedGraph
 178         files. Optimize BAM and BAMPE parsing with pointer casting instead
 179         of python unpack.
 180
 181         2) The comment lines in the headers of BED or SAM files will be
 182         correctly skipped. However, MACS2 won't check comment lines in the
 183         middle of the file.
 184
 185         * Bugs fixed
 186
 187         1) Cutoff-analysis in callpeak command. #341
 188
 189         2) Issues related to SAMParser and three ELAND Parsers are
 190         fixed. #347
 191
 192         * Other
 193
 194         1) cmdlinetest script in test/ folder has been updated to: 1. test
 195         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
 196         and after the error or warning message during tests; 3. output
 197         only the first 10 lines if the difference between test result and
 198         standard result can be found; 4. prockreport monitor CPU time and
 199         memory usage in 1 sec interval -- a bit more accurate.
 200
 201         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
 202
 203 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
 204         MACS version 2.2.5 (Py3 speed up)
 205
 206         * Features added
 207
 208         1) *Github code only and Not included in MACS2 release* New
 209         testing data for performance test. An subsampled ENCODE2 CTCF
 210         ChIP-seq dataset, including 5million ChIP reads and 5 million
 211         control reads, has been included in the test folder for testing
 212         CPU and memory usage (i.e. 5M test). Several related scripts ,
 213         including `prockreport` for output cpu memory usage, `pyprofile`
 214         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
 215         been included.
 216
 217         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
 218         The old hashtable.pyx implementation copied from Pandas (very old
 219         version) doesn't work well in Python3+Cython. It slows down the
 220         pqtable checkup using the identical Cython codes as in
 221         v2.1.4. While running 5M test, the `__getitem__` function in the
 222         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
 223         148.6s with the same number of calls in MACS2 v2.2.4. As a
 224         consequence, the standard python dictionary implementation has
 225         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
 226         faster than py2 version, but uses a bit more memory. In general,
 227         v2.2.5 can finish 5M reads test in 20% less time than MACS2
 228         v2.1.4, but use 15% more memory.
 229
 230         * Bug fixed
 231
 232         1) More Python3 related fixes, e.g. the return value of keys from
 233         py3 dict. #333 #337
 234
 235
 236 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
 237         MACS version 2.2.4 (Python3)
 238
 239         * Features added
 240
 241         1) First Python3 version MACS2 released.
 242
 243         2) Version number 2.2.X will be used for MACS2 in Python3, in
 244         parallel to 2.1.X.
 245
 246         3) More comprehensive test.sh script to check the consistency of
 247         results from Python2 version and Python3 version.
 248
 249         4) Simplify setup.py script since the newest version transparently
 250         supports cython. And when cython is not installed by the user,
 251         setup.py can still compile using only C codes.
 252
 253         5) Fix Signal.pyx to use np.array instead of np.mat.
 254
 255 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
 256         MACS version 2.1.4
 257
 258         * Features added
 259
 260         Github Actions is used together with Travis CI for testing and
 261         deployment.
 262
 263         * Bugs fixed
 264
 265         PR #322:
 266
 267         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 268         not initialized as 0 before adding. Potential bugs are fixed in
 269         other functions in ScoreTrack and CallPeakUnit codes.
 270
 271         2) #321 Cython dependency in setup.py script is removed. And place
 272         'cythonzie' call to the correct position.
 273
 274         3) A typo is fixed in Github Actions script.
 275
 276 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 277         MACS version 2.1.3.3
 278
 279         * Features added
 280
 281         1) Support Docker auto-deploy. PR #309
 282
 283         2) Support Travis CI auto-testing, update unit-testing
 284         scripts, and enable subcommand testing on small datasets.
 285
 286         3) Update README documents. #297 PR #306
 287
 288         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 289         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 290
 291         5) `--d-min` option is added in `callpeak` and `predictd`, to
 292         exclude predictions of fragment size smaller than the given
 293         value. Merged from PR #267 @shouldsee.
 294
 295         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 296         `pileup` and `refinepeak` subcommands. Users can use this option
 297         to decrease memory usage while there are a large number of contigs
 298         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 299         `pileup` and `refinepeak` will suggest users to tweak
 300         `--buffer-size` while catching a MemoryError. #313 PR #314
 301
 302         * Bugs fixed
 303
 304         1) #265 Fixed a bug where the pseudocount hasn't been applied
 305         while calculating p-value score in ScoreTrack object.
 306
 307         2) Fixed bdgbroadcall so that it will report those broad peaks
 308         without strong peak inside, a consistent behavior as `callpeak
 309         --broad`.
 310
 311         3) Rename COPYING to LICENSE.
 312
 313 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 314         MACS version 2.1.2
 315
 316         * New features
 317
 318         1) Added missing BEDPE support. And enable the support for BAMPE
 319         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 320         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 321         will pile up the whole fragment defined by mapping locations of
 322         the left end and right end of each read pair. Thank @purcaro
 323
 324         2) Added options to callpeak command for tweaking max-gap and
 325         min-len during peak calling. Thank @jsh58!
 326
 327         3) The callpeak option "--to-large" option is replaced with
 328         "--scale-to large".
 329
 330         4) The randsample option "-t" has been replaced with "-i".
 331
 332         * Bug fixes
 333
 334         1) Fixed memory issue related to #122 and #146
 335
 336         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 337
 338         3) Fixed a bug while setting commandline qvalue cutoff.
 339
 340         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 341
 342         5) Fixed the calculation of average fragment length for paired-end
 343         data. Thank @jsh58
 344
 345         6) Fixed bugs caused by khash while computing p/q-value and log
 346         likelihood ratios. Thank @jsh58
 347
 348         7) More spelling tweaks in source code. Thank @mr-c
 349
 350 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 351         MACS version 2.1.1 20160309
 352
 353         * Retire the tag:rc.
 354
 355         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 356
 357         * Change filtering criteria for reading BAM/SAM files
 358
 359         Related to callpeak and filterdup commands. Now the
 360         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 361         still be read although MACS2 may decide them as duplicates
 362         later. Related to old issue #33. Sorry I forgot to address it for
 363         years!
 364
 365 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 366         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 367
 368         * Bug fixes
 369
 370         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 371         the former option is not supported by older GCC. Related to issues
 372         #91, #109.
 373
 374         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 375         the PeakIO won't throw an error.
 376
 377         * New features
 378
 379         1) callpeak
 380
 381         a) A more flexible format, BEDPE, is supported. Now users can
 382         define the left and right position of the ChIPed fragment, and
 383         MACS2 will skip model building and directly pileup the
 384         fragments. Related to issue #112.
 385
 386         b) The 'tempdir' can be specified, to save cached pileup
 387         tracks. Originially, the temporary files were stored in
 388         /tmp. Thank @daler! Related to issues #97 and #105.
 389
 390         2) bdgopt
 391
 392         New operations are added, to calculate the maximum or minimum value between
 393         values in BEDGRAPH and given value.
 394
 395         3) bdgcmp
 396
 397         New method is added, to calculate the maximum value between values
 398         defined in two BEDGRAPH files.
 399
 400 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 401         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 402
 403         * Bug fixes
 404
 405         1) Fix a bug while dealing with some chromosomes only containing
 406         one read (pair). The size of dup_plus/dup_minus arrays after
 407         filtering dups should +1.
 408
 409         2) Fix a bug related to the broad peak calling function in
 410         previous versions. The gaps were miscalculated, so segmented weak
 411         broad calls may be reported, and sometimes you would see peaks
 412         with lower than cutoff values in the output files.
 413
 414         3) "Potentially" Fixed issue #105 on temporary cache files, need
 415         further followup.
 416
 417
 418 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 419         MACS version 2.1.0 20150731 (tag:rc)
 420
 421         * Bug fixes
 422
 423         1) Fixed issue #76: information about broad/narrow cutoff will be
 424         correctly displayed.
 425
 426         2) Fixed issue #79: bdgopt extparam option is fixed.
 427
 428         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 429         for filterdup command.
 430
 431         4) Fixed issue #78, #88 and similar issue reported in MACS google
 432         group: MACS2 now can correctly deal with multiple alignment files
 433         for -t or -c. The 'finalize' function will be correctly
 434         called. Multiple files option is enabled for filterdup,
 435         randsample, predictd, pileup and refinepeak commands.
 436
 437         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 438         be sorted by leftmost then rightmost ends.
 439
 440         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 441         array. This will cause 'callpeak --nolambda' hang forever while
 442         calculating pvalues and qvalues.
 443
 444 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 445         MACS version 2.1.0 20150420 (tag:rc)
 446
 447         * New commands
 448
 449         1) bdgopt: some convenient functions to modify bedGraph files.
 450
 451         2) cmbreps: Combine scores from two replicates. Including three
 452         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 453         method to combine two p-value scores. After that, user can use
 454         bdgpeakcall to call peaks on combined scores.
 455
 456         * New features
 457
 458         1) callpeak and bdgpeakcall now can try to analyze the
 459         relationship between p-values and number/length of peaks then
 460         generate a summary to help users decide an appropriate cutoff.
 461
 462         2) callpeak now can accept fold-enrichment cutoff as a filter for
 463         final peak calls.
 464
 465         * Performance
 466
 467         Now MACS2 runs about 3X as fast as previous version. Trade
 468         clean python codes for speed... Now while processing 50M ChIP vs
 469         50M control, it will take only 10 minutes.
 470
 471         * Bug fixes
 472
 473         1) Sampling function in BAMPE mode.
 474
 475         2) Callpeak while there are >= 2 input files for -t or -c.
 476
 477         3) While reading BAM/SAM, those secondary or supplementary
 478         alignments will be correctly skipped.
 479
 480         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 481         option that MACS2 will discard those SAM/BAM alignments with bit
 482         1024 no matter how --keep-dup is set.
 483
 484         5) Fixed issue #49: setuptools is used intead of distutils
 485
 486         6) Fixed issue #51: fix the problem when using --trackline
 487         argument when control file is absent.
 488
 489         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 490         read mapped to minus strand. Previous implementation will find
 491         incorrect 5' end if there is indel in alignment.
 492
 493         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 494         mode which will cause incorrect filtering of duplicated reads. Now
 495         fixed.
 496
 497         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 498         be 1.
 499
 500         10) Issue #71: Merged from aertslab@github, close file descriptor
 501         after creating them with mkstemp().
 502
 503 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 504         MACS version 2.1.0 20140616 (tag:rc)
 505
 506         * callpeak module
 507
 508         "--ratio" is added to manually assign the scaling factor of ChIP
 509         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 510         implementing the patch file!
 511
 512         "--shift" is added to move cutting ends (5' end of reads) around,
 513         in order to process DNAse-Seq data, e.g., use "--shift -100
 514         --extsize 200" to get 200bps fragments around 5' ends. For general
 515         ChIP-Seq data analysis, this option should be always set as
 516         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 517         group!
 518
 519         ** Do not output negative fragment size from cross-correlation
 520         analysis. Thank Alvin Qin for the feedback!
 521
 522         ** --half-ext and --control-shift are removed. For complex read
 523         shifting and extending, combine '--shift' and '--extsize'
 524         options. For comparing two conditions, use 'bdgdiff' module
 525         instead.
 526
 527         ** a bug is fixed to output the last pileup value in bdg file
 528         correctly.
 529
 530         * filterdup
 531
 532         A 'dry-run' option is added to only output numbers, including the
 533         number of allowed duplicates, the total number of reads before and
 534         after filtering duplicates and the estimated duplication
 535         rate. Thank John Urban for the suggestion!
 536
 537
 538 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 539         MACS version 2.0.10 20131216 (tag:alpha)
 540
 541         bug fixes and tweaks
 542
 543         * We changed license from Artistic License to 3-clauses BSD license.
 544
 545         Yes. Simpler the better.
 546
 547         * Process paired-end data with "-f BAMPE" without control
 548
 549         * GappedPeak output for --broad option has been fixed again to be
 550         consistent with official UCSC format. We add 1bp pseudo-block to
 551         left and/or right of broad region when necessary, so that you can
 552         virtualize the regions without strong enrichment inside
 553         successfully. In downstream analysis except for virtualization,
 554         you may need to remove all 1bps blocks from gappedPeak file.
 555
 556         * diffpeak subcommand is temporarily disabled. Till we
 557         re-implement it.
 558
 559 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 560         MACS version 2.0.10 20131028 (tag:alpha)
 561
 562         * callpeak --call-summits improvement
 563
 564         The smoothing window length has been fixed as fragment length
 565         instead of short read length. The larger smoothing window will
 566         grant better smoothing results and better sub-peak summits
 567         detection.
 568
 569         * --outdir and --ofile options for almost all commands
 570
 571         Thank Björn Grüning for initially implementing these options!
 572         Now, MACS2 will save results into a specified
 573         directory by '--outdir' option, and/or save result into a
 574         specified file by '--ofile' option. Note, in case '--ofile' is
 575         available for a subcommand, '-o' now has been adjusted to be the
 576         same as '--ofile' instead of '--o-prefix'.
 577
 578         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 579         for each subcommand:
 580
 581         ** callpeak: --outdir
 582         ** diffpeak: Not implemented
 583         ** bdgpeakcall: --outdir and --ofile
 584         ** bdgbroadcall: --outdir and --ofile
 585         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 586         and the order of arguments for --ofile must be the same as for -m.
 587         ** bdgdiff: --outdir and --ofile
 588         ** filterdup: --outdir
 589         ** pileup: --outdir
 590         ** randsample: --outdir
 591         ** refinepeak: --outdir and --ofile
 592
 593
 594 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 595         MACS version 2.0.10 20130915 (tag:alpha)
 596
 597         * callpeak Added a new option --buffer-size
 598
 599         This option is to tweak a previously hidden parameter that
 600         controls the steps to increase array size for storing alignment
 601         information. While in some rare cases, the number of
 602         chromosomes/contigs/scaffolds is huge, the original default
 603         setting will cause a huge memory waste. In these cases, we
 604         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 605         although the decrease will slow process to read alignment files.
 606
 607         * an optimization to speed up pvalue-qvalue statistics
 608
 609         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 610         human TF library, and now it will take 10 minutes. It was due to a
 611         single line of code to get a value from a numpy array ...
 612
 613         * fixed logLR bugs.
 614
 615 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 616         MACS version 2.0.10 20130731 (tag:alpha)
 617
 618         * callpeak --call-summits
 619
 620         Fix bugs causing callpeak --call-summits option generating extra
 621         number of peaks and inconsistent peak boundaries comparing to
 622         default option. Thank Ben Levinson!
 623
 624         * bdgcmp output
 625
 626         Fix bugs causing bdgcmp output logLR all in positive values. Now
 627         'depletion' can be correctly represented as negative values.
 628
 629         * bdgdiff
 630
 631         Fix the behavior of bdgdiff module. Now it can take four
 632         bedGraph files, then use logLR as cutoff to call differential
 633         regions. Check command line of bdgdiff for detail.
 634
 635 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 636         MACS version 2.0.10 20130713 (tag:alpha)
 637
 638         * fix bugs while output broadPeak and gappedPeak.
 639
 640         Note. Those weak broad regions without any strong enrichment
 641         regions inside won't be saved in gappedPeak file.
 642
 643         * bdgcmp -T and -C are merged into -S and description is updated.
 644
 645         Now, you can use it to override SPMR values in your input for
 646         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 647         statistics will cause weird results ( in most cases, lower
 648         significancy), and won't be consistent with MACS2 callpeak
 649         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 650         sample size in MILLION according to 'callpeak --to-large' option.
 651
 652 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 653         MACS version 2.0.10 20130710 (tag:alpha)
 654
 655         * fix BED style output format of callpeak module:
 656
 657         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 658         the output. Old BED format file won't be saved.
 659
 660         2) with --broad: broadPeak (BED6+3) for broad region and
 661         gappedPeak (BED12+3) for chained enriched regions will be the
 662         output. Old BED format, narrowPeak format, summit file won't be
 663         saved.
 664
 665         * bdgcmp now can accept list of methods to calculate scores. So
 666         you can run it once to generate multiple types of scores. Thank
 667         Jon Urban for this suggestion!
 668
 669         * C codes are re-generated through Cython 0.19.1.
 670
 671 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 672         MACS version 2.0.10 20130520 (tag:alpha)
 673
 674         * broad peak calling modules are modified in order to report all
 675         relexed regions even there is no strong enrichment inside.
 676
 677 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 678         MACS version 2.0.10 20130501 (tag:alpha)
 679
 680         * Memory usage is decreased to about 1/4-1/5 of previous usage
 681         Now, the internal data structure and algorithm are both
 682         re-organized, so that intermediate data wouldn't be saved in
 683         memory. Intead they will be calculated on the fly. New MACS2 will
 684         spend longer time (1.5 to 2 times) however it will use less memory
 685         so can be more usable on small mem servers.
 686
 687         * --seed option is added to callpeak and randsample commands
 688         Thank Mathieu Gineste for this suggestion!
 689
 690 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 691         MACS version 2.0.10 20130306 (tag:alpha)
 692
 693         * diffpeak module New module to detect differential binding sites
 694         with more statistics.
 695
 696         * Introduced --refine-peaks
 697         Calculates reads balancing to refine peak summits
 698
 699         * Ouput file names prefix
 700         Correct encodePeak to narrowPeak, broadPeak to bed12.
 701
 702 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 703         MACS version 2.0.10 (tag:alpha not released)
 704
 705         * Introduced BAMPEParser
 706         Reads PE data directly, requires bedtools for now
 707
 708         * Introduced --call-summits
 709         Uses signal processing methods to call overlapping peaks
 710
 711         * Added --no-trackline
 712         By default, files have descriptive tracklines now
 713
 714         * new refinepeak command (experimental)
 715         This new function will use a similar method in SPP (wtd), to
 716         analyze raw tag distribution in peak region, then redefine the
 717         peak summit where plus and minus tags are evenly distributed
 718         around.
 719
 720         * Changes to output *
 721         cPeakDetect.pyx has full support for new print/write methods and
 722         --call-peaks, BAMPEParser, and use of paired-end data
 723
 724         * Parser optimization
 725
 726         cParser.pyx is rewritten to use io.BufferedReader to speed
 727         up. Speed is doubled.
 728
 729         Code is reorganized -- most of functions are inherited from
 730         GenericParser class.
 731
 732         * Use cross-correlation to calculate fragment size
 733
 734         First, all pairs will be used in prediction for fragment
 735         size. Previously, only no more than 1000 pairs are used. Second,
 736         cross-correlation is used to find the best phase difference
 737         between + and - tag pileups.
 738
 739         * Speed up p-value and q-value calculation
 740
 741         This part is ten times faster now. I am using a dictionary to
 742         cache p-value results from Poisson CDF function. A bit more memory
 743         will be used to increase speed. I hope this dictionary would not
 744         explode since the possible pairs of ChIP signal and control lambda
 745         are hugely redundant. Also, I rewrited part of q-value
 746         calculation.
 747
 748         * Speed up peak detection
 749
 750         This part is about hundred of times faster now.  Optimizations
 751         include using Numpy functions as much as possible, and making loop
 752         body as small as possible.
 753
 754         * Post-processing on differential calls
 755
 756         After macs2diff finds differential binding sites between two
 757         conditions, it will try to annotate the peak calls from one of two
 758         conditions, describe the changes ...
 759
 760         * Fragment size prediction in macs2diff
 761
 762         Now by default, macs2diff will try to use the average fragment
 763         size from both condition 1 and condition 2 for tag extension and
 764         peak calling. Previously, by default, it will use different sizes
 765         unless --nomodel is specified.
 766
 767         Technically, I separate model building processes out. So macs2diff
 768         will build fragment sizes for condition 1 and 2 in parallel (2
 769         processes maximum), then perform 4-way comparisons in parallel (4
 770         processes maximum).
 771
 772         * Diff score
 773
 774         Combine two p/qscore tracks together. At regions where condition 1
 775         is higher than condition 2, score would be positive, otherwise,
 776         negative.
 777
 778         * SAMParser and BAMParser
 779
 780         Bug fixed for paired-end sequencing data.
 781
 782         * BedGraph.pyx
 783
 784         Fixed a bug while calling peaks from BedGraph file. It previously
 785         mistakenly output same peaks multiple times at the end of
 786         chromosome.
 787
 788 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 789         MACS version 2.0.9 (tag:alpha)
 790
 791         * Auto fixation on predicted d is turned off by default!
 792
 793         Previous --off-auto is now default. MACS will not automatically
 794         fix d less than 2 times of tag size according to
 795         --shiftsize. While tag size is getting longer nowadays, it would
 796         be easier to have d less than 2 times of tag size, however d may
 797         still be meaningful and useful. Please judge it using your own
 798         wisdom.
 799
 800         * Scaling issue
 801
 802         Now, the default scaling while treatment and input are unbalanced
 803         has been adjusted. By default, larger sample will be scaled down
 804         linearly to match the smaller sample. In this way, background
 805         noise will be reduced more than real signals, so we expect to have
 806         more specific results than the other way around (i.e. --to-large
 807         is set).
 808
 809         Also, an alternative option to randomly sample larger data
 810         (--down-sample) is provided to replace default linear
 811         scaling. However, this option will cause results irresproducible,
 812         so be careful.
 813
 814         * randsample script
 815
 816         A new script 'randsample'  is added, which can randomly sample
 817         certain percentage or number of tags.
 818
 819         * Peak summit
 820
 821         Now, MACS will decide peak summits according to pileup height
 822         instead of qvalue scores. In this way, the summit may be more
 823         accurate.
 824
 825         * Diff score
 826
 827         MACS calculate qvalue scores as differential scores. When compare
 828         two conditions (saying A and B), the maximum qscore for comparing
 829         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 830         will be computed. If maxqscore_a2b is bigger, the diff score is
 831         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 832
 833 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 834         MACS version 2.0.8 (tag:alpha)
 835
 836         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 837
 838         New script bdgbroadcall and the extra option '--broad' for macs2
 839         script, can be used to call broad regions with a loose cutoff to
 840         link nearby significant regions. The output is represented as
 841         BED12 format.
 842
 843         * MACS2/IO/cScoreTrack.pyx
 844
 845         Fix q-value calculation to generate forcefully monotonic values.
 846
 847         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 848
 849         They are combined to one more powerful script called
 850         "filterdup". The script filterdup can filter duplicated reads
 851         according to sequencing depth and genome size. The script can also
 852         convert any format supported by MACS to BED format.
 853
 854 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 855         MACS version 2.0.7 (tag:alpha)
 856
 857         * bin/macsdiff renamed to bin/bdgdiff
 858
 859         Now this script will work as a low-level finetuning tool as bdgcmp
 860         and bdgpeakcall.
 861
 862         * bin/macs2diff
 863
 864         A new script to take treatment and control files from two
 865         condition, calculate fragment size, use local poisson to get
 866         pvalues and BH process to get qvalues, then combine 4-ways result
 867         to call differential sites.
 868
 869         This script can use upto 4 cpus to speed up 4-ways calculation. (
 870         I am trying multiprocessing in python. )
 871
 872         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 873         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 874         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 875
 876         All above files are modified for the new macs2diff script.
 877
 878         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 879
 880         Now q-value 0.01 is the default cutoff. If -p is specified,
 881         p-value cutoff will be used instead.
 882
 883 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 884         MACS version 2.0.6 (tag:alpha)
 885
 886         * bin/macsdiff
 887
 888         A script to call differential regions. A naive way is introduced
 889         to find the regions where:
 890
 891         1. signal from condition 1 is larger than input 1 and condition 2 --
 892         unique region in condition 1;
 893         2. signal from condition 2 is larger than input 2 and condition 1
 894         -- unique region in condition 2;
 895         3. signal from condition 1 is larger than input 1, signal from
 896         condition 2 is larger than input 2, however either signal from
 897         condition 1 or 2 is not larger than the other.
 898
 899         Here 'larger' means the pvalue or qvalue from a Poisson test is
 900         under certain cutoff.
 901
 902         (I will make another script to wrap up mulitple scripts for
 903         differential calling)
 904
 905 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 906         MACS version 2.0.5 (tag:alpha)
 907
 908         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 909         MACS2/IO/cPeakIO.pyx
 910
 911         Use hash to store peak information. Add back the feature to deal
 912         with data without control.
 913
 914         Fix bug which incorrectly allows small peaks at the end of
 915         chromosomes.
 916
 917         * bin/bdgpeakcall, bin/bdgcmp
 918
 919         Fix bugs. bdgpeakcall can output encodePeak format.
 920
 921 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
 922         MACS version 2.0.4 (tag:alpha)
 923
 924         * cPeakDetect.py
 925
 926         Fix a bug, correctly assign lambda_bg while --to-small is
 927         set. Thanks Junya Seo!
 928
 929         Add rank and num of bp columns to pvalue-qvalue table.
 930
 931         * cScoreTrack.py
 932
 933         Fix bugs to correctly deal with peakless chromosomes. Thanks
 934         Vaibhav Jain!
 935
 936         Use AFDR for independent tests instead.
 937
 938         * encodePeak
 939
 940         Now MACS can output peak coordinates together with pvalue, qvalue,
 941         summit positions in a single encodePeak format (designed for
 942         ENCODE project) file. This file can be loaded to UCSC
 943         browser. Definition of some specific columns are: 5th:
 944         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
 945         -log10qvalue, 10th: relative summit position to peak start.
 946
 947
 948 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 949         MACS version 2.0.3 (tag:alpha)
 950
 951         * Rich output with qvalue, fold enrichment, and pileup height
 952
 953         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
 954         procedure:
 955
 956         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
 957
 958         Now we have a similiar xls output file as before. The differences
 959         from previous file are:
 960
 961         1. Summit now is absolute summit, instead of relative summit
 962            position;
 963         2. 'Pileup' is previous 'tag' column. It's the extended fragment
 964            pileup at the peak summit;
 965         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
 966            5.00 means 1e-5, simple and less confusing.
 967         4. FDR column becomes '-log10(qvalue)' column.
 968         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
 969            the values at the peak summit.
 970
 971         * Extra output files
 972
 973         NAME_pqtable.txt contains pvalue and qvalue relationships.
 974
 975         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
 976         and -log10qvalue scores in BedGraph format. Nearby regions with
 977         the same value are not merged.
 978
 979         * Separation of FeatIO.py
 980
 981         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
 982         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
 983         implemented to store pileup, local lambda, pvalue, and qvalue
 984         alltogether in cScoreTrack.pyx.
 985
 986         * Experimental option --half-ext
 987
 988         Suggested by NPS algorithm, I added an experimental option
 989         --half-ext to let MACS only extends ChIP fragment around its
 990         middle point for only 1/2 d.
 991
 992 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
 993         MACS version 2.0.2 (tag:alpha)
 994
 995         * macs2
 996
 997         Add an error check to see if there is no common chromosome names
 998         from treatment file and control file
 999
1000         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1001
1002         Reduce memory usage by removing deepcopy() calls.
1003
1004         * Modify README documents and others.
1005
1006 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1007         MACS Version 2.0.1 (tag:alpha)
1008
1009         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1010
1011         Jie suggested me a brilliant simple method to pileup fragments
1012         into bedGraph track. It works extremely faster than the previous
1013         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1014         large local lambda calculation in MACSv2 now. Now I generate three
1015         bedGraphs for d-size local bias, slocal-size and llocal-size local
1016         bias, and calculate the maximum local bias as local lambda
1017         bedGraph track.
1018
1019         Minor: add_loc in bedGraphTrackI now can correctly merge the
1020         region with its preceding region if their value are the same.
1021
1022         * macs2
1023
1024         Add an option to shift control tags before extension. By default,
1025         control tags will be extended to both sides regardless of strand
1026         information.
1027
1028 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1029         MACS Version 2.0.0 (tag:alpha)
1030
1031         * Use bedGraph type to store data internally and externally.
1032
1033         We can have theoretically one-basepair resolution profiles. 10
1034         times smaller in filesize and even smaller after converting to
1035         bigWig for visualization.
1036
1037         * Peak calling process modified. Better peak boundary detection.
1038
1039         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1040         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1041         one will be averaged to d size) Then calculate the maximum value
1042         of these two tracks and a global background, to have a
1043         local-lambda bedGraph.
1044
1045         Use -10log10poisson_pvalue as scores to generate a score track
1046         before peak calling.
1047
1048         A general peak calling based on a score cutoff, min length of peak
1049         and max gap between nearby peaks.
1050
1051         * Option changes.
1052
1053         Wiggle file output is removed. Now we only support bedGraph
1054         output. The generation of bedGraph is highly recommended since it
1055         will not cost extra time. In other words, bedGraph generation is
1056         internally run even you don't want to save bedGraphs on disk, due
1057         to the peak calling algorithm in MACS v2.
1058
1059         * cProb.pyx
1060
1061         We now can calculate poisson pvalue in log space so that the score
1062         (-10*log10pvalue) will not have a upper limit of 3100 due to
1063         precision of float number.
1064
1065         * Cython is adopted to speed up Python code.
1066
1067 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1068         Small fixes
1069
1070         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1071
1072 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1073         Version 1.4.0rc2 (Valentine)
1074
1075         * --single-wig option is renamed to --single-profile
1076
1077         * BedGraph output with --bdg or -B option.
1078
1079         The BedGraph output provides 1bp resolution fragment pileup
1080         profile. File size is smaller than wig file. This option can be
1081         combined with --single-profile option to produce a bedgraph file
1082         for the whole genome. This option can also make --space,
1083         --call-subpeaks invalid.
1084
1085         * Fix the description of --shiftsize to correctly state that the
1086         value is 1/2 d (fragment size).
1087
1088         * Fix a bug in the call to __filter_w_control_tags when control is
1089         not available.
1090
1091         * Fix a bug on --to-small option. Now it works as expected.
1092
1093         * Fix a bug while counting the tags in candidate peak region, an
1094         extra tag may be included. (Thanks to Jake Biesinger!)
1095
1096         * Fix the bug for the peaks extended outside of chromosome
1097         start. If the minus strand tag goes outside of chromosome start
1098         after extension of d, it will be thrown out.
1099
1100         * Post-process script for a combined wig file:
1101
1102         The "wignorm" command can be called after a full run of MACS14 as
1103         a postprocess. wignorm can calculate the local background from the
1104         control wig file from MACS14, then use either foldchange,
1105         -10*log10(pvalue) from possion test, or difference after asinh
1106         transformation as the score to build a single wig track to
1107         represent the binding strength. This script will take a
1108         significant long time to process.
1109
1110         * --wigextend has been obsoleted.
1111
1112 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1113         Version 1.4.0rc1 (Starry Sky)
1114
1115         * Duplicate reads option
1116
1117         --keep-dup behavior is changed. Now user can specify how many
1118         reads he/she wants to keep at the same genomic location. 'auto' to
1119         let MACS decide the number based on binomial distribution, 'all'
1120         to let MACS keep all reads.
1121
1122         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1123
1124         By default, MACS will now scale the smaller dataset to the bigger
1125         dataset. For instance, if IP has 10 million reads, and Input has 5
1126         million, MACS will double the lambda value calculated from Input
1127         reads while calling BOTH the positive peaks and negative
1128         peaks. This will address the issue caused by unbalanced numbers of
1129         reads from IP and Input. If --to-small is turned on, MACS will
1130         scale the larger dataset to the smaller one. So from now on, if d
1131         is fixed, then the peaks from a MACS call for A vs B should be
1132         identical to the negative peaks from a B vs A.
1133
1134 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1135         Version 1.4.0beta (summer wishes)
1136
1137         * New features
1138
1139         ** Model building
1140
1141         The default behavior in the model building step is slightly
1142         changed. When MACS can't find enough pairs to build model
1143         (implemented in alpha version) or the modeled fragment length is
1144         less than 2 times of tag length (implemented in beta version),
1145         MACS will use 2 times of --shiftsize value as fragment length in
1146         the later analysis. --off-auto can turn off this default behavior.
1147
1148         ** Redundant tag filtering
1149
1150         The IO module is rewritten. The redundant tag filtering process
1151         becomes simpler and works as promise. The maximum allowed number
1152         of tags at the exact same location is calculated from the
1153         sequencing depth and genome size using a binomial distribution,
1154         for both TREAMENT and CONTROL separately. ( previously only
1155         TREATMENT is considered ) The exact same location means the same
1156         coordination and the same strand. Then MACS will only keep at most
1157         this number of tags at the exact same location in the following
1158         analysis. An option --keep-dup can let MACS skip the filtering and
1159         keep all the tags. However this may bring in a lot of sequencing
1160         bias, so you may get many false positive peaks.
1161
1162         ** Single wiggle mode
1163
1164         First thing to mention, this is not the score track that I
1165         described before. By default, MACS generates wiggle files for
1166         fragment pileup for every chromosomes separately. When you use
1167         --single-wig option, MACS will generate a single wiggle file for
1168         all the chromosomes so you will get a wig.gz for TREATMENT and
1169         another wig.gz for CONTROL if available.
1170
1171         ** Sniff -- automatic format detection
1172
1173         Now, by default or "-f AUTO", MACS will decide the input file
1174         format automatically. Technically, it will try to read at most
1175         1000 records for the first 10 non-comment lines. If it succeeds,
1176         the format is decided. I recommend not to use AUTO and specify the
1177         right format for your input files, unless you combine different
1178         formats in a single MACS run.
1179
1180         * Options changes
1181
1182         --single-wig and --keep-dup are added. Check previous section in
1183         ChangeLog for detail.
1184
1185         -f (--format) AUTO is now the default option.
1186
1187         --slocal default: 1000
1188         --llocal default: 10000
1189
1190         * Bug fixed
1191
1192         Setup script will stop the installation if python version is not
1193         python2.6 or python2.7.
1194
1195         Local lambda calculation has been changed back. MACS will check
1196         peak_region, slocal( default 1K) and llocal (default 10K) for the
1197         local bias. The previous 200bps default will cause MACS misses
1198         some peaks where the input bias is very sharp.
1199
1200         sam2bed.py script is corrected.
1201
1202         Relative pos in xls output is fixed.
1203
1204         Parser for ELAND_export is fixed to pass some of the no match
1205         lines. And elandexport2bed.py is fixed too. ( however I can't
1206         guarantee that it works on any eland_export files. )
1207
1208 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1209         Version 1.4.0alpha2 (be smarter)
1210
1211         * Options changes
1212
1213         --gsize now provides shortcuts for common genomes, including
1214         human, mouse, C. elegans and fruitfly.
1215
1216         --llocal now will be 5000 bps if there is no input file, so that
1217         local lambda doesn't overkill enriched binding sites.
1218
1219 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1220         Version 1.4alpha (be smarter)
1221
1222         * Options changes
1223
1224         --tsize option is redesigned. MACS will use the first 10 lines of
1225         the input to decide the tag size. If user specifies --tsize, it
1226         will override the auto decided tsize.
1227
1228         --lambdaset is replaced by --slocal and --llocal which mean the
1229         small local region and large local region.
1230
1231         --bw has no effect on the scan-window size now. It only affects the
1232         paired-peaks model process.
1233
1234         * Model building
1235
1236         During the model building, MACS will pick out the enriched regions
1237         which are not too high and not too low to build the paired-peak
1238         model. Default the region is from fold 10 to fold 30. If MACS
1239         fails to build the model, by default it will use the nomodel
1240         settings, like shiftsize=100bps, to shift and extend each
1241         tags. This behavior can be turned off by '--off-auto'.
1242
1243         * Output files
1244
1245         An extra file including all the summit positions are saved in
1246         *_summits.bed file. An option '--call-subpeaks' will invoke
1247         PeakSplitter developed by Mali Salmon to split wide peaks into
1248         smaller subpeaks.
1249
1250         * Sniff ( will in beta )
1251
1252         Automatically recognize the input file format, so use can combine
1253         different format in one MACS run.
1254
1255         Not implemented features/TODO:
1256
1257         * Algorithms ( in near future? )
1258
1259         MACS will try to refine the peak boundaries by calculating the
1260         scores for every point in the candidate peak regions. The score
1261         will be the -10*log(10,pvalue) on a local poisson distribution. A
1262         cutoff specified by users (--pvalue) will be applied to find the
1263         precise sub-peaks in the original candidate peak region. Peak
1264         boudaries and peak summits positions will be saved in separate BED
1265         files.
1266
1267         * Single wiggle track ( in near future? )
1268
1269         A single wiggle track will be generated to save the scores within
1270         candidate peak regions in the 10bps resolution. The wiggle file
1271         is in fixedStep format.
1272
1273
1274 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1275         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1276
1277         * bin/Constants.py
1278
1279         Fixed typo. FCSTEP -> FESTEP
1280
1281         * lib/PeakDetect.py
1282
1283         The 'femax' attribute bug is fixed
1284
1285 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1286         Version 1.3.7 (Oktoberfest)
1287
1288         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1289
1290         Enhancements by Peter Chines:
1291
1292         1. gzip files are supported.
1293         2. when --diag is on, user can set the increment and endpoint for
1294         fold enrichment analysis by setting --fe-step and --fe-max.
1295
1296         Enhancements by Davide Cittaro:
1297
1298         1. BAM and SAM formats are supported.
1299         2. small changes in the header lines of wiggle output.
1300
1301         Enhancements by Me:
1302         1. I added --fe-min option;
1303         2. Bowtie ascii output with suffix ".map" is supported.
1304
1305         Bug fixed:
1306
1307         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1308         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1309         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1310         4. Some "fold change" have been changed to "fold enrichment".
1311
1312 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1313         Version 1.3.6.1 (default parameter change)
1314
1315         * bin/macs, lib/PeakDetect.py
1316
1317         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1318         default. "--futurefdr" is added which can turn on the 'new' method
1319         introduced in 1.3.6. By default it's off.
1320
1321         * lib/PeakDetect.py
1322
1323         Fixed a bug. p-value is corrected a little bit.
1324
1325
1326 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1327         Version 1.3.6 (Birthday cake)
1328
1329         * bin/macs
1330
1331         "track name" is added to the header of BED output file.
1332
1333         Now the default peak detection method is to consider 5k and 10k
1334         nearby regions in treatment data and peak location, 1k, 5k, and
1335         10k regions in control data to calculate local bias. The old
1336         method can be called through '--old' option.
1337
1338         Information about how many total/unique tags in treatment or
1339         control will be saved in final .xls output.
1340
1341         * lib/IO/__init__.py
1342
1343         ".fa" will be removed from input tag alignment so only the
1344         chromosome names are kept.
1345
1346         WigTrackI class is added for Wiggle like data structure. (not used
1347         now)
1348
1349         The parser for ELAND multi PET files has been fixed. Now the 5'
1350         tag position for a pair will be kept, whereas in the previous
1351         version, the middle points are kept.
1352
1353         * lib/IO/BinKeeper.py
1354
1355         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1356         browser, which can quickly access certain region for values in a
1357         large wiggle like data file. (not used now)
1358
1359         * lib/OptValidator.py
1360
1361         typo fixed.
1362
1363         * lib/PeakDetect.py
1364
1365         Now the default peak detection method is to consider 5k and 10k
1366         nearby regions in treatment data and peak location, 1k, 5k, and
1367         10k regions in control data to calculate local bias. The old
1368         method can be called through '--old' option.
1369
1370         Two columns have beed added to BED output file. 4th column: peak
1371         name; 5th column: peak score using -10log(10,pvalue) as score.
1372
1373         * setup.py
1374
1375         Add support to build a Mac App through 'setup.py py2app', or a
1376         Windows executable through 'setup.py py2exe'. You need to install
1377         py2app or py2exe package in order to use these functions.
1378
1379 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1380         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1381
1382         * PeakDetect.py
1383
1384         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1385         in control data to calculate local lambda for each peak. Peak
1386         calling results will be slightly different with previous version,
1387         beware!
1388
1389         * OptValidator.py
1390
1391         Typo fixed, ELANDParser -> ELANDResultParser
1392
1393         * OutputWriter.py
1394
1395         Now, modeled d value will be shown on the model figure.
1396
1397 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1398         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1399
1400         * macs, IO/__init__.py, PeakDetect.py
1401
1402         Add support for ELAND multi format. Add support for Pair-End
1403         experiment, in this case, 5'end and 3'end ELAND multi format files
1404         are required for treatment or control data. See 00README file for
1405         detail.
1406
1407         Add wigextend option.
1408
1409         Add petdist option for Pair-End Tag experiment, which is the best
1410         distance between 5' and 3' tags.
1411
1412         * PeakDetect.py
1413
1414         Fixed a bug which cause the end positions of every peak region
1415         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1416
1417         * OutputWriter.py
1418
1419         Fix bugs while generating wiggle files. The start position of
1420         wiggle file is set to 1 instead of 0.
1421
1422         Fix a bug that every 10M bps, signals in the first 'd' range are
1423         lower than actual. ( Thanks Mali Salmon!)
1424
1425
1426 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1427         Version 1.3.3 (wiggle bugs fixed)
1428
1429         * OutputWriter.py
1430
1431         Fix bugs while generating wiggle files. 1. 'span=' is added to
1432         'variableStep' line; 2. previously, every 10M bps, the coordinates
1433         were wrongly shifted to the right for 'd' basepairs.
1434
1435         * macs, PeakDetect.py
1436
1437         Add an option to save wiggle files on different resolution.
1438
1439 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1440         Version 1.3.2 (tiny bugs fixed)
1441
1442         * IO/__init__.py
1443
1444         Fix 65536 -> 65535. ( Thank Joon)
1445
1446         * Prob.py
1447
1448         Improved for binomial function with extra large number. Imported
1449         from Cistrome project.
1450
1451         * PeakDetect.py
1452
1453         If treatment channel misses reads in some chromosome included in
1454         control channel, or vice versa, MACS will not exit. (Thank Shaun
1455         Mahony)
1456
1457         Instead, MACS will fake a tag at position -1 when calling
1458         treatment peaks vs control, but will ignore the chromosome while
1459         calling negative peaks.
1460
1461 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1462         Version 1.3.1 (tiny bugs fixed version)
1463
1464         * Prob.py
1465
1466         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1467         binomial functions can tolerate large and small numbers.
1468
1469         * IO/__init__.py
1470
1471         Parsers now split lines in BED/ELAND file using any
1472         whitespaces. 'track' or 'browser' lines will be regarded as
1473         comment lines. A bug fixed when throwing StrandFormatError. The
1474         maximum redundant tag number at a single position can be no less
1475         than 65536.
1476
1477
1478 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1479         Version 1.3 (naming clarification version)
1480
1481         * Naming clarification changes according to our manuscript:
1482
1483         'frag_len' is changed to 'd'.
1484
1485         'fold_change' is changed to 'fold_enrichment'.
1486
1487         Suggest '--bw' parameter to be determined by users from the real
1488         sonication size.
1489
1490         Maximum FDR is 100% in the output file.
1491
1492         And other clarifications in 00README file and the documents on the
1493         website.
1494
1495         * IO/__init__.py
1496         If the redundant tag number at a single position is over 32767,
1497         just remember 32767, instead of raising an overflow exception.
1498
1499         * setup.py
1500         fixed a typo.
1501
1502         * PeakDetect.py
1503         Bug fixed for diagnosis report.
1504
1505
1506 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1507         Version 1.2.2gamma
1508
1509         * Serious bugs fix:
1510
1511         Poisson distribution CDF and inverse CDF functions are
1512         corrected. They can produce right results even for huge lambda
1513         now. So that the p-value and FDR values in the final excel sheet
1514         are corrected.
1515
1516         IO package now can tolerate some rare cases; ELANDParser in IO
1517         package is fixed. (Thank Bogdan)
1518
1519         * Improvement:
1520
1521         Reverse paired peaks in model are rejected. So there will be no
1522         negative 'frag_len'. (Thank Bogdan)
1523
1524         * Features added:
1525
1526         Diagnosis function is completed. Which can output a table file for
1527         users to estimate their sequencing depth.
1528
1529
1530 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1531         Version 1.2
1532
1533         * Probe.py is added!
1534
1535         GSL is totally removed from MACS. Instead, I have implemented the
1536         CDF and inverse CDF for poisson and binomial distribution purely
1537         in python.
1538
1539         * Constants.py is added!
1540
1541         Organize constants used in MACS in the Constants.py file.
1542
1543         * All other files are modified!
1544
1545         Foldchange calculation is modified. Now the foldchange only be
1546         calculated at the peak summit position instead of the whole peak
1547         region. The values will be higher and more robust than before.
1548
1549         Features added:
1550
1551         1. MACS can save wiggle format files containing the tag number at
1552         every 10 bp along the genome. Tags are shifted according to our
1553         model before they are calculated.
1554
1555         2. Model building and local lambda calculation can be skipped with
1556         certain options.
1557
1558         3. A diagnosis report can be generated through '--diag'
1559         option. This report can help you get an assumption about the
1560         sequencing saturation. This funtion is only in beta stage.
1561
1562         4. FDR calculation speed is highly improved.
1563
1564 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1565         Version 1.1
1566
1567         * TabIO, PeakModel.py ...
1568         Bug fixed to let MACS tolerate some cases while there is no tag on
1569         either plus strand or minus strand.
1570
1571         * setup.py
1572         Check the version of python. If the version is lower than 2.4,
1573         refuse to install with warning.
1574
1575
1576 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1577         MACS version 2.0.10 20130731 (tag:alpha)
1578
1579         * callpeak --call-summits
1580
1581         Fix bugs causing callpeak --call-summits option generating extra
1582         number of peaks and inconsistent peak boundaries comparing to
1583         default option. Thank Ben Levinson!
1584
1585         * bdgcmp output
1586
1587         Fix bugs causing bdgcmp output logLR all in positive values. Now
1588         'depletion' can be correctly represented as negative values.
1589
1590         * bdgdiff
1591
1592         Fix the behavior of bdgdiff module. Now it can take four
1593         bedGraph files, then use logLR as cutoff to call differential
1594         regions. Check command line of bdgdiff for detail.
1595
1596 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1597         MACS version 2.0.10 20130713 (tag:alpha)
1598
1599         * fix bugs while output broadPeak and gappedPeak.
1600
1601         Note. Those weak broad regions without any strong enrichment
1602         regions inside won't be saved in gappedPeak file.
1603
1604         * bdgcmp -T and -C are merged into -S and description is updated.
1605
1606         Now, you can use it to override SPMR values in your input for
1607         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1608         statistics will cause weird results ( in most cases, lower
1609         significancy), and won't be consistent with MACS2 callpeak
1610         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1611         sample size in MILLION according to 'callpeak --to-large' option.
1612
1613 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1614         MACS version 2.0.10 20130710 (tag:alpha)
1615
1616         * fix BED style output format of callpeak module:
1617
1618         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1619         the output. Old BED format file won't be saved.
1620
1621         2) with --broad: broadPeak (BED6+3) for broad region and
1622         gappedPeak (BED12+3) for chained enriched regions will be the
1623         output. Old BED format, narrowPeak format, summit file won't be
1624         saved.
1625
1626         * bdgcmp now can accept list of methods to calculate scores. So
1627         you can run it once to generate multiple types of scores. Thank
1628         Jon Urban for this suggestion!
1629
1630         * C codes are re-generated through Cython 0.19.1.
1631
1632 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1633         MACS version 2.0.10 20130520 (tag:alpha)
1634
1635         * broad peak calling modules are modified in order to report all
1636         relexed regions even there is no strong enrichment inside.
1637
1638 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1639         MACS version 2.0.10 20130501 (tag:alpha)
1640
1641         * Memory usage is decreased to about 1/4-1/5 of previous usage
1642         Now, the internal data structure and algorithm are both
1643         re-organized, so that intermediate data wouldn't be saved in
1644         memory. Intead they will be calculated on the fly. New MACS2 will
1645         spend longer time (1.5 to 2 times) however it will use less memory
1646         so can be more usable on small mem servers.
1647
1648         * --seed option is added to callpeak and randsample commands
1649         Thank Mathieu Gineste for this suggestion!
1650
1651 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1652         MACS version 2.0.10 20130306 (tag:alpha)
1653
1654         * diffpeak module New module to detect differential binding sites
1655         with more statistics.
1656
1657         * Introduced --refine-peaks
1658         Calculates reads balancing to refine peak summits
1659
1660         * Ouput file names prefix
1661         Correct encodePeak to narrowPeak, broadPeak to bed12.
1662
1663 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1664         MACS version 2.0.10 (tag:alpha not released)
1665
1666         * Introduced BAMPEParser
1667         Reads PE data directly, requires bedtools for now
1668
1669         * Introduced --call-summits
1670         Uses signal processing methods to call overlapping peaks
1671
1672         * Added --no-trackline
1673         By default, files have descriptive tracklines now
1674
1675         * new refinepeak command (experimental)
1676         This new function will use a similar method in SPP (wtd), to
1677         analyze raw tag distribution in peak region, then redefine the
1678         peak summit where plus and minus tags are evenly distributed
1679         around.
1680
1681         * Changes to output *
1682         cPeakDetect.pyx has full support for new print/write methods and
1683         --call-peaks, BAMPEParser, and use of paired-end data
1684
1685         * Parser optimization
1686
1687         cParser.pyx is rewritten to use io.BufferedReader to speed
1688         up. Speed is doubled.
1689
1690         Code is reorganized -- most of functions are inherited from
1691         GenericParser class.
1692
1693         * Use cross-correlation to calculate fragment size
1694
1695         First, all pairs will be used in prediction for fragment
1696         size. Previously, only no more than 1000 pairs are used. Second,
1697         cross-correlation is used to find the best phase difference
1698         between + and - tag pileups.
1699
1700         * Speed up p-value and q-value calculation
1701
1702         This part is ten times faster now. I am using a dictionary to
1703         cache p-value results from Poisson CDF function. A bit more memory
1704         will be used to increase speed. I hope this dictionary would not
1705         explode since the possible pairs of ChIP signal and control lambda
1706         are hugely redundant. Also, I rewrited part of q-value
1707         calculation.
1708
1709         * Speed up peak detection
1710
1711         This part is about hundred of times faster now.  Optimizations
1712         include using Numpy functions as much as possible, and making loop
1713         body as small as possible.
1714
1715         * Post-processing on differential calls
1716
1717         After macs2diff finds differential binding sites between two
1718         conditions, it will try to annotate the peak calls from one of two
1719         conditions, describe the changes ...
1720
1721         * Fragment size prediction in macs2diff
1722
1723         Now by default, macs2diff will try to use the average fragment
1724         size from both condition 1 and condition 2 for tag extension and
1725         peak calling. Previously, by default, it will use different sizes
1726         unless --nomodel is specified.
1727
1728         Technically, I separate model building processes out. So macs2diff
1729         will build fragment sizes for condition 1 and 2 in parallel (2
1730         processes maximum), then perform 4-way comparisons in parallel (4
1731         processes maximum).
1732
1733         * Diff score
1734
1735         Combine two p/qscore tracks together. At regions where condition 1
1736         is higher than condition 2, score would be positive, otherwise,
1737         negative.
1738
1739         * SAMParser and BAMParser
1740
1741         Bug fixed for paired-end sequencing data.
1742
1743         * BedGraph.pyx
1744
1745         Fixed a bug while calling peaks from BedGraph file. It previously
1746         mistakenly output same peaks multiple times at the end of
1747         chromosome.
1748
1749 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1750         MACS version 2.0.9 (tag:alpha)
1751
1752         * Auto fixation on predicted d is turned off by default!
1753
1754         Previous --off-auto is now default. MACS will not automatically
1755         fix d less than 2 times of tag size according to
1756         --shiftsize. While tag size is getting longer nowadays, it would
1757         be easier to have d less than 2 times of tag size, however d may
1758         still be meaningful and useful. Please judge it using your own
1759         wisdom.
1760
1761         * Scaling issue
1762
1763         Now, the default scaling while treatment and input are unbalanced
1764         has been adjusted. By default, larger sample will be scaled down
1765         linearly to match the smaller sample. In this way, background
1766         noise will be reduced more than real signals, so we expect to have
1767         more specific results than the other way around (i.e. --to-large
1768         is set).
1769
1770         Also, an alternative option to randomly sample larger data
1771         (--down-sample) is provided to replace default linear
1772         scaling. However, this option will cause results irresproducible,
1773         so be careful.
1774
1775         * randsample script
1776
1777         A new script 'randsample'  is added, which can randomly sample
1778         certain percentage or number of tags.
1779
1780         * Peak summit
1781
1782         Now, MACS will decide peak summits according to pileup height
1783         instead of qvalue scores. In this way, the summit may be more
1784         accurate.
1785
1786         * Diff score
1787
1788         MACS calculate qvalue scores as differential scores. When compare
1789         two conditions (saying A and B), the maximum qscore for comparing
1790         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1791         will be computed. If maxqscore_a2b is bigger, the diff score is
1792         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1793
1794 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1795         MACS version 2.0.8 (tag:alpha)
1796
1797         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1798
1799         New script bdgbroadcall and the extra option '--broad' for macs2
1800         script, can be used to call broad regions with a loose cutoff to
1801         link nearby significant regions. The output is represented as
1802         BED12 format.
1803
1804         * MACS2/IO/cScoreTrack.pyx
1805
1806         Fix q-value calculation to generate forcefully monotonic values.
1807
1808         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1809
1810         They are combined to one more powerful script called
1811         "filterdup". The script filterdup can filter duplicated reads
1812         according to sequencing depth and genome size. The script can also
1813         convert any format supported by MACS to BED format.
1814
1815 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1816         MACS version 2.0.7 (tag:alpha)
1817
1818         * bin/macsdiff renamed to bin/bdgdiff
1819
1820         Now this script will work as a low-level finetuning tool as bdgcmp
1821         and bdgpeakcall.
1822
1823         * bin/macs2diff
1824
1825         A new script to take treatment and control files from two
1826         condition, calculate fragment size, use local poisson to get
1827         pvalues and BH process to get qvalues, then combine 4-ways result
1828         to call differential sites.
1829
1830         This script can use upto 4 cpus to speed up 4-ways calculation. (
1831         I am trying multiprocessing in python. )
1832
1833         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1834         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1835         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1836
1837         All above files are modified for the new macs2diff script.
1838
1839         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1840
1841         Now q-value 0.01 is the default cutoff. If -p is specified,
1842         p-value cutoff will be used instead.
1843
1844 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1845         MACS version 2.0.6 (tag:alpha)
1846
1847         * bin/macsdiff
1848
1849         A script to call differential regions. A naive way is introduced
1850         to find the regions where:
1851
1852         1. signal from condition 1 is larger than input 1 and condition 2 --
1853         unique region in condition 1;
1854         2. signal from condition 2 is larger than input 2 and condition 1
1855         -- unique region in condition 2;
1856         3. signal from condition 1 is larger than input 1, signal from
1857         condition 2 is larger than input 2, however either signal from
1858         condition 1 or 2 is not larger than the other.
1859
1860         Here 'larger' means the pvalue or qvalue from a Poisson test is
1861         under certain cutoff.
1862
1863         (I will make another script to wrap up mulitple scripts for
1864         differential calling)
1865
1866 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1867         MACS version 2.0.5 (tag:alpha)
1868
1869         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1870         MACS2/IO/cPeakIO.pyx
1871
1872         Use hash to store peak information. Add back the feature to deal
1873         with data without control.
1874
1875         Fix bug which incorrectly allows small peaks at the end of
1876         chromosomes.
1877
1878         * bin/bdgpeakcall, bin/bdgcmp
1879
1880         Fix bugs. bdgpeakcall can output encodePeak format.
1881
1882 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1883         MACS version 2.0.4 (tag:alpha)
1884
1885         * cPeakDetect.py
1886
1887         Fix a bug, correctly assign lambda_bg while --to-small is
1888         set. Thanks Junya Seo!
1889
1890         Add rank and num of bp columns to pvalue-qvalue table.
1891
1892         * cScoreTrack.py
1893
1894         Fix bugs to correctly deal with peakless chromosomes. Thanks
1895         Vaibhav Jain!
1896
1897         Use AFDR for independent tests instead.
1898
1899         * encodePeak
1900
1901         Now MACS can output peak coordinates together with pvalue, qvalue,
1902         summit positions in a single encodePeak format (designed for
1903         ENCODE project) file. This file can be loaded to UCSC
1904         browser. Definition of some specific columns are: 5th:
1905         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1906         -log10qvalue, 10th: relative summit position to peak start.
1907
1908
1909 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1910         MACS version 2.0.3 (tag:alpha)
1911
1912         * Rich output with qvalue, fold enrichment, and pileup height
1913
1914         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1915         procedure:
1916
1917         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1918
1919         Now we have a similiar xls output file as before. The differences
1920         from previous file are:
1921
1922         1. Summit now is absolute summit, instead of relative summit
1923            position;
1924         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1925            pileup at the peak summit;
1926         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1927            5.00 means 1e-5, simple and less confusing.
1928         4. FDR column becomes '-log10(qvalue)' column.
1929         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1930            the values at the peak summit.
1931
1932         * Extra output files
1933
1934         NAME_pqtable.txt contains pvalue and qvalue relationships.
1935
1936         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1937         and -log10qvalue scores in BedGraph format. Nearby regions with
1938         the same value are not merged.
1939
1940         * Separation of FeatIO.py
1941
1942         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1943         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1944         implemented to store pileup, local lambda, pvalue, and qvalue
1945         alltogether in cScoreTrack.pyx.
1946
1947         * Experimental option --half-ext
1948
1949         Suggested by NPS algorithm, I added an experimental option
1950         --half-ext to let MACS only extends ChIP fragment around its
1951         middle point for only 1/2 d.
1952
1953 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1954         MACS version 2.0.2 (tag:alpha)
1955
1956         * macs2
1957
1958         Add an error check to see if there is no common chromosome names
1959         from treatment file and control file
1960
1961         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1962
1963         Reduce memory usage by removing deepcopy() calls.
1964
1965         * Modify README documents and others.
1966
1967 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1968         MACS Version 2.0.1 (tag:alpha)
1969
1970         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1971
1972         Jie suggested me a brilliant simple method to pileup fragments
1973         into bedGraph track. It works extremely faster than the previous
1974         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1975         large local lambda calculation in MACSv2 now. Now I generate three
1976         bedGraphs for d-size local bias, slocal-size and llocal-size local
1977         bias, and calculate the maximum local bias as local lambda
1978         bedGraph track.
1979
1980         Minor: add_loc in bedGraphTrackI now can correctly merge the
1981         region with its preceding region if their value are the same.
1982
1983         * macs2
1984
1985         Add an option to shift control tags before extension. By default,
1986         control tags will be extended to both sides regardless of strand
1987         information.
1988
1989 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1990         MACS Version 2.0.0 (tag:alpha)
1991
1992         * Use bedGraph type to store data internally and externally.
1993
1994         We can have theoretically one-basepair resolution profiles. 10
1995         times smaller in filesize and even smaller after converting to
1996         bigWig for visualization.
1997
1998         * Peak calling process modified. Better peak boundary detection.
1999
2000         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2001         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2002         one will be averaged to d size) Then calculate the maximum value
2003         of these two tracks and a global background, to have a
2004         local-lambda bedGraph.
2005
2006         Use -10log10poisson_pvalue as scores to generate a score track
2007         before peak calling.
2008
2009         A general peak calling based on a score cutoff, min length of peak
2010         and max gap between nearby peaks.
2011
2012         * Option changes.
2013
2014         Wiggle file output is removed. Now we only support bedGraph
2015         output. The generation of bedGraph is highly recommended since it
2016         will not cost extra time. In other words, bedGraph generation is
2017         internally run even you don't want to save bedGraphs on disk, due
2018         to the peak calling algorithm in MACS v2.
2019
2020         * cProb.pyx
2021
2022         We now can calculate poisson pvalue in log space so that the score
2023         (-10*log10pvalue) will not have a upper limit of 3100 due to
2024         precision of float number.
2025
2026         * Cython is adopted to speed up Python code.
2027
2028 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2029         Small fixes
2030
2031         * Replaced with a newest WigTrackI class and fixed the wignorm script.
2032
2033 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2034         Version 1.4.0rc2 (Valentine)
2035
2036         * --single-wig option is renamed to --single-profile
2037
2038         * BedGraph output with --bdg or -B option.
2039
2040         The BedGraph output provides 1bp resolution fragment pileup
2041         profile. File size is smaller than wig file. This option can be
2042         combined with --single-profile option to produce a bedgraph file
2043         for the whole genome. This option can also make --space,
2044         --call-subpeaks invalid.
2045
2046         * Fix the description of --shiftsize to correctly state that the
2047         value is 1/2 d (fragment size).
2048
2049         * Fix a bug in the call to __filter_w_control_tags when control is
2050         not available.
2051
2052         * Fix a bug on --to-small option. Now it works as expected.
2053
2054         * Fix a bug while counting the tags in candidate peak region, an
2055         extra tag may be included. (Thanks to Jake Biesinger!)
2056
2057         * Fix the bug for the peaks extended outside of chromosome
2058         start. If the minus strand tag goes outside of chromosome start
2059         after extension of d, it will be thrown out.
2060
2061         * Post-process script for a combined wig file:
2062
2063         The "wignorm" command can be called after a full run of MACS14 as
2064         a postprocess. wignorm can calculate the local background from the
2065         control wig file from MACS14, then use either foldchange,
2066         -10*log10(pvalue) from possion test, or difference after asinh
2067         transformation as the score to build a single wig track to
2068         represent the binding strength. This script will take a
2069         significant long time to process.
2070
2071         * --wigextend has been obsoleted.
2072
2073 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2074         Version 1.4.0rc1 (Starry Sky)
2075
2076         * Duplicate reads option
2077
2078         --keep-dup behavior is changed. Now user can specify how many
2079         reads he/she wants to keep at the same genomic location. 'auto' to
2080         let MACS decide the number based on binomial distribution, 'all'
2081         to let MACS keep all reads.
2082
2083         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2084
2085         By default, MACS will now scale the smaller dataset to the bigger
2086         dataset. For instance, if IP has 10 million reads, and Input has 5
2087         million, MACS will double the lambda value calculated from Input
2088         reads while calling BOTH the positive peaks and negative
2089         peaks. This will address the issue caused by unbalanced numbers of
2090         reads from IP and Input. If --to-small is turned on, MACS will
2091         scale the larger dataset to the smaller one. So from now on, if d
2092         is fixed, then the peaks from a MACS call for A vs B should be
2093         identical to the negative peaks from a B vs A.
2094
2095 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
2096         Version 1.4.0beta (summer wishes)
2097
2098         * New features
2099
2100         ** Model building
2101
2102         The default behavior in the model building step is slightly
2103         changed. When MACS can't find enough pairs to build model
2104         (implemented in alpha version) or the modeled fragment length is
2105         less than 2 times of tag length (implemented in beta version),
2106         MACS will use 2 times of --shiftsize value as fragment length in
2107         the later analysis. --off-auto can turn off this default behavior.
2108
2109         ** Redundant tag filtering
2110
2111         The IO module is rewritten. The redundant tag filtering process
2112         becomes simpler and works as promise. The maximum allowed number
2113         of tags at the exact same location is calculated from the
2114         sequencing depth and genome size using a binomial distribution,
2115         for both TREAMENT and CONTROL separately. ( previously only
2116         TREATMENT is considered ) The exact same location means the same
2117         coordination and the same strand. Then MACS will only keep at most
2118         this number of tags at the exact same location in the following
2119         analysis. An option --keep-dup can let MACS skip the filtering and
2120         keep all the tags. However this may bring in a lot of sequencing
2121         bias, so you may get many false positive peaks.
2122
2123         ** Single wiggle mode
2124
2125         First thing to mention, this is not the score track that I
2126         described before. By default, MACS generates wiggle files for
2127         fragment pileup for every chromosomes separately. When you use
2128         --single-wig option, MACS will generate a single wiggle file for
2129         all the chromosomes so you will get a wig.gz for TREATMENT and
2130         another wig.gz for CONTROL if available.
2131
2132         ** Sniff -- automatic format detection
2133
2134         Now, by default or "-f AUTO", MACS will decide the input file
2135         format automatically. Technically, it will try to read at most
2136         1000 records for the first 10 non-comment lines. If it succeeds,
2137         the format is decided. I recommend not to use AUTO and specify the
2138         right format for your input files, unless you combine different
2139         formats in a single MACS run.
2140
2141         * Options changes
2142
2143         --single-wig and --keep-dup are added. Check previous section in
2144         ChangeLog for detail.
2145
2146         -f (--format) AUTO is now the default option.
2147
2148         --slocal default: 1000
2149         --llocal default: 10000
2150
2151         * Bug fixed
2152
2153         Setup script will stop the installation if python version is not
2154         python2.6 or python2.7.
2155
2156         Local lambda calculation has been changed back. MACS will check
2157         peak_region, slocal( default 1K) and llocal (default 10K) for the
2158         local bias. The previous 200bps default will cause MACS misses
2159         some peaks where the input bias is very sharp.
2160
2161         sam2bed.py script is corrected.
2162
2163         Relative pos in xls output is fixed.
2164
2165         Parser for ELAND_export is fixed to pass some of the no match
2166         lines. And elandexport2bed.py is fixed too. ( however I can't
2167         guarantee that it works on any eland_export files. )
2168
2169 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2170         Version 1.4.0alpha2 (be smarter)
2171
2172         * Options changes
2173
2174         --gsize now provides shortcuts for common genomes, including
2175         human, mouse, C. elegans and fruitfly.
2176
2177         --llocal now will be 5000 bps if there is no input file, so that
2178         local lambda doesn't overkill enriched binding sites.
2179
2180 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2181         Version 1.4alpha (be smarter)
2182
2183         * Options changes
2184
2185         --tsize option is redesigned. MACS will use the first 10 lines of
2186         the input to decide the tag size. If user specifies --tsize, it
2187         will override the auto decided tsize.
2188
2189         --lambdaset is replaced by --slocal and --llocal which mean the
2190         small local region and large local region.
2191
2192         --bw has no effect on the scan-window size now. It only affects the
2193         paired-peaks model process.
2194
2195         * Model building
2196
2197         During the model building, MACS will pick out the enriched regions
2198         which are not too high and not too low to build the paired-peak
2199         model. Default the region is from fold 10 to fold 30. If MACS
2200         fails to build the model, by default it will use the nomodel
2201         settings, like shiftsize=100bps, to shift and extend each
2202         tags. This behavior can be turned off by '--off-auto'.
2203
2204         * Output files
2205
2206         An extra file including all the summit positions are saved in
2207         *_summits.bed file. An option '--call-subpeaks' will invoke
2208         PeakSplitter developed by Mali Salmon to split wide peaks into
2209         smaller subpeaks.
2210
2211         * Sniff ( will in beta )
2212
2213         Automatically recognize the input file format, so use can combine
2214         different format in one MACS run.
2215
2216         Not implemented features/TODO:
2217
2218         * Algorithms ( in near future? )
2219
2220         MACS will try to refine the peak boundaries by calculating the
2221         scores for every point in the candidate peak regions. The score
2222         will be the -10*log(10,pvalue) on a local poisson distribution. A
2223         cutoff specified by users (--pvalue) will be applied to find the
2224         precise sub-peaks in the original candidate peak region. Peak
2225         boudaries and peak summits positions will be saved in separate BED
2226         files.
2227
2228         * Single wiggle track ( in near future? )
2229
2230         A single wiggle track will be generated to save the scores within
2231         candidate peak regions in the 10bps resolution. The wiggle file
2232         is in fixedStep format.
2233
2234
2235 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2236         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2237
2238         * bin/Constants.py
2239
2240         Fixed typo. FCSTEP -> FESTEP
2241
2242         * lib/PeakDetect.py
2243
2244         The 'femax' attribute bug is fixed
2245
2246 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2247         Version 1.3.7 (Oktoberfest)
2248
2249         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2250
2251         Enhancements by Peter Chines:
2252
2253         1. gzip files are supported.
2254         2. when --diag is on, user can set the increment and endpoint for
2255         fold enrichment analysis by setting --fe-step and --fe-max.
2256
2257         Enhancements by Davide Cittaro:
2258
2259         1. BAM and SAM formats are supported.
2260         2. small changes in the header lines of wiggle output.
2261
2262         Enhancements by Me:
2263         1. I added --fe-min option;
2264         2. Bowtie ascii output with suffix ".map" is supported.
2265
2266         Bug fixed:
2267
2268         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2269         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2270         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2271         4. Some "fold change" have been changed to "fold enrichment".
2272
2273 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2274         Version 1.3.6.1 (default parameter change)
2275
2276         * bin/macs, lib/PeakDetect.py
2277
2278         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2279         default. "--futurefdr" is added which can turn on the 'new' method
2280         introduced in 1.3.6. By default it's off.
2281
2282         * lib/PeakDetect.py
2283
2284         Fixed a bug. p-value is corrected a little bit.
2285
2286
2287 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2288         Version 1.3.6 (Birthday cake)
2289
2290         * bin/macs
2291
2292         "track name" is added to the header of BED output file.
2293
2294         Now the default peak detection method is to consider 5k and 10k
2295         nearby regions in treatment data and peak location, 1k, 5k, and
2296         10k regions in control data to calculate local bias. The old
2297         method can be called through '--old' option.
2298
2299         Information about how many total/unique tags in treatment or
2300         control will be saved in final .xls output.
2301
2302         * lib/IO/__init__.py
2303
2304         ".fa" will be removed from input tag alignment so only the
2305         chromosome names are kept.
2306
2307         WigTrackI class is added for Wiggle like data structure. (not used
2308         now)
2309
2310         The parser for ELAND multi PET files has been fixed. Now the 5'
2311         tag position for a pair will be kept, whereas in the previous
2312         version, the middle points are kept.
2313
2314         * lib/IO/BinKeeper.py
2315
2316         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2317         browser, which can quickly access certain region for values in a
2318         large wiggle like data file. (not used now)
2319
2320         * lib/OptValidator.py
2321
2322         typo fixed.
2323
2324         * lib/PeakDetect.py
2325
2326         Now the default peak detection method is to consider 5k and 10k
2327         nearby regions in treatment data and peak location, 1k, 5k, and
2328         10k regions in control data to calculate local bias. The old
2329         method can be called through '--old' option.
2330
2331         Two columns have beed added to BED output file. 4th column: peak
2332         name; 5th column: peak score using -10log(10,pvalue) as score.
2333
2334         * setup.py
2335
2336         Add support to build a Mac App through 'setup.py py2app', or a
2337         Windows executable through 'setup.py py2exe'. You need to install
2338         py2app or py2exe package in order to use these functions.
2339
2340 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2341         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2342
2343         * PeakDetect.py
2344
2345         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2346         in control data to calculate local lambda for each peak. Peak
2347         calling results will be slightly different with previous version,
2348         beware!
2349
2350         * OptValidator.py
2351
2352         Typo fixed, ELANDParser -> ELANDResultParser
2353
2354         * OutputWriter.py
2355
2356         Now, modeled d value will be shown on the model figure.
2357
2358 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2359         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2360
2361         * macs, IO/__init__.py, PeakDetect.py
2362
2363         Add support for ELAND multi format. Add support for Pair-End
2364         experiment, in this case, 5'end and 3'end ELAND multi format files
2365         are required for treatment or control data. See 00README file for
2366         detail.
2367
2368         Add wigextend option.
2369
2370         Add petdist option for Pair-End Tag experiment, which is the best
2371         distance between 5' and 3' tags.
2372
2373         * PeakDetect.py
2374
2375         Fixed a bug which cause the end positions of every peak region
2376         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2377
2378         * OutputWriter.py
2379
2380         Fix bugs while generating wiggle files. The start position of
2381         wiggle file is set to 1 instead of 0.
2382
2383         Fix a bug that every 10M bps, signals in the first 'd' range are
2384         lower than actual. ( Thanks Mali Salmon!)
2385
2386
2387 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2388         Version 1.3.3 (wiggle bugs fixed)
2389
2390         * OutputWriter.py
2391
2392         Fix bugs while generating wiggle files. 1. 'span=' is added to
2393         'variableStep' line; 2. previously, every 10M bps, the coordinates
2394         were wrongly shifted to the right for 'd' basepairs.
2395
2396         * macs, PeakDetect.py
2397
2398         Add an option to save wiggle files on different resolution.
2399
2400 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2401         Version 1.3.2 (tiny bugs fixed)
2402
2403         * IO/__init__.py
2404
2405         Fix 65536 -> 65535. ( Thank Joon)
2406
2407         * Prob.py
2408
2409         Improved for binomial function with extra large number. Imported
2410         from Cistrome project.
2411
2412         * PeakDetect.py
2413
2414         If treatment channel misses reads in some chromosome included in
2415         control channel, or vice versa, MACS will not exit. (Thank Shaun
2416         Mahony)
2417
2418         Instead, MACS will fake a tag at position -1 when calling
2419         treatment peaks vs control, but will ignore the chromosome while
2420         calling negative peaks.
2421
2422 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2423         Version 1.3.1 (tiny bugs fixed version)
2424
2425         * Prob.py
2426
2427         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2428         binomial functions can tolerate large and small numbers.
2429
2430         * IO/__init__.py
2431
2432         Parsers now split lines in BED/ELAND file using any
2433         whitespaces. 'track' or 'browser' lines will be regarded as
2434         comment lines. A bug fixed when throwing StrandFormatError. The
2435         maximum redundant tag number at a single position can be no less
2436         than 65536.
2437
2438
2439 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2440         Version 1.3 (naming clarification version)
2441
2442         * Naming clarification changes according to our manuscript:
2443
2444         'frag_len' is changed to 'd'.
2445
2446         'fold_change' is changed to 'fold_enrichment'.
2447
2448         Suggest '--bw' parameter to be determined by users from the real
2449         sonication size.
2450
2451         Maximum FDR is 100% in the output file.
2452
2453         And other clarifications in 00README file and the documents on the
2454         website.
2455
2456         * IO/__init__.py
2457         If the redundant tag number at a single position is over 32767,
2458         just remember 32767, instead of raising an overflow exception.
2459
2460         * setup.py
2461         fixed a typo.
2462
2463         * PeakDetect.py
2464         Bug fixed for diagnosis report.
2465
2466
2467 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2468         Version 1.2.2gamma
2469
2470         * Serious bugs fix:
2471
2472         Poisson distribution CDF and inverse CDF functions are
2473         corrected. They can produce right results even for huge lambda
2474         now. So that the p-value and FDR values in the final excel sheet
2475         are corrected.
2476
2477         IO package now can tolerate some rare cases; ELANDParser in IO
2478         package is fixed. (Thank Bogdan)
2479
2480         * Improvement:
2481
2482         Reverse paired peaks in model are rejected. So there will be no
2483         negative 'frag_len'. (Thank Bogdan)
2484
2485         * Features added:
2486
2487         Diagnosis function is completed. Which can output a table file for
2488         users to estimate their sequencing depth.
2489
2490
2491 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2492         Version 1.2
2493
2494         * Probe.py is added!
2495
2496         GSL is totally removed from MACS. Instead, I have implemented the
2497         CDF and inverse CDF for poisson and binomial distribution purely
2498         in python.
2499
2500         * Constants.py is added!
2501
2502         Organize constants used in MACS in the Constants.py file.
2503
2504         * All other files are modified!
2505
2506         Foldchange calculation is modified. Now the foldchange only be
2507         calculated at the peak summit position instead of the whole peak
2508         region. The values will be higher and more robust than before.
2509
2510         Features added:
2511
2512         1. MACS can save wiggle format files containing the tag number at
2513         every 10 bp along the genome. Tags are shifted according to our
2514         model before they are calculated.
2515
2516         2. Model building and local lambda calculation can be skipped with
2517         certain options.
2518
2519         3. A diagnosis report can be generated through '--diag'
2520         option. This report can help you get an assumption about the
2521         sequencing saturation. This funtion is only in beta stage.
2522
2523         4. FDR calculation speed is highly improved.
2524
2525 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2526         Version 1.1
2527
2528         * TabIO, PeakModel.py ...
2529         Bug fixed to let MACS tolerate some cases while there is no tag on
2530         either plus strand or minus strand.
2531
2532         * setup.py
2533         Check the version of python. If the version is lower than 2.4,
2534         refuse to install with warning.
2535