ChangeLog

   1 2024-09-06  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS 3.0.2
   3
   4         * Features added
   5
   6         1) Introduce a new emission model for the `hmmratac` function. Now
   7         users can choose the simpler Poisson emission `--hmm-type poisson`
   8         instead of the default Gaussian emission. As a consequence, the
   9         saved HMM model file in json will include the hmm-type information
  10         as well. Note that in order to be compatible with the HMM model
  11         file from previous version, if there is no hmm-type information in
  12         the model file, the hmm-type will be assigned as gaussian. #635
  13
  14         2) `hmmratac` now output narrowPeak format output. The summit
  15         position and the peak score columns reported in the narrowPeak
  16         output represents the position with highest foldchange value
  17         (pileup vs average background).
  18
  19         3) Add `--cutoff-analysis-steps` and `--cutoff-analysis-max` for
  20         `callpeak`, `bdgpeakcall`, and `hmmratac` so that we can
  21         have finer resolution of the cutoff analysis report. #636  #642
  22
  23         4) Reduce memory usage of `hmmratac` during decoding step, by
  24         writing decoding results to a temporary file on disk (file
  25         location depends on the environmental TEMP setting), then loading
  26         it back while identifying state pathes. This change will decrease
  27         the memory usage dramatically. #628 #640
  28
  29         5) Fix instructions for preparing narrowPeak files for uploading
  30         to UCSC browser, with the `--trackline` option in `callpeak`. #653
  31
  32         6) For gappedPeak output, set thickStart and thickEnd columns as
  33         0, according to UCSC definition.
  34
  35         * Bugs fixed
  36
  37         1) Use `-O3` instead of `-Ofast` for compatibility. #637
  38
  39         * Documentation
  40
  41         1) Update instruction to install macs3 through conda/bioconda
  42
  43         2) Reorganize MACS3 docs and publish through
  44         https://macs3-project.github.io/MACS
  45
  46         3) Description on various file formats used in MACS3.
  47
  48 2024-02-19  Tao Liu  <vladimir.liu@gmail.com>
  49         MACS 3.0.1
  50
  51         * Bugs fixed
  52
  53         1) Fixed a bug that the `hmmatac` can't correctly save the
  54         digested signal files. #605 #611
  55
  56         2) Applied a patch to remove cython requirement from the installed
  57         system. (it's needed for building the package). #606 #612
  58
  59         3) Relax the testing script while comparing the peaks called from
  60         current codes and the standard peaks. To implement this, we added
  61         'intersection' function to 'Regions' class to find the
  62         intersecting regions of two Regions object (similar to PeakIO but
  63         only recording chromosome, start and end positions). And we
  64         updated the unit test 'test_Region.py' then implemented a script
  65         'jaccard.py' to compute the Jaccard Index of two peak files. If
  66         the JI > 0.99 we would think the peaks called and the standard
  67         peaks are similar. This is to avoid the problem caused by
  68         different Numpy/SciPy/sci-kit learn libraries, when certain peak
  69         coordinates may have 10bps difference. #615 #619
  70
  71         4) Due to the changes in scikit-learn 1.3.0:
  72         https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
  73         0.3 uses Kmeans will end up with inconsistent results between
  74         sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
  75         hmm.GaussianHMM and adjusted the standard output from `hmmratac`
  76         subcommand. The change is based on
  77         https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
  78         the random seeding of KMeans 10 times. Now the `hmmratac` results
  79         should be more consistent (at least JI>0.99). #615 #620
  80
  81         * Other
  82
  83         1) We added some dependencies to MACS3. `hmmratc` subcommand needs
  84         `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
  85         `scikit-learn` needs `scipy`. Since major releases have happened
  86         for both`scipy` and `scikit-learn`, we have to set specific
  87         version requirements for them in order to make sure the output
  88         results from `hmmratac` are consistent.
  89
  90         2) We updated our documentation website using
  91         Sphinx. https://macs3-project.github.io/MACS/
  92
  93 2023-11-15  Tao Liu  <vladimir.liu@gmail.com>
  94         MACS 3.0.0
  95
  96         1) Call variants in peak regions directly from BAM files. The
  97         function was originally developed under code name SAPPER. Now
  98         SAPPER has been merged into MACS as the `callvar` command. It can
  99         be used to call SNVs and small INDELs directly from alignment
 100         files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
 101         the DNA sequence at the enriched genomic regions (binding sites or
 102         accessible DNA) and to refine the alignment when necessary. We
 103         added `simde` as a submodule in order to support fermi-lite
 104         library under non-x64 architectures.
 105
 106         2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
 107         a dedicated software to analyze ATAC-seq data. The basic idea
 108         behind HMMRATAC is to digest ATAC-seq data according to the
 109         fragment length of read pairs into four signal tracks: short
 110         fragments, mono-nucleosomal fragments, di-nucleosomal fragments
 111         and tri-nucleosomal fragments. Then integrate the four tracks
 112         again using Hidden Markov Model to consider three hidden states:
 113         open region, nucleosomal region, and background region. The
 114         orginal paper was published in 2019 written in JAVA, by Evan
 115         Tarbell. We implemented it in Python/Cython and optimize the whole
 116         process using existing MACS functions and hmmlearn. Now it can run
 117         much faster than the original JAVA version. Note: evaluation of
 118         the peak calling results is still underway.
 119
 120         3) Speed/memory optimization.  Use the cykhash to replace python
 121         dictionary. Use buffer (10MB) to read and parse input file (not
 122         available for BAM file parser). And many optimization tweaks. We
 123         added memory monitoring to the runtime messages.
 124
 125         4) R wrappers for MACS -- MACSr for bioconductor.
 126
 127         5) Code cleanup. Reorganize source codes.
 128
 129         6) Unit testing.
 130
 131         7) Switch to Github Action for CI, support multi-arch testing
 132         including x64, armv7, aarch64, s390x and ppc64le. We also test on
 133         Mac OS 12.
 134
 135         8) MACS tag-shifting model has been refined. Now it will use a
 136         naive peak calling approach to find ALL possible paired peaks at +
 137         and - strand, then use all of them to calculate the
 138         cross-correlation. (a related bug has been fix
 139         [#442](https://github.com/macs3-project/MACS/issues/442))
 140
 141         9) BAI index and random access to BAM file now is
 142         supported. [#449](https://github.com/macs3-project/MACS/issues/449).
 143
 144         10) Support of Python > 3.10
 145         [#498](https://github.com/macs3-project/MACS/issues/498)
 146
 147         11) The effective genome size parameters have been updated
 148         according to
 149         deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
 150
 151         12) Multiple updates regarding dependencies, anaconda built, CI/CD
 152         process.
 153
 154         13) Cython 3 is supported.
 155
 156         14) Documentations for each subcommand can be found under /docs
 157
 158         *Other*
 159
 160         1) Missing header line while no peaks can be called
 161         [#501](https://github.com/macs3-project/MACS/issues/501)
 162         [#502](https://github.com/macs3-project/MACS/issues/502)
 163
 164         2) Note: different numpy, scipy, sklearn may give slightly
 165         different results for hmmratac results. The current standard
 166         results for automated testing in `/test` directory are from Numpy
 167         1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
 168
 169 2020-04-11  Tao Liu  <vladimir.liu@gmail.com>
 170         MACS version 2.2.7.1
 171
 172         * hotfix:
 173
 174         Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
 175         work.
 176
 177 2020-04-10  Tao Liu  <vladimir.liu@gmail.com>
 178         MACS version 2.2.7
 179
 180         * Bugs fixed
 181
 182         1) MACS2 has been tested on multiple architectures to make sure it
 183         can successfully generate consistent results. Currently the
 184         supported architectures are: AMD64, ARM64, i386, PPC64LE, and
 185         S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
 186         #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
 187         and #370. The lesson is that if the project is built on Cython and
 188         is aimed at memory efficiency, we should specifically define all
 189         int/float types in pyx files such as int8_t or uint32_t using
 190         either libc or numpy (c version) instead of relying on Cython
 191         types such as short, long, double.
 192
 193         2) MACS2 setup script will check numpy and install numpy if
 194         necessary. PR #378, issue #364
 195
 196         3) `bdgbroadcall` command will correctly add the score column (5th
 197         column). The score (5th) column contains 10 times of the average
 198         score in the broad region. PR #373, issue #362
 199
 200         4) The missing test on `bdgopt` subcommand has been added. PR #363
 201
 202         5) The obsolete option `--ratio` from `callpeak` subcommand has
 203         been removed. PR #369, issue #366
 204
 205         6) Fixed the incorrect description in README on the 'maximum
 206         length of broad region is 4 times of d' to 'maximum gap for
 207         merging broad regions is 4 times of tag size by default'. PR #380,
 208         issue #365.
 209
 210         * Other
 211
 212         1) CODE OF CONDUCT document has been added to MACS2 github
 213         repository. PR #358
 214
 215 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
 216         MACS version 2.2.6
 217
 218         * New Features
 219
 220         1) Speed up MACS2. Some programming tricks and code cleanup. The
 221         filter_dup function replaces separate_dups. The later one was
 222         implemented for potentially putting back duplicate reads in
 223         certain downstream analysis. However such analysis hasn't been
 224         implemented. Optimize the speed of writing bedGraph
 225         files. Optimize BAM and BAMPE parsing with pointer casting instead
 226         of python unpack.
 227
 228         2) The comment lines in the headers of BED or SAM files will be
 229         correctly skipped. However, MACS2 won't check comment lines in the
 230         middle of the file.
 231
 232         * Bugs fixed
 233
 234         1) Cutoff-analysis in callpeak command. #341
 235
 236         2) Issues related to SAMParser and three ELAND Parsers are
 237         fixed. #347
 238
 239         * Other
 240
 241         1) cmdlinetest script in test/ folder has been updated to: 1. test
 242         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
 243         and after the error or warning message during tests; 3. output
 244         only the first 10 lines if the difference between test result and
 245         standard result can be found; 4. prockreport monitor CPU time and
 246         memory usage in 1 sec interval -- a bit more accurate.
 247
 248         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
 249
 250 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
 251         MACS version 2.2.5 (Py3 speed up)
 252
 253         * Features added
 254
 255         1) *Github code only and Not included in MACS2 release* New
 256         testing data for performance test. An subsampled ENCODE2 CTCF
 257         ChIP-seq dataset, including 5million ChIP reads and 5 million
 258         control reads, has been included in the test folder for testing
 259         CPU and memory usage (i.e. 5M test). Several related scripts ,
 260         including `prockreport` for output cpu memory usage, `pyprofile`
 261         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
 262         been included.
 263
 264         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
 265         The old hashtable.pyx implementation copied from Pandas (very old
 266         version) doesn't work well in Python3+Cython. It slows down the
 267         pqtable checkup using the identical Cython codes as in
 268         v2.1.4. While running 5M test, the `__getitem__` function in the
 269         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
 270         148.6s with the same number of calls in MACS2 v2.2.4. As a
 271         consequence, the standard python dictionary implementation has
 272         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
 273         faster than py2 version, but uses a bit more memory. In general,
 274         v2.2.5 can finish 5M reads test in 20% less time than MACS2
 275         v2.1.4, but use 15% more memory.
 276
 277         * Bug fixed
 278
 279         1) More Python3 related fixes, e.g. the return value of keys from
 280         py3 dict. #333 #337
 281
 282
 283 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
 284         MACS version 2.2.4 (Python3)
 285
 286         * Features added
 287
 288         1) First Python3 version MACS2 released.
 289
 290         2) Version number 2.2.X will be used for MACS2 in Python3, in
 291         parallel to 2.1.X.
 292
 293         3) More comprehensive test.sh script to check the consistency of
 294         results from Python2 version and Python3 version.
 295
 296         4) Simplify setup.py script since the newest version transparently
 297         supports cython. And when cython is not installed by the user,
 298         setup.py can still compile using only C codes.
 299
 300         5) Fix Signal.pyx to use np.array instead of np.mat.
 301
 302 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
 303         MACS version 2.1.4
 304
 305         * Features added
 306
 307         Github Actions is used together with Travis CI for testing and
 308         deployment.
 309
 310         * Bugs fixed
 311
 312         PR #322:
 313
 314         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 315         not initialized as 0 before adding. Potential bugs are fixed in
 316         other functions in ScoreTrack and CallPeakUnit codes.
 317
 318         2) #321 Cython dependency in setup.py script is removed. And place
 319         'cythonzie' call to the correct position.
 320
 321         3) A typo is fixed in Github Actions script.
 322
 323 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 324         MACS version 2.1.3.3
 325
 326         * Features added
 327
 328         1) Support Docker auto-deploy. PR #309
 329
 330         2) Support Travis CI auto-testing, update unit-testing
 331         scripts, and enable subcommand testing on small datasets.
 332
 333         3) Update README documents. #297 PR #306
 334
 335         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 336         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 337
 338         5) `--d-min` option is added in `callpeak` and `predictd`, to
 339         exclude predictions of fragment size smaller than the given
 340         value. Merged from PR #267 @shouldsee.
 341
 342         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 343         `pileup` and `refinepeak` subcommands. Users can use this option
 344         to decrease memory usage while there are a large number of contigs
 345         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 346         `pileup` and `refinepeak` will suggest users to tweak
 347         `--buffer-size` while catching a MemoryError. #313 PR #314
 348
 349         * Bugs fixed
 350
 351         1) #265 Fixed a bug where the pseudocount hasn't been applied
 352         while calculating p-value score in ScoreTrack object.
 353
 354         2) Fixed bdgbroadcall so that it will report those broad peaks
 355         without strong peak inside, a consistent behavior as `callpeak
 356         --broad`.
 357
 358         3) Rename COPYING to LICENSE.
 359
 360 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 361         MACS version 2.1.2
 362
 363         * New features
 364
 365         1) Added missing BEDPE support. And enable the support for BAMPE
 366         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 367         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 368         will pile up the whole fragment defined by mapping locations of
 369         the left end and right end of each read pair. Thank @purcaro
 370
 371         2) Added options to callpeak command for tweaking max-gap and
 372         min-len during peak calling. Thank @jsh58!
 373
 374         3) The callpeak option "--to-large" option is replaced with
 375         "--scale-to large".
 376
 377         4) The randsample option "-t" has been replaced with "-i".
 378
 379         * Bug fixes
 380
 381         1) Fixed memory issue related to #122 and #146
 382
 383         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 384
 385         3) Fixed a bug while setting commandline qvalue cutoff.
 386
 387         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 388
 389         5) Fixed the calculation of average fragment length for paired-end
 390         data. Thank @jsh58
 391
 392         6) Fixed bugs caused by khash while computing p/q-value and log
 393         likelihood ratios. Thank @jsh58
 394
 395         7) More spelling tweaks in source code. Thank @mr-c
 396
 397 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 398         MACS version 2.1.1 20160309
 399
 400         * Retire the tag:rc.
 401
 402         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 403
 404         * Change filtering criteria for reading BAM/SAM files
 405
 406         Related to callpeak and filterdup commands. Now the
 407         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 408         still be read although MACS2 may decide them as duplicates
 409         later. Related to old issue #33. Sorry I forgot to address it for
 410         years!
 411
 412 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 413         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 414
 415         * Bug fixes
 416
 417         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 418         the former option is not supported by older GCC. Related to issues
 419         #91, #109.
 420
 421         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 422         the PeakIO won't throw an error.
 423
 424         * New features
 425
 426         1) callpeak
 427
 428         a) A more flexible format, BEDPE, is supported. Now users can
 429         define the left and right position of the ChIPed fragment, and
 430         MACS2 will skip model building and directly pileup the
 431         fragments. Related to issue #112.
 432
 433         b) The 'tempdir' can be specified, to save cached pileup
 434         tracks. Originially, the temporary files were stored in
 435         /tmp. Thank @daler! Related to issues #97 and #105.
 436
 437         2) bdgopt
 438
 439         New operations are added, to calculate the maximum or minimum value between
 440         values in BEDGRAPH and given value.
 441
 442         3) bdgcmp
 443
 444         New method is added, to calculate the maximum value between values
 445         defined in two BEDGRAPH files.
 446
 447 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 448         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 449
 450         * Bug fixes
 451
 452         1) Fix a bug while dealing with some chromosomes only containing
 453         one read (pair). The size of dup_plus/dup_minus arrays after
 454         filtering dups should +1.
 455
 456         2) Fix a bug related to the broad peak calling function in
 457         previous versions. The gaps were miscalculated, so segmented weak
 458         broad calls may be reported, and sometimes you would see peaks
 459         with lower than cutoff values in the output files.
 460
 461         3) "Potentially" Fixed issue #105 on temporary cache files, need
 462         further followup.
 463
 464
 465 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 466         MACS version 2.1.0 20150731 (tag:rc)
 467
 468         * Bug fixes
 469
 470         1) Fixed issue #76: information about broad/narrow cutoff will be
 471         correctly displayed.
 472
 473         2) Fixed issue #79: bdgopt extparam option is fixed.
 474
 475         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 476         for filterdup command.
 477
 478         4) Fixed issue #78, #88 and similar issue reported in MACS google
 479         group: MACS2 now can correctly deal with multiple alignment files
 480         for -t or -c. The 'finalize' function will be correctly
 481         called. Multiple files option is enabled for filterdup,
 482         randsample, predictd, pileup and refinepeak commands.
 483
 484         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 485         be sorted by leftmost then rightmost ends.
 486
 487         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 488         array. This will cause 'callpeak --nolambda' hang forever while
 489         calculating pvalues and qvalues.
 490
 491 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 492         MACS version 2.1.0 20150420 (tag:rc)
 493
 494         * New commands
 495
 496         1) bdgopt: some convenient functions to modify bedGraph files.
 497
 498         2) cmbreps: Combine scores from two replicates. Including three
 499         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 500         method to combine two p-value scores. After that, user can use
 501         bdgpeakcall to call peaks on combined scores.
 502
 503         * New features
 504
 505         1) callpeak and bdgpeakcall now can try to analyze the
 506         relationship between p-values and number/length of peaks then
 507         generate a summary to help users decide an appropriate cutoff.
 508
 509         2) callpeak now can accept fold-enrichment cutoff as a filter for
 510         final peak calls.
 511
 512         * Performance
 513
 514         Now MACS2 runs about 3X as fast as previous version. Trade
 515         clean python codes for speed... Now while processing 50M ChIP vs
 516         50M control, it will take only 10 minutes.
 517
 518         * Bug fixes
 519
 520         1) Sampling function in BAMPE mode.
 521
 522         2) Callpeak while there are >= 2 input files for -t or -c.
 523
 524         3) While reading BAM/SAM, those secondary or supplementary
 525         alignments will be correctly skipped.
 526
 527         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 528         option that MACS2 will discard those SAM/BAM alignments with bit
 529         1024 no matter how --keep-dup is set.
 530
 531         5) Fixed issue #49: setuptools is used intead of distutils
 532
 533         6) Fixed issue #51: fix the problem when using --trackline
 534         argument when control file is absent.
 535
 536         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 537         read mapped to minus strand. Previous implementation will find
 538         incorrect 5' end if there is indel in alignment.
 539
 540         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 541         mode which will cause incorrect filtering of duplicated reads. Now
 542         fixed.
 543
 544         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 545         be 1.
 546
 547         10) Issue #71: Merged from aertslab@github, close file descriptor
 548         after creating them with mkstemp().
 549
 550 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 551         MACS version 2.1.0 20140616 (tag:rc)
 552
 553         * callpeak module
 554
 555         "--ratio" is added to manually assign the scaling factor of ChIP
 556         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 557         implementing the patch file!
 558
 559         "--shift" is added to move cutting ends (5' end of reads) around,
 560         in order to process DNAse-Seq data, e.g., use "--shift -100
 561         --extsize 200" to get 200bps fragments around 5' ends. For general
 562         ChIP-Seq data analysis, this option should be always set as
 563         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 564         group!
 565
 566         ** Do not output negative fragment size from cross-correlation
 567         analysis. Thank Alvin Qin for the feedback!
 568
 569         ** --half-ext and --control-shift are removed. For complex read
 570         shifting and extending, combine '--shift' and '--extsize'
 571         options. For comparing two conditions, use 'bdgdiff' module
 572         instead.
 573
 574         ** a bug is fixed to output the last pileup value in bdg file
 575         correctly.
 576
 577         * filterdup
 578
 579         A 'dry-run' option is added to only output numbers, including the
 580         number of allowed duplicates, the total number of reads before and
 581         after filtering duplicates and the estimated duplication
 582         rate. Thank John Urban for the suggestion!
 583
 584
 585 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 586         MACS version 2.0.10 20131216 (tag:alpha)
 587
 588         bug fixes and tweaks
 589
 590         * We changed license from Artistic License to 3-clauses BSD license.
 591
 592         Yes. Simpler the better.
 593
 594         * Process paired-end data with "-f BAMPE" without control
 595
 596         * GappedPeak output for --broad option has been fixed again to be
 597         consistent with official UCSC format. We add 1bp pseudo-block to
 598         left and/or right of broad region when necessary, so that you can
 599         virtualize the regions without strong enrichment inside
 600         successfully. In downstream analysis except for virtualization,
 601         you may need to remove all 1bps blocks from gappedPeak file.
 602
 603         * diffpeak subcommand is temporarily disabled. Till we
 604         re-implement it.
 605
 606 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 607         MACS version 2.0.10 20131028 (tag:alpha)
 608
 609         * callpeak --call-summits improvement
 610
 611         The smoothing window length has been fixed as fragment length
 612         instead of short read length. The larger smoothing window will
 613         grant better smoothing results and better sub-peak summits
 614         detection.
 615
 616         * --outdir and --ofile options for almost all commands
 617
 618         Thank Björn Grüning for initially implementing these options!
 619         Now, MACS2 will save results into a specified
 620         directory by '--outdir' option, and/or save result into a
 621         specified file by '--ofile' option. Note, in case '--ofile' is
 622         available for a subcommand, '-o' now has been adjusted to be the
 623         same as '--ofile' instead of '--o-prefix'.
 624
 625         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 626         for each subcommand:
 627
 628         ** callpeak: --outdir
 629         ** diffpeak: Not implemented
 630         ** bdgpeakcall: --outdir and --ofile
 631         ** bdgbroadcall: --outdir and --ofile
 632         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 633         and the order of arguments for --ofile must be the same as for -m.
 634         ** bdgdiff: --outdir and --ofile
 635         ** filterdup: --outdir
 636         ** pileup: --outdir
 637         ** randsample: --outdir
 638         ** refinepeak: --outdir and --ofile
 639
 640
 641 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 642         MACS version 2.0.10 20130915 (tag:alpha)
 643
 644         * callpeak Added a new option --buffer-size
 645
 646         This option is to tweak a previously hidden parameter that
 647         controls the steps to increase array size for storing alignment
 648         information. While in some rare cases, the number of
 649         chromosomes/contigs/scaffolds is huge, the original default
 650         setting will cause a huge memory waste. In these cases, we
 651         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 652         although the decrease will slow process to read alignment files.
 653
 654         * an optimization to speed up pvalue-qvalue statistics
 655
 656         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 657         human TF library, and now it will take 10 minutes. It was due to a
 658         single line of code to get a value from a numpy array ...
 659
 660         * fixed logLR bugs.
 661
 662 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 663         MACS version 2.0.10 20130731 (tag:alpha)
 664
 665         * callpeak --call-summits
 666
 667         Fix bugs causing callpeak --call-summits option generating extra
 668         number of peaks and inconsistent peak boundaries comparing to
 669         default option. Thank Ben Levinson!
 670
 671         * bdgcmp output
 672
 673         Fix bugs causing bdgcmp output logLR all in positive values. Now
 674         'depletion' can be correctly represented as negative values.
 675
 676         * bdgdiff
 677
 678         Fix the behavior of bdgdiff module. Now it can take four
 679         bedGraph files, then use logLR as cutoff to call differential
 680         regions. Check command line of bdgdiff for detail.
 681
 682 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 683         MACS version 2.0.10 20130713 (tag:alpha)
 684
 685         * fix bugs while output broadPeak and gappedPeak.
 686
 687         Note. Those weak broad regions without any strong enrichment
 688         regions inside won't be saved in gappedPeak file.
 689
 690         * bdgcmp -T and -C are merged into -S and description is updated.
 691
 692         Now, you can use it to override SPMR values in your input for
 693         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 694         statistics will cause weird results ( in most cases, lower
 695         significancy), and won't be consistent with MACS2 callpeak
 696         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 697         sample size in MILLION according to 'callpeak --to-large' option.
 698
 699 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 700         MACS version 2.0.10 20130710 (tag:alpha)
 701
 702         * fix BED style output format of callpeak module:
 703
 704         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 705         the output. Old BED format file won't be saved.
 706
 707         2) with --broad: broadPeak (BED6+3) for broad region and
 708         gappedPeak (BED12+3) for chained enriched regions will be the
 709         output. Old BED format, narrowPeak format, summit file won't be
 710         saved.
 711
 712         * bdgcmp now can accept list of methods to calculate scores. So
 713         you can run it once to generate multiple types of scores. Thank
 714         Jon Urban for this suggestion!
 715
 716         * C codes are re-generated through Cython 0.19.1.
 717
 718 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 719         MACS version 2.0.10 20130520 (tag:alpha)
 720
 721         * broad peak calling modules are modified in order to report all
 722         relexed regions even there is no strong enrichment inside.
 723
 724 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 725         MACS version 2.0.10 20130501 (tag:alpha)
 726
 727         * Memory usage is decreased to about 1/4-1/5 of previous usage
 728         Now, the internal data structure and algorithm are both
 729         re-organized, so that intermediate data wouldn't be saved in
 730         memory. Intead they will be calculated on the fly. New MACS2 will
 731         spend longer time (1.5 to 2 times) however it will use less memory
 732         so can be more usable on small mem servers.
 733
 734         * --seed option is added to callpeak and randsample commands
 735         Thank Mathieu Gineste for this suggestion!
 736
 737 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 738         MACS version 2.0.10 20130306 (tag:alpha)
 739
 740         * diffpeak module New module to detect differential binding sites
 741         with more statistics.
 742
 743         * Introduced --refine-peaks
 744         Calculates reads balancing to refine peak summits
 745
 746         * Ouput file names prefix
 747         Correct encodePeak to narrowPeak, broadPeak to bed12.
 748
 749 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 750         MACS version 2.0.10 (tag:alpha not released)
 751
 752         * Introduced BAMPEParser
 753         Reads PE data directly, requires bedtools for now
 754
 755         * Introduced --call-summits
 756         Uses signal processing methods to call overlapping peaks
 757
 758         * Added --no-trackline
 759         By default, files have descriptive tracklines now
 760
 761         * new refinepeak command (experimental)
 762         This new function will use a similar method in SPP (wtd), to
 763         analyze raw tag distribution in peak region, then redefine the
 764         peak summit where plus and minus tags are evenly distributed
 765         around.
 766
 767         * Changes to output *
 768         cPeakDetect.pyx has full support for new print/write methods and
 769         --call-peaks, BAMPEParser, and use of paired-end data
 770
 771         * Parser optimization
 772
 773         cParser.pyx is rewritten to use io.BufferedReader to speed
 774         up. Speed is doubled.
 775
 776         Code is reorganized -- most of functions are inherited from
 777         GenericParser class.
 778
 779         * Use cross-correlation to calculate fragment size
 780
 781         First, all pairs will be used in prediction for fragment
 782         size. Previously, only no more than 1000 pairs are used. Second,
 783         cross-correlation is used to find the best phase difference
 784         between + and - tag pileups.
 785
 786         * Speed up p-value and q-value calculation
 787
 788         This part is ten times faster now. I am using a dictionary to
 789         cache p-value results from Poisson CDF function. A bit more memory
 790         will be used to increase speed. I hope this dictionary would not
 791         explode since the possible pairs of ChIP signal and control lambda
 792         are hugely redundant. Also, I rewrited part of q-value
 793         calculation.
 794
 795         * Speed up peak detection
 796
 797         This part is about hundred of times faster now.  Optimizations
 798         include using Numpy functions as much as possible, and making loop
 799         body as small as possible.
 800
 801         * Post-processing on differential calls
 802
 803         After macs2diff finds differential binding sites between two
 804         conditions, it will try to annotate the peak calls from one of two
 805         conditions, describe the changes ...
 806
 807         * Fragment size prediction in macs2diff
 808
 809         Now by default, macs2diff will try to use the average fragment
 810         size from both condition 1 and condition 2 for tag extension and
 811         peak calling. Previously, by default, it will use different sizes
 812         unless --nomodel is specified.
 813
 814         Technically, I separate model building processes out. So macs2diff
 815         will build fragment sizes for condition 1 and 2 in parallel (2
 816         processes maximum), then perform 4-way comparisons in parallel (4
 817         processes maximum).
 818
 819         * Diff score
 820
 821         Combine two p/qscore tracks together. At regions where condition 1
 822         is higher than condition 2, score would be positive, otherwise,
 823         negative.
 824
 825         * SAMParser and BAMParser
 826
 827         Bug fixed for paired-end sequencing data.
 828
 829         * BedGraph.pyx
 830
 831         Fixed a bug while calling peaks from BedGraph file. It previously
 832         mistakenly output same peaks multiple times at the end of
 833         chromosome.
 834
 835 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 836         MACS version 2.0.9 (tag:alpha)
 837
 838         * Auto fixation on predicted d is turned off by default!
 839
 840         Previous --off-auto is now default. MACS will not automatically
 841         fix d less than 2 times of tag size according to
 842         --shiftsize. While tag size is getting longer nowadays, it would
 843         be easier to have d less than 2 times of tag size, however d may
 844         still be meaningful and useful. Please judge it using your own
 845         wisdom.
 846
 847         * Scaling issue
 848
 849         Now, the default scaling while treatment and input are unbalanced
 850         has been adjusted. By default, larger sample will be scaled down
 851         linearly to match the smaller sample. In this way, background
 852         noise will be reduced more than real signals, so we expect to have
 853         more specific results than the other way around (i.e. --to-large
 854         is set).
 855
 856         Also, an alternative option to randomly sample larger data
 857         (--down-sample) is provided to replace default linear
 858         scaling. However, this option will cause results irresproducible,
 859         so be careful.
 860
 861         * randsample script
 862
 863         A new script 'randsample'  is added, which can randomly sample
 864         certain percentage or number of tags.
 865
 866         * Peak summit
 867
 868         Now, MACS will decide peak summits according to pileup height
 869         instead of qvalue scores. In this way, the summit may be more
 870         accurate.
 871
 872         * Diff score
 873
 874         MACS calculate qvalue scores as differential scores. When compare
 875         two conditions (saying A and B), the maximum qscore for comparing
 876         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 877         will be computed. If maxqscore_a2b is bigger, the diff score is
 878         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 879
 880 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 881         MACS version 2.0.8 (tag:alpha)
 882
 883         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 884
 885         New script bdgbroadcall and the extra option '--broad' for macs2
 886         script, can be used to call broad regions with a loose cutoff to
 887         link nearby significant regions. The output is represented as
 888         BED12 format.
 889
 890         * MACS2/IO/cScoreTrack.pyx
 891
 892         Fix q-value calculation to generate forcefully monotonic values.
 893
 894         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 895
 896         They are combined to one more powerful script called
 897         "filterdup". The script filterdup can filter duplicated reads
 898         according to sequencing depth and genome size. The script can also
 899         convert any format supported by MACS to BED format.
 900
 901 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 902         MACS version 2.0.7 (tag:alpha)
 903
 904         * bin/macsdiff renamed to bin/bdgdiff
 905
 906         Now this script will work as a low-level finetuning tool as bdgcmp
 907         and bdgpeakcall.
 908
 909         * bin/macs2diff
 910
 911         A new script to take treatment and control files from two
 912         condition, calculate fragment size, use local poisson to get
 913         pvalues and BH process to get qvalues, then combine 4-ways result
 914         to call differential sites.
 915
 916         This script can use upto 4 cpus to speed up 4-ways calculation. (
 917         I am trying multiprocessing in python. )
 918
 919         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 920         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 921         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 922
 923         All above files are modified for the new macs2diff script.
 924
 925         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 926
 927         Now q-value 0.01 is the default cutoff. If -p is specified,
 928         p-value cutoff will be used instead.
 929
 930 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 931         MACS version 2.0.6 (tag:alpha)
 932
 933         * bin/macsdiff
 934
 935         A script to call differential regions. A naive way is introduced
 936         to find the regions where:
 937
 938         1. signal from condition 1 is larger than input 1 and condition 2 --
 939         unique region in condition 1;
 940         2. signal from condition 2 is larger than input 2 and condition 1
 941         -- unique region in condition 2;
 942         3. signal from condition 1 is larger than input 1, signal from
 943         condition 2 is larger than input 2, however either signal from
 944         condition 1 or 2 is not larger than the other.
 945
 946         Here 'larger' means the pvalue or qvalue from a Poisson test is
 947         under certain cutoff.
 948
 949         (I will make another script to wrap up mulitple scripts for
 950         differential calling)
 951
 952 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 953         MACS version 2.0.5 (tag:alpha)
 954
 955         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 956         MACS2/IO/cPeakIO.pyx
 957
 958         Use hash to store peak information. Add back the feature to deal
 959         with data without control.
 960
 961         Fix bug which incorrectly allows small peaks at the end of
 962         chromosomes.
 963
 964         * bin/bdgpeakcall, bin/bdgcmp
 965
 966         Fix bugs. bdgpeakcall can output encodePeak format.
 967
 968 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
 969         MACS version 2.0.4 (tag:alpha)
 970
 971         * cPeakDetect.py
 972
 973         Fix a bug, correctly assign lambda_bg while --to-small is
 974         set. Thanks Junya Seo!
 975
 976         Add rank and num of bp columns to pvalue-qvalue table.
 977
 978         * cScoreTrack.py
 979
 980         Fix bugs to correctly deal with peakless chromosomes. Thanks
 981         Vaibhav Jain!
 982
 983         Use AFDR for independent tests instead.
 984
 985         * encodePeak
 986
 987         Now MACS can output peak coordinates together with pvalue, qvalue,
 988         summit positions in a single encodePeak format (designed for
 989         ENCODE project) file. This file can be loaded to UCSC
 990         browser. Definition of some specific columns are: 5th:
 991         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
 992         -log10qvalue, 10th: relative summit position to peak start.
 993
 994
 995 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 996         MACS version 2.0.3 (tag:alpha)
 997
 998         * Rich output with qvalue, fold enrichment, and pileup height
 999
1000         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1001         procedure:
1002
1003         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1004
1005         Now we have a similiar xls output file as before. The differences
1006         from previous file are:
1007
1008         1. Summit now is absolute summit, instead of relative summit
1009            position;
1010         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1011            pileup at the peak summit;
1012         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1013            5.00 means 1e-5, simple and less confusing.
1014         4. FDR column becomes '-log10(qvalue)' column.
1015         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1016            the values at the peak summit.
1017
1018         * Extra output files
1019
1020         NAME_pqtable.txt contains pvalue and qvalue relationships.
1021
1022         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1023         and -log10qvalue scores in BedGraph format. Nearby regions with
1024         the same value are not merged.
1025
1026         * Separation of FeatIO.py
1027
1028         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1029         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1030         implemented to store pileup, local lambda, pvalue, and qvalue
1031         alltogether in cScoreTrack.pyx.
1032
1033         * Experimental option --half-ext
1034
1035         Suggested by NPS algorithm, I added an experimental option
1036         --half-ext to let MACS only extends ChIP fragment around its
1037         middle point for only 1/2 d.
1038
1039 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1040         MACS version 2.0.2 (tag:alpha)
1041
1042         * macs2
1043
1044         Add an error check to see if there is no common chromosome names
1045         from treatment file and control file
1046
1047         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1048
1049         Reduce memory usage by removing deepcopy() calls.
1050
1051         * Modify README documents and others.
1052
1053 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1054         MACS Version 2.0.1 (tag:alpha)
1055
1056         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1057
1058         Jie suggested me a brilliant simple method to pileup fragments
1059         into bedGraph track. It works extremely faster than the previous
1060         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1061         large local lambda calculation in MACSv2 now. Now I generate three
1062         bedGraphs for d-size local bias, slocal-size and llocal-size local
1063         bias, and calculate the maximum local bias as local lambda
1064         bedGraph track.
1065
1066         Minor: add_loc in bedGraphTrackI now can correctly merge the
1067         region with its preceding region if their value are the same.
1068
1069         * macs2
1070
1071         Add an option to shift control tags before extension. By default,
1072         control tags will be extended to both sides regardless of strand
1073         information.
1074
1075 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1076         MACS Version 2.0.0 (tag:alpha)
1077
1078         * Use bedGraph type to store data internally and externally.
1079
1080         We can have theoretically one-basepair resolution profiles. 10
1081         times smaller in filesize and even smaller after converting to
1082         bigWig for visualization.
1083
1084         * Peak calling process modified. Better peak boundary detection.
1085
1086         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1087         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1088         one will be averaged to d size) Then calculate the maximum value
1089         of these two tracks and a global background, to have a
1090         local-lambda bedGraph.
1091
1092         Use -10log10poisson_pvalue as scores to generate a score track
1093         before peak calling.
1094
1095         A general peak calling based on a score cutoff, min length of peak
1096         and max gap between nearby peaks.
1097
1098         * Option changes.
1099
1100         Wiggle file output is removed. Now we only support bedGraph
1101         output. The generation of bedGraph is highly recommended since it
1102         will not cost extra time. In other words, bedGraph generation is
1103         internally run even you don't want to save bedGraphs on disk, due
1104         to the peak calling algorithm in MACS v2.
1105
1106         * cProb.pyx
1107
1108         We now can calculate poisson pvalue in log space so that the score
1109         (-10*log10pvalue) will not have a upper limit of 3100 due to
1110         precision of float number.
1111
1112         * Cython is adopted to speed up Python code.
1113
1114 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1115         Small fixes
1116
1117         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1118
1119 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1120         Version 1.4.0rc2 (Valentine)
1121
1122         * --single-wig option is renamed to --single-profile
1123
1124         * BedGraph output with --bdg or -B option.
1125
1126         The BedGraph output provides 1bp resolution fragment pileup
1127         profile. File size is smaller than wig file. This option can be
1128         combined with --single-profile option to produce a bedgraph file
1129         for the whole genome. This option can also make --space,
1130         --call-subpeaks invalid.
1131
1132         * Fix the description of --shiftsize to correctly state that the
1133         value is 1/2 d (fragment size).
1134
1135         * Fix a bug in the call to __filter_w_control_tags when control is
1136         not available.
1137
1138         * Fix a bug on --to-small option. Now it works as expected.
1139
1140         * Fix a bug while counting the tags in candidate peak region, an
1141         extra tag may be included. (Thanks to Jake Biesinger!)
1142
1143         * Fix the bug for the peaks extended outside of chromosome
1144         start. If the minus strand tag goes outside of chromosome start
1145         after extension of d, it will be thrown out.
1146
1147         * Post-process script for a combined wig file:
1148
1149         The "wignorm" command can be called after a full run of MACS14 as
1150         a postprocess. wignorm can calculate the local background from the
1151         control wig file from MACS14, then use either foldchange,
1152         -10*log10(pvalue) from possion test, or difference after asinh
1153         transformation as the score to build a single wig track to
1154         represent the binding strength. This script will take a
1155         significant long time to process.
1156
1157         * --wigextend has been obsoleted.
1158
1159 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1160         Version 1.4.0rc1 (Starry Sky)
1161
1162         * Duplicate reads option
1163
1164         --keep-dup behavior is changed. Now user can specify how many
1165         reads he/she wants to keep at the same genomic location. 'auto' to
1166         let MACS decide the number based on binomial distribution, 'all'
1167         to let MACS keep all reads.
1168
1169         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1170
1171         By default, MACS will now scale the smaller dataset to the bigger
1172         dataset. For instance, if IP has 10 million reads, and Input has 5
1173         million, MACS will double the lambda value calculated from Input
1174         reads while calling BOTH the positive peaks and negative
1175         peaks. This will address the issue caused by unbalanced numbers of
1176         reads from IP and Input. If --to-small is turned on, MACS will
1177         scale the larger dataset to the smaller one. So from now on, if d
1178         is fixed, then the peaks from a MACS call for A vs B should be
1179         identical to the negative peaks from a B vs A.
1180
1181 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1182         Version 1.4.0beta (summer wishes)
1183
1184         * New features
1185
1186         ** Model building
1187
1188         The default behavior in the model building step is slightly
1189         changed. When MACS can't find enough pairs to build model
1190         (implemented in alpha version) or the modeled fragment length is
1191         less than 2 times of tag length (implemented in beta version),
1192         MACS will use 2 times of --shiftsize value as fragment length in
1193         the later analysis. --off-auto can turn off this default behavior.
1194
1195         ** Redundant tag filtering
1196
1197         The IO module is rewritten. The redundant tag filtering process
1198         becomes simpler and works as promise. The maximum allowed number
1199         of tags at the exact same location is calculated from the
1200         sequencing depth and genome size using a binomial distribution,
1201         for both TREAMENT and CONTROL separately. ( previously only
1202         TREATMENT is considered ) The exact same location means the same
1203         coordination and the same strand. Then MACS will only keep at most
1204         this number of tags at the exact same location in the following
1205         analysis. An option --keep-dup can let MACS skip the filtering and
1206         keep all the tags. However this may bring in a lot of sequencing
1207         bias, so you may get many false positive peaks.
1208
1209         ** Single wiggle mode
1210
1211         First thing to mention, this is not the score track that I
1212         described before. By default, MACS generates wiggle files for
1213         fragment pileup for every chromosomes separately. When you use
1214         --single-wig option, MACS will generate a single wiggle file for
1215         all the chromosomes so you will get a wig.gz for TREATMENT and
1216         another wig.gz for CONTROL if available.
1217
1218         ** Sniff -- automatic format detection
1219
1220         Now, by default or "-f AUTO", MACS will decide the input file
1221         format automatically. Technically, it will try to read at most
1222         1000 records for the first 10 non-comment lines. If it succeeds,
1223         the format is decided. I recommend not to use AUTO and specify the
1224         right format for your input files, unless you combine different
1225         formats in a single MACS run.
1226
1227         * Options changes
1228
1229         --single-wig and --keep-dup are added. Check previous section in
1230         ChangeLog for detail.
1231
1232         -f (--format) AUTO is now the default option.
1233
1234         --slocal default: 1000
1235         --llocal default: 10000
1236
1237         * Bug fixed
1238
1239         Setup script will stop the installation if python version is not
1240         python2.6 or python2.7.
1241
1242         Local lambda calculation has been changed back. MACS will check
1243         peak_region, slocal( default 1K) and llocal (default 10K) for the
1244         local bias. The previous 200bps default will cause MACS misses
1245         some peaks where the input bias is very sharp.
1246
1247         sam2bed.py script is corrected.
1248
1249         Relative pos in xls output is fixed.
1250
1251         Parser for ELAND_export is fixed to pass some of the no match
1252         lines. And elandexport2bed.py is fixed too. ( however I can't
1253         guarantee that it works on any eland_export files. )
1254
1255 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1256         Version 1.4.0alpha2 (be smarter)
1257
1258         * Options changes
1259
1260         --gsize now provides shortcuts for common genomes, including
1261         human, mouse, C. elegans and fruitfly.
1262
1263         --llocal now will be 5000 bps if there is no input file, so that
1264         local lambda doesn't overkill enriched binding sites.
1265
1266 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1267         Version 1.4alpha (be smarter)
1268
1269         * Options changes
1270
1271         --tsize option is redesigned. MACS will use the first 10 lines of
1272         the input to decide the tag size. If user specifies --tsize, it
1273         will override the auto decided tsize.
1274
1275         --lambdaset is replaced by --slocal and --llocal which mean the
1276         small local region and large local region.
1277
1278         --bw has no effect on the scan-window size now. It only affects the
1279         paired-peaks model process.
1280
1281         * Model building
1282
1283         During the model building, MACS will pick out the enriched regions
1284         which are not too high and not too low to build the paired-peak
1285         model. Default the region is from fold 10 to fold 30. If MACS
1286         fails to build the model, by default it will use the nomodel
1287         settings, like shiftsize=100bps, to shift and extend each
1288         tags. This behavior can be turned off by '--off-auto'.
1289
1290         * Output files
1291
1292         An extra file including all the summit positions are saved in
1293         *_summits.bed file. An option '--call-subpeaks' will invoke
1294         PeakSplitter developed by Mali Salmon to split wide peaks into
1295         smaller subpeaks.
1296
1297         * Sniff ( will in beta )
1298
1299         Automatically recognize the input file format, so use can combine
1300         different format in one MACS run.
1301
1302         Not implemented features/TODO:
1303
1304         * Algorithms ( in near future? )
1305
1306         MACS will try to refine the peak boundaries by calculating the
1307         scores for every point in the candidate peak regions. The score
1308         will be the -10*log(10,pvalue) on a local poisson distribution. A
1309         cutoff specified by users (--pvalue) will be applied to find the
1310         precise sub-peaks in the original candidate peak region. Peak
1311         boudaries and peak summits positions will be saved in separate BED
1312         files.
1313
1314         * Single wiggle track ( in near future? )
1315
1316         A single wiggle track will be generated to save the scores within
1317         candidate peak regions in the 10bps resolution. The wiggle file
1318         is in fixedStep format.
1319
1320
1321 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1322         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1323
1324         * bin/Constants.py
1325
1326         Fixed typo. FCSTEP -> FESTEP
1327
1328         * lib/PeakDetect.py
1329
1330         The 'femax' attribute bug is fixed
1331
1332 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1333         Version 1.3.7 (Oktoberfest)
1334
1335         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1336
1337         Enhancements by Peter Chines:
1338
1339         1. gzip files are supported.
1340         2. when --diag is on, user can set the increment and endpoint for
1341         fold enrichment analysis by setting --fe-step and --fe-max.
1342
1343         Enhancements by Davide Cittaro:
1344
1345         1. BAM and SAM formats are supported.
1346         2. small changes in the header lines of wiggle output.
1347
1348         Enhancements by Me:
1349         1. I added --fe-min option;
1350         2. Bowtie ascii output with suffix ".map" is supported.
1351
1352         Bug fixed:
1353
1354         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1355         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1356         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1357         4. Some "fold change" have been changed to "fold enrichment".
1358
1359 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1360         Version 1.3.6.1 (default parameter change)
1361
1362         * bin/macs, lib/PeakDetect.py
1363
1364         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1365         default. "--futurefdr" is added which can turn on the 'new' method
1366         introduced in 1.3.6. By default it's off.
1367
1368         * lib/PeakDetect.py
1369
1370         Fixed a bug. p-value is corrected a little bit.
1371
1372
1373 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1374         Version 1.3.6 (Birthday cake)
1375
1376         * bin/macs
1377
1378         "track name" is added to the header of BED output file.
1379
1380         Now the default peak detection method is to consider 5k and 10k
1381         nearby regions in treatment data and peak location, 1k, 5k, and
1382         10k regions in control data to calculate local bias. The old
1383         method can be called through '--old' option.
1384
1385         Information about how many total/unique tags in treatment or
1386         control will be saved in final .xls output.
1387
1388         * lib/IO/__init__.py
1389
1390         ".fa" will be removed from input tag alignment so only the
1391         chromosome names are kept.
1392
1393         WigTrackI class is added for Wiggle like data structure. (not used
1394         now)
1395
1396         The parser for ELAND multi PET files has been fixed. Now the 5'
1397         tag position for a pair will be kept, whereas in the previous
1398         version, the middle points are kept.
1399
1400         * lib/IO/BinKeeper.py
1401
1402         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1403         browser, which can quickly access certain region for values in a
1404         large wiggle like data file. (not used now)
1405
1406         * lib/OptValidator.py
1407
1408         typo fixed.
1409
1410         * lib/PeakDetect.py
1411
1412         Now the default peak detection method is to consider 5k and 10k
1413         nearby regions in treatment data and peak location, 1k, 5k, and
1414         10k regions in control data to calculate local bias. The old
1415         method can be called through '--old' option.
1416
1417         Two columns have beed added to BED output file. 4th column: peak
1418         name; 5th column: peak score using -10log(10,pvalue) as score.
1419
1420         * setup.py
1421
1422         Add support to build a Mac App through 'setup.py py2app', or a
1423         Windows executable through 'setup.py py2exe'. You need to install
1424         py2app or py2exe package in order to use these functions.
1425
1426 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1427         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1428
1429         * PeakDetect.py
1430
1431         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1432         in control data to calculate local lambda for each peak. Peak
1433         calling results will be slightly different with previous version,
1434         beware!
1435
1436         * OptValidator.py
1437
1438         Typo fixed, ELANDParser -> ELANDResultParser
1439
1440         * OutputWriter.py
1441
1442         Now, modeled d value will be shown on the model figure.
1443
1444 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1445         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1446
1447         * macs, IO/__init__.py, PeakDetect.py
1448
1449         Add support for ELAND multi format. Add support for Pair-End
1450         experiment, in this case, 5'end and 3'end ELAND multi format files
1451         are required for treatment or control data. See 00README file for
1452         detail.
1453
1454         Add wigextend option.
1455
1456         Add petdist option for Pair-End Tag experiment, which is the best
1457         distance between 5' and 3' tags.
1458
1459         * PeakDetect.py
1460
1461         Fixed a bug which cause the end positions of every peak region
1462         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1463
1464         * OutputWriter.py
1465
1466         Fix bugs while generating wiggle files. The start position of
1467         wiggle file is set to 1 instead of 0.
1468
1469         Fix a bug that every 10M bps, signals in the first 'd' range are
1470         lower than actual. ( Thanks Mali Salmon!)
1471
1472
1473 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1474         Version 1.3.3 (wiggle bugs fixed)
1475
1476         * OutputWriter.py
1477
1478         Fix bugs while generating wiggle files. 1. 'span=' is added to
1479         'variableStep' line; 2. previously, every 10M bps, the coordinates
1480         were wrongly shifted to the right for 'd' basepairs.
1481
1482         * macs, PeakDetect.py
1483
1484         Add an option to save wiggle files on different resolution.
1485
1486 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1487         Version 1.3.2 (tiny bugs fixed)
1488
1489         * IO/__init__.py
1490
1491         Fix 65536 -> 65535. ( Thank Joon)
1492
1493         * Prob.py
1494
1495         Improved for binomial function with extra large number. Imported
1496         from Cistrome project.
1497
1498         * PeakDetect.py
1499
1500         If treatment channel misses reads in some chromosome included in
1501         control channel, or vice versa, MACS will not exit. (Thank Shaun
1502         Mahony)
1503
1504         Instead, MACS will fake a tag at position -1 when calling
1505         treatment peaks vs control, but will ignore the chromosome while
1506         calling negative peaks.
1507
1508 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1509         Version 1.3.1 (tiny bugs fixed version)
1510
1511         * Prob.py
1512
1513         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1514         binomial functions can tolerate large and small numbers.
1515
1516         * IO/__init__.py
1517
1518         Parsers now split lines in BED/ELAND file using any
1519         whitespaces. 'track' or 'browser' lines will be regarded as
1520         comment lines. A bug fixed when throwing StrandFormatError. The
1521         maximum redundant tag number at a single position can be no less
1522         than 65536.
1523
1524
1525 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1526         Version 1.3 (naming clarification version)
1527
1528         * Naming clarification changes according to our manuscript:
1529
1530         'frag_len' is changed to 'd'.
1531
1532         'fold_change' is changed to 'fold_enrichment'.
1533
1534         Suggest '--bw' parameter to be determined by users from the real
1535         sonication size.
1536
1537         Maximum FDR is 100% in the output file.
1538
1539         And other clarifications in 00README file and the documents on the
1540         website.
1541
1542         * IO/__init__.py
1543         If the redundant tag number at a single position is over 32767,
1544         just remember 32767, instead of raising an overflow exception.
1545
1546         * setup.py
1547         fixed a typo.
1548
1549         * PeakDetect.py
1550         Bug fixed for diagnosis report.
1551
1552
1553 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1554         Version 1.2.2gamma
1555
1556         * Serious bugs fix:
1557
1558         Poisson distribution CDF and inverse CDF functions are
1559         corrected. They can produce right results even for huge lambda
1560         now. So that the p-value and FDR values in the final excel sheet
1561         are corrected.
1562
1563         IO package now can tolerate some rare cases; ELANDParser in IO
1564         package is fixed. (Thank Bogdan)
1565
1566         * Improvement:
1567
1568         Reverse paired peaks in model are rejected. So there will be no
1569         negative 'frag_len'. (Thank Bogdan)
1570
1571         * Features added:
1572
1573         Diagnosis function is completed. Which can output a table file for
1574         users to estimate their sequencing depth.
1575
1576
1577 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1578         Version 1.2
1579
1580         * Probe.py is added!
1581
1582         GSL is totally removed from MACS. Instead, I have implemented the
1583         CDF and inverse CDF for poisson and binomial distribution purely
1584         in python.
1585
1586         * Constants.py is added!
1587
1588         Organize constants used in MACS in the Constants.py file.
1589
1590         * All other files are modified!
1591
1592         Foldchange calculation is modified. Now the foldchange only be
1593         calculated at the peak summit position instead of the whole peak
1594         region. The values will be higher and more robust than before.
1595
1596         Features added:
1597
1598         1. MACS can save wiggle format files containing the tag number at
1599         every 10 bp along the genome. Tags are shifted according to our
1600         model before they are calculated.
1601
1602         2. Model building and local lambda calculation can be skipped with
1603         certain options.
1604
1605         3. A diagnosis report can be generated through '--diag'
1606         option. This report can help you get an assumption about the
1607         sequencing saturation. This funtion is only in beta stage.
1608
1609         4. FDR calculation speed is highly improved.
1610
1611 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1612         Version 1.1
1613
1614         * TabIO, PeakModel.py ...
1615         Bug fixed to let MACS tolerate some cases while there is no tag on
1616         either plus strand or minus strand.
1617
1618         * setup.py
1619         Check the version of python. If the version is lower than 2.4,
1620         refuse to install with warning.
1621
1622
1623 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1624         MACS version 2.0.10 20130731 (tag:alpha)
1625
1626         * callpeak --call-summits
1627
1628         Fix bugs causing callpeak --call-summits option generating extra
1629         number of peaks and inconsistent peak boundaries comparing to
1630         default option. Thank Ben Levinson!
1631
1632         * bdgcmp output
1633
1634         Fix bugs causing bdgcmp output logLR all in positive values. Now
1635         'depletion' can be correctly represented as negative values.
1636
1637         * bdgdiff
1638
1639         Fix the behavior of bdgdiff module. Now it can take four
1640         bedGraph files, then use logLR as cutoff to call differential
1641         regions. Check command line of bdgdiff for detail.
1642
1643 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1644         MACS version 2.0.10 20130713 (tag:alpha)
1645
1646         * fix bugs while output broadPeak and gappedPeak.
1647
1648         Note. Those weak broad regions without any strong enrichment
1649         regions inside won't be saved in gappedPeak file.
1650
1651         * bdgcmp -T and -C are merged into -S and description is updated.
1652
1653         Now, you can use it to override SPMR values in your input for
1654         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1655         statistics will cause weird results ( in most cases, lower
1656         significancy), and won't be consistent with MACS2 callpeak
1657         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1658         sample size in MILLION according to 'callpeak --to-large' option.
1659
1660 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1661         MACS version 2.0.10 20130710 (tag:alpha)
1662
1663         * fix BED style output format of callpeak module:
1664
1665         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1666         the output. Old BED format file won't be saved.
1667
1668         2) with --broad: broadPeak (BED6+3) for broad region and
1669         gappedPeak (BED12+3) for chained enriched regions will be the
1670         output. Old BED format, narrowPeak format, summit file won't be
1671         saved.
1672
1673         * bdgcmp now can accept list of methods to calculate scores. So
1674         you can run it once to generate multiple types of scores. Thank
1675         Jon Urban for this suggestion!
1676
1677         * C codes are re-generated through Cython 0.19.1.
1678
1679 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1680         MACS version 2.0.10 20130520 (tag:alpha)
1681
1682         * broad peak calling modules are modified in order to report all
1683         relexed regions even there is no strong enrichment inside.
1684
1685 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1686         MACS version 2.0.10 20130501 (tag:alpha)
1687
1688         * Memory usage is decreased to about 1/4-1/5 of previous usage
1689         Now, the internal data structure and algorithm are both
1690         re-organized, so that intermediate data wouldn't be saved in
1691         memory. Intead they will be calculated on the fly. New MACS2 will
1692         spend longer time (1.5 to 2 times) however it will use less memory
1693         so can be more usable on small mem servers.
1694
1695         * --seed option is added to callpeak and randsample commands
1696         Thank Mathieu Gineste for this suggestion!
1697
1698 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1699         MACS version 2.0.10 20130306 (tag:alpha)
1700
1701         * diffpeak module New module to detect differential binding sites
1702         with more statistics.
1703
1704         * Introduced --refine-peaks
1705         Calculates reads balancing to refine peak summits
1706
1707         * Ouput file names prefix
1708         Correct encodePeak to narrowPeak, broadPeak to bed12.
1709
1710 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1711         MACS version 2.0.10 (tag:alpha not released)
1712
1713         * Introduced BAMPEParser
1714         Reads PE data directly, requires bedtools for now
1715
1716         * Introduced --call-summits
1717         Uses signal processing methods to call overlapping peaks
1718
1719         * Added --no-trackline
1720         By default, files have descriptive tracklines now
1721
1722         * new refinepeak command (experimental)
1723         This new function will use a similar method in SPP (wtd), to
1724         analyze raw tag distribution in peak region, then redefine the
1725         peak summit where plus and minus tags are evenly distributed
1726         around.
1727
1728         * Changes to output *
1729         cPeakDetect.pyx has full support for new print/write methods and
1730         --call-peaks, BAMPEParser, and use of paired-end data
1731
1732         * Parser optimization
1733
1734         cParser.pyx is rewritten to use io.BufferedReader to speed
1735         up. Speed is doubled.
1736
1737         Code is reorganized -- most of functions are inherited from
1738         GenericParser class.
1739
1740         * Use cross-correlation to calculate fragment size
1741
1742         First, all pairs will be used in prediction for fragment
1743         size. Previously, only no more than 1000 pairs are used. Second,
1744         cross-correlation is used to find the best phase difference
1745         between + and - tag pileups.
1746
1747         * Speed up p-value and q-value calculation
1748
1749         This part is ten times faster now. I am using a dictionary to
1750         cache p-value results from Poisson CDF function. A bit more memory
1751         will be used to increase speed. I hope this dictionary would not
1752         explode since the possible pairs of ChIP signal and control lambda
1753         are hugely redundant. Also, I rewrited part of q-value
1754         calculation.
1755
1756         * Speed up peak detection
1757
1758         This part is about hundred of times faster now.  Optimizations
1759         include using Numpy functions as much as possible, and making loop
1760         body as small as possible.
1761
1762         * Post-processing on differential calls
1763
1764         After macs2diff finds differential binding sites between two
1765         conditions, it will try to annotate the peak calls from one of two
1766         conditions, describe the changes ...
1767
1768         * Fragment size prediction in macs2diff
1769
1770         Now by default, macs2diff will try to use the average fragment
1771         size from both condition 1 and condition 2 for tag extension and
1772         peak calling. Previously, by default, it will use different sizes
1773         unless --nomodel is specified.
1774
1775         Technically, I separate model building processes out. So macs2diff
1776         will build fragment sizes for condition 1 and 2 in parallel (2
1777         processes maximum), then perform 4-way comparisons in parallel (4
1778         processes maximum).
1779
1780         * Diff score
1781
1782         Combine two p/qscore tracks together. At regions where condition 1
1783         is higher than condition 2, score would be positive, otherwise,
1784         negative.
1785
1786         * SAMParser and BAMParser
1787
1788         Bug fixed for paired-end sequencing data.
1789
1790         * BedGraph.pyx
1791
1792         Fixed a bug while calling peaks from BedGraph file. It previously
1793         mistakenly output same peaks multiple times at the end of
1794         chromosome.
1795
1796 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1797         MACS version 2.0.9 (tag:alpha)
1798
1799         * Auto fixation on predicted d is turned off by default!
1800
1801         Previous --off-auto is now default. MACS will not automatically
1802         fix d less than 2 times of tag size according to
1803         --shiftsize. While tag size is getting longer nowadays, it would
1804         be easier to have d less than 2 times of tag size, however d may
1805         still be meaningful and useful. Please judge it using your own
1806         wisdom.
1807
1808         * Scaling issue
1809
1810         Now, the default scaling while treatment and input are unbalanced
1811         has been adjusted. By default, larger sample will be scaled down
1812         linearly to match the smaller sample. In this way, background
1813         noise will be reduced more than real signals, so we expect to have
1814         more specific results than the other way around (i.e. --to-large
1815         is set).
1816
1817         Also, an alternative option to randomly sample larger data
1818         (--down-sample) is provided to replace default linear
1819         scaling. However, this option will cause results irresproducible,
1820         so be careful.
1821
1822         * randsample script
1823
1824         A new script 'randsample'  is added, which can randomly sample
1825         certain percentage or number of tags.
1826
1827         * Peak summit
1828
1829         Now, MACS will decide peak summits according to pileup height
1830         instead of qvalue scores. In this way, the summit may be more
1831         accurate.
1832
1833         * Diff score
1834
1835         MACS calculate qvalue scores as differential scores. When compare
1836         two conditions (saying A and B), the maximum qscore for comparing
1837         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1838         will be computed. If maxqscore_a2b is bigger, the diff score is
1839         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1840
1841 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1842         MACS version 2.0.8 (tag:alpha)
1843
1844         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1845
1846         New script bdgbroadcall and the extra option '--broad' for macs2
1847         script, can be used to call broad regions with a loose cutoff to
1848         link nearby significant regions. The output is represented as
1849         BED12 format.
1850
1851         * MACS2/IO/cScoreTrack.pyx
1852
1853         Fix q-value calculation to generate forcefully monotonic values.
1854
1855         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1856
1857         They are combined to one more powerful script called
1858         "filterdup". The script filterdup can filter duplicated reads
1859         according to sequencing depth and genome size. The script can also
1860         convert any format supported by MACS to BED format.
1861
1862 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1863         MACS version 2.0.7 (tag:alpha)
1864
1865         * bin/macsdiff renamed to bin/bdgdiff
1866
1867         Now this script will work as a low-level finetuning tool as bdgcmp
1868         and bdgpeakcall.
1869
1870         * bin/macs2diff
1871
1872         A new script to take treatment and control files from two
1873         condition, calculate fragment size, use local poisson to get
1874         pvalues and BH process to get qvalues, then combine 4-ways result
1875         to call differential sites.
1876
1877         This script can use upto 4 cpus to speed up 4-ways calculation. (
1878         I am trying multiprocessing in python. )
1879
1880         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1881         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1882         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1883
1884         All above files are modified for the new macs2diff script.
1885
1886         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1887
1888         Now q-value 0.01 is the default cutoff. If -p is specified,
1889         p-value cutoff will be used instead.
1890
1891 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1892         MACS version 2.0.6 (tag:alpha)
1893
1894         * bin/macsdiff
1895
1896         A script to call differential regions. A naive way is introduced
1897         to find the regions where:
1898
1899         1. signal from condition 1 is larger than input 1 and condition 2 --
1900         unique region in condition 1;
1901         2. signal from condition 2 is larger than input 2 and condition 1
1902         -- unique region in condition 2;
1903         3. signal from condition 1 is larger than input 1, signal from
1904         condition 2 is larger than input 2, however either signal from
1905         condition 1 or 2 is not larger than the other.
1906
1907         Here 'larger' means the pvalue or qvalue from a Poisson test is
1908         under certain cutoff.
1909
1910         (I will make another script to wrap up mulitple scripts for
1911         differential calling)
1912
1913 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1914         MACS version 2.0.5 (tag:alpha)
1915
1916         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1917         MACS2/IO/cPeakIO.pyx
1918
1919         Use hash to store peak information. Add back the feature to deal
1920         with data without control.
1921
1922         Fix bug which incorrectly allows small peaks at the end of
1923         chromosomes.
1924
1925         * bin/bdgpeakcall, bin/bdgcmp
1926
1927         Fix bugs. bdgpeakcall can output encodePeak format.
1928
1929 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1930         MACS version 2.0.4 (tag:alpha)
1931
1932         * cPeakDetect.py
1933
1934         Fix a bug, correctly assign lambda_bg while --to-small is
1935         set. Thanks Junya Seo!
1936
1937         Add rank and num of bp columns to pvalue-qvalue table.
1938
1939         * cScoreTrack.py
1940
1941         Fix bugs to correctly deal with peakless chromosomes. Thanks
1942         Vaibhav Jain!
1943
1944         Use AFDR for independent tests instead.
1945
1946         * encodePeak
1947
1948         Now MACS can output peak coordinates together with pvalue, qvalue,
1949         summit positions in a single encodePeak format (designed for
1950         ENCODE project) file. This file can be loaded to UCSC
1951         browser. Definition of some specific columns are: 5th:
1952         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1953         -log10qvalue, 10th: relative summit position to peak start.
1954
1955
1956 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1957         MACS version 2.0.3 (tag:alpha)
1958
1959         * Rich output with qvalue, fold enrichment, and pileup height
1960
1961         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1962         procedure:
1963
1964         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1965
1966         Now we have a similiar xls output file as before. The differences
1967         from previous file are:
1968
1969         1. Summit now is absolute summit, instead of relative summit
1970            position;
1971         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1972            pileup at the peak summit;
1973         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1974            5.00 means 1e-5, simple and less confusing.
1975         4. FDR column becomes '-log10(qvalue)' column.
1976         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1977            the values at the peak summit.
1978
1979         * Extra output files
1980
1981         NAME_pqtable.txt contains pvalue and qvalue relationships.
1982
1983         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1984         and -log10qvalue scores in BedGraph format. Nearby regions with
1985         the same value are not merged.
1986
1987         * Separation of FeatIO.py
1988
1989         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1990         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1991         implemented to store pileup, local lambda, pvalue, and qvalue
1992         alltogether in cScoreTrack.pyx.
1993
1994         * Experimental option --half-ext
1995
1996         Suggested by NPS algorithm, I added an experimental option
1997         --half-ext to let MACS only extends ChIP fragment around its
1998         middle point for only 1/2 d.
1999
2000 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2001         MACS version 2.0.2 (tag:alpha)
2002
2003         * macs2
2004
2005         Add an error check to see if there is no common chromosome names
2006         from treatment file and control file
2007
2008         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
2009
2010         Reduce memory usage by removing deepcopy() calls.
2011
2012         * Modify README documents and others.
2013
2014 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
2015         MACS Version 2.0.1 (tag:alpha)
2016
2017         * cPileup.pyx, cPeakDetect.pyx and peak calling process
2018
2019         Jie suggested me a brilliant simple method to pileup fragments
2020         into bedGraph track. It works extremely faster than the previous
2021         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
2022         large local lambda calculation in MACSv2 now. Now I generate three
2023         bedGraphs for d-size local bias, slocal-size and llocal-size local
2024         bias, and calculate the maximum local bias as local lambda
2025         bedGraph track.
2026
2027         Minor: add_loc in bedGraphTrackI now can correctly merge the
2028         region with its preceding region if their value are the same.
2029
2030         * macs2
2031
2032         Add an option to shift control tags before extension. By default,
2033         control tags will be extended to both sides regardless of strand
2034         information.
2035
2036 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
2037         MACS Version 2.0.0 (tag:alpha)
2038
2039         * Use bedGraph type to store data internally and externally.
2040
2041         We can have theoretically one-basepair resolution profiles. 10
2042         times smaller in filesize and even smaller after converting to
2043         bigWig for visualization.
2044
2045         * Peak calling process modified. Better peak boundary detection.
2046
2047         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2048         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2049         one will be averaged to d size) Then calculate the maximum value
2050         of these two tracks and a global background, to have a
2051         local-lambda bedGraph.
2052
2053         Use -10log10poisson_pvalue as scores to generate a score track
2054         before peak calling.
2055
2056         A general peak calling based on a score cutoff, min length of peak
2057         and max gap between nearby peaks.
2058
2059         * Option changes.
2060
2061         Wiggle file output is removed. Now we only support bedGraph
2062         output. The generation of bedGraph is highly recommended since it
2063         will not cost extra time. In other words, bedGraph generation is
2064         internally run even you don't want to save bedGraphs on disk, due
2065         to the peak calling algorithm in MACS v2.
2066
2067         * cProb.pyx
2068
2069         We now can calculate poisson pvalue in log space so that the score
2070         (-10*log10pvalue) will not have a upper limit of 3100 due to
2071         precision of float number.
2072
2073         * Cython is adopted to speed up Python code.
2074
2075 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2076         Small fixes
2077
2078         * Replaced with a newest WigTrackI class and fixed the wignorm script.
2079
2080 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2081         Version 1.4.0rc2 (Valentine)
2082
2083         * --single-wig option is renamed to --single-profile
2084
2085         * BedGraph output with --bdg or -B option.
2086
2087         The BedGraph output provides 1bp resolution fragment pileup
2088         profile. File size is smaller than wig file. This option can be
2089         combined with --single-profile option to produce a bedgraph file
2090         for the whole genome. This option can also make --space,
2091         --call-subpeaks invalid.
2092
2093         * Fix the description of --shiftsize to correctly state that the
2094         value is 1/2 d (fragment size).
2095
2096         * Fix a bug in the call to __filter_w_control_tags when control is
2097         not available.
2098
2099         * Fix a bug on --to-small option. Now it works as expected.
2100
2101         * Fix a bug while counting the tags in candidate peak region, an
2102         extra tag may be included. (Thanks to Jake Biesinger!)
2103
2104         * Fix the bug for the peaks extended outside of chromosome
2105         start. If the minus strand tag goes outside of chromosome start
2106         after extension of d, it will be thrown out.
2107
2108         * Post-process script for a combined wig file:
2109
2110         The "wignorm" command can be called after a full run of MACS14 as
2111         a postprocess. wignorm can calculate the local background from the
2112         control wig file from MACS14, then use either foldchange,
2113         -10*log10(pvalue) from possion test, or difference after asinh
2114         transformation as the score to build a single wig track to
2115         represent the binding strength. This script will take a
2116         significant long time to process.
2117
2118         * --wigextend has been obsoleted.
2119
2120 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2121         Version 1.4.0rc1 (Starry Sky)
2122
2123         * Duplicate reads option
2124
2125         --keep-dup behavior is changed. Now user can specify how many
2126         reads he/she wants to keep at the same genomic location. 'auto' to
2127         let MACS decide the number based on binomial distribution, 'all'
2128         to let MACS keep all reads.
2129
2130         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2131
2132         By default, MACS will now scale the smaller dataset to the bigger
2133         dataset. For instance, if IP has 10 million reads, and Input has 5
2134         million, MACS will double the lambda value calculated from Input
2135         reads while calling BOTH the positive peaks and negative
2136         peaks. This will address the issue caused by unbalanced numbers of
2137         reads from IP and Input. If --to-small is turned on, MACS will
2138         scale the larger dataset to the smaller one. So from now on, if d
2139         is fixed, then the peaks from a MACS call for A vs B should be
2140         identical to the negative peaks from a B vs A.
2141
2142 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
2143         Version 1.4.0beta (summer wishes)
2144
2145         * New features
2146
2147         ** Model building
2148
2149         The default behavior in the model building step is slightly
2150         changed. When MACS can't find enough pairs to build model
2151         (implemented in alpha version) or the modeled fragment length is
2152         less than 2 times of tag length (implemented in beta version),
2153         MACS will use 2 times of --shiftsize value as fragment length in
2154         the later analysis. --off-auto can turn off this default behavior.
2155
2156         ** Redundant tag filtering
2157
2158         The IO module is rewritten. The redundant tag filtering process
2159         becomes simpler and works as promise. The maximum allowed number
2160         of tags at the exact same location is calculated from the
2161         sequencing depth and genome size using a binomial distribution,
2162         for both TREAMENT and CONTROL separately. ( previously only
2163         TREATMENT is considered ) The exact same location means the same
2164         coordination and the same strand. Then MACS will only keep at most
2165         this number of tags at the exact same location in the following
2166         analysis. An option --keep-dup can let MACS skip the filtering and
2167         keep all the tags. However this may bring in a lot of sequencing
2168         bias, so you may get many false positive peaks.
2169
2170         ** Single wiggle mode
2171
2172         First thing to mention, this is not the score track that I
2173         described before. By default, MACS generates wiggle files for
2174         fragment pileup for every chromosomes separately. When you use
2175         --single-wig option, MACS will generate a single wiggle file for
2176         all the chromosomes so you will get a wig.gz for TREATMENT and
2177         another wig.gz for CONTROL if available.
2178
2179         ** Sniff -- automatic format detection
2180
2181         Now, by default or "-f AUTO", MACS will decide the input file
2182         format automatically. Technically, it will try to read at most
2183         1000 records for the first 10 non-comment lines. If it succeeds,
2184         the format is decided. I recommend not to use AUTO and specify the
2185         right format for your input files, unless you combine different
2186         formats in a single MACS run.
2187
2188         * Options changes
2189
2190         --single-wig and --keep-dup are added. Check previous section in
2191         ChangeLog for detail.
2192
2193         -f (--format) AUTO is now the default option.
2194
2195         --slocal default: 1000
2196         --llocal default: 10000
2197
2198         * Bug fixed
2199
2200         Setup script will stop the installation if python version is not
2201         python2.6 or python2.7.
2202
2203         Local lambda calculation has been changed back. MACS will check
2204         peak_region, slocal( default 1K) and llocal (default 10K) for the
2205         local bias. The previous 200bps default will cause MACS misses
2206         some peaks where the input bias is very sharp.
2207
2208         sam2bed.py script is corrected.
2209
2210         Relative pos in xls output is fixed.
2211
2212         Parser for ELAND_export is fixed to pass some of the no match
2213         lines. And elandexport2bed.py is fixed too. ( however I can't
2214         guarantee that it works on any eland_export files. )
2215
2216 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2217         Version 1.4.0alpha2 (be smarter)
2218
2219         * Options changes
2220
2221         --gsize now provides shortcuts for common genomes, including
2222         human, mouse, C. elegans and fruitfly.
2223
2224         --llocal now will be 5000 bps if there is no input file, so that
2225         local lambda doesn't overkill enriched binding sites.
2226
2227 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2228         Version 1.4alpha (be smarter)
2229
2230         * Options changes
2231
2232         --tsize option is redesigned. MACS will use the first 10 lines of
2233         the input to decide the tag size. If user specifies --tsize, it
2234         will override the auto decided tsize.
2235
2236         --lambdaset is replaced by --slocal and --llocal which mean the
2237         small local region and large local region.
2238
2239         --bw has no effect on the scan-window size now. It only affects the
2240         paired-peaks model process.
2241
2242         * Model building
2243
2244         During the model building, MACS will pick out the enriched regions
2245         which are not too high and not too low to build the paired-peak
2246         model. Default the region is from fold 10 to fold 30. If MACS
2247         fails to build the model, by default it will use the nomodel
2248         settings, like shiftsize=100bps, to shift and extend each
2249         tags. This behavior can be turned off by '--off-auto'.
2250
2251         * Output files
2252
2253         An extra file including all the summit positions are saved in
2254         *_summits.bed file. An option '--call-subpeaks' will invoke
2255         PeakSplitter developed by Mali Salmon to split wide peaks into
2256         smaller subpeaks.
2257
2258         * Sniff ( will in beta )
2259
2260         Automatically recognize the input file format, so use can combine
2261         different format in one MACS run.
2262
2263         Not implemented features/TODO:
2264
2265         * Algorithms ( in near future? )
2266
2267         MACS will try to refine the peak boundaries by calculating the
2268         scores for every point in the candidate peak regions. The score
2269         will be the -10*log(10,pvalue) on a local poisson distribution. A
2270         cutoff specified by users (--pvalue) will be applied to find the
2271         precise sub-peaks in the original candidate peak region. Peak
2272         boudaries and peak summits positions will be saved in separate BED
2273         files.
2274
2275         * Single wiggle track ( in near future? )
2276
2277         A single wiggle track will be generated to save the scores within
2278         candidate peak regions in the 10bps resolution. The wiggle file
2279         is in fixedStep format.
2280
2281
2282 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2283         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2284
2285         * bin/Constants.py
2286
2287         Fixed typo. FCSTEP -> FESTEP
2288
2289         * lib/PeakDetect.py
2290
2291         The 'femax' attribute bug is fixed
2292
2293 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2294         Version 1.3.7 (Oktoberfest)
2295
2296         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2297
2298         Enhancements by Peter Chines:
2299
2300         1. gzip files are supported.
2301         2. when --diag is on, user can set the increment and endpoint for
2302         fold enrichment analysis by setting --fe-step and --fe-max.
2303
2304         Enhancements by Davide Cittaro:
2305
2306         1. BAM and SAM formats are supported.
2307         2. small changes in the header lines of wiggle output.
2308
2309         Enhancements by Me:
2310         1. I added --fe-min option;
2311         2. Bowtie ascii output with suffix ".map" is supported.
2312
2313         Bug fixed:
2314
2315         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2316         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2317         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2318         4. Some "fold change" have been changed to "fold enrichment".
2319
2320 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2321         Version 1.3.6.1 (default parameter change)
2322
2323         * bin/macs, lib/PeakDetect.py
2324
2325         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2326         default. "--futurefdr" is added which can turn on the 'new' method
2327         introduced in 1.3.6. By default it's off.
2328
2329         * lib/PeakDetect.py
2330
2331         Fixed a bug. p-value is corrected a little bit.
2332
2333
2334 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2335         Version 1.3.6 (Birthday cake)
2336
2337         * bin/macs
2338
2339         "track name" is added to the header of BED output file.
2340
2341         Now the default peak detection method is to consider 5k and 10k
2342         nearby regions in treatment data and peak location, 1k, 5k, and
2343         10k regions in control data to calculate local bias. The old
2344         method can be called through '--old' option.
2345
2346         Information about how many total/unique tags in treatment or
2347         control will be saved in final .xls output.
2348
2349         * lib/IO/__init__.py
2350
2351         ".fa" will be removed from input tag alignment so only the
2352         chromosome names are kept.
2353
2354         WigTrackI class is added for Wiggle like data structure. (not used
2355         now)
2356
2357         The parser for ELAND multi PET files has been fixed. Now the 5'
2358         tag position for a pair will be kept, whereas in the previous
2359         version, the middle points are kept.
2360
2361         * lib/IO/BinKeeper.py
2362
2363         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2364         browser, which can quickly access certain region for values in a
2365         large wiggle like data file. (not used now)
2366
2367         * lib/OptValidator.py
2368
2369         typo fixed.
2370
2371         * lib/PeakDetect.py
2372
2373         Now the default peak detection method is to consider 5k and 10k
2374         nearby regions in treatment data and peak location, 1k, 5k, and
2375         10k regions in control data to calculate local bias. The old
2376         method can be called through '--old' option.
2377
2378         Two columns have beed added to BED output file. 4th column: peak
2379         name; 5th column: peak score using -10log(10,pvalue) as score.
2380
2381         * setup.py
2382
2383         Add support to build a Mac App through 'setup.py py2app', or a
2384         Windows executable through 'setup.py py2exe'. You need to install
2385         py2app or py2exe package in order to use these functions.
2386
2387 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2388         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2389
2390         * PeakDetect.py
2391
2392         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2393         in control data to calculate local lambda for each peak. Peak
2394         calling results will be slightly different with previous version,
2395         beware!
2396
2397         * OptValidator.py
2398
2399         Typo fixed, ELANDParser -> ELANDResultParser
2400
2401         * OutputWriter.py
2402
2403         Now, modeled d value will be shown on the model figure.
2404
2405 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2406         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2407
2408         * macs, IO/__init__.py, PeakDetect.py
2409
2410         Add support for ELAND multi format. Add support for Pair-End
2411         experiment, in this case, 5'end and 3'end ELAND multi format files
2412         are required for treatment or control data. See 00README file for
2413         detail.
2414
2415         Add wigextend option.
2416
2417         Add petdist option for Pair-End Tag experiment, which is the best
2418         distance between 5' and 3' tags.
2419
2420         * PeakDetect.py
2421
2422         Fixed a bug which cause the end positions of every peak region
2423         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2424
2425         * OutputWriter.py
2426
2427         Fix bugs while generating wiggle files. The start position of
2428         wiggle file is set to 1 instead of 0.
2429
2430         Fix a bug that every 10M bps, signals in the first 'd' range are
2431         lower than actual. ( Thanks Mali Salmon!)
2432
2433
2434 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2435         Version 1.3.3 (wiggle bugs fixed)
2436
2437         * OutputWriter.py
2438
2439         Fix bugs while generating wiggle files. 1. 'span=' is added to
2440         'variableStep' line; 2. previously, every 10M bps, the coordinates
2441         were wrongly shifted to the right for 'd' basepairs.
2442
2443         * macs, PeakDetect.py
2444
2445         Add an option to save wiggle files on different resolution.
2446
2447 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2448         Version 1.3.2 (tiny bugs fixed)
2449
2450         * IO/__init__.py
2451
2452         Fix 65536 -> 65535. ( Thank Joon)
2453
2454         * Prob.py
2455
2456         Improved for binomial function with extra large number. Imported
2457         from Cistrome project.
2458
2459         * PeakDetect.py
2460
2461         If treatment channel misses reads in some chromosome included in
2462         control channel, or vice versa, MACS will not exit. (Thank Shaun
2463         Mahony)
2464
2465         Instead, MACS will fake a tag at position -1 when calling
2466         treatment peaks vs control, but will ignore the chromosome while
2467         calling negative peaks.
2468
2469 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2470         Version 1.3.1 (tiny bugs fixed version)
2471
2472         * Prob.py
2473
2474         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2475         binomial functions can tolerate large and small numbers.
2476
2477         * IO/__init__.py
2478
2479         Parsers now split lines in BED/ELAND file using any
2480         whitespaces. 'track' or 'browser' lines will be regarded as
2481         comment lines. A bug fixed when throwing StrandFormatError. The
2482         maximum redundant tag number at a single position can be no less
2483         than 65536.
2484
2485
2486 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2487         Version 1.3 (naming clarification version)
2488
2489         * Naming clarification changes according to our manuscript:
2490
2491         'frag_len' is changed to 'd'.
2492
2493         'fold_change' is changed to 'fold_enrichment'.
2494
2495         Suggest '--bw' parameter to be determined by users from the real
2496         sonication size.
2497
2498         Maximum FDR is 100% in the output file.
2499
2500         And other clarifications in 00README file and the documents on the
2501         website.
2502
2503         * IO/__init__.py
2504         If the redundant tag number at a single position is over 32767,
2505         just remember 32767, instead of raising an overflow exception.
2506
2507         * setup.py
2508         fixed a typo.
2509
2510         * PeakDetect.py
2511         Bug fixed for diagnosis report.
2512
2513
2514 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2515         Version 1.2.2gamma
2516
2517         * Serious bugs fix:
2518
2519         Poisson distribution CDF and inverse CDF functions are
2520         corrected. They can produce right results even for huge lambda
2521         now. So that the p-value and FDR values in the final excel sheet
2522         are corrected.
2523
2524         IO package now can tolerate some rare cases; ELANDParser in IO
2525         package is fixed. (Thank Bogdan)
2526
2527         * Improvement:
2528
2529         Reverse paired peaks in model are rejected. So there will be no
2530         negative 'frag_len'. (Thank Bogdan)
2531
2532         * Features added:
2533
2534         Diagnosis function is completed. Which can output a table file for
2535         users to estimate their sequencing depth.
2536
2537
2538 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2539         Version 1.2
2540
2541         * Probe.py is added!
2542
2543         GSL is totally removed from MACS. Instead, I have implemented the
2544         CDF and inverse CDF for poisson and binomial distribution purely
2545         in python.
2546
2547         * Constants.py is added!
2548
2549         Organize constants used in MACS in the Constants.py file.
2550
2551         * All other files are modified!
2552
2553         Foldchange calculation is modified. Now the foldchange only be
2554         calculated at the peak summit position instead of the whole peak
2555         region. The values will be higher and more robust than before.
2556
2557         Features added:
2558
2559         1. MACS can save wiggle format files containing the tag number at
2560         every 10 bp along the genome. Tags are shifted according to our
2561         model before they are calculated.
2562
2563         2. Model building and local lambda calculation can be skipped with
2564         certain options.
2565
2566         3. A diagnosis report can be generated through '--diag'
2567         option. This report can help you get an assumption about the
2568         sequencing saturation. This funtion is only in beta stage.
2569
2570         4. FDR calculation speed is highly improved.
2571
2572 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2573         Version 1.1
2574
2575         * TabIO, PeakModel.py ...
2576         Bug fixed to let MACS tolerate some cases while there is no tag on
2577         either plus strand or minus strand.
2578
2579         * setup.py
2580         Check the version of python. If the version is lower than 2.4,
2581         refuse to install with warning.
2582