ChangeLog

   1 2024-10-08  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS 3.0.3b
   3
   4         * Features added
   5
   6         1) We implemented the IO module for reading the fragment files
   7         usually used in single-cell ATAC-seq experiment
   8         `Parser.FragParser`. And we implemented a new
   9         `PairedEndTrack.PETrackII` to store the data in fragment file,
  10         including the barcodes and counts information. In the `PETrackII`
  11         class, we are able to extract a subset using a list of barcodes,
  12         which enables us to call peaks only on a pool (pseudo-bulk) of
  13         cells.
  14
  15         2) We extensively rewrote the `pyx` codes into `py` codes. In
  16         another words, we now apply the 'pure python style' with PEP-484
  17         type annotations to our previous Cython style codes. So that, the
  18         source codes can be more compatible to Python programming tools
  19         such as `flake8`. During rewritting, we cleaned the source codes
  20         even more, and removed unnecessary dependencies during
  21         compilation.
  22
  23         * Bug fixed
  24
  25         1) Fix issues in big-endian system in `Parser.py` codes. Enable
  26         big-endian support in `BAM.py` codes for accessig certain
  27         alignment records that overlap with givin genomic
  28         coordinates using BAM/BAI files.
  29
  30         * Doc
  31
  32         1) Explanation on the filtering criteria on SAM/BAM/BAMPE files.
  33
  34 2024-09-06  Tao Liu  <vladimir.liu@gmail.com>
  35         MACS 3.0.2
  36
  37         * Features added
  38
  39         1) Introduce a new emission model for the `hmmratac` function. Now
  40         users can choose the simpler Poisson emission `--hmm-type poisson`
  41         instead of the default Gaussian emission. As a consequence, the
  42         saved HMM model file in json will include the hmm-type information
  43         as well. Note that in order to be compatible with the HMM model
  44         file from previous version, if there is no hmm-type information in
  45         the model file, the hmm-type will be assigned as gaussian. #635
  46
  47         2) `hmmratac` now output narrowPeak format output. The summit
  48         position and the peak score columns reported in the narrowPeak
  49         output represents the position with highest foldchange value
  50         (pileup vs average background).
  51
  52         3) Add `--cutoff-analysis-steps` and `--cutoff-analysis-max` for
  53         `callpeak`, `bdgpeakcall`, and `hmmratac` so that we can
  54         have finer resolution of the cutoff analysis report. #636  #642
  55
  56         4) Reduce memory usage of `hmmratac` during decoding step, by
  57         writing decoding results to a temporary file on disk (file
  58         location depends on the environmental TEMP setting), then loading
  59         it back while identifying state pathes. This change will decrease
  60         the memory usage dramatically. #628 #640
  61
  62         5) Fix instructions for preparing narrowPeak files for uploading
  63         to UCSC browser, with the `--trackline` option in `callpeak`. #653
  64
  65         6) For gappedPeak output, set thickStart and thickEnd columns as
  66         0, according to UCSC definition.
  67
  68         * Bugs fixed
  69
  70         1) Use `-O3` instead of `-Ofast` for compatibility. #637
  71
  72         * Documentation
  73
  74         1) Update instruction to install macs3 through conda/bioconda
  75
  76         2) Reorganize MACS3 docs and publish through
  77         https://macs3-project.github.io/MACS
  78
  79         3) Description on various file formats used in MACS3.
  80
  81 2024-02-19  Tao Liu  <vladimir.liu@gmail.com>
  82         MACS 3.0.1
  83
  84         * Bugs fixed
  85
  86         1) Fixed a bug that the `hmmatac` can't correctly save the
  87         digested signal files. #605 #611
  88
  89         2) Applied a patch to remove cython requirement from the installed
  90         system. (it's needed for building the package). #606 #612
  91
  92         3) Relax the testing script while comparing the peaks called from
  93         current codes and the standard peaks. To implement this, we added
  94         'intersection' function to 'Regions' class to find the
  95         intersecting regions of two Regions object (similar to PeakIO but
  96         only recording chromosome, start and end positions). And we
  97         updated the unit test 'test_Region.py' then implemented a script
  98         'jaccard.py' to compute the Jaccard Index of two peak files. If
  99         the JI > 0.99 we would think the peaks called and the standard
 100         peaks are similar. This is to avoid the problem caused by
 101         different Numpy/SciPy/sci-kit learn libraries, when certain peak
 102         coordinates may have 10bps difference. #615 #619
 103
 104         4) Due to the changes in scikit-learn 1.3.0:
 105         https://scikit-learn.org/1.3/whats_new/v1.3.html: The way hmmlearn
 106         0.3 uses Kmeans will end up with inconsistent results between
 107         sklearn <1.3 and sklearn >=1.3. Therefore, we patched the class
 108         hmm.GaussianHMM and adjusted the standard output from `hmmratac`
 109         subcommand. The change is based on
 110         https://github.com/hmmlearn/hmmlearn/pull/545. The idea is to do
 111         the random seeding of KMeans 10 times. Now the `hmmratac` results
 112         should be more consistent (at least JI>0.99). #615 #620
 113
 114         * Other
 115
 116         1) We added some dependencies to MACS3. `hmmratc` subcommand needs
 117         `hmmlearn` library, `hmmlearn` needs `scikit-learn` and
 118         `scikit-learn` needs `scipy`. Since major releases have happened
 119         for both`scipy` and `scikit-learn`, we have to set specific
 120         version requirements for them in order to make sure the output
 121         results from `hmmratac` are consistent.
 122
 123         2) We updated our documentation website using
 124         Sphinx. https://macs3-project.github.io/MACS/
 125
 126 2023-11-15  Tao Liu  <vladimir.liu@gmail.com>
 127         MACS 3.0.0
 128
 129         1) Call variants in peak regions directly from BAM files. The
 130         function was originally developed under code name SAPPER. Now
 131         SAPPER has been merged into MACS as the `callvar` command. It can
 132         be used to call SNVs and small INDELs directly from alignment
 133         files for ChIP-seq or ATAC-seq. We call `fermi-lite` to assemble
 134         the DNA sequence at the enriched genomic regions (binding sites or
 135         accessible DNA) and to refine the alignment when necessary. We
 136         added `simde` as a submodule in order to support fermi-lite
 137         library under non-x64 architectures.
 138
 139         2) HMMRATAC module is added as subcommand `hmmratac`. HMMRATAC is
 140         a dedicated software to analyze ATAC-seq data. The basic idea
 141         behind HMMRATAC is to digest ATAC-seq data according to the
 142         fragment length of read pairs into four signal tracks: short
 143         fragments, mono-nucleosomal fragments, di-nucleosomal fragments
 144         and tri-nucleosomal fragments. Then integrate the four tracks
 145         again using Hidden Markov Model to consider three hidden states:
 146         open region, nucleosomal region, and background region. The
 147         orginal paper was published in 2019 written in JAVA, by Evan
 148         Tarbell. We implemented it in Python/Cython and optimize the whole
 149         process using existing MACS functions and hmmlearn. Now it can run
 150         much faster than the original JAVA version. Note: evaluation of
 151         the peak calling results is still underway.
 152
 153         3) Speed/memory optimization.  Use the cykhash to replace python
 154         dictionary. Use buffer (10MB) to read and parse input file (not
 155         available for BAM file parser). And many optimization tweaks. We
 156         added memory monitoring to the runtime messages.
 157
 158         4) R wrappers for MACS -- MACSr for bioconductor.
 159
 160         5) Code cleanup. Reorganize source codes.
 161
 162         6) Unit testing.
 163
 164         7) Switch to Github Action for CI, support multi-arch testing
 165         including x64, armv7, aarch64, s390x and ppc64le. We also test on
 166         Mac OS 12.
 167
 168         8) MACS tag-shifting model has been refined. Now it will use a
 169         naive peak calling approach to find ALL possible paired peaks at +
 170         and - strand, then use all of them to calculate the
 171         cross-correlation. (a related bug has been fix
 172         [#442](https://github.com/macs3-project/MACS/issues/442))
 173
 174         9) BAI index and random access to BAM file now is
 175         supported. [#449](https://github.com/macs3-project/MACS/issues/449).
 176
 177         10) Support of Python > 3.10
 178         [#498](https://github.com/macs3-project/MACS/issues/498)
 179
 180         11) The effective genome size parameters have been updated
 181         according to
 182         deeptools. [#508](https://github.com/macs3-project/MACS/issues/508)
 183
 184         12) Multiple updates regarding dependencies, anaconda built, CI/CD
 185         process.
 186
 187         13) Cython 3 is supported.
 188
 189         14) Documentations for each subcommand can be found under /docs
 190
 191         *Other*
 192
 193         1) Missing header line while no peaks can be called
 194         [#501](https://github.com/macs3-project/MACS/issues/501)
 195         [#502](https://github.com/macs3-project/MACS/issues/502)
 196
 197         2) Note: different numpy, scipy, sklearn may give slightly
 198         different results for hmmratac results. The current standard
 199         results for automated testing in `/test` directory are from Numpy
 200         1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
 201
 202 2020-04-11  Tao Liu  <vladimir.liu@gmail.com>
 203         MACS version 2.2.7.1
 204
 205         * hotfix:
 206
 207         Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
 208         work.
 209
 210 2020-04-10  Tao Liu  <vladimir.liu@gmail.com>
 211         MACS version 2.2.7
 212
 213         * Bugs fixed
 214
 215         1) MACS2 has been tested on multiple architectures to make sure it
 216         can successfully generate consistent results. Currently the
 217         supported architectures are: AMD64, ARM64, i386, PPC64LE, and
 218         S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
 219         #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
 220         and #370. The lesson is that if the project is built on Cython and
 221         is aimed at memory efficiency, we should specifically define all
 222         int/float types in pyx files such as int8_t or uint32_t using
 223         either libc or numpy (c version) instead of relying on Cython
 224         types such as short, long, double.
 225
 226         2) MACS2 setup script will check numpy and install numpy if
 227         necessary. PR #378, issue #364
 228
 229         3) `bdgbroadcall` command will correctly add the score column (5th
 230         column). The score (5th) column contains 10 times of the average
 231         score in the broad region. PR #373, issue #362
 232
 233         4) The missing test on `bdgopt` subcommand has been added. PR #363
 234
 235         5) The obsolete option `--ratio` from `callpeak` subcommand has
 236         been removed. PR #369, issue #366
 237
 238         6) Fixed the incorrect description in README on the 'maximum
 239         length of broad region is 4 times of d' to 'maximum gap for
 240         merging broad regions is 4 times of tag size by default'. PR #380,
 241         issue #365.
 242
 243         * Other
 244
 245         1) CODE OF CONDUCT document has been added to MACS2 github
 246         repository. PR #358
 247
 248 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
 249         MACS version 2.2.6
 250
 251         * New Features
 252
 253         1) Speed up MACS2. Some programming tricks and code cleanup. The
 254         filter_dup function replaces separate_dups. The later one was
 255         implemented for potentially putting back duplicate reads in
 256         certain downstream analysis. However such analysis hasn't been
 257         implemented. Optimize the speed of writing bedGraph
 258         files. Optimize BAM and BAMPE parsing with pointer casting instead
 259         of python unpack.
 260
 261         2) The comment lines in the headers of BED or SAM files will be
 262         correctly skipped. However, MACS2 won't check comment lines in the
 263         middle of the file.
 264
 265         * Bugs fixed
 266
 267         1) Cutoff-analysis in callpeak command. #341
 268
 269         2) Issues related to SAMParser and three ELAND Parsers are
 270         fixed. #347
 271
 272         * Other
 273
 274         1) cmdlinetest script in test/ folder has been updated to: 1. test
 275         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
 276         and after the error or warning message during tests; 3. output
 277         only the first 10 lines if the difference between test result and
 278         standard result can be found; 4. prockreport monitor CPU time and
 279         memory usage in 1 sec interval -- a bit more accurate.
 280
 281         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
 282
 283 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
 284         MACS version 2.2.5 (Py3 speed up)
 285
 286         * Features added
 287
 288         1) *Github code only and Not included in MACS2 release* New
 289         testing data for performance test. An subsampled ENCODE2 CTCF
 290         ChIP-seq dataset, including 5million ChIP reads and 5 million
 291         control reads, has been included in the test folder for testing
 292         CPU and memory usage (i.e. 5M test). Several related scripts ,
 293         including `prockreport` for output cpu memory usage, `pyprofile`
 294         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
 295         been included.
 296
 297         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
 298         The old hashtable.pyx implementation copied from Pandas (very old
 299         version) doesn't work well in Python3+Cython. It slows down the
 300         pqtable checkup using the identical Cython codes as in
 301         v2.1.4. While running 5M test, the `__getitem__` function in the
 302         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
 303         148.6s with the same number of calls in MACS2 v2.2.4. As a
 304         consequence, the standard python dictionary implementation has
 305         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
 306         faster than py2 version, but uses a bit more memory. In general,
 307         v2.2.5 can finish 5M reads test in 20% less time than MACS2
 308         v2.1.4, but use 15% more memory.
 309
 310         * Bug fixed
 311
 312         1) More Python3 related fixes, e.g. the return value of keys from
 313         py3 dict. #333 #337
 314
 315
 316 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
 317         MACS version 2.2.4 (Python3)
 318
 319         * Features added
 320
 321         1) First Python3 version MACS2 released.
 322
 323         2) Version number 2.2.X will be used for MACS2 in Python3, in
 324         parallel to 2.1.X.
 325
 326         3) More comprehensive test.sh script to check the consistency of
 327         results from Python2 version and Python3 version.
 328
 329         4) Simplify setup.py script since the newest version transparently
 330         supports cython. And when cython is not installed by the user,
 331         setup.py can still compile using only C codes.
 332
 333         5) Fix Signal.pyx to use np.array instead of np.mat.
 334
 335 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
 336         MACS version 2.1.4
 337
 338         * Features added
 339
 340         Github Actions is used together with Travis CI for testing and
 341         deployment.
 342
 343         * Bugs fixed
 344
 345         PR #322:
 346
 347         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 348         not initialized as 0 before adding. Potential bugs are fixed in
 349         other functions in ScoreTrack and CallPeakUnit codes.
 350
 351         2) #321 Cython dependency in setup.py script is removed. And place
 352         'cythonzie' call to the correct position.
 353
 354         3) A typo is fixed in Github Actions script.
 355
 356 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 357         MACS version 2.1.3.3
 358
 359         * Features added
 360
 361         1) Support Docker auto-deploy. PR #309
 362
 363         2) Support Travis CI auto-testing, update unit-testing
 364         scripts, and enable subcommand testing on small datasets.
 365
 366         3) Update README documents. #297 PR #306
 367
 368         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 369         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 370
 371         5) `--d-min` option is added in `callpeak` and `predictd`, to
 372         exclude predictions of fragment size smaller than the given
 373         value. Merged from PR #267 @shouldsee.
 374
 375         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 376         `pileup` and `refinepeak` subcommands. Users can use this option
 377         to decrease memory usage while there are a large number of contigs
 378         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 379         `pileup` and `refinepeak` will suggest users to tweak
 380         `--buffer-size` while catching a MemoryError. #313 PR #314
 381
 382         * Bugs fixed
 383
 384         1) #265 Fixed a bug where the pseudocount hasn't been applied
 385         while calculating p-value score in ScoreTrack object.
 386
 387         2) Fixed bdgbroadcall so that it will report those broad peaks
 388         without strong peak inside, a consistent behavior as `callpeak
 389         --broad`.
 390
 391         3) Rename COPYING to LICENSE.
 392
 393 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 394         MACS version 2.1.2
 395
 396         * New features
 397
 398         1) Added missing BEDPE support. And enable the support for BAMPE
 399         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 400         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 401         will pile up the whole fragment defined by mapping locations of
 402         the left end and right end of each read pair. Thank @purcaro
 403
 404         2) Added options to callpeak command for tweaking max-gap and
 405         min-len during peak calling. Thank @jsh58!
 406
 407         3) The callpeak option "--to-large" option is replaced with
 408         "--scale-to large".
 409
 410         4) The randsample option "-t" has been replaced with "-i".
 411
 412         * Bug fixes
 413
 414         1) Fixed memory issue related to #122 and #146
 415
 416         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 417
 418         3) Fixed a bug while setting commandline qvalue cutoff.
 419
 420         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 421
 422         5) Fixed the calculation of average fragment length for paired-end
 423         data. Thank @jsh58
 424
 425         6) Fixed bugs caused by khash while computing p/q-value and log
 426         likelihood ratios. Thank @jsh58
 427
 428         7) More spelling tweaks in source code. Thank @mr-c
 429
 430 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 431         MACS version 2.1.1 20160309
 432
 433         * Retire the tag:rc.
 434
 435         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 436
 437         * Change filtering criteria for reading BAM/SAM files
 438
 439         Related to callpeak and filterdup commands. Now the
 440         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 441         still be read although MACS2 may decide them as duplicates
 442         later. Related to old issue #33. Sorry I forgot to address it for
 443         years!
 444
 445 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 446         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 447
 448         * Bug fixes
 449
 450         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 451         the former option is not supported by older GCC. Related to issues
 452         #91, #109.
 453
 454         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 455         the PeakIO won't throw an error.
 456
 457         * New features
 458
 459         1) callpeak
 460
 461         a) A more flexible format, BEDPE, is supported. Now users can
 462         define the left and right position of the ChIPed fragment, and
 463         MACS2 will skip model building and directly pileup the
 464         fragments. Related to issue #112.
 465
 466         b) The 'tempdir' can be specified, to save cached pileup
 467         tracks. Originially, the temporary files were stored in
 468         /tmp. Thank @daler! Related to issues #97 and #105.
 469
 470         2) bdgopt
 471
 472         New operations are added, to calculate the maximum or minimum value between
 473         values in BEDGRAPH and given value.
 474
 475         3) bdgcmp
 476
 477         New method is added, to calculate the maximum value between values
 478         defined in two BEDGRAPH files.
 479
 480 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 481         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 482
 483         * Bug fixes
 484
 485         1) Fix a bug while dealing with some chromosomes only containing
 486         one read (pair). The size of dup_plus/dup_minus arrays after
 487         filtering dups should +1.
 488
 489         2) Fix a bug related to the broad peak calling function in
 490         previous versions. The gaps were miscalculated, so segmented weak
 491         broad calls may be reported, and sometimes you would see peaks
 492         with lower than cutoff values in the output files.
 493
 494         3) "Potentially" Fixed issue #105 on temporary cache files, need
 495         further followup.
 496
 497
 498 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 499         MACS version 2.1.0 20150731 (tag:rc)
 500
 501         * Bug fixes
 502
 503         1) Fixed issue #76: information about broad/narrow cutoff will be
 504         correctly displayed.
 505
 506         2) Fixed issue #79: bdgopt extparam option is fixed.
 507
 508         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 509         for filterdup command.
 510
 511         4) Fixed issue #78, #88 and similar issue reported in MACS google
 512         group: MACS2 now can correctly deal with multiple alignment files
 513         for -t or -c. The 'finalize' function will be correctly
 514         called. Multiple files option is enabled for filterdup,
 515         randsample, predictd, pileup and refinepeak commands.
 516
 517         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 518         be sorted by leftmost then rightmost ends.
 519
 520         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 521         array. This will cause 'callpeak --nolambda' hang forever while
 522         calculating pvalues and qvalues.
 523
 524 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 525         MACS version 2.1.0 20150420 (tag:rc)
 526
 527         * New commands
 528
 529         1) bdgopt: some convenient functions to modify bedGraph files.
 530
 531         2) cmbreps: Combine scores from two replicates. Including three
 532         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 533         method to combine two p-value scores. After that, user can use
 534         bdgpeakcall to call peaks on combined scores.
 535
 536         * New features
 537
 538         1) callpeak and bdgpeakcall now can try to analyze the
 539         relationship between p-values and number/length of peaks then
 540         generate a summary to help users decide an appropriate cutoff.
 541
 542         2) callpeak now can accept fold-enrichment cutoff as a filter for
 543         final peak calls.
 544
 545         * Performance
 546
 547         Now MACS2 runs about 3X as fast as previous version. Trade
 548         clean python codes for speed... Now while processing 50M ChIP vs
 549         50M control, it will take only 10 minutes.
 550
 551         * Bug fixes
 552
 553         1) Sampling function in BAMPE mode.
 554
 555         2) Callpeak while there are >= 2 input files for -t or -c.
 556
 557         3) While reading BAM/SAM, those secondary or supplementary
 558         alignments will be correctly skipped.
 559
 560         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 561         option that MACS2 will discard those SAM/BAM alignments with bit
 562         1024 no matter how --keep-dup is set.
 563
 564         5) Fixed issue #49: setuptools is used intead of distutils
 565
 566         6) Fixed issue #51: fix the problem when using --trackline
 567         argument when control file is absent.
 568
 569         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 570         read mapped to minus strand. Previous implementation will find
 571         incorrect 5' end if there is indel in alignment.
 572
 573         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 574         mode which will cause incorrect filtering of duplicated reads. Now
 575         fixed.
 576
 577         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 578         be 1.
 579
 580         10) Issue #71: Merged from aertslab@github, close file descriptor
 581         after creating them with mkstemp().
 582
 583 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 584         MACS version 2.1.0 20140616 (tag:rc)
 585
 586         * callpeak module
 587
 588         "--ratio" is added to manually assign the scaling factor of ChIP
 589         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 590         implementing the patch file!
 591
 592         "--shift" is added to move cutting ends (5' end of reads) around,
 593         in order to process DNAse-Seq data, e.g., use "--shift -100
 594         --extsize 200" to get 200bps fragments around 5' ends. For general
 595         ChIP-Seq data analysis, this option should be always set as
 596         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 597         group!
 598
 599         ** Do not output negative fragment size from cross-correlation
 600         analysis. Thank Alvin Qin for the feedback!
 601
 602         ** --half-ext and --control-shift are removed. For complex read
 603         shifting and extending, combine '--shift' and '--extsize'
 604         options. For comparing two conditions, use 'bdgdiff' module
 605         instead.
 606
 607         ** a bug is fixed to output the last pileup value in bdg file
 608         correctly.
 609
 610         * filterdup
 611
 612         A 'dry-run' option is added to only output numbers, including the
 613         number of allowed duplicates, the total number of reads before and
 614         after filtering duplicates and the estimated duplication
 615         rate. Thank John Urban for the suggestion!
 616
 617
 618 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 619         MACS version 2.0.10 20131216 (tag:alpha)
 620
 621         bug fixes and tweaks
 622
 623         * We changed license from Artistic License to 3-clauses BSD license.
 624
 625         Yes. Simpler the better.
 626
 627         * Process paired-end data with "-f BAMPE" without control
 628
 629         * GappedPeak output for --broad option has been fixed again to be
 630         consistent with official UCSC format. We add 1bp pseudo-block to
 631         left and/or right of broad region when necessary, so that you can
 632         virtualize the regions without strong enrichment inside
 633         successfully. In downstream analysis except for virtualization,
 634         you may need to remove all 1bps blocks from gappedPeak file.
 635
 636         * diffpeak subcommand is temporarily disabled. Till we
 637         re-implement it.
 638
 639 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 640         MACS version 2.0.10 20131028 (tag:alpha)
 641
 642         * callpeak --call-summits improvement
 643
 644         The smoothing window length has been fixed as fragment length
 645         instead of short read length. The larger smoothing window will
 646         grant better smoothing results and better sub-peak summits
 647         detection.
 648
 649         * --outdir and --ofile options for almost all commands
 650
 651         Thank Björn Grüning for initially implementing these options!
 652         Now, MACS2 will save results into a specified
 653         directory by '--outdir' option, and/or save result into a
 654         specified file by '--ofile' option. Note, in case '--ofile' is
 655         available for a subcommand, '-o' now has been adjusted to be the
 656         same as '--ofile' instead of '--o-prefix'.
 657
 658         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 659         for each subcommand:
 660
 661         ** callpeak: --outdir
 662         ** diffpeak: Not implemented
 663         ** bdgpeakcall: --outdir and --ofile
 664         ** bdgbroadcall: --outdir and --ofile
 665         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 666         and the order of arguments for --ofile must be the same as for -m.
 667         ** bdgdiff: --outdir and --ofile
 668         ** filterdup: --outdir
 669         ** pileup: --outdir
 670         ** randsample: --outdir
 671         ** refinepeak: --outdir and --ofile
 672
 673
 674 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 675         MACS version 2.0.10 20130915 (tag:alpha)
 676
 677         * callpeak Added a new option --buffer-size
 678
 679         This option is to tweak a previously hidden parameter that
 680         controls the steps to increase array size for storing alignment
 681         information. While in some rare cases, the number of
 682         chromosomes/contigs/scaffolds is huge, the original default
 683         setting will cause a huge memory waste. In these cases, we
 684         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 685         although the decrease will slow process to read alignment files.
 686
 687         * an optimization to speed up pvalue-qvalue statistics
 688
 689         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 690         human TF library, and now it will take 10 minutes. It was due to a
 691         single line of code to get a value from a numpy array ...
 692
 693         * fixed logLR bugs.
 694
 695 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 696         MACS version 2.0.10 20130731 (tag:alpha)
 697
 698         * callpeak --call-summits
 699
 700         Fix bugs causing callpeak --call-summits option generating extra
 701         number of peaks and inconsistent peak boundaries comparing to
 702         default option. Thank Ben Levinson!
 703
 704         * bdgcmp output
 705
 706         Fix bugs causing bdgcmp output logLR all in positive values. Now
 707         'depletion' can be correctly represented as negative values.
 708
 709         * bdgdiff
 710
 711         Fix the behavior of bdgdiff module. Now it can take four
 712         bedGraph files, then use logLR as cutoff to call differential
 713         regions. Check command line of bdgdiff for detail.
 714
 715 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 716         MACS version 2.0.10 20130713 (tag:alpha)
 717
 718         * fix bugs while output broadPeak and gappedPeak.
 719
 720         Note. Those weak broad regions without any strong enrichment
 721         regions inside won't be saved in gappedPeak file.
 722
 723         * bdgcmp -T and -C are merged into -S and description is updated.
 724
 725         Now, you can use it to override SPMR values in your input for
 726         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 727         statistics will cause weird results ( in most cases, lower
 728         significancy), and won't be consistent with MACS2 callpeak
 729         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 730         sample size in MILLION according to 'callpeak --to-large' option.
 731
 732 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 733         MACS version 2.0.10 20130710 (tag:alpha)
 734
 735         * fix BED style output format of callpeak module:
 736
 737         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 738         the output. Old BED format file won't be saved.
 739
 740         2) with --broad: broadPeak (BED6+3) for broad region and
 741         gappedPeak (BED12+3) for chained enriched regions will be the
 742         output. Old BED format, narrowPeak format, summit file won't be
 743         saved.
 744
 745         * bdgcmp now can accept list of methods to calculate scores. So
 746         you can run it once to generate multiple types of scores. Thank
 747         Jon Urban for this suggestion!
 748
 749         * C codes are re-generated through Cython 0.19.1.
 750
 751 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 752         MACS version 2.0.10 20130520 (tag:alpha)
 753
 754         * broad peak calling modules are modified in order to report all
 755         relexed regions even there is no strong enrichment inside.
 756
 757 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 758         MACS version 2.0.10 20130501 (tag:alpha)
 759
 760         * Memory usage is decreased to about 1/4-1/5 of previous usage
 761         Now, the internal data structure and algorithm are both
 762         re-organized, so that intermediate data wouldn't be saved in
 763         memory. Intead they will be calculated on the fly. New MACS2 will
 764         spend longer time (1.5 to 2 times) however it will use less memory
 765         so can be more usable on small mem servers.
 766
 767         * --seed option is added to callpeak and randsample commands
 768         Thank Mathieu Gineste for this suggestion!
 769
 770 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 771         MACS version 2.0.10 20130306 (tag:alpha)
 772
 773         * diffpeak module New module to detect differential binding sites
 774         with more statistics.
 775
 776         * Introduced --refine-peaks
 777         Calculates reads balancing to refine peak summits
 778
 779         * Ouput file names prefix
 780         Correct encodePeak to narrowPeak, broadPeak to bed12.
 781
 782 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 783         MACS version 2.0.10 (tag:alpha not released)
 784
 785         * Introduced BAMPEParser
 786         Reads PE data directly, requires bedtools for now
 787
 788         * Introduced --call-summits
 789         Uses signal processing methods to call overlapping peaks
 790
 791         * Added --no-trackline
 792         By default, files have descriptive tracklines now
 793
 794         * new refinepeak command (experimental)
 795         This new function will use a similar method in SPP (wtd), to
 796         analyze raw tag distribution in peak region, then redefine the
 797         peak summit where plus and minus tags are evenly distributed
 798         around.
 799
 800         * Changes to output *
 801         cPeakDetect.pyx has full support for new print/write methods and
 802         --call-peaks, BAMPEParser, and use of paired-end data
 803
 804         * Parser optimization
 805
 806         cParser.pyx is rewritten to use io.BufferedReader to speed
 807         up. Speed is doubled.
 808
 809         Code is reorganized -- most of functions are inherited from
 810         GenericParser class.
 811
 812         * Use cross-correlation to calculate fragment size
 813
 814         First, all pairs will be used in prediction for fragment
 815         size. Previously, only no more than 1000 pairs are used. Second,
 816         cross-correlation is used to find the best phase difference
 817         between + and - tag pileups.
 818
 819         * Speed up p-value and q-value calculation
 820
 821         This part is ten times faster now. I am using a dictionary to
 822         cache p-value results from Poisson CDF function. A bit more memory
 823         will be used to increase speed. I hope this dictionary would not
 824         explode since the possible pairs of ChIP signal and control lambda
 825         are hugely redundant. Also, I rewrited part of q-value
 826         calculation.
 827
 828         * Speed up peak detection
 829
 830         This part is about hundred of times faster now.  Optimizations
 831         include using Numpy functions as much as possible, and making loop
 832         body as small as possible.
 833
 834         * Post-processing on differential calls
 835
 836         After macs2diff finds differential binding sites between two
 837         conditions, it will try to annotate the peak calls from one of two
 838         conditions, describe the changes ...
 839
 840         * Fragment size prediction in macs2diff
 841
 842         Now by default, macs2diff will try to use the average fragment
 843         size from both condition 1 and condition 2 for tag extension and
 844         peak calling. Previously, by default, it will use different sizes
 845         unless --nomodel is specified.
 846
 847         Technically, I separate model building processes out. So macs2diff
 848         will build fragment sizes for condition 1 and 2 in parallel (2
 849         processes maximum), then perform 4-way comparisons in parallel (4
 850         processes maximum).
 851
 852         * Diff score
 853
 854         Combine two p/qscore tracks together. At regions where condition 1
 855         is higher than condition 2, score would be positive, otherwise,
 856         negative.
 857
 858         * SAMParser and BAMParser
 859
 860         Bug fixed for paired-end sequencing data.
 861
 862         * BedGraph.pyx
 863
 864         Fixed a bug while calling peaks from BedGraph file. It previously
 865         mistakenly output same peaks multiple times at the end of
 866         chromosome.
 867
 868 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 869         MACS version 2.0.9 (tag:alpha)
 870
 871         * Auto fixation on predicted d is turned off by default!
 872
 873         Previous --off-auto is now default. MACS will not automatically
 874         fix d less than 2 times of tag size according to
 875         --shiftsize. While tag size is getting longer nowadays, it would
 876         be easier to have d less than 2 times of tag size, however d may
 877         still be meaningful and useful. Please judge it using your own
 878         wisdom.
 879
 880         * Scaling issue
 881
 882         Now, the default scaling while treatment and input are unbalanced
 883         has been adjusted. By default, larger sample will be scaled down
 884         linearly to match the smaller sample. In this way, background
 885         noise will be reduced more than real signals, so we expect to have
 886         more specific results than the other way around (i.e. --to-large
 887         is set).
 888
 889         Also, an alternative option to randomly sample larger data
 890         (--down-sample) is provided to replace default linear
 891         scaling. However, this option will cause results irresproducible,
 892         so be careful.
 893
 894         * randsample script
 895
 896         A new script 'randsample'  is added, which can randomly sample
 897         certain percentage or number of tags.
 898
 899         * Peak summit
 900
 901         Now, MACS will decide peak summits according to pileup height
 902         instead of qvalue scores. In this way, the summit may be more
 903         accurate.
 904
 905         * Diff score
 906
 907         MACS calculate qvalue scores as differential scores. When compare
 908         two conditions (saying A and B), the maximum qscore for comparing
 909         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 910         will be computed. If maxqscore_a2b is bigger, the diff score is
 911         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 912
 913 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 914         MACS version 2.0.8 (tag:alpha)
 915
 916         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 917
 918         New script bdgbroadcall and the extra option '--broad' for macs2
 919         script, can be used to call broad regions with a loose cutoff to
 920         link nearby significant regions. The output is represented as
 921         BED12 format.
 922
 923         * MACS2/IO/cScoreTrack.pyx
 924
 925         Fix q-value calculation to generate forcefully monotonic values.
 926
 927         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 928
 929         They are combined to one more powerful script called
 930         "filterdup". The script filterdup can filter duplicated reads
 931         according to sequencing depth and genome size. The script can also
 932         convert any format supported by MACS to BED format.
 933
 934 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 935         MACS version 2.0.7 (tag:alpha)
 936
 937         * bin/macsdiff renamed to bin/bdgdiff
 938
 939         Now this script will work as a low-level finetuning tool as bdgcmp
 940         and bdgpeakcall.
 941
 942         * bin/macs2diff
 943
 944         A new script to take treatment and control files from two
 945         condition, calculate fragment size, use local poisson to get
 946         pvalues and BH process to get qvalues, then combine 4-ways result
 947         to call differential sites.
 948
 949         This script can use upto 4 cpus to speed up 4-ways calculation. (
 950         I am trying multiprocessing in python. )
 951
 952         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 953         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 954         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 955
 956         All above files are modified for the new macs2diff script.
 957
 958         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 959
 960         Now q-value 0.01 is the default cutoff. If -p is specified,
 961         p-value cutoff will be used instead.
 962
 963 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 964         MACS version 2.0.6 (tag:alpha)
 965
 966         * bin/macsdiff
 967
 968         A script to call differential regions. A naive way is introduced
 969         to find the regions where:
 970
 971         1. signal from condition 1 is larger than input 1 and condition 2 --
 972         unique region in condition 1;
 973         2. signal from condition 2 is larger than input 2 and condition 1
 974         -- unique region in condition 2;
 975         3. signal from condition 1 is larger than input 1, signal from
 976         condition 2 is larger than input 2, however either signal from
 977         condition 1 or 2 is not larger than the other.
 978
 979         Here 'larger' means the pvalue or qvalue from a Poisson test is
 980         under certain cutoff.
 981
 982         (I will make another script to wrap up mulitple scripts for
 983         differential calling)
 984
 985 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 986         MACS version 2.0.5 (tag:alpha)
 987
 988         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 989         MACS2/IO/cPeakIO.pyx
 990
 991         Use hash to store peak information. Add back the feature to deal
 992         with data without control.
 993
 994         Fix bug which incorrectly allows small peaks at the end of
 995         chromosomes.
 996
 997         * bin/bdgpeakcall, bin/bdgcmp
 998
 999         Fix bugs. bdgpeakcall can output encodePeak format.
1000
1001 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1002         MACS version 2.0.4 (tag:alpha)
1003
1004         * cPeakDetect.py
1005
1006         Fix a bug, correctly assign lambda_bg while --to-small is
1007         set. Thanks Junya Seo!
1008
1009         Add rank and num of bp columns to pvalue-qvalue table.
1010
1011         * cScoreTrack.py
1012
1013         Fix bugs to correctly deal with peakless chromosomes. Thanks
1014         Vaibhav Jain!
1015
1016         Use AFDR for independent tests instead.
1017
1018         * encodePeak
1019
1020         Now MACS can output peak coordinates together with pvalue, qvalue,
1021         summit positions in a single encodePeak format (designed for
1022         ENCODE project) file. This file can be loaded to UCSC
1023         browser. Definition of some specific columns are: 5th:
1024         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1025         -log10qvalue, 10th: relative summit position to peak start.
1026
1027
1028 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1029         MACS version 2.0.3 (tag:alpha)
1030
1031         * Rich output with qvalue, fold enrichment, and pileup height
1032
1033         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1034         procedure:
1035
1036         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1037
1038         Now we have a similiar xls output file as before. The differences
1039         from previous file are:
1040
1041         1. Summit now is absolute summit, instead of relative summit
1042            position;
1043         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1044            pileup at the peak summit;
1045         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1046            5.00 means 1e-5, simple and less confusing.
1047         4. FDR column becomes '-log10(qvalue)' column.
1048         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1049            the values at the peak summit.
1050
1051         * Extra output files
1052
1053         NAME_pqtable.txt contains pvalue and qvalue relationships.
1054
1055         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1056         and -log10qvalue scores in BedGraph format. Nearby regions with
1057         the same value are not merged.
1058
1059         * Separation of FeatIO.py
1060
1061         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1062         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1063         implemented to store pileup, local lambda, pvalue, and qvalue
1064         alltogether in cScoreTrack.pyx.
1065
1066         * Experimental option --half-ext
1067
1068         Suggested by NPS algorithm, I added an experimental option
1069         --half-ext to let MACS only extends ChIP fragment around its
1070         middle point for only 1/2 d.
1071
1072 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1073         MACS version 2.0.2 (tag:alpha)
1074
1075         * macs2
1076
1077         Add an error check to see if there is no common chromosome names
1078         from treatment file and control file
1079
1080         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1081
1082         Reduce memory usage by removing deepcopy() calls.
1083
1084         * Modify README documents and others.
1085
1086 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1087         MACS Version 2.0.1 (tag:alpha)
1088
1089         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1090
1091         Jie suggested me a brilliant simple method to pileup fragments
1092         into bedGraph track. It works extremely faster than the previous
1093         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1094         large local lambda calculation in MACSv2 now. Now I generate three
1095         bedGraphs for d-size local bias, slocal-size and llocal-size local
1096         bias, and calculate the maximum local bias as local lambda
1097         bedGraph track.
1098
1099         Minor: add_loc in bedGraphTrackI now can correctly merge the
1100         region with its preceding region if their value are the same.
1101
1102         * macs2
1103
1104         Add an option to shift control tags before extension. By default,
1105         control tags will be extended to both sides regardless of strand
1106         information.
1107
1108 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1109         MACS Version 2.0.0 (tag:alpha)
1110
1111         * Use bedGraph type to store data internally and externally.
1112
1113         We can have theoretically one-basepair resolution profiles. 10
1114         times smaller in filesize and even smaller after converting to
1115         bigWig for visualization.
1116
1117         * Peak calling process modified. Better peak boundary detection.
1118
1119         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1120         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1121         one will be averaged to d size) Then calculate the maximum value
1122         of these two tracks and a global background, to have a
1123         local-lambda bedGraph.
1124
1125         Use -10log10poisson_pvalue as scores to generate a score track
1126         before peak calling.
1127
1128         A general peak calling based on a score cutoff, min length of peak
1129         and max gap between nearby peaks.
1130
1131         * Option changes.
1132
1133         Wiggle file output is removed. Now we only support bedGraph
1134         output. The generation of bedGraph is highly recommended since it
1135         will not cost extra time. In other words, bedGraph generation is
1136         internally run even you don't want to save bedGraphs on disk, due
1137         to the peak calling algorithm in MACS v2.
1138
1139         * cProb.pyx
1140
1141         We now can calculate poisson pvalue in log space so that the score
1142         (-10*log10pvalue) will not have a upper limit of 3100 due to
1143         precision of float number.
1144
1145         * Cython is adopted to speed up Python code.
1146
1147 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1148         Small fixes
1149
1150         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1151
1152 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1153         Version 1.4.0rc2 (Valentine)
1154
1155         * --single-wig option is renamed to --single-profile
1156
1157         * BedGraph output with --bdg or -B option.
1158
1159         The BedGraph output provides 1bp resolution fragment pileup
1160         profile. File size is smaller than wig file. This option can be
1161         combined with --single-profile option to produce a bedgraph file
1162         for the whole genome. This option can also make --space,
1163         --call-subpeaks invalid.
1164
1165         * Fix the description of --shiftsize to correctly state that the
1166         value is 1/2 d (fragment size).
1167
1168         * Fix a bug in the call to __filter_w_control_tags when control is
1169         not available.
1170
1171         * Fix a bug on --to-small option. Now it works as expected.
1172
1173         * Fix a bug while counting the tags in candidate peak region, an
1174         extra tag may be included. (Thanks to Jake Biesinger!)
1175
1176         * Fix the bug for the peaks extended outside of chromosome
1177         start. If the minus strand tag goes outside of chromosome start
1178         after extension of d, it will be thrown out.
1179
1180         * Post-process script for a combined wig file:
1181
1182         The "wignorm" command can be called after a full run of MACS14 as
1183         a postprocess. wignorm can calculate the local background from the
1184         control wig file from MACS14, then use either foldchange,
1185         -10*log10(pvalue) from possion test, or difference after asinh
1186         transformation as the score to build a single wig track to
1187         represent the binding strength. This script will take a
1188         significant long time to process.
1189
1190         * --wigextend has been obsoleted.
1191
1192 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1193         Version 1.4.0rc1 (Starry Sky)
1194
1195         * Duplicate reads option
1196
1197         --keep-dup behavior is changed. Now user can specify how many
1198         reads he/she wants to keep at the same genomic location. 'auto' to
1199         let MACS decide the number based on binomial distribution, 'all'
1200         to let MACS keep all reads.
1201
1202         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1203
1204         By default, MACS will now scale the smaller dataset to the bigger
1205         dataset. For instance, if IP has 10 million reads, and Input has 5
1206         million, MACS will double the lambda value calculated from Input
1207         reads while calling BOTH the positive peaks and negative
1208         peaks. This will address the issue caused by unbalanced numbers of
1209         reads from IP and Input. If --to-small is turned on, MACS will
1210         scale the larger dataset to the smaller one. So from now on, if d
1211         is fixed, then the peaks from a MACS call for A vs B should be
1212         identical to the negative peaks from a B vs A.
1213
1214 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1215         Version 1.4.0beta (summer wishes)
1216
1217         * New features
1218
1219         ** Model building
1220
1221         The default behavior in the model building step is slightly
1222         changed. When MACS can't find enough pairs to build model
1223         (implemented in alpha version) or the modeled fragment length is
1224         less than 2 times of tag length (implemented in beta version),
1225         MACS will use 2 times of --shiftsize value as fragment length in
1226         the later analysis. --off-auto can turn off this default behavior.
1227
1228         ** Redundant tag filtering
1229
1230         The IO module is rewritten. The redundant tag filtering process
1231         becomes simpler and works as promise. The maximum allowed number
1232         of tags at the exact same location is calculated from the
1233         sequencing depth and genome size using a binomial distribution,
1234         for both TREAMENT and CONTROL separately. ( previously only
1235         TREATMENT is considered ) The exact same location means the same
1236         coordination and the same strand. Then MACS will only keep at most
1237         this number of tags at the exact same location in the following
1238         analysis. An option --keep-dup can let MACS skip the filtering and
1239         keep all the tags. However this may bring in a lot of sequencing
1240         bias, so you may get many false positive peaks.
1241
1242         ** Single wiggle mode
1243
1244         First thing to mention, this is not the score track that I
1245         described before. By default, MACS generates wiggle files for
1246         fragment pileup for every chromosomes separately. When you use
1247         --single-wig option, MACS will generate a single wiggle file for
1248         all the chromosomes so you will get a wig.gz for TREATMENT and
1249         another wig.gz for CONTROL if available.
1250
1251         ** Sniff -- automatic format detection
1252
1253         Now, by default or "-f AUTO", MACS will decide the input file
1254         format automatically. Technically, it will try to read at most
1255         1000 records for the first 10 non-comment lines. If it succeeds,
1256         the format is decided. I recommend not to use AUTO and specify the
1257         right format for your input files, unless you combine different
1258         formats in a single MACS run.
1259
1260         * Options changes
1261
1262         --single-wig and --keep-dup are added. Check previous section in
1263         ChangeLog for detail.
1264
1265         -f (--format) AUTO is now the default option.
1266
1267         --slocal default: 1000
1268         --llocal default: 10000
1269
1270         * Bug fixed
1271
1272         Setup script will stop the installation if python version is not
1273         python2.6 or python2.7.
1274
1275         Local lambda calculation has been changed back. MACS will check
1276         peak_region, slocal( default 1K) and llocal (default 10K) for the
1277         local bias. The previous 200bps default will cause MACS misses
1278         some peaks where the input bias is very sharp.
1279
1280         sam2bed.py script is corrected.
1281
1282         Relative pos in xls output is fixed.
1283
1284         Parser for ELAND_export is fixed to pass some of the no match
1285         lines. And elandexport2bed.py is fixed too. ( however I can't
1286         guarantee that it works on any eland_export files. )
1287
1288 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1289         Version 1.4.0alpha2 (be smarter)
1290
1291         * Options changes
1292
1293         --gsize now provides shortcuts for common genomes, including
1294         human, mouse, C. elegans and fruitfly.
1295
1296         --llocal now will be 5000 bps if there is no input file, so that
1297         local lambda doesn't overkill enriched binding sites.
1298
1299 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1300         Version 1.4alpha (be smarter)
1301
1302         * Options changes
1303
1304         --tsize option is redesigned. MACS will use the first 10 lines of
1305         the input to decide the tag size. If user specifies --tsize, it
1306         will override the auto decided tsize.
1307
1308         --lambdaset is replaced by --slocal and --llocal which mean the
1309         small local region and large local region.
1310
1311         --bw has no effect on the scan-window size now. It only affects the
1312         paired-peaks model process.
1313
1314         * Model building
1315
1316         During the model building, MACS will pick out the enriched regions
1317         which are not too high and not too low to build the paired-peak
1318         model. Default the region is from fold 10 to fold 30. If MACS
1319         fails to build the model, by default it will use the nomodel
1320         settings, like shiftsize=100bps, to shift and extend each
1321         tags. This behavior can be turned off by '--off-auto'.
1322
1323         * Output files
1324
1325         An extra file including all the summit positions are saved in
1326         *_summits.bed file. An option '--call-subpeaks' will invoke
1327         PeakSplitter developed by Mali Salmon to split wide peaks into
1328         smaller subpeaks.
1329
1330         * Sniff ( will in beta )
1331
1332         Automatically recognize the input file format, so use can combine
1333         different format in one MACS run.
1334
1335         Not implemented features/TODO:
1336
1337         * Algorithms ( in near future? )
1338
1339         MACS will try to refine the peak boundaries by calculating the
1340         scores for every point in the candidate peak regions. The score
1341         will be the -10*log(10,pvalue) on a local poisson distribution. A
1342         cutoff specified by users (--pvalue) will be applied to find the
1343         precise sub-peaks in the original candidate peak region. Peak
1344         boudaries and peak summits positions will be saved in separate BED
1345         files.
1346
1347         * Single wiggle track ( in near future? )
1348
1349         A single wiggle track will be generated to save the scores within
1350         candidate peak regions in the 10bps resolution. The wiggle file
1351         is in fixedStep format.
1352
1353
1354 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1355         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1356
1357         * bin/Constants.py
1358
1359         Fixed typo. FCSTEP -> FESTEP
1360
1361         * lib/PeakDetect.py
1362
1363         The 'femax' attribute bug is fixed
1364
1365 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1366         Version 1.3.7 (Oktoberfest)
1367
1368         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1369
1370         Enhancements by Peter Chines:
1371
1372         1. gzip files are supported.
1373         2. when --diag is on, user can set the increment and endpoint for
1374         fold enrichment analysis by setting --fe-step and --fe-max.
1375
1376         Enhancements by Davide Cittaro:
1377
1378         1. BAM and SAM formats are supported.
1379         2. small changes in the header lines of wiggle output.
1380
1381         Enhancements by Me:
1382         1. I added --fe-min option;
1383         2. Bowtie ascii output with suffix ".map" is supported.
1384
1385         Bug fixed:
1386
1387         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1388         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1389         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1390         4. Some "fold change" have been changed to "fold enrichment".
1391
1392 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1393         Version 1.3.6.1 (default parameter change)
1394
1395         * bin/macs, lib/PeakDetect.py
1396
1397         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1398         default. "--futurefdr" is added which can turn on the 'new' method
1399         introduced in 1.3.6. By default it's off.
1400
1401         * lib/PeakDetect.py
1402
1403         Fixed a bug. p-value is corrected a little bit.
1404
1405
1406 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1407         Version 1.3.6 (Birthday cake)
1408
1409         * bin/macs
1410
1411         "track name" is added to the header of BED output file.
1412
1413         Now the default peak detection method is to consider 5k and 10k
1414         nearby regions in treatment data and peak location, 1k, 5k, and
1415         10k regions in control data to calculate local bias. The old
1416         method can be called through '--old' option.
1417
1418         Information about how many total/unique tags in treatment or
1419         control will be saved in final .xls output.
1420
1421         * lib/IO/__init__.py
1422
1423         ".fa" will be removed from input tag alignment so only the
1424         chromosome names are kept.
1425
1426         WigTrackI class is added for Wiggle like data structure. (not used
1427         now)
1428
1429         The parser for ELAND multi PET files has been fixed. Now the 5'
1430         tag position for a pair will be kept, whereas in the previous
1431         version, the middle points are kept.
1432
1433         * lib/IO/BinKeeper.py
1434
1435         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1436         browser, which can quickly access certain region for values in a
1437         large wiggle like data file. (not used now)
1438
1439         * lib/OptValidator.py
1440
1441         typo fixed.
1442
1443         * lib/PeakDetect.py
1444
1445         Now the default peak detection method is to consider 5k and 10k
1446         nearby regions in treatment data and peak location, 1k, 5k, and
1447         10k regions in control data to calculate local bias. The old
1448         method can be called through '--old' option.
1449
1450         Two columns have beed added to BED output file. 4th column: peak
1451         name; 5th column: peak score using -10log(10,pvalue) as score.
1452
1453         * setup.py
1454
1455         Add support to build a Mac App through 'setup.py py2app', or a
1456         Windows executable through 'setup.py py2exe'. You need to install
1457         py2app or py2exe package in order to use these functions.
1458
1459 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1460         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1461
1462         * PeakDetect.py
1463
1464         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1465         in control data to calculate local lambda for each peak. Peak
1466         calling results will be slightly different with previous version,
1467         beware!
1468
1469         * OptValidator.py
1470
1471         Typo fixed, ELANDParser -> ELANDResultParser
1472
1473         * OutputWriter.py
1474
1475         Now, modeled d value will be shown on the model figure.
1476
1477 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1478         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1479
1480         * macs, IO/__init__.py, PeakDetect.py
1481
1482         Add support for ELAND multi format. Add support for Pair-End
1483         experiment, in this case, 5'end and 3'end ELAND multi format files
1484         are required for treatment or control data. See 00README file for
1485         detail.
1486
1487         Add wigextend option.
1488
1489         Add petdist option for Pair-End Tag experiment, which is the best
1490         distance between 5' and 3' tags.
1491
1492         * PeakDetect.py
1493
1494         Fixed a bug which cause the end positions of every peak region
1495         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1496
1497         * OutputWriter.py
1498
1499         Fix bugs while generating wiggle files. The start position of
1500         wiggle file is set to 1 instead of 0.
1501
1502         Fix a bug that every 10M bps, signals in the first 'd' range are
1503         lower than actual. ( Thanks Mali Salmon!)
1504
1505
1506 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1507         Version 1.3.3 (wiggle bugs fixed)
1508
1509         * OutputWriter.py
1510
1511         Fix bugs while generating wiggle files. 1. 'span=' is added to
1512         'variableStep' line; 2. previously, every 10M bps, the coordinates
1513         were wrongly shifted to the right for 'd' basepairs.
1514
1515         * macs, PeakDetect.py
1516
1517         Add an option to save wiggle files on different resolution.
1518
1519 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1520         Version 1.3.2 (tiny bugs fixed)
1521
1522         * IO/__init__.py
1523
1524         Fix 65536 -> 65535. ( Thank Joon)
1525
1526         * Prob.py
1527
1528         Improved for binomial function with extra large number. Imported
1529         from Cistrome project.
1530
1531         * PeakDetect.py
1532
1533         If treatment channel misses reads in some chromosome included in
1534         control channel, or vice versa, MACS will not exit. (Thank Shaun
1535         Mahony)
1536
1537         Instead, MACS will fake a tag at position -1 when calling
1538         treatment peaks vs control, but will ignore the chromosome while
1539         calling negative peaks.
1540
1541 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1542         Version 1.3.1 (tiny bugs fixed version)
1543
1544         * Prob.py
1545
1546         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1547         binomial functions can tolerate large and small numbers.
1548
1549         * IO/__init__.py
1550
1551         Parsers now split lines in BED/ELAND file using any
1552         whitespaces. 'track' or 'browser' lines will be regarded as
1553         comment lines. A bug fixed when throwing StrandFormatError. The
1554         maximum redundant tag number at a single position can be no less
1555         than 65536.
1556
1557
1558 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1559         Version 1.3 (naming clarification version)
1560
1561         * Naming clarification changes according to our manuscript:
1562
1563         'frag_len' is changed to 'd'.
1564
1565         'fold_change' is changed to 'fold_enrichment'.
1566
1567         Suggest '--bw' parameter to be determined by users from the real
1568         sonication size.
1569
1570         Maximum FDR is 100% in the output file.
1571
1572         And other clarifications in 00README file and the documents on the
1573         website.
1574
1575         * IO/__init__.py
1576         If the redundant tag number at a single position is over 32767,
1577         just remember 32767, instead of raising an overflow exception.
1578
1579         * setup.py
1580         fixed a typo.
1581
1582         * PeakDetect.py
1583         Bug fixed for diagnosis report.
1584
1585
1586 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1587         Version 1.2.2gamma
1588
1589         * Serious bugs fix:
1590
1591         Poisson distribution CDF and inverse CDF functions are
1592         corrected. They can produce right results even for huge lambda
1593         now. So that the p-value and FDR values in the final excel sheet
1594         are corrected.
1595
1596         IO package now can tolerate some rare cases; ELANDParser in IO
1597         package is fixed. (Thank Bogdan)
1598
1599         * Improvement:
1600
1601         Reverse paired peaks in model are rejected. So there will be no
1602         negative 'frag_len'. (Thank Bogdan)
1603
1604         * Features added:
1605
1606         Diagnosis function is completed. Which can output a table file for
1607         users to estimate their sequencing depth.
1608
1609
1610 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1611         Version 1.2
1612
1613         * Probe.py is added!
1614
1615         GSL is totally removed from MACS. Instead, I have implemented the
1616         CDF and inverse CDF for poisson and binomial distribution purely
1617         in python.
1618
1619         * Constants.py is added!
1620
1621         Organize constants used in MACS in the Constants.py file.
1622
1623         * All other files are modified!
1624
1625         Foldchange calculation is modified. Now the foldchange only be
1626         calculated at the peak summit position instead of the whole peak
1627         region. The values will be higher and more robust than before.
1628
1629         Features added:
1630
1631         1. MACS can save wiggle format files containing the tag number at
1632         every 10 bp along the genome. Tags are shifted according to our
1633         model before they are calculated.
1634
1635         2. Model building and local lambda calculation can be skipped with
1636         certain options.
1637
1638         3. A diagnosis report can be generated through '--diag'
1639         option. This report can help you get an assumption about the
1640         sequencing saturation. This funtion is only in beta stage.
1641
1642         4. FDR calculation speed is highly improved.
1643
1644 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1645         Version 1.1
1646
1647         * TabIO, PeakModel.py ...
1648         Bug fixed to let MACS tolerate some cases while there is no tag on
1649         either plus strand or minus strand.
1650
1651         * setup.py
1652         Check the version of python. If the version is lower than 2.4,
1653         refuse to install with warning.
1654
1655
1656 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1657         MACS version 2.0.10 20130731 (tag:alpha)
1658
1659         * callpeak --call-summits
1660
1661         Fix bugs causing callpeak --call-summits option generating extra
1662         number of peaks and inconsistent peak boundaries comparing to
1663         default option. Thank Ben Levinson!
1664
1665         * bdgcmp output
1666
1667         Fix bugs causing bdgcmp output logLR all in positive values. Now
1668         'depletion' can be correctly represented as negative values.
1669
1670         * bdgdiff
1671
1672         Fix the behavior of bdgdiff module. Now it can take four
1673         bedGraph files, then use logLR as cutoff to call differential
1674         regions. Check command line of bdgdiff for detail.
1675
1676 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1677         MACS version 2.0.10 20130713 (tag:alpha)
1678
1679         * fix bugs while output broadPeak and gappedPeak.
1680
1681         Note. Those weak broad regions without any strong enrichment
1682         regions inside won't be saved in gappedPeak file.
1683
1684         * bdgcmp -T and -C are merged into -S and description is updated.
1685
1686         Now, you can use it to override SPMR values in your input for
1687         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1688         statistics will cause weird results ( in most cases, lower
1689         significancy), and won't be consistent with MACS2 callpeak
1690         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1691         sample size in MILLION according to 'callpeak --to-large' option.
1692
1693 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1694         MACS version 2.0.10 20130710 (tag:alpha)
1695
1696         * fix BED style output format of callpeak module:
1697
1698         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1699         the output. Old BED format file won't be saved.
1700
1701         2) with --broad: broadPeak (BED6+3) for broad region and
1702         gappedPeak (BED12+3) for chained enriched regions will be the
1703         output. Old BED format, narrowPeak format, summit file won't be
1704         saved.
1705
1706         * bdgcmp now can accept list of methods to calculate scores. So
1707         you can run it once to generate multiple types of scores. Thank
1708         Jon Urban for this suggestion!
1709
1710         * C codes are re-generated through Cython 0.19.1.
1711
1712 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1713         MACS version 2.0.10 20130520 (tag:alpha)
1714
1715         * broad peak calling modules are modified in order to report all
1716         relexed regions even there is no strong enrichment inside.
1717
1718 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1719         MACS version 2.0.10 20130501 (tag:alpha)
1720
1721         * Memory usage is decreased to about 1/4-1/5 of previous usage
1722         Now, the internal data structure and algorithm are both
1723         re-organized, so that intermediate data wouldn't be saved in
1724         memory. Intead they will be calculated on the fly. New MACS2 will
1725         spend longer time (1.5 to 2 times) however it will use less memory
1726         so can be more usable on small mem servers.
1727
1728         * --seed option is added to callpeak and randsample commands
1729         Thank Mathieu Gineste for this suggestion!
1730
1731 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1732         MACS version 2.0.10 20130306 (tag:alpha)
1733
1734         * diffpeak module New module to detect differential binding sites
1735         with more statistics.
1736
1737         * Introduced --refine-peaks
1738         Calculates reads balancing to refine peak summits
1739
1740         * Ouput file names prefix
1741         Correct encodePeak to narrowPeak, broadPeak to bed12.
1742
1743 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1744         MACS version 2.0.10 (tag:alpha not released)
1745
1746         * Introduced BAMPEParser
1747         Reads PE data directly, requires bedtools for now
1748
1749         * Introduced --call-summits
1750         Uses signal processing methods to call overlapping peaks
1751
1752         * Added --no-trackline
1753         By default, files have descriptive tracklines now
1754
1755         * new refinepeak command (experimental)
1756         This new function will use a similar method in SPP (wtd), to
1757         analyze raw tag distribution in peak region, then redefine the
1758         peak summit where plus and minus tags are evenly distributed
1759         around.
1760
1761         * Changes to output *
1762         cPeakDetect.pyx has full support for new print/write methods and
1763         --call-peaks, BAMPEParser, and use of paired-end data
1764
1765         * Parser optimization
1766
1767         cParser.pyx is rewritten to use io.BufferedReader to speed
1768         up. Speed is doubled.
1769
1770         Code is reorganized -- most of functions are inherited from
1771         GenericParser class.
1772
1773         * Use cross-correlation to calculate fragment size
1774
1775         First, all pairs will be used in prediction for fragment
1776         size. Previously, only no more than 1000 pairs are used. Second,
1777         cross-correlation is used to find the best phase difference
1778         between + and - tag pileups.
1779
1780         * Speed up p-value and q-value calculation
1781
1782         This part is ten times faster now. I am using a dictionary to
1783         cache p-value results from Poisson CDF function. A bit more memory
1784         will be used to increase speed. I hope this dictionary would not
1785         explode since the possible pairs of ChIP signal and control lambda
1786         are hugely redundant. Also, I rewrited part of q-value
1787         calculation.
1788
1789         * Speed up peak detection
1790
1791         This part is about hundred of times faster now.  Optimizations
1792         include using Numpy functions as much as possible, and making loop
1793         body as small as possible.
1794
1795         * Post-processing on differential calls
1796
1797         After macs2diff finds differential binding sites between two
1798         conditions, it will try to annotate the peak calls from one of two
1799         conditions, describe the changes ...
1800
1801         * Fragment size prediction in macs2diff
1802
1803         Now by default, macs2diff will try to use the average fragment
1804         size from both condition 1 and condition 2 for tag extension and
1805         peak calling. Previously, by default, it will use different sizes
1806         unless --nomodel is specified.
1807
1808         Technically, I separate model building processes out. So macs2diff
1809         will build fragment sizes for condition 1 and 2 in parallel (2
1810         processes maximum), then perform 4-way comparisons in parallel (4
1811         processes maximum).
1812
1813         * Diff score
1814
1815         Combine two p/qscore tracks together. At regions where condition 1
1816         is higher than condition 2, score would be positive, otherwise,
1817         negative.
1818
1819         * SAMParser and BAMParser
1820
1821         Bug fixed for paired-end sequencing data.
1822
1823         * BedGraph.pyx
1824
1825         Fixed a bug while calling peaks from BedGraph file. It previously
1826         mistakenly output same peaks multiple times at the end of
1827         chromosome.
1828
1829 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1830         MACS version 2.0.9 (tag:alpha)
1831
1832         * Auto fixation on predicted d is turned off by default!
1833
1834         Previous --off-auto is now default. MACS will not automatically
1835         fix d less than 2 times of tag size according to
1836         --shiftsize. While tag size is getting longer nowadays, it would
1837         be easier to have d less than 2 times of tag size, however d may
1838         still be meaningful and useful. Please judge it using your own
1839         wisdom.
1840
1841         * Scaling issue
1842
1843         Now, the default scaling while treatment and input are unbalanced
1844         has been adjusted. By default, larger sample will be scaled down
1845         linearly to match the smaller sample. In this way, background
1846         noise will be reduced more than real signals, so we expect to have
1847         more specific results than the other way around (i.e. --to-large
1848         is set).
1849
1850         Also, an alternative option to randomly sample larger data
1851         (--down-sample) is provided to replace default linear
1852         scaling. However, this option will cause results irresproducible,
1853         so be careful.
1854
1855         * randsample script
1856
1857         A new script 'randsample'  is added, which can randomly sample
1858         certain percentage or number of tags.
1859
1860         * Peak summit
1861
1862         Now, MACS will decide peak summits according to pileup height
1863         instead of qvalue scores. In this way, the summit may be more
1864         accurate.
1865
1866         * Diff score
1867
1868         MACS calculate qvalue scores as differential scores. When compare
1869         two conditions (saying A and B), the maximum qscore for comparing
1870         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1871         will be computed. If maxqscore_a2b is bigger, the diff score is
1872         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1873
1874 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1875         MACS version 2.0.8 (tag:alpha)
1876
1877         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1878
1879         New script bdgbroadcall and the extra option '--broad' for macs2
1880         script, can be used to call broad regions with a loose cutoff to
1881         link nearby significant regions. The output is represented as
1882         BED12 format.
1883
1884         * MACS2/IO/cScoreTrack.pyx
1885
1886         Fix q-value calculation to generate forcefully monotonic values.
1887
1888         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1889
1890         They are combined to one more powerful script called
1891         "filterdup". The script filterdup can filter duplicated reads
1892         according to sequencing depth and genome size. The script can also
1893         convert any format supported by MACS to BED format.
1894
1895 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1896         MACS version 2.0.7 (tag:alpha)
1897
1898         * bin/macsdiff renamed to bin/bdgdiff
1899
1900         Now this script will work as a low-level finetuning tool as bdgcmp
1901         and bdgpeakcall.
1902
1903         * bin/macs2diff
1904
1905         A new script to take treatment and control files from two
1906         condition, calculate fragment size, use local poisson to get
1907         pvalues and BH process to get qvalues, then combine 4-ways result
1908         to call differential sites.
1909
1910         This script can use upto 4 cpus to speed up 4-ways calculation. (
1911         I am trying multiprocessing in python. )
1912
1913         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1914         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1915         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1916
1917         All above files are modified for the new macs2diff script.
1918
1919         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1920
1921         Now q-value 0.01 is the default cutoff. If -p is specified,
1922         p-value cutoff will be used instead.
1923
1924 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1925         MACS version 2.0.6 (tag:alpha)
1926
1927         * bin/macsdiff
1928
1929         A script to call differential regions. A naive way is introduced
1930         to find the regions where:
1931
1932         1. signal from condition 1 is larger than input 1 and condition 2 --
1933         unique region in condition 1;
1934         2. signal from condition 2 is larger than input 2 and condition 1
1935         -- unique region in condition 2;
1936         3. signal from condition 1 is larger than input 1, signal from
1937         condition 2 is larger than input 2, however either signal from
1938         condition 1 or 2 is not larger than the other.
1939
1940         Here 'larger' means the pvalue or qvalue from a Poisson test is
1941         under certain cutoff.
1942
1943         (I will make another script to wrap up mulitple scripts for
1944         differential calling)
1945
1946 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1947         MACS version 2.0.5 (tag:alpha)
1948
1949         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1950         MACS2/IO/cPeakIO.pyx
1951
1952         Use hash to store peak information. Add back the feature to deal
1953         with data without control.
1954
1955         Fix bug which incorrectly allows small peaks at the end of
1956         chromosomes.
1957
1958         * bin/bdgpeakcall, bin/bdgcmp
1959
1960         Fix bugs. bdgpeakcall can output encodePeak format.
1961
1962 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1963         MACS version 2.0.4 (tag:alpha)
1964
1965         * cPeakDetect.py
1966
1967         Fix a bug, correctly assign lambda_bg while --to-small is
1968         set. Thanks Junya Seo!
1969
1970         Add rank and num of bp columns to pvalue-qvalue table.
1971
1972         * cScoreTrack.py
1973
1974         Fix bugs to correctly deal with peakless chromosomes. Thanks
1975         Vaibhav Jain!
1976
1977         Use AFDR for independent tests instead.
1978
1979         * encodePeak
1980
1981         Now MACS can output peak coordinates together with pvalue, qvalue,
1982         summit positions in a single encodePeak format (designed for
1983         ENCODE project) file. This file can be loaded to UCSC
1984         browser. Definition of some specific columns are: 5th:
1985         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1986         -log10qvalue, 10th: relative summit position to peak start.
1987
1988
1989 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1990         MACS version 2.0.3 (tag:alpha)
1991
1992         * Rich output with qvalue, fold enrichment, and pileup height
1993
1994         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1995         procedure:
1996
1997         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1998
1999         Now we have a similiar xls output file as before. The differences
2000         from previous file are:
2001
2002         1. Summit now is absolute summit, instead of relative summit
2003            position;
2004         2. 'Pileup' is previous 'tag' column. It's the extended fragment
2005            pileup at the peak summit;
2006         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
2007            5.00 means 1e-5, simple and less confusing.
2008         4. FDR column becomes '-log10(qvalue)' column.
2009         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
2010            the values at the peak summit.
2011
2012         * Extra output files
2013
2014         NAME_pqtable.txt contains pvalue and qvalue relationships.
2015
2016         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
2017         and -log10qvalue scores in BedGraph format. Nearby regions with
2018         the same value are not merged.
2019
2020         * Separation of FeatIO.py
2021
2022         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
2023         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
2024         implemented to store pileup, local lambda, pvalue, and qvalue
2025         alltogether in cScoreTrack.pyx.
2026
2027         * Experimental option --half-ext
2028
2029         Suggested by NPS algorithm, I added an experimental option
2030         --half-ext to let MACS only extends ChIP fragment around its
2031         middle point for only 1/2 d.
2032
2033 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2034         MACS version 2.0.2 (tag:alpha)
2035
2036         * macs2
2037
2038         Add an error check to see if there is no common chromosome names
2039         from treatment file and control file
2040
2041         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
2042
2043         Reduce memory usage by removing deepcopy() calls.
2044
2045         * Modify README documents and others.
2046
2047 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
2048         MACS Version 2.0.1 (tag:alpha)
2049
2050         * cPileup.pyx, cPeakDetect.pyx and peak calling process
2051
2052         Jie suggested me a brilliant simple method to pileup fragments
2053         into bedGraph track. It works extremely faster than the previous
2054         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
2055         large local lambda calculation in MACSv2 now. Now I generate three
2056         bedGraphs for d-size local bias, slocal-size and llocal-size local
2057         bias, and calculate the maximum local bias as local lambda
2058         bedGraph track.
2059
2060         Minor: add_loc in bedGraphTrackI now can correctly merge the
2061         region with its preceding region if their value are the same.
2062
2063         * macs2
2064
2065         Add an option to shift control tags before extension. By default,
2066         control tags will be extended to both sides regardless of strand
2067         information.
2068
2069 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
2070         MACS Version 2.0.0 (tag:alpha)
2071
2072         * Use bedGraph type to store data internally and externally.
2073
2074         We can have theoretically one-basepair resolution profiles. 10
2075         times smaller in filesize and even smaller after converting to
2076         bigWig for visualization.
2077
2078         * Peak calling process modified. Better peak boundary detection.
2079
2080         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
2081         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
2082         one will be averaged to d size) Then calculate the maximum value
2083         of these two tracks and a global background, to have a
2084         local-lambda bedGraph.
2085
2086         Use -10log10poisson_pvalue as scores to generate a score track
2087         before peak calling.
2088
2089         A general peak calling based on a score cutoff, min length of peak
2090         and max gap between nearby peaks.
2091
2092         * Option changes.
2093
2094         Wiggle file output is removed. Now we only support bedGraph
2095         output. The generation of bedGraph is highly recommended since it
2096         will not cost extra time. In other words, bedGraph generation is
2097         internally run even you don't want to save bedGraphs on disk, due
2098         to the peak calling algorithm in MACS v2.
2099
2100         * cProb.pyx
2101
2102         We now can calculate poisson pvalue in log space so that the score
2103         (-10*log10pvalue) will not have a upper limit of 3100 due to
2104         precision of float number.
2105
2106         * Cython is adopted to speed up Python code.
2107
2108 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2109         Small fixes
2110
2111         * Replaced with a newest WigTrackI class and fixed the wignorm script.
2112
2113 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2114         Version 1.4.0rc2 (Valentine)
2115
2116         * --single-wig option is renamed to --single-profile
2117
2118         * BedGraph output with --bdg or -B option.
2119
2120         The BedGraph output provides 1bp resolution fragment pileup
2121         profile. File size is smaller than wig file. This option can be
2122         combined with --single-profile option to produce a bedgraph file
2123         for the whole genome. This option can also make --space,
2124         --call-subpeaks invalid.
2125
2126         * Fix the description of --shiftsize to correctly state that the
2127         value is 1/2 d (fragment size).
2128
2129         * Fix a bug in the call to __filter_w_control_tags when control is
2130         not available.
2131
2132         * Fix a bug on --to-small option. Now it works as expected.
2133
2134         * Fix a bug while counting the tags in candidate peak region, an
2135         extra tag may be included. (Thanks to Jake Biesinger!)
2136
2137         * Fix the bug for the peaks extended outside of chromosome
2138         start. If the minus strand tag goes outside of chromosome start
2139         after extension of d, it will be thrown out.
2140
2141         * Post-process script for a combined wig file:
2142
2143         The "wignorm" command can be called after a full run of MACS14 as
2144         a postprocess. wignorm can calculate the local background from the
2145         control wig file from MACS14, then use either foldchange,
2146         -10*log10(pvalue) from possion test, or difference after asinh
2147         transformation as the score to build a single wig track to
2148         represent the binding strength. This script will take a
2149         significant long time to process.
2150
2151         * --wigextend has been obsoleted.
2152
2153 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2154         Version 1.4.0rc1 (Starry Sky)
2155
2156         * Duplicate reads option
2157
2158         --keep-dup behavior is changed. Now user can specify how many
2159         reads he/she wants to keep at the same genomic location. 'auto' to
2160         let MACS decide the number based on binomial distribution, 'all'
2161         to let MACS keep all reads.
2162
2163         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2164
2165         By default, MACS will now scale the smaller dataset to the bigger
2166         dataset. For instance, if IP has 10 million reads, and Input has 5
2167         million, MACS will double the lambda value calculated from Input
2168         reads while calling BOTH the positive peaks and negative
2169         peaks. This will address the issue caused by unbalanced numbers of
2170         reads from IP and Input. If --to-small is turned on, MACS will
2171         scale the larger dataset to the smaller one. So from now on, if d
2172         is fixed, then the peaks from a MACS call for A vs B should be
2173         identical to the negative peaks from a B vs A.
2174
2175 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
2176         Version 1.4.0beta (summer wishes)
2177
2178         * New features
2179
2180         ** Model building
2181
2182         The default behavior in the model building step is slightly
2183         changed. When MACS can't find enough pairs to build model
2184         (implemented in alpha version) or the modeled fragment length is
2185         less than 2 times of tag length (implemented in beta version),
2186         MACS will use 2 times of --shiftsize value as fragment length in
2187         the later analysis. --off-auto can turn off this default behavior.
2188
2189         ** Redundant tag filtering
2190
2191         The IO module is rewritten. The redundant tag filtering process
2192         becomes simpler and works as promise. The maximum allowed number
2193         of tags at the exact same location is calculated from the
2194         sequencing depth and genome size using a binomial distribution,
2195         for both TREAMENT and CONTROL separately. ( previously only
2196         TREATMENT is considered ) The exact same location means the same
2197         coordination and the same strand. Then MACS will only keep at most
2198         this number of tags at the exact same location in the following
2199         analysis. An option --keep-dup can let MACS skip the filtering and
2200         keep all the tags. However this may bring in a lot of sequencing
2201         bias, so you may get many false positive peaks.
2202
2203         ** Single wiggle mode
2204
2205         First thing to mention, this is not the score track that I
2206         described before. By default, MACS generates wiggle files for
2207         fragment pileup for every chromosomes separately. When you use
2208         --single-wig option, MACS will generate a single wiggle file for
2209         all the chromosomes so you will get a wig.gz for TREATMENT and
2210         another wig.gz for CONTROL if available.
2211
2212         ** Sniff -- automatic format detection
2213
2214         Now, by default or "-f AUTO", MACS will decide the input file
2215         format automatically. Technically, it will try to read at most
2216         1000 records for the first 10 non-comment lines. If it succeeds,
2217         the format is decided. I recommend not to use AUTO and specify the
2218         right format for your input files, unless you combine different
2219         formats in a single MACS run.
2220
2221         * Options changes
2222
2223         --single-wig and --keep-dup are added. Check previous section in
2224         ChangeLog for detail.
2225
2226         -f (--format) AUTO is now the default option.
2227
2228         --slocal default: 1000
2229         --llocal default: 10000
2230
2231         * Bug fixed
2232
2233         Setup script will stop the installation if python version is not
2234         python2.6 or python2.7.
2235
2236         Local lambda calculation has been changed back. MACS will check
2237         peak_region, slocal( default 1K) and llocal (default 10K) for the
2238         local bias. The previous 200bps default will cause MACS misses
2239         some peaks where the input bias is very sharp.
2240
2241         sam2bed.py script is corrected.
2242
2243         Relative pos in xls output is fixed.
2244
2245         Parser for ELAND_export is fixed to pass some of the no match
2246         lines. And elandexport2bed.py is fixed too. ( however I can't
2247         guarantee that it works on any eland_export files. )
2248
2249 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2250         Version 1.4.0alpha2 (be smarter)
2251
2252         * Options changes
2253
2254         --gsize now provides shortcuts for common genomes, including
2255         human, mouse, C. elegans and fruitfly.
2256
2257         --llocal now will be 5000 bps if there is no input file, so that
2258         local lambda doesn't overkill enriched binding sites.
2259
2260 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2261         Version 1.4alpha (be smarter)
2262
2263         * Options changes
2264
2265         --tsize option is redesigned. MACS will use the first 10 lines of
2266         the input to decide the tag size. If user specifies --tsize, it
2267         will override the auto decided tsize.
2268
2269         --lambdaset is replaced by --slocal and --llocal which mean the
2270         small local region and large local region.
2271
2272         --bw has no effect on the scan-window size now. It only affects the
2273         paired-peaks model process.
2274
2275         * Model building
2276
2277         During the model building, MACS will pick out the enriched regions
2278         which are not too high and not too low to build the paired-peak
2279         model. Default the region is from fold 10 to fold 30. If MACS
2280         fails to build the model, by default it will use the nomodel
2281         settings, like shiftsize=100bps, to shift and extend each
2282         tags. This behavior can be turned off by '--off-auto'.
2283
2284         * Output files
2285
2286         An extra file including all the summit positions are saved in
2287         *_summits.bed file. An option '--call-subpeaks' will invoke
2288         PeakSplitter developed by Mali Salmon to split wide peaks into
2289         smaller subpeaks.
2290
2291         * Sniff ( will in beta )
2292
2293         Automatically recognize the input file format, so use can combine
2294         different format in one MACS run.
2295
2296         Not implemented features/TODO:
2297
2298         * Algorithms ( in near future? )
2299
2300         MACS will try to refine the peak boundaries by calculating the
2301         scores for every point in the candidate peak regions. The score
2302         will be the -10*log(10,pvalue) on a local poisson distribution. A
2303         cutoff specified by users (--pvalue) will be applied to find the
2304         precise sub-peaks in the original candidate peak region. Peak
2305         boudaries and peak summits positions will be saved in separate BED
2306         files.
2307
2308         * Single wiggle track ( in near future? )
2309
2310         A single wiggle track will be generated to save the scores within
2311         candidate peak regions in the 10bps resolution. The wiggle file
2312         is in fixedStep format.
2313
2314
2315 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2316         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2317
2318         * bin/Constants.py
2319
2320         Fixed typo. FCSTEP -> FESTEP
2321
2322         * lib/PeakDetect.py
2323
2324         The 'femax' attribute bug is fixed
2325
2326 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2327         Version 1.3.7 (Oktoberfest)
2328
2329         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2330
2331         Enhancements by Peter Chines:
2332
2333         1. gzip files are supported.
2334         2. when --diag is on, user can set the increment and endpoint for
2335         fold enrichment analysis by setting --fe-step and --fe-max.
2336
2337         Enhancements by Davide Cittaro:
2338
2339         1. BAM and SAM formats are supported.
2340         2. small changes in the header lines of wiggle output.
2341
2342         Enhancements by Me:
2343         1. I added --fe-min option;
2344         2. Bowtie ascii output with suffix ".map" is supported.
2345
2346         Bug fixed:
2347
2348         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2349         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2350         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2351         4. Some "fold change" have been changed to "fold enrichment".
2352
2353 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2354         Version 1.3.6.1 (default parameter change)
2355
2356         * bin/macs, lib/PeakDetect.py
2357
2358         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2359         default. "--futurefdr" is added which can turn on the 'new' method
2360         introduced in 1.3.6. By default it's off.
2361
2362         * lib/PeakDetect.py
2363
2364         Fixed a bug. p-value is corrected a little bit.
2365
2366
2367 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2368         Version 1.3.6 (Birthday cake)
2369
2370         * bin/macs
2371
2372         "track name" is added to the header of BED output file.
2373
2374         Now the default peak detection method is to consider 5k and 10k
2375         nearby regions in treatment data and peak location, 1k, 5k, and
2376         10k regions in control data to calculate local bias. The old
2377         method can be called through '--old' option.
2378
2379         Information about how many total/unique tags in treatment or
2380         control will be saved in final .xls output.
2381
2382         * lib/IO/__init__.py
2383
2384         ".fa" will be removed from input tag alignment so only the
2385         chromosome names are kept.
2386
2387         WigTrackI class is added for Wiggle like data structure. (not used
2388         now)
2389
2390         The parser for ELAND multi PET files has been fixed. Now the 5'
2391         tag position for a pair will be kept, whereas in the previous
2392         version, the middle points are kept.
2393
2394         * lib/IO/BinKeeper.py
2395
2396         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2397         browser, which can quickly access certain region for values in a
2398         large wiggle like data file. (not used now)
2399
2400         * lib/OptValidator.py
2401
2402         typo fixed.
2403
2404         * lib/PeakDetect.py
2405
2406         Now the default peak detection method is to consider 5k and 10k
2407         nearby regions in treatment data and peak location, 1k, 5k, and
2408         10k regions in control data to calculate local bias. The old
2409         method can be called through '--old' option.
2410
2411         Two columns have beed added to BED output file. 4th column: peak
2412         name; 5th column: peak score using -10log(10,pvalue) as score.
2413
2414         * setup.py
2415
2416         Add support to build a Mac App through 'setup.py py2app', or a
2417         Windows executable through 'setup.py py2exe'. You need to install
2418         py2app or py2exe package in order to use these functions.
2419
2420 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2421         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2422
2423         * PeakDetect.py
2424
2425         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2426         in control data to calculate local lambda for each peak. Peak
2427         calling results will be slightly different with previous version,
2428         beware!
2429
2430         * OptValidator.py
2431
2432         Typo fixed, ELANDParser -> ELANDResultParser
2433
2434         * OutputWriter.py
2435
2436         Now, modeled d value will be shown on the model figure.
2437
2438 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2439         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2440
2441         * macs, IO/__init__.py, PeakDetect.py
2442
2443         Add support for ELAND multi format. Add support for Pair-End
2444         experiment, in this case, 5'end and 3'end ELAND multi format files
2445         are required for treatment or control data. See 00README file for
2446         detail.
2447
2448         Add wigextend option.
2449
2450         Add petdist option for Pair-End Tag experiment, which is the best
2451         distance between 5' and 3' tags.
2452
2453         * PeakDetect.py
2454
2455         Fixed a bug which cause the end positions of every peak region
2456         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2457
2458         * OutputWriter.py
2459
2460         Fix bugs while generating wiggle files. The start position of
2461         wiggle file is set to 1 instead of 0.
2462
2463         Fix a bug that every 10M bps, signals in the first 'd' range are
2464         lower than actual. ( Thanks Mali Salmon!)
2465
2466
2467 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2468         Version 1.3.3 (wiggle bugs fixed)
2469
2470         * OutputWriter.py
2471
2472         Fix bugs while generating wiggle files. 1. 'span=' is added to
2473         'variableStep' line; 2. previously, every 10M bps, the coordinates
2474         were wrongly shifted to the right for 'd' basepairs.
2475
2476         * macs, PeakDetect.py
2477
2478         Add an option to save wiggle files on different resolution.
2479
2480 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2481         Version 1.3.2 (tiny bugs fixed)
2482
2483         * IO/__init__.py
2484
2485         Fix 65536 -> 65535. ( Thank Joon)
2486
2487         * Prob.py
2488
2489         Improved for binomial function with extra large number. Imported
2490         from Cistrome project.
2491
2492         * PeakDetect.py
2493
2494         If treatment channel misses reads in some chromosome included in
2495         control channel, or vice versa, MACS will not exit. (Thank Shaun
2496         Mahony)
2497
2498         Instead, MACS will fake a tag at position -1 when calling
2499         treatment peaks vs control, but will ignore the chromosome while
2500         calling negative peaks.
2501
2502 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2503         Version 1.3.1 (tiny bugs fixed version)
2504
2505         * Prob.py
2506
2507         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2508         binomial functions can tolerate large and small numbers.
2509
2510         * IO/__init__.py
2511
2512         Parsers now split lines in BED/ELAND file using any
2513         whitespaces. 'track' or 'browser' lines will be regarded as
2514         comment lines. A bug fixed when throwing StrandFormatError. The
2515         maximum redundant tag number at a single position can be no less
2516         than 65536.
2517
2518
2519 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2520         Version 1.3 (naming clarification version)
2521
2522         * Naming clarification changes according to our manuscript:
2523
2524         'frag_len' is changed to 'd'.
2525
2526         'fold_change' is changed to 'fold_enrichment'.
2527
2528         Suggest '--bw' parameter to be determined by users from the real
2529         sonication size.
2530
2531         Maximum FDR is 100% in the output file.
2532
2533         And other clarifications in 00README file and the documents on the
2534         website.
2535
2536         * IO/__init__.py
2537         If the redundant tag number at a single position is over 32767,
2538         just remember 32767, instead of raising an overflow exception.
2539
2540         * setup.py
2541         fixed a typo.
2542
2543         * PeakDetect.py
2544         Bug fixed for diagnosis report.
2545
2546
2547 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2548         Version 1.2.2gamma
2549
2550         * Serious bugs fix:
2551
2552         Poisson distribution CDF and inverse CDF functions are
2553         corrected. They can produce right results even for huge lambda
2554         now. So that the p-value and FDR values in the final excel sheet
2555         are corrected.
2556
2557         IO package now can tolerate some rare cases; ELANDParser in IO
2558         package is fixed. (Thank Bogdan)
2559
2560         * Improvement:
2561
2562         Reverse paired peaks in model are rejected. So there will be no
2563         negative 'frag_len'. (Thank Bogdan)
2564
2565         * Features added:
2566
2567         Diagnosis function is completed. Which can output a table file for
2568         users to estimate their sequencing depth.
2569
2570
2571 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2572         Version 1.2
2573
2574         * Probe.py is added!
2575
2576         GSL is totally removed from MACS. Instead, I have implemented the
2577         CDF and inverse CDF for poisson and binomial distribution purely
2578         in python.
2579
2580         * Constants.py is added!
2581
2582         Organize constants used in MACS in the Constants.py file.
2583
2584         * All other files are modified!
2585
2586         Foldchange calculation is modified. Now the foldchange only be
2587         calculated at the peak summit position instead of the whole peak
2588         region. The values will be higher and more robust than before.
2589
2590         Features added:
2591
2592         1. MACS can save wiggle format files containing the tag number at
2593         every 10 bp along the genome. Tags are shifted according to our
2594         model before they are calculated.
2595
2596         2. Model building and local lambda calculation can be skipped with
2597         certain options.
2598
2599         3. A diagnosis report can be generated through '--diag'
2600         option. This report can help you get an assumption about the
2601         sequencing saturation. This funtion is only in beta stage.
2602
2603         4. FDR calculation speed is highly improved.
2604
2605 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2606         Version 1.1
2607
2608         * TabIO, PeakModel.py ...
2609         Bug fixed to let MACS tolerate some cases while there is no tag on
2610         either plus strand or minus strand.
2611
2612         * setup.py
2613         Check the version of python. If the version is lower than 2.4,
2614         refuse to install with warning.
2615