ChangeLog

   1 2023-07-28  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS 3.0.0b3
   3
   4         * New features in MACS3:
   5
   6         1) Speed/memory optimization.  Use the cykhash to replace python
   7         dictionary. Use buffer (10MB) to read and parse input file (not
   8         available for BAM file parser). And many optimization tweaks. We
   9         added memory monitoring to the runtime messages.
  10
  11         2) Call variants in peak regions directly from BAM files. The
  12         function was originally developed under code name SAPPER. Now
  13         SAPPER has been merged into MACS. Also, `simde` has been added as
  14         a submodule in order to support fermi-lite library under non-x64
  15         architectures.
  16
  17         3) HMMRATAC module is added. HMMRATAC is a dedicated software to
  18         analyze ATAC-seq data. The basic idea behind HMMRATAC is to digest
  19         ATAC-seq data according to the fragment length of read pairs into
  20         four signal tracks: short fragments, mononucleosomal fragments,
  21         di-nucleosomal fragments and tri-nucleosomal fragments. Then
  22         integrate the four tracks again using Hidden Markov Model to
  23         consider three hidden states: open region, nucleosomal region, and
  24         background region. The orginal paper was published in 2019 written
  25         in JAVA, by Evan Tarbell. We implemented it in Python/Cython and
  26         optimize the whole process using existing MACS functions and
  27         hmmlearn. Now it can run much faster than the original JAVA
  28         version. Note: evaluation of the peak calling results is underway.
  29
  30         4) Code cleanup. Reorganize source codes.
  31
  32         5) Unit testing.
  33
  34         6) R wrappers for MACS -- MACSr
  35
  36         7) Switch to Github Action for CI, support multi-arch testing
  37         including x64, armv7, aarch64, s390x and ppc64le. We also test on
  38         Mac OS 12.
  39
  40         8) MACS tag-shifting model has been refined. Now it will use a
  41         naive peak calling approach to find ALL possible paired peaks at +
  42         and - strand, then use all of them to calculate the
  43         cross-correlation. (a related bug has been fix #442)
  44
  45         9) BAI index and random access to BAM file now is supported. #449
  46         And user can use original BAM file (instead of the subset of BAM
  47         file as in SAPPER) in the `callvar` command.
  48
  49         10) Support of Python > 3.10 #497 #498
  50
  51         11) The effective genome size parameters have been updated
  52         according to deeptools. #508
  53
  54         12) Multiple updates regarding dependencies, anaconda built, CI/CD
  55         process.
  56
  57         13) Cython support to ~0.29. Cython 3 is not supported yet.
  58
  59         * Other:
  60         1) Missing header line while no peaks can be called #501 #502
  61
  62         2) Note: different numpy, scipy, sklearn may give slightly
  63         different results for hmmratac results. The current standard
  64         results for automated testing in `/test` directory are from Numpy
  65         1.25.1, Scipy 1.11.1, and sklearn 1.3.0.
  66
  67 2020-04-11  Tao Liu  <vladimir.liu@gmail.com>
  68         MACS version 2.2.7.1
  69
  70         * hotfix:
  71
  72         Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
  73         work.
  74
  75 2020-04-10  Tao Liu  <vladimir.liu@gmail.com>
  76         MACS version 2.2.7
  77
  78         * Bugs fixed
  79
  80         1) MACS2 has been tested on multiple architectures to make sure it
  81         can successfully generate consistent results. Currently the
  82         supported architectures are: AMD64, ARM64, i386, PPC64LE, and
  83         S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
  84         #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
  85         and #370. The lesson is that if the project is built on Cython and
  86         is aimed at memory efficiency, we should specifically define all
  87         int/float types in pyx files such as int8_t or uint32_t using
  88         either libc or numpy (c version) instead of relying on Cython
  89         types such as short, long, double.
  90
  91         2) MACS2 setup script will check numpy and install numpy if
  92         necessary. PR #378, issue #364
  93
  94         3) `bdgbroadcall` command will correctly add the score column (5th
  95         column). The score (5th) column contains 10 times of the average
  96         score in the broad region. PR #373, issue #362
  97
  98         4) The missing test on `bdgopt` subcommand has been added. PR #363
  99
 100         5) The obsolete option `--ratio` from `callpeak` subcommand has
 101         been removed. PR #369, issue #366
 102
 103         6) Fixed the incorrect description in README on the 'maximum
 104         length of broad region is 4 times of d' to 'maximum gap for
 105         merging broad regions is 4 times of tag size by default'. PR #380,
 106         issue #365.
 107
 108         * Other
 109
 110         1) CODE OF CONDUCT document has been added to MACS2 github
 111         repository. PR #358
 112
 113 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
 114         MACS version 2.2.6
 115
 116         * New Features
 117
 118         1) Speed up MACS2. Some programming tricks and code cleanup. The
 119         filter_dup function replaces separate_dups. The later one was
 120         implemented for potentially putting back duplicate reads in
 121         certain downstream analysis. However such analysis hasn't been
 122         implemented. Optimize the speed of writing bedGraph
 123         files. Optimize BAM and BAMPE parsing with pointer casting instead
 124         of python unpack.
 125
 126         2) The comment lines in the headers of BED or SAM files will be
 127         correctly skipped. However, MACS2 won't check comment lines in the
 128         middle of the file.
 129
 130         * Bugs fixed
 131
 132         1) Cutoff-analysis in callpeak command. #341
 133
 134         2) Issues related to SAMParser and three ELAND Parsers are
 135         fixed. #347
 136
 137         * Other
 138
 139         1) cmdlinetest script in test/ folder has been updated to: 1. test
 140         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
 141         and after the error or warning message during tests; 3. output
 142         only the first 10 lines if the difference between test result and
 143         standard result can be found; 4. prockreport monitor CPU time and
 144         memory usage in 1 sec interval -- a bit more accurate.
 145
 146         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
 147
 148 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
 149         MACS version 2.2.5 (Py3 speed up)
 150
 151         * Features added
 152
 153         1) *Github code only and Not included in MACS2 release* New
 154         testing data for performance test. An subsampled ENCODE2 CTCF
 155         ChIP-seq dataset, including 5million ChIP reads and 5 million
 156         control reads, has been included in the test folder for testing
 157         CPU and memory usage (i.e. 5M test). Several related scripts ,
 158         including `prockreport` for output cpu memory usage, `pyprofile`
 159         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
 160         been included.
 161
 162         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
 163         The old hashtable.pyx implementation copied from Pandas (very old
 164         version) doesn't work well in Python3+Cython. It slows down the
 165         pqtable checkup using the identical Cython codes as in
 166         v2.1.4. While running 5M test, the `__getitem__` function in the
 167         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
 168         148.6s with the same number of calls in MACS2 v2.2.4. As a
 169         consequence, the standard python dictionary implementation has
 170         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
 171         faster than py2 version, but uses a bit more memory. In general,
 172         v2.2.5 can finish 5M reads test in 20% less time than MACS2
 173         v2.1.4, but use 15% more memory.
 174
 175         * Bug fixed
 176
 177         1) More Python3 related fixes, e.g. the return value of keys from
 178         py3 dict. #333 #337
 179
 180
 181 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
 182         MACS version 2.2.4 (Python3)
 183
 184         * Features added
 185
 186         1) First Python3 version MACS2 released.
 187
 188         2) Version number 2.2.X will be used for MACS2 in Python3, in
 189         parallel to 2.1.X.
 190
 191         3) More comprehensive test.sh script to check the consistency of
 192         results from Python2 version and Python3 version.
 193
 194         4) Simplify setup.py script since the newest version transparently
 195         supports cython. And when cython is not installed by the user,
 196         setup.py can still compile using only C codes.
 197
 198         5) Fix Signal.pyx to use np.array instead of np.mat.
 199
 200 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
 201         MACS version 2.1.4
 202
 203         * Features added
 204
 205         Github Actions is used together with Travis CI for testing and
 206         deployment.
 207
 208         * Bugs fixed
 209
 210         PR #322:
 211
 212         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 213         not initialized as 0 before adding. Potential bugs are fixed in
 214         other functions in ScoreTrack and CallPeakUnit codes.
 215
 216         2) #321 Cython dependency in setup.py script is removed. And place
 217         'cythonzie' call to the correct position.
 218
 219         3) A typo is fixed in Github Actions script.
 220
 221 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 222         MACS version 2.1.3.3
 223
 224         * Features added
 225
 226         1) Support Docker auto-deploy. PR #309
 227
 228         2) Support Travis CI auto-testing, update unit-testing
 229         scripts, and enable subcommand testing on small datasets.
 230
 231         3) Update README documents. #297 PR #306
 232
 233         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 234         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 235
 236         5) `--d-min` option is added in `callpeak` and `predictd`, to
 237         exclude predictions of fragment size smaller than the given
 238         value. Merged from PR #267 @shouldsee.
 239
 240         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 241         `pileup` and `refinepeak` subcommands. Users can use this option
 242         to decrease memory usage while there are a large number of contigs
 243         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 244         `pileup` and `refinepeak` will suggest users to tweak
 245         `--buffer-size` while catching a MemoryError. #313 PR #314
 246
 247         * Bugs fixed
 248
 249         1) #265 Fixed a bug where the pseudocount hasn't been applied
 250         while calculating p-value score in ScoreTrack object.
 251
 252         2) Fixed bdgbroadcall so that it will report those broad peaks
 253         without strong peak inside, a consistent behavior as `callpeak
 254         --broad`.
 255
 256         3) Rename COPYING to LICENSE.
 257
 258 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 259         MACS version 2.1.2
 260
 261         * New features
 262
 263         1) Added missing BEDPE support. And enable the support for BAMPE
 264         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 265         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 266         will pile up the whole fragment defined by mapping locations of
 267         the left end and right end of each read pair. Thank @purcaro
 268
 269         2) Added options to callpeak command for tweaking max-gap and
 270         min-len during peak calling. Thank @jsh58!
 271
 272         3) The callpeak option "--to-large" option is replaced with
 273         "--scale-to large".
 274
 275         4) The randsample option "-t" has been replaced with "-i".
 276
 277         * Bug fixes
 278
 279         1) Fixed memory issue related to #122 and #146
 280
 281         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 282
 283         3) Fixed a bug while setting commandline qvalue cutoff.
 284
 285         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 286
 287         5) Fixed the calculation of average fragment length for paired-end
 288         data. Thank @jsh58
 289
 290         6) Fixed bugs caused by khash while computing p/q-value and log
 291         likelihood ratios. Thank @jsh58
 292
 293         7) More spelling tweaks in source code. Thank @mr-c
 294
 295 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 296         MACS version 2.1.1 20160309
 297
 298         * Retire the tag:rc.
 299
 300         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 301
 302         * Change filtering criteria for reading BAM/SAM files
 303
 304         Related to callpeak and filterdup commands. Now the
 305         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 306         still be read although MACS2 may decide them as duplicates
 307         later. Related to old issue #33. Sorry I forgot to address it for
 308         years!
 309
 310 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 311         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 312
 313         * Bug fixes
 314
 315         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 316         the former option is not supported by older GCC. Related to issues
 317         #91, #109.
 318
 319         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 320         the PeakIO won't throw an error.
 321
 322         * New features
 323
 324         1) callpeak
 325
 326         a) A more flexible format, BEDPE, is supported. Now users can
 327         define the left and right position of the ChIPed fragment, and
 328         MACS2 will skip model building and directly pileup the
 329         fragments. Related to issue #112.
 330
 331         b) The 'tempdir' can be specified, to save cached pileup
 332         tracks. Originially, the temporary files were stored in
 333         /tmp. Thank @daler! Related to issues #97 and #105.
 334
 335         2) bdgopt
 336
 337         New operations are added, to calculate the maximum or minimum value between
 338         values in BEDGRAPH and given value.
 339
 340         3) bdgcmp
 341
 342         New method is added, to calculate the maximum value between values
 343         defined in two BEDGRAPH files.
 344
 345 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 346         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 347
 348         * Bug fixes
 349
 350         1) Fix a bug while dealing with some chromosomes only containing
 351         one read (pair). The size of dup_plus/dup_minus arrays after
 352         filtering dups should +1.
 353
 354         2) Fix a bug related to the broad peak calling function in
 355         previous versions. The gaps were miscalculated, so segmented weak
 356         broad calls may be reported, and sometimes you would see peaks
 357         with lower than cutoff values in the output files.
 358
 359         3) "Potentially" Fixed issue #105 on temporary cache files, need
 360         further followup.
 361
 362
 363 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 364         MACS version 2.1.0 20150731 (tag:rc)
 365
 366         * Bug fixes
 367
 368         1) Fixed issue #76: information about broad/narrow cutoff will be
 369         correctly displayed.
 370
 371         2) Fixed issue #79: bdgopt extparam option is fixed.
 372
 373         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 374         for filterdup command.
 375
 376         4) Fixed issue #78, #88 and similar issue reported in MACS google
 377         group: MACS2 now can correctly deal with multiple alignment files
 378         for -t or -c. The 'finalize' function will be correctly
 379         called. Multiple files option is enabled for filterdup,
 380         randsample, predictd, pileup and refinepeak commands.
 381
 382         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 383         be sorted by leftmost then rightmost ends.
 384
 385         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 386         array. This will cause 'callpeak --nolambda' hang forever while
 387         calculating pvalues and qvalues.
 388
 389 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 390         MACS version 2.1.0 20150420 (tag:rc)
 391
 392         * New commands
 393
 394         1) bdgopt: some convenient functions to modify bedGraph files.
 395
 396         2) cmbreps: Combine scores from two replicates. Including three
 397         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 398         method to combine two p-value scores. After that, user can use
 399         bdgpeakcall to call peaks on combined scores.
 400
 401         * New features
 402
 403         1) callpeak and bdgpeakcall now can try to analyze the
 404         relationship between p-values and number/length of peaks then
 405         generate a summary to help users decide an appropriate cutoff.
 406
 407         2) callpeak now can accept fold-enrichment cutoff as a filter for
 408         final peak calls.
 409
 410         * Performance
 411
 412         Now MACS2 runs about 3X as fast as previous version. Trade
 413         clean python codes for speed... Now while processing 50M ChIP vs
 414         50M control, it will take only 10 minutes.
 415
 416         * Bug fixes
 417
 418         1) Sampling function in BAMPE mode.
 419
 420         2) Callpeak while there are >= 2 input files for -t or -c.
 421
 422         3) While reading BAM/SAM, those secondary or supplementary
 423         alignments will be correctly skipped.
 424
 425         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 426         option that MACS2 will discard those SAM/BAM alignments with bit
 427         1024 no matter how --keep-dup is set.
 428
 429         5) Fixed issue #49: setuptools is used intead of distutils
 430
 431         6) Fixed issue #51: fix the problem when using --trackline
 432         argument when control file is absent.
 433
 434         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 435         read mapped to minus strand. Previous implementation will find
 436         incorrect 5' end if there is indel in alignment.
 437
 438         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 439         mode which will cause incorrect filtering of duplicated reads. Now
 440         fixed.
 441
 442         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 443         be 1.
 444
 445         10) Issue #71: Merged from aertslab@github, close file descriptor
 446         after creating them with mkstemp().
 447
 448 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 449         MACS version 2.1.0 20140616 (tag:rc)
 450
 451         * callpeak module
 452
 453         "--ratio" is added to manually assign the scaling factor of ChIP
 454         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 455         implementing the patch file!
 456
 457         "--shift" is added to move cutting ends (5' end of reads) around,
 458         in order to process DNAse-Seq data, e.g., use "--shift -100
 459         --extsize 200" to get 200bps fragments around 5' ends. For general
 460         ChIP-Seq data analysis, this option should be always set as
 461         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 462         group!
 463
 464         ** Do not output negative fragment size from cross-correlation
 465         analysis. Thank Alvin Qin for the feedback!
 466
 467         ** --half-ext and --control-shift are removed. For complex read
 468         shifting and extending, combine '--shift' and '--extsize'
 469         options. For comparing two conditions, use 'bdgdiff' module
 470         instead.
 471
 472         ** a bug is fixed to output the last pileup value in bdg file
 473         correctly.
 474
 475         * filterdup
 476
 477         A 'dry-run' option is added to only output numbers, including the
 478         number of allowed duplicates, the total number of reads before and
 479         after filtering duplicates and the estimated duplication
 480         rate. Thank John Urban for the suggestion!
 481
 482
 483 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 484         MACS version 2.0.10 20131216 (tag:alpha)
 485
 486         bug fixes and tweaks
 487
 488         * We changed license from Artistic License to 3-clauses BSD license.
 489
 490         Yes. Simpler the better.
 491
 492         * Process paired-end data with "-f BAMPE" without control
 493
 494         * GappedPeak output for --broad option has been fixed again to be
 495         consistent with official UCSC format. We add 1bp pseudo-block to
 496         left and/or right of broad region when necessary, so that you can
 497         virtualize the regions without strong enrichment inside
 498         successfully. In downstream analysis except for virtualization,
 499         you may need to remove all 1bps blocks from gappedPeak file.
 500
 501         * diffpeak subcommand is temporarily disabled. Till we
 502         re-implement it.
 503
 504 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 505         MACS version 2.0.10 20131028 (tag:alpha)
 506
 507         * callpeak --call-summits improvement
 508
 509         The smoothing window length has been fixed as fragment length
 510         instead of short read length. The larger smoothing window will
 511         grant better smoothing results and better sub-peak summits
 512         detection.
 513
 514         * --outdir and --ofile options for almost all commands
 515
 516         Thank Björn Grüning for initially implementing these options!
 517         Now, MACS2 will save results into a specified
 518         directory by '--outdir' option, and/or save result into a
 519         specified file by '--ofile' option. Note, in case '--ofile' is
 520         available for a subcommand, '-o' now has been adjusted to be the
 521         same as '--ofile' instead of '--o-prefix'.
 522
 523         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 524         for each subcommand:
 525
 526         ** callpeak: --outdir
 527         ** diffpeak: Not implemented
 528         ** bdgpeakcall: --outdir and --ofile
 529         ** bdgbroadcall: --outdir and --ofile
 530         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 531         and the order of arguments for --ofile must be the same as for -m.
 532         ** bdgdiff: --outdir and --ofile
 533         ** filterdup: --outdir
 534         ** pileup: --outdir
 535         ** randsample: --outdir
 536         ** refinepeak: --outdir and --ofile
 537
 538
 539 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 540         MACS version 2.0.10 20130915 (tag:alpha)
 541
 542         * callpeak Added a new option --buffer-size
 543
 544         This option is to tweak a previously hidden parameter that
 545         controls the steps to increase array size for storing alignment
 546         information. While in some rare cases, the number of
 547         chromosomes/contigs/scaffolds is huge, the original default
 548         setting will cause a huge memory waste. In these cases, we
 549         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 550         although the decrease will slow process to read alignment files.
 551
 552         * an optimization to speed up pvalue-qvalue statistics
 553
 554         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 555         human TF library, and now it will take 10 minutes. It was due to a
 556         single line of code to get a value from a numpy array ...
 557
 558         * fixed logLR bugs.
 559
 560 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 561         MACS version 2.0.10 20130731 (tag:alpha)
 562
 563         * callpeak --call-summits
 564
 565         Fix bugs causing callpeak --call-summits option generating extra
 566         number of peaks and inconsistent peak boundaries comparing to
 567         default option. Thank Ben Levinson!
 568
 569         * bdgcmp output
 570
 571         Fix bugs causing bdgcmp output logLR all in positive values. Now
 572         'depletion' can be correctly represented as negative values.
 573
 574         * bdgdiff
 575
 576         Fix the behavior of bdgdiff module. Now it can take four
 577         bedGraph files, then use logLR as cutoff to call differential
 578         regions. Check command line of bdgdiff for detail.
 579
 580 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 581         MACS version 2.0.10 20130713 (tag:alpha)
 582
 583         * fix bugs while output broadPeak and gappedPeak.
 584
 585         Note. Those weak broad regions without any strong enrichment
 586         regions inside won't be saved in gappedPeak file.
 587
 588         * bdgcmp -T and -C are merged into -S and description is updated.
 589
 590         Now, you can use it to override SPMR values in your input for
 591         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 592         statistics will cause weird results ( in most cases, lower
 593         significancy), and won't be consistent with MACS2 callpeak
 594         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 595         sample size in MILLION according to 'callpeak --to-large' option.
 596
 597 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 598         MACS version 2.0.10 20130710 (tag:alpha)
 599
 600         * fix BED style output format of callpeak module:
 601
 602         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 603         the output. Old BED format file won't be saved.
 604
 605         2) with --broad: broadPeak (BED6+3) for broad region and
 606         gappedPeak (BED12+3) for chained enriched regions will be the
 607         output. Old BED format, narrowPeak format, summit file won't be
 608         saved.
 609
 610         * bdgcmp now can accept list of methods to calculate scores. So
 611         you can run it once to generate multiple types of scores. Thank
 612         Jon Urban for this suggestion!
 613
 614         * C codes are re-generated through Cython 0.19.1.
 615
 616 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 617         MACS version 2.0.10 20130520 (tag:alpha)
 618
 619         * broad peak calling modules are modified in order to report all
 620         relexed regions even there is no strong enrichment inside.
 621
 622 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 623         MACS version 2.0.10 20130501 (tag:alpha)
 624
 625         * Memory usage is decreased to about 1/4-1/5 of previous usage
 626         Now, the internal data structure and algorithm are both
 627         re-organized, so that intermediate data wouldn't be saved in
 628         memory. Intead they will be calculated on the fly. New MACS2 will
 629         spend longer time (1.5 to 2 times) however it will use less memory
 630         so can be more usable on small mem servers.
 631
 632         * --seed option is added to callpeak and randsample commands
 633         Thank Mathieu Gineste for this suggestion!
 634
 635 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 636         MACS version 2.0.10 20130306 (tag:alpha)
 637
 638         * diffpeak module New module to detect differential binding sites
 639         with more statistics.
 640
 641         * Introduced --refine-peaks
 642         Calculates reads balancing to refine peak summits
 643
 644         * Ouput file names prefix
 645         Correct encodePeak to narrowPeak, broadPeak to bed12.
 646
 647 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 648         MACS version 2.0.10 (tag:alpha not released)
 649
 650         * Introduced BAMPEParser
 651         Reads PE data directly, requires bedtools for now
 652
 653         * Introduced --call-summits
 654         Uses signal processing methods to call overlapping peaks
 655
 656         * Added --no-trackline
 657         By default, files have descriptive tracklines now
 658
 659         * new refinepeak command (experimental)
 660         This new function will use a similar method in SPP (wtd), to
 661         analyze raw tag distribution in peak region, then redefine the
 662         peak summit where plus and minus tags are evenly distributed
 663         around.
 664
 665         * Changes to output *
 666         cPeakDetect.pyx has full support for new print/write methods and
 667         --call-peaks, BAMPEParser, and use of paired-end data
 668
 669         * Parser optimization
 670
 671         cParser.pyx is rewritten to use io.BufferedReader to speed
 672         up. Speed is doubled.
 673
 674         Code is reorganized -- most of functions are inherited from
 675         GenericParser class.
 676
 677         * Use cross-correlation to calculate fragment size
 678
 679         First, all pairs will be used in prediction for fragment
 680         size. Previously, only no more than 1000 pairs are used. Second,
 681         cross-correlation is used to find the best phase difference
 682         between + and - tag pileups.
 683
 684         * Speed up p-value and q-value calculation
 685
 686         This part is ten times faster now. I am using a dictionary to
 687         cache p-value results from Poisson CDF function. A bit more memory
 688         will be used to increase speed. I hope this dictionary would not
 689         explode since the possible pairs of ChIP signal and control lambda
 690         are hugely redundant. Also, I rewrited part of q-value
 691         calculation.
 692
 693         * Speed up peak detection
 694
 695         This part is about hundred of times faster now.  Optimizations
 696         include using Numpy functions as much as possible, and making loop
 697         body as small as possible.
 698
 699         * Post-processing on differential calls
 700
 701         After macs2diff finds differential binding sites between two
 702         conditions, it will try to annotate the peak calls from one of two
 703         conditions, describe the changes ...
 704
 705         * Fragment size prediction in macs2diff
 706
 707         Now by default, macs2diff will try to use the average fragment
 708         size from both condition 1 and condition 2 for tag extension and
 709         peak calling. Previously, by default, it will use different sizes
 710         unless --nomodel is specified.
 711
 712         Technically, I separate model building processes out. So macs2diff
 713         will build fragment sizes for condition 1 and 2 in parallel (2
 714         processes maximum), then perform 4-way comparisons in parallel (4
 715         processes maximum).
 716
 717         * Diff score
 718
 719         Combine two p/qscore tracks together. At regions where condition 1
 720         is higher than condition 2, score would be positive, otherwise,
 721         negative.
 722
 723         * SAMParser and BAMParser
 724
 725         Bug fixed for paired-end sequencing data.
 726
 727         * BedGraph.pyx
 728
 729         Fixed a bug while calling peaks from BedGraph file. It previously
 730         mistakenly output same peaks multiple times at the end of
 731         chromosome.
 732
 733 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 734         MACS version 2.0.9 (tag:alpha)
 735
 736         * Auto fixation on predicted d is turned off by default!
 737
 738         Previous --off-auto is now default. MACS will not automatically
 739         fix d less than 2 times of tag size according to
 740         --shiftsize. While tag size is getting longer nowadays, it would
 741         be easier to have d less than 2 times of tag size, however d may
 742         still be meaningful and useful. Please judge it using your own
 743         wisdom.
 744
 745         * Scaling issue
 746
 747         Now, the default scaling while treatment and input are unbalanced
 748         has been adjusted. By default, larger sample will be scaled down
 749         linearly to match the smaller sample. In this way, background
 750         noise will be reduced more than real signals, so we expect to have
 751         more specific results than the other way around (i.e. --to-large
 752         is set).
 753
 754         Also, an alternative option to randomly sample larger data
 755         (--down-sample) is provided to replace default linear
 756         scaling. However, this option will cause results irresproducible,
 757         so be careful.
 758
 759         * randsample script
 760
 761         A new script 'randsample'  is added, which can randomly sample
 762         certain percentage or number of tags.
 763
 764         * Peak summit
 765
 766         Now, MACS will decide peak summits according to pileup height
 767         instead of qvalue scores. In this way, the summit may be more
 768         accurate.
 769
 770         * Diff score
 771
 772         MACS calculate qvalue scores as differential scores. When compare
 773         two conditions (saying A and B), the maximum qscore for comparing
 774         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 775         will be computed. If maxqscore_a2b is bigger, the diff score is
 776         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 777
 778 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 779         MACS version 2.0.8 (tag:alpha)
 780
 781         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 782
 783         New script bdgbroadcall and the extra option '--broad' for macs2
 784         script, can be used to call broad regions with a loose cutoff to
 785         link nearby significant regions. The output is represented as
 786         BED12 format.
 787
 788         * MACS2/IO/cScoreTrack.pyx
 789
 790         Fix q-value calculation to generate forcefully monotonic values.
 791
 792         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 793
 794         They are combined to one more powerful script called
 795         "filterdup". The script filterdup can filter duplicated reads
 796         according to sequencing depth and genome size. The script can also
 797         convert any format supported by MACS to BED format.
 798
 799 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 800         MACS version 2.0.7 (tag:alpha)
 801
 802         * bin/macsdiff renamed to bin/bdgdiff
 803
 804         Now this script will work as a low-level finetuning tool as bdgcmp
 805         and bdgpeakcall.
 806
 807         * bin/macs2diff
 808
 809         A new script to take treatment and control files from two
 810         condition, calculate fragment size, use local poisson to get
 811         pvalues and BH process to get qvalues, then combine 4-ways result
 812         to call differential sites.
 813
 814         This script can use upto 4 cpus to speed up 4-ways calculation. (
 815         I am trying multiprocessing in python. )
 816
 817         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 818         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 819         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 820
 821         All above files are modified for the new macs2diff script.
 822
 823         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 824
 825         Now q-value 0.01 is the default cutoff. If -p is specified,
 826         p-value cutoff will be used instead.
 827
 828 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 829         MACS version 2.0.6 (tag:alpha)
 830
 831         * bin/macsdiff
 832
 833         A script to call differential regions. A naive way is introduced
 834         to find the regions where:
 835
 836         1. signal from condition 1 is larger than input 1 and condition 2 --
 837         unique region in condition 1;
 838         2. signal from condition 2 is larger than input 2 and condition 1
 839         -- unique region in condition 2;
 840         3. signal from condition 1 is larger than input 1, signal from
 841         condition 2 is larger than input 2, however either signal from
 842         condition 1 or 2 is not larger than the other.
 843
 844         Here 'larger' means the pvalue or qvalue from a Poisson test is
 845         under certain cutoff.
 846
 847         (I will make another script to wrap up mulitple scripts for
 848         differential calling)
 849
 850 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 851         MACS version 2.0.5 (tag:alpha)
 852
 853         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 854         MACS2/IO/cPeakIO.pyx
 855
 856         Use hash to store peak information. Add back the feature to deal
 857         with data without control.
 858
 859         Fix bug which incorrectly allows small peaks at the end of
 860         chromosomes.
 861
 862         * bin/bdgpeakcall, bin/bdgcmp
 863
 864         Fix bugs. bdgpeakcall can output encodePeak format.
 865
 866 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
 867         MACS version 2.0.4 (tag:alpha)
 868
 869         * cPeakDetect.py
 870
 871         Fix a bug, correctly assign lambda_bg while --to-small is
 872         set. Thanks Junya Seo!
 873
 874         Add rank and num of bp columns to pvalue-qvalue table.
 875
 876         * cScoreTrack.py
 877
 878         Fix bugs to correctly deal with peakless chromosomes. Thanks
 879         Vaibhav Jain!
 880
 881         Use AFDR for independent tests instead.
 882
 883         * encodePeak
 884
 885         Now MACS can output peak coordinates together with pvalue, qvalue,
 886         summit positions in a single encodePeak format (designed for
 887         ENCODE project) file. This file can be loaded to UCSC
 888         browser. Definition of some specific columns are: 5th:
 889         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
 890         -log10qvalue, 10th: relative summit position to peak start.
 891
 892
 893 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 894         MACS version 2.0.3 (tag:alpha)
 895
 896         * Rich output with qvalue, fold enrichment, and pileup height
 897
 898         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
 899         procedure:
 900
 901         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
 902
 903         Now we have a similiar xls output file as before. The differences
 904         from previous file are:
 905
 906         1. Summit now is absolute summit, instead of relative summit
 907            position;
 908         2. 'Pileup' is previous 'tag' column. It's the extended fragment
 909            pileup at the peak summit;
 910         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
 911            5.00 means 1e-5, simple and less confusing.
 912         4. FDR column becomes '-log10(qvalue)' column.
 913         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
 914            the values at the peak summit.
 915
 916         * Extra output files
 917
 918         NAME_pqtable.txt contains pvalue and qvalue relationships.
 919
 920         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
 921         and -log10qvalue scores in BedGraph format. Nearby regions with
 922         the same value are not merged.
 923
 924         * Separation of FeatIO.py
 925
 926         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
 927         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
 928         implemented to store pileup, local lambda, pvalue, and qvalue
 929         alltogether in cScoreTrack.pyx.
 930
 931         * Experimental option --half-ext
 932
 933         Suggested by NPS algorithm, I added an experimental option
 934         --half-ext to let MACS only extends ChIP fragment around its
 935         middle point for only 1/2 d.
 936
 937 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
 938         MACS version 2.0.2 (tag:alpha)
 939
 940         * macs2
 941
 942         Add an error check to see if there is no common chromosome names
 943         from treatment file and control file
 944
 945         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
 946
 947         Reduce memory usage by removing deepcopy() calls.
 948
 949         * Modify README documents and others.
 950
 951 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 952         MACS Version 2.0.1 (tag:alpha)
 953
 954         * cPileup.pyx, cPeakDetect.pyx and peak calling process
 955
 956         Jie suggested me a brilliant simple method to pileup fragments
 957         into bedGraph track. It works extremely faster than the previous
 958         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
 959         large local lambda calculation in MACSv2 now. Now I generate three
 960         bedGraphs for d-size local bias, slocal-size and llocal-size local
 961         bias, and calculate the maximum local bias as local lambda
 962         bedGraph track.
 963
 964         Minor: add_loc in bedGraphTrackI now can correctly merge the
 965         region with its preceding region if their value are the same.
 966
 967         * macs2
 968
 969         Add an option to shift control tags before extension. By default,
 970         control tags will be extended to both sides regardless of strand
 971         information.
 972
 973 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
 974         MACS Version 2.0.0 (tag:alpha)
 975
 976         * Use bedGraph type to store data internally and externally.
 977
 978         We can have theoretically one-basepair resolution profiles. 10
 979         times smaller in filesize and even smaller after converting to
 980         bigWig for visualization.
 981
 982         * Peak calling process modified. Better peak boundary detection.
 983
 984         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
 985         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
 986         one will be averaged to d size) Then calculate the maximum value
 987         of these two tracks and a global background, to have a
 988         local-lambda bedGraph.
 989
 990         Use -10log10poisson_pvalue as scores to generate a score track
 991         before peak calling.
 992
 993         A general peak calling based on a score cutoff, min length of peak
 994         and max gap between nearby peaks.
 995
 996         * Option changes.
 997
 998         Wiggle file output is removed. Now we only support bedGraph
 999         output. The generation of bedGraph is highly recommended since it
1000         will not cost extra time. In other words, bedGraph generation is
1001         internally run even you don't want to save bedGraphs on disk, due
1002         to the peak calling algorithm in MACS v2.
1003
1004         * cProb.pyx
1005
1006         We now can calculate poisson pvalue in log space so that the score
1007         (-10*log10pvalue) will not have a upper limit of 3100 due to
1008         precision of float number.
1009
1010         * Cython is adopted to speed up Python code.
1011
1012 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1013         Small fixes
1014
1015         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1016
1017 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1018         Version 1.4.0rc2 (Valentine)
1019
1020         * --single-wig option is renamed to --single-profile
1021
1022         * BedGraph output with --bdg or -B option.
1023
1024         The BedGraph output provides 1bp resolution fragment pileup
1025         profile. File size is smaller than wig file. This option can be
1026         combined with --single-profile option to produce a bedgraph file
1027         for the whole genome. This option can also make --space,
1028         --call-subpeaks invalid.
1029
1030         * Fix the description of --shiftsize to correctly state that the
1031         value is 1/2 d (fragment size).
1032
1033         * Fix a bug in the call to __filter_w_control_tags when control is
1034         not available.
1035
1036         * Fix a bug on --to-small option. Now it works as expected.
1037
1038         * Fix a bug while counting the tags in candidate peak region, an
1039         extra tag may be included. (Thanks to Jake Biesinger!)
1040
1041         * Fix the bug for the peaks extended outside of chromosome
1042         start. If the minus strand tag goes outside of chromosome start
1043         after extension of d, it will be thrown out.
1044
1045         * Post-process script for a combined wig file:
1046
1047         The "wignorm" command can be called after a full run of MACS14 as
1048         a postprocess. wignorm can calculate the local background from the
1049         control wig file from MACS14, then use either foldchange,
1050         -10*log10(pvalue) from possion test, or difference after asinh
1051         transformation as the score to build a single wig track to
1052         represent the binding strength. This script will take a
1053         significant long time to process.
1054
1055         * --wigextend has been obsoleted.
1056
1057 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1058         Version 1.4.0rc1 (Starry Sky)
1059
1060         * Duplicate reads option
1061
1062         --keep-dup behavior is changed. Now user can specify how many
1063         reads he/she wants to keep at the same genomic location. 'auto' to
1064         let MACS decide the number based on binomial distribution, 'all'
1065         to let MACS keep all reads.
1066
1067         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1068
1069         By default, MACS will now scale the smaller dataset to the bigger
1070         dataset. For instance, if IP has 10 million reads, and Input has 5
1071         million, MACS will double the lambda value calculated from Input
1072         reads while calling BOTH the positive peaks and negative
1073         peaks. This will address the issue caused by unbalanced numbers of
1074         reads from IP and Input. If --to-small is turned on, MACS will
1075         scale the larger dataset to the smaller one. So from now on, if d
1076         is fixed, then the peaks from a MACS call for A vs B should be
1077         identical to the negative peaks from a B vs A.
1078
1079 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1080         Version 1.4.0beta (summer wishes)
1081
1082         * New features
1083
1084         ** Model building
1085
1086         The default behavior in the model building step is slightly
1087         changed. When MACS can't find enough pairs to build model
1088         (implemented in alpha version) or the modeled fragment length is
1089         less than 2 times of tag length (implemented in beta version),
1090         MACS will use 2 times of --shiftsize value as fragment length in
1091         the later analysis. --off-auto can turn off this default behavior.
1092
1093         ** Redundant tag filtering
1094
1095         The IO module is rewritten. The redundant tag filtering process
1096         becomes simpler and works as promise. The maximum allowed number
1097         of tags at the exact same location is calculated from the
1098         sequencing depth and genome size using a binomial distribution,
1099         for both TREAMENT and CONTROL separately. ( previously only
1100         TREATMENT is considered ) The exact same location means the same
1101         coordination and the same strand. Then MACS will only keep at most
1102         this number of tags at the exact same location in the following
1103         analysis. An option --keep-dup can let MACS skip the filtering and
1104         keep all the tags. However this may bring in a lot of sequencing
1105         bias, so you may get many false positive peaks.
1106
1107         ** Single wiggle mode
1108
1109         First thing to mention, this is not the score track that I
1110         described before. By default, MACS generates wiggle files for
1111         fragment pileup for every chromosomes separately. When you use
1112         --single-wig option, MACS will generate a single wiggle file for
1113         all the chromosomes so you will get a wig.gz for TREATMENT and
1114         another wig.gz for CONTROL if available.
1115
1116         ** Sniff -- automatic format detection
1117
1118         Now, by default or "-f AUTO", MACS will decide the input file
1119         format automatically. Technically, it will try to read at most
1120         1000 records for the first 10 non-comment lines. If it succeeds,
1121         the format is decided. I recommend not to use AUTO and specify the
1122         right format for your input files, unless you combine different
1123         formats in a single MACS run.
1124
1125         * Options changes
1126
1127         --single-wig and --keep-dup are added. Check previous section in
1128         ChangeLog for detail.
1129
1130         -f (--format) AUTO is now the default option.
1131
1132         --slocal default: 1000
1133         --llocal default: 10000
1134
1135         * Bug fixed
1136
1137         Setup script will stop the installation if python version is not
1138         python2.6 or python2.7.
1139
1140         Local lambda calculation has been changed back. MACS will check
1141         peak_region, slocal( default 1K) and llocal (default 10K) for the
1142         local bias. The previous 200bps default will cause MACS misses
1143         some peaks where the input bias is very sharp.
1144
1145         sam2bed.py script is corrected.
1146
1147         Relative pos in xls output is fixed.
1148
1149         Parser for ELAND_export is fixed to pass some of the no match
1150         lines. And elandexport2bed.py is fixed too. ( however I can't
1151         guarantee that it works on any eland_export files. )
1152
1153 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1154         Version 1.4.0alpha2 (be smarter)
1155
1156         * Options changes
1157
1158         --gsize now provides shortcuts for common genomes, including
1159         human, mouse, C. elegans and fruitfly.
1160
1161         --llocal now will be 5000 bps if there is no input file, so that
1162         local lambda doesn't overkill enriched binding sites.
1163
1164 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1165         Version 1.4alpha (be smarter)
1166
1167         * Options changes
1168
1169         --tsize option is redesigned. MACS will use the first 10 lines of
1170         the input to decide the tag size. If user specifies --tsize, it
1171         will override the auto decided tsize.
1172
1173         --lambdaset is replaced by --slocal and --llocal which mean the
1174         small local region and large local region.
1175
1176         --bw has no effect on the scan-window size now. It only affects the
1177         paired-peaks model process.
1178
1179         * Model building
1180
1181         During the model building, MACS will pick out the enriched regions
1182         which are not too high and not too low to build the paired-peak
1183         model. Default the region is from fold 10 to fold 30. If MACS
1184         fails to build the model, by default it will use the nomodel
1185         settings, like shiftsize=100bps, to shift and extend each
1186         tags. This behavior can be turned off by '--off-auto'.
1187
1188         * Output files
1189
1190         An extra file including all the summit positions are saved in
1191         *_summits.bed file. An option '--call-subpeaks' will invoke
1192         PeakSplitter developed by Mali Salmon to split wide peaks into
1193         smaller subpeaks.
1194
1195         * Sniff ( will in beta )
1196
1197         Automatically recognize the input file format, so use can combine
1198         different format in one MACS run.
1199
1200         Not implemented features/TODO:
1201
1202         * Algorithms ( in near future? )
1203
1204         MACS will try to refine the peak boundaries by calculating the
1205         scores for every point in the candidate peak regions. The score
1206         will be the -10*log(10,pvalue) on a local poisson distribution. A
1207         cutoff specified by users (--pvalue) will be applied to find the
1208         precise sub-peaks in the original candidate peak region. Peak
1209         boudaries and peak summits positions will be saved in separate BED
1210         files.
1211
1212         * Single wiggle track ( in near future? )
1213
1214         A single wiggle track will be generated to save the scores within
1215         candidate peak regions in the 10bps resolution. The wiggle file
1216         is in fixedStep format.
1217
1218
1219 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1220         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1221
1222         * bin/Constants.py
1223
1224         Fixed typo. FCSTEP -> FESTEP
1225
1226         * lib/PeakDetect.py
1227
1228         The 'femax' attribute bug is fixed
1229
1230 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1231         Version 1.3.7 (Oktoberfest)
1232
1233         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1234
1235         Enhancements by Peter Chines:
1236
1237         1. gzip files are supported.
1238         2. when --diag is on, user can set the increment and endpoint for
1239         fold enrichment analysis by setting --fe-step and --fe-max.
1240
1241         Enhancements by Davide Cittaro:
1242
1243         1. BAM and SAM formats are supported.
1244         2. small changes in the header lines of wiggle output.
1245
1246         Enhancements by Me:
1247         1. I added --fe-min option;
1248         2. Bowtie ascii output with suffix ".map" is supported.
1249
1250         Bug fixed:
1251
1252         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1253         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1254         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1255         4. Some "fold change" have been changed to "fold enrichment".
1256
1257 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1258         Version 1.3.6.1 (default parameter change)
1259
1260         * bin/macs, lib/PeakDetect.py
1261
1262         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1263         default. "--futurefdr" is added which can turn on the 'new' method
1264         introduced in 1.3.6. By default it's off.
1265
1266         * lib/PeakDetect.py
1267
1268         Fixed a bug. p-value is corrected a little bit.
1269
1270
1271 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1272         Version 1.3.6 (Birthday cake)
1273
1274         * bin/macs
1275
1276         "track name" is added to the header of BED output file.
1277
1278         Now the default peak detection method is to consider 5k and 10k
1279         nearby regions in treatment data and peak location, 1k, 5k, and
1280         10k regions in control data to calculate local bias. The old
1281         method can be called through '--old' option.
1282
1283         Information about how many total/unique tags in treatment or
1284         control will be saved in final .xls output.
1285
1286         * lib/IO/__init__.py
1287
1288         ".fa" will be removed from input tag alignment so only the
1289         chromosome names are kept.
1290
1291         WigTrackI class is added for Wiggle like data structure. (not used
1292         now)
1293
1294         The parser for ELAND multi PET files has been fixed. Now the 5'
1295         tag position for a pair will be kept, whereas in the previous
1296         version, the middle points are kept.
1297
1298         * lib/IO/BinKeeper.py
1299
1300         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1301         browser, which can quickly access certain region for values in a
1302         large wiggle like data file. (not used now)
1303
1304         * lib/OptValidator.py
1305
1306         typo fixed.
1307
1308         * lib/PeakDetect.py
1309
1310         Now the default peak detection method is to consider 5k and 10k
1311         nearby regions in treatment data and peak location, 1k, 5k, and
1312         10k regions in control data to calculate local bias. The old
1313         method can be called through '--old' option.
1314
1315         Two columns have beed added to BED output file. 4th column: peak
1316         name; 5th column: peak score using -10log(10,pvalue) as score.
1317
1318         * setup.py
1319
1320         Add support to build a Mac App through 'setup.py py2app', or a
1321         Windows executable through 'setup.py py2exe'. You need to install
1322         py2app or py2exe package in order to use these functions.
1323
1324 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1325         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1326
1327         * PeakDetect.py
1328
1329         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1330         in control data to calculate local lambda for each peak. Peak
1331         calling results will be slightly different with previous version,
1332         beware!
1333
1334         * OptValidator.py
1335
1336         Typo fixed, ELANDParser -> ELANDResultParser
1337
1338         * OutputWriter.py
1339
1340         Now, modeled d value will be shown on the model figure.
1341
1342 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1343         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1344
1345         * macs, IO/__init__.py, PeakDetect.py
1346
1347         Add support for ELAND multi format. Add support for Pair-End
1348         experiment, in this case, 5'end and 3'end ELAND multi format files
1349         are required for treatment or control data. See 00README file for
1350         detail.
1351
1352         Add wigextend option.
1353
1354         Add petdist option for Pair-End Tag experiment, which is the best
1355         distance between 5' and 3' tags.
1356
1357         * PeakDetect.py
1358
1359         Fixed a bug which cause the end positions of every peak region
1360         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1361
1362         * OutputWriter.py
1363
1364         Fix bugs while generating wiggle files. The start position of
1365         wiggle file is set to 1 instead of 0.
1366
1367         Fix a bug that every 10M bps, signals in the first 'd' range are
1368         lower than actual. ( Thanks Mali Salmon!)
1369
1370
1371 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1372         Version 1.3.3 (wiggle bugs fixed)
1373
1374         * OutputWriter.py
1375
1376         Fix bugs while generating wiggle files. 1. 'span=' is added to
1377         'variableStep' line; 2. previously, every 10M bps, the coordinates
1378         were wrongly shifted to the right for 'd' basepairs.
1379
1380         * macs, PeakDetect.py
1381
1382         Add an option to save wiggle files on different resolution.
1383
1384 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1385         Version 1.3.2 (tiny bugs fixed)
1386
1387         * IO/__init__.py
1388
1389         Fix 65536 -> 65535. ( Thank Joon)
1390
1391         * Prob.py
1392
1393         Improved for binomial function with extra large number. Imported
1394         from Cistrome project.
1395
1396         * PeakDetect.py
1397
1398         If treatment channel misses reads in some chromosome included in
1399         control channel, or vice versa, MACS will not exit. (Thank Shaun
1400         Mahony)
1401
1402         Instead, MACS will fake a tag at position -1 when calling
1403         treatment peaks vs control, but will ignore the chromosome while
1404         calling negative peaks.
1405
1406 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1407         Version 1.3.1 (tiny bugs fixed version)
1408
1409         * Prob.py
1410
1411         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1412         binomial functions can tolerate large and small numbers.
1413
1414         * IO/__init__.py
1415
1416         Parsers now split lines in BED/ELAND file using any
1417         whitespaces. 'track' or 'browser' lines will be regarded as
1418         comment lines. A bug fixed when throwing StrandFormatError. The
1419         maximum redundant tag number at a single position can be no less
1420         than 65536.
1421
1422
1423 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1424         Version 1.3 (naming clarification version)
1425
1426         * Naming clarification changes according to our manuscript:
1427
1428         'frag_len' is changed to 'd'.
1429
1430         'fold_change' is changed to 'fold_enrichment'.
1431
1432         Suggest '--bw' parameter to be determined by users from the real
1433         sonication size.
1434
1435         Maximum FDR is 100% in the output file.
1436
1437         And other clarifications in 00README file and the documents on the
1438         website.
1439
1440         * IO/__init__.py
1441         If the redundant tag number at a single position is over 32767,
1442         just remember 32767, instead of raising an overflow exception.
1443
1444         * setup.py
1445         fixed a typo.
1446
1447         * PeakDetect.py
1448         Bug fixed for diagnosis report.
1449
1450
1451 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1452         Version 1.2.2gamma
1453
1454         * Serious bugs fix:
1455
1456         Poisson distribution CDF and inverse CDF functions are
1457         corrected. They can produce right results even for huge lambda
1458         now. So that the p-value and FDR values in the final excel sheet
1459         are corrected.
1460
1461         IO package now can tolerate some rare cases; ELANDParser in IO
1462         package is fixed. (Thank Bogdan)
1463
1464         * Improvement:
1465
1466         Reverse paired peaks in model are rejected. So there will be no
1467         negative 'frag_len'. (Thank Bogdan)
1468
1469         * Features added:
1470
1471         Diagnosis function is completed. Which can output a table file for
1472         users to estimate their sequencing depth.
1473
1474
1475 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1476         Version 1.2
1477
1478         * Probe.py is added!
1479
1480         GSL is totally removed from MACS. Instead, I have implemented the
1481         CDF and inverse CDF for poisson and binomial distribution purely
1482         in python.
1483
1484         * Constants.py is added!
1485
1486         Organize constants used in MACS in the Constants.py file.
1487
1488         * All other files are modified!
1489
1490         Foldchange calculation is modified. Now the foldchange only be
1491         calculated at the peak summit position instead of the whole peak
1492         region. The values will be higher and more robust than before.
1493
1494         Features added:
1495
1496         1. MACS can save wiggle format files containing the tag number at
1497         every 10 bp along the genome. Tags are shifted according to our
1498         model before they are calculated.
1499
1500         2. Model building and local lambda calculation can be skipped with
1501         certain options.
1502
1503         3. A diagnosis report can be generated through '--diag'
1504         option. This report can help you get an assumption about the
1505         sequencing saturation. This funtion is only in beta stage.
1506
1507         4. FDR calculation speed is highly improved.
1508
1509 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1510         Version 1.1
1511
1512         * TabIO, PeakModel.py ...
1513         Bug fixed to let MACS tolerate some cases while there is no tag on
1514         either plus strand or minus strand.
1515
1516         * setup.py
1517         Check the version of python. If the version is lower than 2.4,
1518         refuse to install with warning.
1519
1520
1521 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1522         MACS version 2.0.10 20130731 (tag:alpha)
1523
1524         * callpeak --call-summits
1525
1526         Fix bugs causing callpeak --call-summits option generating extra
1527         number of peaks and inconsistent peak boundaries comparing to
1528         default option. Thank Ben Levinson!
1529
1530         * bdgcmp output
1531
1532         Fix bugs causing bdgcmp output logLR all in positive values. Now
1533         'depletion' can be correctly represented as negative values.
1534
1535         * bdgdiff
1536
1537         Fix the behavior of bdgdiff module. Now it can take four
1538         bedGraph files, then use logLR as cutoff to call differential
1539         regions. Check command line of bdgdiff for detail.
1540
1541 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1542         MACS version 2.0.10 20130713 (tag:alpha)
1543
1544         * fix bugs while output broadPeak and gappedPeak.
1545
1546         Note. Those weak broad regions without any strong enrichment
1547         regions inside won't be saved in gappedPeak file.
1548
1549         * bdgcmp -T and -C are merged into -S and description is updated.
1550
1551         Now, you can use it to override SPMR values in your input for
1552         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1553         statistics will cause weird results ( in most cases, lower
1554         significancy), and won't be consistent with MACS2 callpeak
1555         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1556         sample size in MILLION according to 'callpeak --to-large' option.
1557
1558 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1559         MACS version 2.0.10 20130710 (tag:alpha)
1560
1561         * fix BED style output format of callpeak module:
1562
1563         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1564         the output. Old BED format file won't be saved.
1565
1566         2) with --broad: broadPeak (BED6+3) for broad region and
1567         gappedPeak (BED12+3) for chained enriched regions will be the
1568         output. Old BED format, narrowPeak format, summit file won't be
1569         saved.
1570
1571         * bdgcmp now can accept list of methods to calculate scores. So
1572         you can run it once to generate multiple types of scores. Thank
1573         Jon Urban for this suggestion!
1574
1575         * C codes are re-generated through Cython 0.19.1.
1576
1577 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1578         MACS version 2.0.10 20130520 (tag:alpha)
1579
1580         * broad peak calling modules are modified in order to report all
1581         relexed regions even there is no strong enrichment inside.
1582
1583 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1584         MACS version 2.0.10 20130501 (tag:alpha)
1585
1586         * Memory usage is decreased to about 1/4-1/5 of previous usage
1587         Now, the internal data structure and algorithm are both
1588         re-organized, so that intermediate data wouldn't be saved in
1589         memory. Intead they will be calculated on the fly. New MACS2 will
1590         spend longer time (1.5 to 2 times) however it will use less memory
1591         so can be more usable on small mem servers.
1592
1593         * --seed option is added to callpeak and randsample commands
1594         Thank Mathieu Gineste for this suggestion!
1595
1596 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1597         MACS version 2.0.10 20130306 (tag:alpha)
1598
1599         * diffpeak module New module to detect differential binding sites
1600         with more statistics.
1601
1602         * Introduced --refine-peaks
1603         Calculates reads balancing to refine peak summits
1604
1605         * Ouput file names prefix
1606         Correct encodePeak to narrowPeak, broadPeak to bed12.
1607
1608 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1609         MACS version 2.0.10 (tag:alpha not released)
1610
1611         * Introduced BAMPEParser
1612         Reads PE data directly, requires bedtools for now
1613
1614         * Introduced --call-summits
1615         Uses signal processing methods to call overlapping peaks
1616
1617         * Added --no-trackline
1618         By default, files have descriptive tracklines now
1619
1620         * new refinepeak command (experimental)
1621         This new function will use a similar method in SPP (wtd), to
1622         analyze raw tag distribution in peak region, then redefine the
1623         peak summit where plus and minus tags are evenly distributed
1624         around.
1625
1626         * Changes to output *
1627         cPeakDetect.pyx has full support for new print/write methods and
1628         --call-peaks, BAMPEParser, and use of paired-end data
1629
1630         * Parser optimization
1631
1632         cParser.pyx is rewritten to use io.BufferedReader to speed
1633         up. Speed is doubled.
1634
1635         Code is reorganized -- most of functions are inherited from
1636         GenericParser class.
1637
1638         * Use cross-correlation to calculate fragment size
1639
1640         First, all pairs will be used in prediction for fragment
1641         size. Previously, only no more than 1000 pairs are used. Second,
1642         cross-correlation is used to find the best phase difference
1643         between + and - tag pileups.
1644
1645         * Speed up p-value and q-value calculation
1646
1647         This part is ten times faster now. I am using a dictionary to
1648         cache p-value results from Poisson CDF function. A bit more memory
1649         will be used to increase speed. I hope this dictionary would not
1650         explode since the possible pairs of ChIP signal and control lambda
1651         are hugely redundant. Also, I rewrited part of q-value
1652         calculation.
1653
1654         * Speed up peak detection
1655
1656         This part is about hundred of times faster now.  Optimizations
1657         include using Numpy functions as much as possible, and making loop
1658         body as small as possible.
1659
1660         * Post-processing on differential calls
1661
1662         After macs2diff finds differential binding sites between two
1663         conditions, it will try to annotate the peak calls from one of two
1664         conditions, describe the changes ...
1665
1666         * Fragment size prediction in macs2diff
1667
1668         Now by default, macs2diff will try to use the average fragment
1669         size from both condition 1 and condition 2 for tag extension and
1670         peak calling. Previously, by default, it will use different sizes
1671         unless --nomodel is specified.
1672
1673         Technically, I separate model building processes out. So macs2diff
1674         will build fragment sizes for condition 1 and 2 in parallel (2
1675         processes maximum), then perform 4-way comparisons in parallel (4
1676         processes maximum).
1677
1678         * Diff score
1679
1680         Combine two p/qscore tracks together. At regions where condition 1
1681         is higher than condition 2, score would be positive, otherwise,
1682         negative.
1683
1684         * SAMParser and BAMParser
1685
1686         Bug fixed for paired-end sequencing data.
1687
1688         * BedGraph.pyx
1689
1690         Fixed a bug while calling peaks from BedGraph file. It previously
1691         mistakenly output same peaks multiple times at the end of
1692         chromosome.
1693
1694 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1695         MACS version 2.0.9 (tag:alpha)
1696
1697         * Auto fixation on predicted d is turned off by default!
1698
1699         Previous --off-auto is now default. MACS will not automatically
1700         fix d less than 2 times of tag size according to
1701         --shiftsize. While tag size is getting longer nowadays, it would
1702         be easier to have d less than 2 times of tag size, however d may
1703         still be meaningful and useful. Please judge it using your own
1704         wisdom.
1705
1706         * Scaling issue
1707
1708         Now, the default scaling while treatment and input are unbalanced
1709         has been adjusted. By default, larger sample will be scaled down
1710         linearly to match the smaller sample. In this way, background
1711         noise will be reduced more than real signals, so we expect to have
1712         more specific results than the other way around (i.e. --to-large
1713         is set).
1714
1715         Also, an alternative option to randomly sample larger data
1716         (--down-sample) is provided to replace default linear
1717         scaling. However, this option will cause results irresproducible,
1718         so be careful.
1719
1720         * randsample script
1721
1722         A new script 'randsample'  is added, which can randomly sample
1723         certain percentage or number of tags.
1724
1725         * Peak summit
1726
1727         Now, MACS will decide peak summits according to pileup height
1728         instead of qvalue scores. In this way, the summit may be more
1729         accurate.
1730
1731         * Diff score
1732
1733         MACS calculate qvalue scores as differential scores. When compare
1734         two conditions (saying A and B), the maximum qscore for comparing
1735         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1736         will be computed. If maxqscore_a2b is bigger, the diff score is
1737         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1738
1739 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1740         MACS version 2.0.8 (tag:alpha)
1741
1742         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1743
1744         New script bdgbroadcall and the extra option '--broad' for macs2
1745         script, can be used to call broad regions with a loose cutoff to
1746         link nearby significant regions. The output is represented as
1747         BED12 format.
1748
1749         * MACS2/IO/cScoreTrack.pyx
1750
1751         Fix q-value calculation to generate forcefully monotonic values.
1752
1753         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1754
1755         They are combined to one more powerful script called
1756         "filterdup". The script filterdup can filter duplicated reads
1757         according to sequencing depth and genome size. The script can also
1758         convert any format supported by MACS to BED format.
1759
1760 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1761         MACS version 2.0.7 (tag:alpha)
1762
1763         * bin/macsdiff renamed to bin/bdgdiff
1764
1765         Now this script will work as a low-level finetuning tool as bdgcmp
1766         and bdgpeakcall.
1767
1768         * bin/macs2diff
1769
1770         A new script to take treatment and control files from two
1771         condition, calculate fragment size, use local poisson to get
1772         pvalues and BH process to get qvalues, then combine 4-ways result
1773         to call differential sites.
1774
1775         This script can use upto 4 cpus to speed up 4-ways calculation. (
1776         I am trying multiprocessing in python. )
1777
1778         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1779         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1780         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1781
1782         All above files are modified for the new macs2diff script.
1783
1784         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1785
1786         Now q-value 0.01 is the default cutoff. If -p is specified,
1787         p-value cutoff will be used instead.
1788
1789 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1790         MACS version 2.0.6 (tag:alpha)
1791
1792         * bin/macsdiff
1793
1794         A script to call differential regions. A naive way is introduced
1795         to find the regions where:
1796
1797         1. signal from condition 1 is larger than input 1 and condition 2 --
1798         unique region in condition 1;
1799         2. signal from condition 2 is larger than input 2 and condition 1
1800         -- unique region in condition 2;
1801         3. signal from condition 1 is larger than input 1, signal from
1802         condition 2 is larger than input 2, however either signal from
1803         condition 1 or 2 is not larger than the other.
1804
1805         Here 'larger' means the pvalue or qvalue from a Poisson test is
1806         under certain cutoff.
1807
1808         (I will make another script to wrap up mulitple scripts for
1809         differential calling)
1810
1811 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1812         MACS version 2.0.5 (tag:alpha)
1813
1814         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1815         MACS2/IO/cPeakIO.pyx
1816
1817         Use hash to store peak information. Add back the feature to deal
1818         with data without control.
1819
1820         Fix bug which incorrectly allows small peaks at the end of
1821         chromosomes.
1822
1823         * bin/bdgpeakcall, bin/bdgcmp
1824
1825         Fix bugs. bdgpeakcall can output encodePeak format.
1826
1827 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1828         MACS version 2.0.4 (tag:alpha)
1829
1830         * cPeakDetect.py
1831
1832         Fix a bug, correctly assign lambda_bg while --to-small is
1833         set. Thanks Junya Seo!
1834
1835         Add rank and num of bp columns to pvalue-qvalue table.
1836
1837         * cScoreTrack.py
1838
1839         Fix bugs to correctly deal with peakless chromosomes. Thanks
1840         Vaibhav Jain!
1841
1842         Use AFDR for independent tests instead.
1843
1844         * encodePeak
1845
1846         Now MACS can output peak coordinates together with pvalue, qvalue,
1847         summit positions in a single encodePeak format (designed for
1848         ENCODE project) file. This file can be loaded to UCSC
1849         browser. Definition of some specific columns are: 5th:
1850         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1851         -log10qvalue, 10th: relative summit position to peak start.
1852
1853
1854 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1855         MACS version 2.0.3 (tag:alpha)
1856
1857         * Rich output with qvalue, fold enrichment, and pileup height
1858
1859         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1860         procedure:
1861
1862         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1863
1864         Now we have a similiar xls output file as before. The differences
1865         from previous file are:
1866
1867         1. Summit now is absolute summit, instead of relative summit
1868            position;
1869         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1870            pileup at the peak summit;
1871         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1872            5.00 means 1e-5, simple and less confusing.
1873         4. FDR column becomes '-log10(qvalue)' column.
1874         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1875            the values at the peak summit.
1876
1877         * Extra output files
1878
1879         NAME_pqtable.txt contains pvalue and qvalue relationships.
1880
1881         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1882         and -log10qvalue scores in BedGraph format. Nearby regions with
1883         the same value are not merged.
1884
1885         * Separation of FeatIO.py
1886
1887         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1888         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1889         implemented to store pileup, local lambda, pvalue, and qvalue
1890         alltogether in cScoreTrack.pyx.
1891
1892         * Experimental option --half-ext
1893
1894         Suggested by NPS algorithm, I added an experimental option
1895         --half-ext to let MACS only extends ChIP fragment around its
1896         middle point for only 1/2 d.
1897
1898 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1899         MACS version 2.0.2 (tag:alpha)
1900
1901         * macs2
1902
1903         Add an error check to see if there is no common chromosome names
1904         from treatment file and control file
1905
1906         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1907
1908         Reduce memory usage by removing deepcopy() calls.
1909
1910         * Modify README documents and others.
1911
1912 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1913         MACS Version 2.0.1 (tag:alpha)
1914
1915         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1916
1917         Jie suggested me a brilliant simple method to pileup fragments
1918         into bedGraph track. It works extremely faster than the previous
1919         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1920         large local lambda calculation in MACSv2 now. Now I generate three
1921         bedGraphs for d-size local bias, slocal-size and llocal-size local
1922         bias, and calculate the maximum local bias as local lambda
1923         bedGraph track.
1924
1925         Minor: add_loc in bedGraphTrackI now can correctly merge the
1926         region with its preceding region if their value are the same.
1927
1928         * macs2
1929
1930         Add an option to shift control tags before extension. By default,
1931         control tags will be extended to both sides regardless of strand
1932         information.
1933
1934 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1935         MACS Version 2.0.0 (tag:alpha)
1936
1937         * Use bedGraph type to store data internally and externally.
1938
1939         We can have theoretically one-basepair resolution profiles. 10
1940         times smaller in filesize and even smaller after converting to
1941         bigWig for visualization.
1942
1943         * Peak calling process modified. Better peak boundary detection.
1944
1945         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1946         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1947         one will be averaged to d size) Then calculate the maximum value
1948         of these two tracks and a global background, to have a
1949         local-lambda bedGraph.
1950
1951         Use -10log10poisson_pvalue as scores to generate a score track
1952         before peak calling.
1953
1954         A general peak calling based on a score cutoff, min length of peak
1955         and max gap between nearby peaks.
1956
1957         * Option changes.
1958
1959         Wiggle file output is removed. Now we only support bedGraph
1960         output. The generation of bedGraph is highly recommended since it
1961         will not cost extra time. In other words, bedGraph generation is
1962         internally run even you don't want to save bedGraphs on disk, due
1963         to the peak calling algorithm in MACS v2.
1964
1965         * cProb.pyx
1966
1967         We now can calculate poisson pvalue in log space so that the score
1968         (-10*log10pvalue) will not have a upper limit of 3100 due to
1969         precision of float number.
1970
1971         * Cython is adopted to speed up Python code.
1972
1973 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1974         Small fixes
1975
1976         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1977
1978 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1979         Version 1.4.0rc2 (Valentine)
1980
1981         * --single-wig option is renamed to --single-profile
1982
1983         * BedGraph output with --bdg or -B option.
1984
1985         The BedGraph output provides 1bp resolution fragment pileup
1986         profile. File size is smaller than wig file. This option can be
1987         combined with --single-profile option to produce a bedgraph file
1988         for the whole genome. This option can also make --space,
1989         --call-subpeaks invalid.
1990
1991         * Fix the description of --shiftsize to correctly state that the
1992         value is 1/2 d (fragment size).
1993
1994         * Fix a bug in the call to __filter_w_control_tags when control is
1995         not available.
1996
1997         * Fix a bug on --to-small option. Now it works as expected.
1998
1999         * Fix a bug while counting the tags in candidate peak region, an
2000         extra tag may be included. (Thanks to Jake Biesinger!)
2001
2002         * Fix the bug for the peaks extended outside of chromosome
2003         start. If the minus strand tag goes outside of chromosome start
2004         after extension of d, it will be thrown out.
2005
2006         * Post-process script for a combined wig file:
2007
2008         The "wignorm" command can be called after a full run of MACS14 as
2009         a postprocess. wignorm can calculate the local background from the
2010         control wig file from MACS14, then use either foldchange,
2011         -10*log10(pvalue) from possion test, or difference after asinh
2012         transformation as the score to build a single wig track to
2013         represent the binding strength. This script will take a
2014         significant long time to process.
2015
2016         * --wigextend has been obsoleted.
2017
2018 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
2019         Version 1.4.0rc1 (Starry Sky)
2020
2021         * Duplicate reads option
2022
2023         --keep-dup behavior is changed. Now user can specify how many
2024         reads he/she wants to keep at the same genomic location. 'auto' to
2025         let MACS decide the number based on binomial distribution, 'all'
2026         to let MACS keep all reads.
2027
2028         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
2029
2030         By default, MACS will now scale the smaller dataset to the bigger
2031         dataset. For instance, if IP has 10 million reads, and Input has 5
2032         million, MACS will double the lambda value calculated from Input
2033         reads while calling BOTH the positive peaks and negative
2034         peaks. This will address the issue caused by unbalanced numbers of
2035         reads from IP and Input. If --to-small is turned on, MACS will
2036         scale the larger dataset to the smaller one. So from now on, if d
2037         is fixed, then the peaks from a MACS call for A vs B should be
2038         identical to the negative peaks from a B vs A.
2039
2040 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
2041         Version 1.4.0beta (summer wishes)
2042
2043         * New features
2044
2045         ** Model building
2046
2047         The default behavior in the model building step is slightly
2048         changed. When MACS can't find enough pairs to build model
2049         (implemented in alpha version) or the modeled fragment length is
2050         less than 2 times of tag length (implemented in beta version),
2051         MACS will use 2 times of --shiftsize value as fragment length in
2052         the later analysis. --off-auto can turn off this default behavior.
2053
2054         ** Redundant tag filtering
2055
2056         The IO module is rewritten. The redundant tag filtering process
2057         becomes simpler and works as promise. The maximum allowed number
2058         of tags at the exact same location is calculated from the
2059         sequencing depth and genome size using a binomial distribution,
2060         for both TREAMENT and CONTROL separately. ( previously only
2061         TREATMENT is considered ) The exact same location means the same
2062         coordination and the same strand. Then MACS will only keep at most
2063         this number of tags at the exact same location in the following
2064         analysis. An option --keep-dup can let MACS skip the filtering and
2065         keep all the tags. However this may bring in a lot of sequencing
2066         bias, so you may get many false positive peaks.
2067
2068         ** Single wiggle mode
2069
2070         First thing to mention, this is not the score track that I
2071         described before. By default, MACS generates wiggle files for
2072         fragment pileup for every chromosomes separately. When you use
2073         --single-wig option, MACS will generate a single wiggle file for
2074         all the chromosomes so you will get a wig.gz for TREATMENT and
2075         another wig.gz for CONTROL if available.
2076
2077         ** Sniff -- automatic format detection
2078
2079         Now, by default or "-f AUTO", MACS will decide the input file
2080         format automatically. Technically, it will try to read at most
2081         1000 records for the first 10 non-comment lines. If it succeeds,
2082         the format is decided. I recommend not to use AUTO and specify the
2083         right format for your input files, unless you combine different
2084         formats in a single MACS run.
2085
2086         * Options changes
2087
2088         --single-wig and --keep-dup are added. Check previous section in
2089         ChangeLog for detail.
2090
2091         -f (--format) AUTO is now the default option.
2092
2093         --slocal default: 1000
2094         --llocal default: 10000
2095
2096         * Bug fixed
2097
2098         Setup script will stop the installation if python version is not
2099         python2.6 or python2.7.
2100
2101         Local lambda calculation has been changed back. MACS will check
2102         peak_region, slocal( default 1K) and llocal (default 10K) for the
2103         local bias. The previous 200bps default will cause MACS misses
2104         some peaks where the input bias is very sharp.
2105
2106         sam2bed.py script is corrected.
2107
2108         Relative pos in xls output is fixed.
2109
2110         Parser for ELAND_export is fixed to pass some of the no match
2111         lines. And elandexport2bed.py is fixed too. ( however I can't
2112         guarantee that it works on any eland_export files. )
2113
2114 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2115         Version 1.4.0alpha2 (be smarter)
2116
2117         * Options changes
2118
2119         --gsize now provides shortcuts for common genomes, including
2120         human, mouse, C. elegans and fruitfly.
2121
2122         --llocal now will be 5000 bps if there is no input file, so that
2123         local lambda doesn't overkill enriched binding sites.
2124
2125 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2126         Version 1.4alpha (be smarter)
2127
2128         * Options changes
2129
2130         --tsize option is redesigned. MACS will use the first 10 lines of
2131         the input to decide the tag size. If user specifies --tsize, it
2132         will override the auto decided tsize.
2133
2134         --lambdaset is replaced by --slocal and --llocal which mean the
2135         small local region and large local region.
2136
2137         --bw has no effect on the scan-window size now. It only affects the
2138         paired-peaks model process.
2139
2140         * Model building
2141
2142         During the model building, MACS will pick out the enriched regions
2143         which are not too high and not too low to build the paired-peak
2144         model. Default the region is from fold 10 to fold 30. If MACS
2145         fails to build the model, by default it will use the nomodel
2146         settings, like shiftsize=100bps, to shift and extend each
2147         tags. This behavior can be turned off by '--off-auto'.
2148
2149         * Output files
2150
2151         An extra file including all the summit positions are saved in
2152         *_summits.bed file. An option '--call-subpeaks' will invoke
2153         PeakSplitter developed by Mali Salmon to split wide peaks into
2154         smaller subpeaks.
2155
2156         * Sniff ( will in beta )
2157
2158         Automatically recognize the input file format, so use can combine
2159         different format in one MACS run.
2160
2161         Not implemented features/TODO:
2162
2163         * Algorithms ( in near future? )
2164
2165         MACS will try to refine the peak boundaries by calculating the
2166         scores for every point in the candidate peak regions. The score
2167         will be the -10*log(10,pvalue) on a local poisson distribution. A
2168         cutoff specified by users (--pvalue) will be applied to find the
2169         precise sub-peaks in the original candidate peak region. Peak
2170         boudaries and peak summits positions will be saved in separate BED
2171         files.
2172
2173         * Single wiggle track ( in near future? )
2174
2175         A single wiggle track will be generated to save the scores within
2176         candidate peak regions in the 10bps resolution. The wiggle file
2177         is in fixedStep format.
2178
2179
2180 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2181         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2182
2183         * bin/Constants.py
2184
2185         Fixed typo. FCSTEP -> FESTEP
2186
2187         * lib/PeakDetect.py
2188
2189         The 'femax' attribute bug is fixed
2190
2191 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2192         Version 1.3.7 (Oktoberfest)
2193
2194         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2195
2196         Enhancements by Peter Chines:
2197
2198         1. gzip files are supported.
2199         2. when --diag is on, user can set the increment and endpoint for
2200         fold enrichment analysis by setting --fe-step and --fe-max.
2201
2202         Enhancements by Davide Cittaro:
2203
2204         1. BAM and SAM formats are supported.
2205         2. small changes in the header lines of wiggle output.
2206
2207         Enhancements by Me:
2208         1. I added --fe-min option;
2209         2. Bowtie ascii output with suffix ".map" is supported.
2210
2211         Bug fixed:
2212
2213         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2214         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2215         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2216         4. Some "fold change" have been changed to "fold enrichment".
2217
2218 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2219         Version 1.3.6.1 (default parameter change)
2220
2221         * bin/macs, lib/PeakDetect.py
2222
2223         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2224         default. "--futurefdr" is added which can turn on the 'new' method
2225         introduced in 1.3.6. By default it's off.
2226
2227         * lib/PeakDetect.py
2228
2229         Fixed a bug. p-value is corrected a little bit.
2230
2231
2232 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2233         Version 1.3.6 (Birthday cake)
2234
2235         * bin/macs
2236
2237         "track name" is added to the header of BED output file.
2238
2239         Now the default peak detection method is to consider 5k and 10k
2240         nearby regions in treatment data and peak location, 1k, 5k, and
2241         10k regions in control data to calculate local bias. The old
2242         method can be called through '--old' option.
2243
2244         Information about how many total/unique tags in treatment or
2245         control will be saved in final .xls output.
2246
2247         * lib/IO/__init__.py
2248
2249         ".fa" will be removed from input tag alignment so only the
2250         chromosome names are kept.
2251
2252         WigTrackI class is added for Wiggle like data structure. (not used
2253         now)
2254
2255         The parser for ELAND multi PET files has been fixed. Now the 5'
2256         tag position for a pair will be kept, whereas in the previous
2257         version, the middle points are kept.
2258
2259         * lib/IO/BinKeeper.py
2260
2261         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2262         browser, which can quickly access certain region for values in a
2263         large wiggle like data file. (not used now)
2264
2265         * lib/OptValidator.py
2266
2267         typo fixed.
2268
2269         * lib/PeakDetect.py
2270
2271         Now the default peak detection method is to consider 5k and 10k
2272         nearby regions in treatment data and peak location, 1k, 5k, and
2273         10k regions in control data to calculate local bias. The old
2274         method can be called through '--old' option.
2275
2276         Two columns have beed added to BED output file. 4th column: peak
2277         name; 5th column: peak score using -10log(10,pvalue) as score.
2278
2279         * setup.py
2280
2281         Add support to build a Mac App through 'setup.py py2app', or a
2282         Windows executable through 'setup.py py2exe'. You need to install
2283         py2app or py2exe package in order to use these functions.
2284
2285 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2286         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2287
2288         * PeakDetect.py
2289
2290         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2291         in control data to calculate local lambda for each peak. Peak
2292         calling results will be slightly different with previous version,
2293         beware!
2294
2295         * OptValidator.py
2296
2297         Typo fixed, ELANDParser -> ELANDResultParser
2298
2299         * OutputWriter.py
2300
2301         Now, modeled d value will be shown on the model figure.
2302
2303 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2304         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2305
2306         * macs, IO/__init__.py, PeakDetect.py
2307
2308         Add support for ELAND multi format. Add support for Pair-End
2309         experiment, in this case, 5'end and 3'end ELAND multi format files
2310         are required for treatment or control data. See 00README file for
2311         detail.
2312
2313         Add wigextend option.
2314
2315         Add petdist option for Pair-End Tag experiment, which is the best
2316         distance between 5' and 3' tags.
2317
2318         * PeakDetect.py
2319
2320         Fixed a bug which cause the end positions of every peak region
2321         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2322
2323         * OutputWriter.py
2324
2325         Fix bugs while generating wiggle files. The start position of
2326         wiggle file is set to 1 instead of 0.
2327
2328         Fix a bug that every 10M bps, signals in the first 'd' range are
2329         lower than actual. ( Thanks Mali Salmon!)
2330
2331
2332 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2333         Version 1.3.3 (wiggle bugs fixed)
2334
2335         * OutputWriter.py
2336
2337         Fix bugs while generating wiggle files. 1. 'span=' is added to
2338         'variableStep' line; 2. previously, every 10M bps, the coordinates
2339         were wrongly shifted to the right for 'd' basepairs.
2340
2341         * macs, PeakDetect.py
2342
2343         Add an option to save wiggle files on different resolution.
2344
2345 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2346         Version 1.3.2 (tiny bugs fixed)
2347
2348         * IO/__init__.py
2349
2350         Fix 65536 -> 65535. ( Thank Joon)
2351
2352         * Prob.py
2353
2354         Improved for binomial function with extra large number. Imported
2355         from Cistrome project.
2356
2357         * PeakDetect.py
2358
2359         If treatment channel misses reads in some chromosome included in
2360         control channel, or vice versa, MACS will not exit. (Thank Shaun
2361         Mahony)
2362
2363         Instead, MACS will fake a tag at position -1 when calling
2364         treatment peaks vs control, but will ignore the chromosome while
2365         calling negative peaks.
2366
2367 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2368         Version 1.3.1 (tiny bugs fixed version)
2369
2370         * Prob.py
2371
2372         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2373         binomial functions can tolerate large and small numbers.
2374
2375         * IO/__init__.py
2376
2377         Parsers now split lines in BED/ELAND file using any
2378         whitespaces. 'track' or 'browser' lines will be regarded as
2379         comment lines. A bug fixed when throwing StrandFormatError. The
2380         maximum redundant tag number at a single position can be no less
2381         than 65536.
2382
2383
2384 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2385         Version 1.3 (naming clarification version)
2386
2387         * Naming clarification changes according to our manuscript:
2388
2389         'frag_len' is changed to 'd'.
2390
2391         'fold_change' is changed to 'fold_enrichment'.
2392
2393         Suggest '--bw' parameter to be determined by users from the real
2394         sonication size.
2395
2396         Maximum FDR is 100% in the output file.
2397
2398         And other clarifications in 00README file and the documents on the
2399         website.
2400
2401         * IO/__init__.py
2402         If the redundant tag number at a single position is over 32767,
2403         just remember 32767, instead of raising an overflow exception.
2404
2405         * setup.py
2406         fixed a typo.
2407
2408         * PeakDetect.py
2409         Bug fixed for diagnosis report.
2410
2411
2412 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2413         Version 1.2.2gamma
2414
2415         * Serious bugs fix:
2416
2417         Poisson distribution CDF and inverse CDF functions are
2418         corrected. They can produce right results even for huge lambda
2419         now. So that the p-value and FDR values in the final excel sheet
2420         are corrected.
2421
2422         IO package now can tolerate some rare cases; ELANDParser in IO
2423         package is fixed. (Thank Bogdan)
2424
2425         * Improvement:
2426
2427         Reverse paired peaks in model are rejected. So there will be no
2428         negative 'frag_len'. (Thank Bogdan)
2429
2430         * Features added:
2431
2432         Diagnosis function is completed. Which can output a table file for
2433         users to estimate their sequencing depth.
2434
2435
2436 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2437         Version 1.2
2438
2439         * Probe.py is added!
2440
2441         GSL is totally removed from MACS. Instead, I have implemented the
2442         CDF and inverse CDF for poisson and binomial distribution purely
2443         in python.
2444
2445         * Constants.py is added!
2446
2447         Organize constants used in MACS in the Constants.py file.
2448
2449         * All other files are modified!
2450
2451         Foldchange calculation is modified. Now the foldchange only be
2452         calculated at the peak summit position instead of the whole peak
2453         region. The values will be higher and more robust than before.
2454
2455         Features added:
2456
2457         1. MACS can save wiggle format files containing the tag number at
2458         every 10 bp along the genome. Tags are shifted according to our
2459         model before they are calculated.
2460
2461         2. Model building and local lambda calculation can be skipped with
2462         certain options.
2463
2464         3. A diagnosis report can be generated through '--diag'
2465         option. This report can help you get an assumption about the
2466         sequencing saturation. This funtion is only in beta stage.
2467
2468         4. FDR calculation speed is highly improved.
2469
2470 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2471         Version 1.1
2472
2473         * TabIO, PeakModel.py ...
2474         Bug fixed to let MACS tolerate some cases while there is no tag on
2475         either plus strand or minus strand.
2476
2477         * setup.py
2478         Check the version of python. If the version is lower than 2.4,
2479         refuse to install with warning.
2480