ChangeLog

   1 2021-02-07  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS 3.0.0a6
   3
   4         * New features:
   5
   6         1) Speed/memory optimization.  Use the cykhash to replace python
   7         dictionary. Use buffer (10MB) to read and parse input file (not
   8         available for BAM file parser). And many optimization tweaks.
   9
  10         2) Code cleanup. Reorganize source codes.
  11
  12         3) Unit testing.
  13
  14         4) R wrappers for MACS -- MACSr
  15
  16         5) Switch to Github Action for CI, support multi-arch testing
  17         including x64, armv7, aarch64, s390x and ppc64le.
  18
  19         6) MACS tag-shifting model has been refined. Now it will use a
  20         naive peak calling approach to find ALL possible paired peaks at +
  21         and - strand, then use all of them to calculate the
  22         cross-correlation. (a related bug has been fix #442)
  23
  24         7) Call variants in peak regions directly from BAM files. The
  25         function was originally developed under code name SAPPER. Now
  26         SAPPER has been merged into MACS. Also, `simde` has been added as
  27         a submodule in order to support fermi-lite library under non-x64
  28         architectures.
  29
  30 2020-04-11  Tao Liu  <vladimir.liu@gmail.com>
  31         MACS version 2.2.7.1
  32
  33         * hotfix:
  34
  35         Add 'wheel' and 'pip' to pyproject.toml so that `pip install` can
  36         work.
  37
  38 2020-04-10  Tao Liu  <vladimir.liu@gmail.com>
  39         MACS version 2.2.7
  40
  41         * Bugs fixed
  42
  43         1) MACS2 has been tested on multiple architectures to make sure it
  44         can successfully generate consistent results. Currently the
  45         supported architectures are: AMD64, ARM64, i386, PPC64LE, and
  46         S390X. Thanks to @mr-c, @junaruga, and @tillea! Related to issue
  47         #340, #349, #351, and #359; to PR #348, #350, #360, #361, #367,
  48         and #370. The lesson is that if the project is built on Cython and
  49         is aimed at memory efficiency, we should specifically define all
  50         int/float types in pyx files such as int8_t or uint32_t using
  51         either libc or numpy (c version) instead of relying on Cython
  52         types such as short, long, double.
  53
  54         2) MACS2 setup script will check numpy and install numpy if
  55         necessary. PR #378, issue #364
  56
  57         3) `bdgbroadcall` command will correctly add the score column (5th
  58         column). The score (5th) column contains 10 times of the average
  59         score in the broad region. PR #373, issue #362
  60
  61         4) The missing test on `bdgopt` subcommand has been added. PR #363
  62
  63         5) The obsolete option `--ratio` from `callpeak` subcommand has
  64         been removed. PR #369, issue #366
  65
  66         6) Fixed the incorrect description in README on the 'maximum
  67         length of broad region is 4 times of d' to 'maximum gap for
  68         merging broad regions is 4 times of tag size by default'. PR #380,
  69         issue #365.
  70
  71         * Other
  72
  73         1) CODE OF CONDUCT document has been added to MACS2 github
  74         repository. PR #358
  75
  76 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
  77         MACS version 2.2.6
  78
  79         * New Features
  80
  81         1) Speed up MACS2. Some programming tricks and code cleanup. The
  82         filter_dup function replaces separate_dups. The later one was
  83         implemented for potentially putting back duplicate reads in
  84         certain downstream analysis. However such analysis hasn't been
  85         implemented. Optimize the speed of writing bedGraph
  86         files. Optimize BAM and BAMPE parsing with pointer casting instead
  87         of python unpack.
  88
  89         2) The comment lines in the headers of BED or SAM files will be
  90         correctly skipped. However, MACS2 won't check comment lines in the
  91         middle of the file.
  92
  93         * Bugs fixed
  94
  95         1) Cutoff-analysis in callpeak command. #341
  96
  97         2) Issues related to SAMParser and three ELAND Parsers are
  98         fixed. #347
  99
 100         * Other
 101
 102         1) cmdlinetest script in test/ folder has been updated to: 1. test
 103         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
 104         and after the error or warning message during tests; 3. output
 105         only the first 10 lines if the difference between test result and
 106         standard result can be found; 4. prockreport monitor CPU time and
 107         memory usage in 1 sec interval -- a bit more accurate.
 108
 109         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
 110
 111 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
 112         MACS version 2.2.5 (Py3 speed up)
 113
 114         * Features added
 115
 116         1) *Github code only and Not included in MACS2 release* New
 117         testing data for performance test. An subsampled ENCODE2 CTCF
 118         ChIP-seq dataset, including 5million ChIP reads and 5 million
 119         control reads, has been included in the test folder for testing
 120         CPU and memory usage (i.e. 5M test). Several related scripts ,
 121         including `prockreport` for output cpu memory usage, `pyprofile`
 122         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
 123         been included.
 124
 125         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
 126         The old hashtable.pyx implementation copied from Pandas (very old
 127         version) doesn't work well in Python3+Cython. It slows down the
 128         pqtable checkup using the identical Cython codes as in
 129         v2.1.4. While running 5M test, the `__getitem__` function in the
 130         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
 131         148.6s with the same number of calls in MACS2 v2.2.4. As a
 132         consequence, the standard python dictionary implementation has
 133         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
 134         faster than py2 version, but uses a bit more memory. In general,
 135         v2.2.5 can finish 5M reads test in 20% less time than MACS2
 136         v2.1.4, but use 15% more memory.
 137
 138         * Bug fixed
 139
 140         1) More Python3 related fixes, e.g. the return value of keys from
 141         py3 dict. #333 #337
 142
 143
 144 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
 145         MACS version 2.2.4 (Python3)
 146
 147         * Features added
 148
 149         1) First Python3 version MACS2 released.
 150
 151         2) Version number 2.2.X will be used for MACS2 in Python3, in
 152         parallel to 2.1.X.
 153
 154         3) More comprehensive test.sh script to check the consistency of
 155         results from Python2 version and Python3 version.
 156
 157         4) Simplify setup.py script since the newest version transparently
 158         supports cython. And when cython is not installed by the user,
 159         setup.py can still compile using only C codes.
 160
 161         5) Fix Signal.pyx to use np.array instead of np.mat.
 162
 163 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
 164         MACS version 2.1.4
 165
 166         * Features added
 167
 168         Github Actions is used together with Travis CI for testing and
 169         deployment.
 170
 171         * Bugs fixed
 172
 173         PR #322:
 174
 175         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 176         not initialized as 0 before adding. Potential bugs are fixed in
 177         other functions in ScoreTrack and CallPeakUnit codes.
 178
 179         2) #321 Cython dependency in setup.py script is removed. And place
 180         'cythonzie' call to the correct position.
 181
 182         3) A typo is fixed in Github Actions script.
 183
 184 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 185         MACS version 2.1.3.3
 186
 187         * Features added
 188
 189         1) Support Docker auto-deploy. PR #309
 190
 191         2) Support Travis CI auto-testing, update unit-testing
 192         scripts, and enable subcommand testing on small datasets.
 193
 194         3) Update README documents. #297 PR #306
 195
 196         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 197         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 198
 199         5) `--d-min` option is added in `callpeak` and `predictd`, to
 200         exclude predictions of fragment size smaller than the given
 201         value. Merged from PR #267 @shouldsee.
 202
 203         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 204         `pileup` and `refinepeak` subcommands. Users can use this option
 205         to decrease memory usage while there are a large number of contigs
 206         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 207         `pileup` and `refinepeak` will suggest users to tweak
 208         `--buffer-size` while catching a MemoryError. #313 PR #314
 209
 210         * Bugs fixed
 211
 212         1) #265 Fixed a bug where the pseudocount hasn't been applied
 213         while calculating p-value score in ScoreTrack object.
 214
 215         2) Fixed bdgbroadcall so that it will report those broad peaks
 216         without strong peak inside, a consistent behavior as `callpeak
 217         --broad`.
 218
 219         3) Rename COPYING to LICENSE.
 220
 221 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 222         MACS version 2.1.2
 223
 224         * New features
 225
 226         1) Added missing BEDPE support. And enable the support for BAMPE
 227         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 228         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 229         will pile up the whole fragment defined by mapping locations of
 230         the left end and right end of each read pair. Thank @purcaro
 231
 232         2) Added options to callpeak command for tweaking max-gap and
 233         min-len during peak calling. Thank @jsh58!
 234
 235         3) The callpeak option "--to-large" option is replaced with
 236         "--scale-to large".
 237
 238         4) The randsample option "-t" has been replaced with "-i".
 239
 240         * Bug fixes
 241
 242         1) Fixed memory issue related to #122 and #146
 243
 244         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 245
 246         3) Fixed a bug while setting commandline qvalue cutoff.
 247
 248         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 249
 250         5) Fixed the calculation of average fragment length for paired-end
 251         data. Thank @jsh58
 252
 253         6) Fixed bugs caused by khash while computing p/q-value and log
 254         likelihood ratios. Thank @jsh58
 255
 256         7) More spelling tweaks in source code. Thank @mr-c
 257
 258 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 259         MACS version 2.1.1 20160309
 260
 261         * Retire the tag:rc.
 262
 263         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 264
 265         * Change filtering criteria for reading BAM/SAM files
 266
 267         Related to callpeak and filterdup commands. Now the
 268         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 269         still be read although MACS2 may decide them as duplicates
 270         later. Related to old issue #33. Sorry I forgot to address it for
 271         years!
 272
 273 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 274         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 275
 276         * Bug fixes
 277
 278         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 279         the former option is not supported by older GCC. Related to issues
 280         #91, #109.
 281
 282         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 283         the PeakIO won't throw an error.
 284
 285         * New features
 286
 287         1) callpeak
 288
 289         a) A more flexible format, BEDPE, is supported. Now users can
 290         define the left and right position of the ChIPed fragment, and
 291         MACS2 will skip model building and directly pileup the
 292         fragments. Related to issue #112.
 293
 294         b) The 'tempdir' can be specified, to save cached pileup
 295         tracks. Originially, the temporary files were stored in
 296         /tmp. Thank @daler! Related to issues #97 and #105.
 297
 298         2) bdgopt
 299
 300         New operations are added, to calculate the maximum or minimum value between
 301         values in BEDGRAPH and given value.
 302
 303         3) bdgcmp
 304
 305         New method is added, to calculate the maximum value between values
 306         defined in two BEDGRAPH files.
 307
 308 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 309         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 310
 311         * Bug fixes
 312
 313         1) Fix a bug while dealing with some chromosomes only containing
 314         one read (pair). The size of dup_plus/dup_minus arrays after
 315         filtering dups should +1.
 316
 317         2) Fix a bug related to the broad peak calling function in
 318         previous versions. The gaps were miscalculated, so segmented weak
 319         broad calls may be reported, and sometimes you would see peaks
 320         with lower than cutoff values in the output files.
 321
 322         3) "Potentially" Fixed issue #105 on temporary cache files, need
 323         further followup.
 324
 325
 326 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 327         MACS version 2.1.0 20150731 (tag:rc)
 328
 329         * Bug fixes
 330
 331         1) Fixed issue #76: information about broad/narrow cutoff will be
 332         correctly displayed.
 333
 334         2) Fixed issue #79: bdgopt extparam option is fixed.
 335
 336         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 337         for filterdup command.
 338
 339         4) Fixed issue #78, #88 and similar issue reported in MACS google
 340         group: MACS2 now can correctly deal with multiple alignment files
 341         for -t or -c. The 'finalize' function will be correctly
 342         called. Multiple files option is enabled for filterdup,
 343         randsample, predictd, pileup and refinepeak commands.
 344
 345         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 346         be sorted by leftmost then rightmost ends.
 347
 348         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 349         array. This will cause 'callpeak --nolambda' hang forever while
 350         calculating pvalues and qvalues.
 351
 352 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 353         MACS version 2.1.0 20150420 (tag:rc)
 354
 355         * New commands
 356
 357         1) bdgopt: some convenient functions to modify bedGraph files.
 358
 359         2) cmbreps: Combine scores from two replicates. Including three
 360         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 361         method to combine two p-value scores. After that, user can use
 362         bdgpeakcall to call peaks on combined scores.
 363
 364         * New features
 365
 366         1) callpeak and bdgpeakcall now can try to analyze the
 367         relationship between p-values and number/length of peaks then
 368         generate a summary to help users decide an appropriate cutoff.
 369
 370         2) callpeak now can accept fold-enrichment cutoff as a filter for
 371         final peak calls.
 372
 373         * Performance
 374
 375         Now MACS2 runs about 3X as fast as previous version. Trade
 376         clean python codes for speed... Now while processing 50M ChIP vs
 377         50M control, it will take only 10 minutes.
 378
 379         * Bug fixes
 380
 381         1) Sampling function in BAMPE mode.
 382
 383         2) Callpeak while there are >= 2 input files for -t or -c.
 384
 385         3) While reading BAM/SAM, those secondary or supplementary
 386         alignments will be correctly skipped.
 387
 388         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 389         option that MACS2 will discard those SAM/BAM alignments with bit
 390         1024 no matter how --keep-dup is set.
 391
 392         5) Fixed issue #49: setuptools is used intead of distutils
 393
 394         6) Fixed issue #51: fix the problem when using --trackline
 395         argument when control file is absent.
 396
 397         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 398         read mapped to minus strand. Previous implementation will find
 399         incorrect 5' end if there is indel in alignment.
 400
 401         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 402         mode which will cause incorrect filtering of duplicated reads. Now
 403         fixed.
 404
 405         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 406         be 1.
 407
 408         10) Issue #71: Merged from aertslab@github, close file descriptor
 409         after creating them with mkstemp().
 410
 411 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 412         MACS version 2.1.0 20140616 (tag:rc)
 413
 414         * callpeak module
 415
 416         "--ratio" is added to manually assign the scaling factor of ChIP
 417         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 418         implementing the patch file!
 419
 420         "--shift" is added to move cutting ends (5' end of reads) around,
 421         in order to process DNAse-Seq data, e.g., use "--shift -100
 422         --extsize 200" to get 200bps fragments around 5' ends. For general
 423         ChIP-Seq data analysis, this option should be always set as
 424         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 425         group!
 426
 427         ** Do not output negative fragment size from cross-correlation
 428         analysis. Thank Alvin Qin for the feedback!
 429
 430         ** --half-ext and --control-shift are removed. For complex read
 431         shifting and extending, combine '--shift' and '--extsize'
 432         options. For comparing two conditions, use 'bdgdiff' module
 433         instead.
 434
 435         ** a bug is fixed to output the last pileup value in bdg file
 436         correctly.
 437
 438         * filterdup
 439
 440         A 'dry-run' option is added to only output numbers, including the
 441         number of allowed duplicates, the total number of reads before and
 442         after filtering duplicates and the estimated duplication
 443         rate. Thank John Urban for the suggestion!
 444
 445
 446 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 447         MACS version 2.0.10 20131216 (tag:alpha)
 448
 449         bug fixes and tweaks
 450
 451         * We changed license from Artistic License to 3-clauses BSD license.
 452
 453         Yes. Simpler the better.
 454
 455         * Process paired-end data with "-f BAMPE" without control
 456
 457         * GappedPeak output for --broad option has been fixed again to be
 458         consistent with official UCSC format. We add 1bp pseudo-block to
 459         left and/or right of broad region when necessary, so that you can
 460         virtualize the regions without strong enrichment inside
 461         successfully. In downstream analysis except for virtualization,
 462         you may need to remove all 1bps blocks from gappedPeak file.
 463
 464         * diffpeak subcommand is temporarily disabled. Till we
 465         re-implement it.
 466
 467 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 468         MACS version 2.0.10 20131028 (tag:alpha)
 469
 470         * callpeak --call-summits improvement
 471
 472         The smoothing window length has been fixed as fragment length
 473         instead of short read length. The larger smoothing window will
 474         grant better smoothing results and better sub-peak summits
 475         detection.
 476
 477         * --outdir and --ofile options for almost all commands
 478
 479         Thank Björn Grüning for initially implementing these options!
 480         Now, MACS2 will save results into a specified
 481         directory by '--outdir' option, and/or save result into a
 482         specified file by '--ofile' option. Note, in case '--ofile' is
 483         available for a subcommand, '-o' now has been adjusted to be the
 484         same as '--ofile' instead of '--o-prefix'.
 485
 486         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 487         for each subcommand:
 488
 489         ** callpeak: --outdir
 490         ** diffpeak: Not implemented
 491         ** bdgpeakcall: --outdir and --ofile
 492         ** bdgbroadcall: --outdir and --ofile
 493         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 494         and the order of arguments for --ofile must be the same as for -m.
 495         ** bdgdiff: --outdir and --ofile
 496         ** filterdup: --outdir
 497         ** pileup: --outdir
 498         ** randsample: --outdir
 499         ** refinepeak: --outdir and --ofile
 500
 501
 502 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 503         MACS version 2.0.10 20130915 (tag:alpha)
 504
 505         * callpeak Added a new option --buffer-size
 506
 507         This option is to tweak a previously hidden parameter that
 508         controls the steps to increase array size for storing alignment
 509         information. While in some rare cases, the number of
 510         chromosomes/contigs/scaffolds is huge, the original default
 511         setting will cause a huge memory waste. In these cases, we
 512         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 513         although the decrease will slow process to read alignment files.
 514
 515         * an optimization to speed up pvalue-qvalue statistics
 516
 517         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 518         human TF library, and now it will take 10 minutes. It was due to a
 519         single line of code to get a value from a numpy array ...
 520
 521         * fixed logLR bugs.
 522
 523 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 524         MACS version 2.0.10 20130731 (tag:alpha)
 525
 526         * callpeak --call-summits
 527
 528         Fix bugs causing callpeak --call-summits option generating extra
 529         number of peaks and inconsistent peak boundaries comparing to
 530         default option. Thank Ben Levinson!
 531
 532         * bdgcmp output
 533
 534         Fix bugs causing bdgcmp output logLR all in positive values. Now
 535         'depletion' can be correctly represented as negative values.
 536
 537         * bdgdiff
 538
 539         Fix the behavior of bdgdiff module. Now it can take four
 540         bedGraph files, then use logLR as cutoff to call differential
 541         regions. Check command line of bdgdiff for detail.
 542
 543 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 544         MACS version 2.0.10 20130713 (tag:alpha)
 545
 546         * fix bugs while output broadPeak and gappedPeak.
 547
 548         Note. Those weak broad regions without any strong enrichment
 549         regions inside won't be saved in gappedPeak file.
 550
 551         * bdgcmp -T and -C are merged into -S and description is updated.
 552
 553         Now, you can use it to override SPMR values in your input for
 554         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 555         statistics will cause weird results ( in most cases, lower
 556         significancy), and won't be consistent with MACS2 callpeak
 557         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 558         sample size in MILLION according to 'callpeak --to-large' option.
 559
 560 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 561         MACS version 2.0.10 20130710 (tag:alpha)
 562
 563         * fix BED style output format of callpeak module:
 564
 565         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 566         the output. Old BED format file won't be saved.
 567
 568         2) with --broad: broadPeak (BED6+3) for broad region and
 569         gappedPeak (BED12+3) for chained enriched regions will be the
 570         output. Old BED format, narrowPeak format, summit file won't be
 571         saved.
 572
 573         * bdgcmp now can accept list of methods to calculate scores. So
 574         you can run it once to generate multiple types of scores. Thank
 575         Jon Urban for this suggestion!
 576
 577         * C codes are re-generated through Cython 0.19.1.
 578
 579 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 580         MACS version 2.0.10 20130520 (tag:alpha)
 581
 582         * broad peak calling modules are modified in order to report all
 583         relexed regions even there is no strong enrichment inside.
 584
 585 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 586         MACS version 2.0.10 20130501 (tag:alpha)
 587
 588         * Memory usage is decreased to about 1/4-1/5 of previous usage
 589         Now, the internal data structure and algorithm are both
 590         re-organized, so that intermediate data wouldn't be saved in
 591         memory. Intead they will be calculated on the fly. New MACS2 will
 592         spend longer time (1.5 to 2 times) however it will use less memory
 593         so can be more usable on small mem servers.
 594
 595         * --seed option is added to callpeak and randsample commands
 596         Thank Mathieu Gineste for this suggestion!
 597
 598 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 599         MACS version 2.0.10 20130306 (tag:alpha)
 600
 601         * diffpeak module New module to detect differential binding sites
 602         with more statistics.
 603
 604         * Introduced --refine-peaks
 605         Calculates reads balancing to refine peak summits
 606
 607         * Ouput file names prefix
 608         Correct encodePeak to narrowPeak, broadPeak to bed12.
 609
 610 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 611         MACS version 2.0.10 (tag:alpha not released)
 612
 613         * Introduced BAMPEParser
 614         Reads PE data directly, requires bedtools for now
 615
 616         * Introduced --call-summits
 617         Uses signal processing methods to call overlapping peaks
 618
 619         * Added --no-trackline
 620         By default, files have descriptive tracklines now
 621
 622         * new refinepeak command (experimental)
 623         This new function will use a similar method in SPP (wtd), to
 624         analyze raw tag distribution in peak region, then redefine the
 625         peak summit where plus and minus tags are evenly distributed
 626         around.
 627
 628         * Changes to output *
 629         cPeakDetect.pyx has full support for new print/write methods and
 630         --call-peaks, BAMPEParser, and use of paired-end data
 631
 632         * Parser optimization
 633
 634         cParser.pyx is rewritten to use io.BufferedReader to speed
 635         up. Speed is doubled.
 636
 637         Code is reorganized -- most of functions are inherited from
 638         GenericParser class.
 639
 640         * Use cross-correlation to calculate fragment size
 641
 642         First, all pairs will be used in prediction for fragment
 643         size. Previously, only no more than 1000 pairs are used. Second,
 644         cross-correlation is used to find the best phase difference
 645         between + and - tag pileups.
 646
 647         * Speed up p-value and q-value calculation
 648
 649         This part is ten times faster now. I am using a dictionary to
 650         cache p-value results from Poisson CDF function. A bit more memory
 651         will be used to increase speed. I hope this dictionary would not
 652         explode since the possible pairs of ChIP signal and control lambda
 653         are hugely redundant. Also, I rewrited part of q-value
 654         calculation.
 655
 656         * Speed up peak detection
 657
 658         This part is about hundred of times faster now.  Optimizations
 659         include using Numpy functions as much as possible, and making loop
 660         body as small as possible.
 661
 662         * Post-processing on differential calls
 663
 664         After macs2diff finds differential binding sites between two
 665         conditions, it will try to annotate the peak calls from one of two
 666         conditions, describe the changes ...
 667
 668         * Fragment size prediction in macs2diff
 669
 670         Now by default, macs2diff will try to use the average fragment
 671         size from both condition 1 and condition 2 for tag extension and
 672         peak calling. Previously, by default, it will use different sizes
 673         unless --nomodel is specified.
 674
 675         Technically, I separate model building processes out. So macs2diff
 676         will build fragment sizes for condition 1 and 2 in parallel (2
 677         processes maximum), then perform 4-way comparisons in parallel (4
 678         processes maximum).
 679
 680         * Diff score
 681
 682         Combine two p/qscore tracks together. At regions where condition 1
 683         is higher than condition 2, score would be positive, otherwise,
 684         negative.
 685
 686         * SAMParser and BAMParser
 687
 688         Bug fixed for paired-end sequencing data.
 689
 690         * BedGraph.pyx
 691
 692         Fixed a bug while calling peaks from BedGraph file. It previously
 693         mistakenly output same peaks multiple times at the end of
 694         chromosome.
 695
 696 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 697         MACS version 2.0.9 (tag:alpha)
 698
 699         * Auto fixation on predicted d is turned off by default!
 700
 701         Previous --off-auto is now default. MACS will not automatically
 702         fix d less than 2 times of tag size according to
 703         --shiftsize. While tag size is getting longer nowadays, it would
 704         be easier to have d less than 2 times of tag size, however d may
 705         still be meaningful and useful. Please judge it using your own
 706         wisdom.
 707
 708         * Scaling issue
 709
 710         Now, the default scaling while treatment and input are unbalanced
 711         has been adjusted. By default, larger sample will be scaled down
 712         linearly to match the smaller sample. In this way, background
 713         noise will be reduced more than real signals, so we expect to have
 714         more specific results than the other way around (i.e. --to-large
 715         is set).
 716
 717         Also, an alternative option to randomly sample larger data
 718         (--down-sample) is provided to replace default linear
 719         scaling. However, this option will cause results irresproducible,
 720         so be careful.
 721
 722         * randsample script
 723
 724         A new script 'randsample'  is added, which can randomly sample
 725         certain percentage or number of tags.
 726
 727         * Peak summit
 728
 729         Now, MACS will decide peak summits according to pileup height
 730         instead of qvalue scores. In this way, the summit may be more
 731         accurate.
 732
 733         * Diff score
 734
 735         MACS calculate qvalue scores as differential scores. When compare
 736         two conditions (saying A and B), the maximum qscore for comparing
 737         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 738         will be computed. If maxqscore_a2b is bigger, the diff score is
 739         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 740
 741 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 742         MACS version 2.0.8 (tag:alpha)
 743
 744         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 745
 746         New script bdgbroadcall and the extra option '--broad' for macs2
 747         script, can be used to call broad regions with a loose cutoff to
 748         link nearby significant regions. The output is represented as
 749         BED12 format.
 750
 751         * MACS2/IO/cScoreTrack.pyx
 752
 753         Fix q-value calculation to generate forcefully monotonic values.
 754
 755         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 756
 757         They are combined to one more powerful script called
 758         "filterdup". The script filterdup can filter duplicated reads
 759         according to sequencing depth and genome size. The script can also
 760         convert any format supported by MACS to BED format.
 761
 762 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 763         MACS version 2.0.7 (tag:alpha)
 764
 765         * bin/macsdiff renamed to bin/bdgdiff
 766
 767         Now this script will work as a low-level finetuning tool as bdgcmp
 768         and bdgpeakcall.
 769
 770         * bin/macs2diff
 771
 772         A new script to take treatment and control files from two
 773         condition, calculate fragment size, use local poisson to get
 774         pvalues and BH process to get qvalues, then combine 4-ways result
 775         to call differential sites.
 776
 777         This script can use upto 4 cpus to speed up 4-ways calculation. (
 778         I am trying multiprocessing in python. )
 779
 780         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 781         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 782         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 783
 784         All above files are modified for the new macs2diff script.
 785
 786         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 787
 788         Now q-value 0.01 is the default cutoff. If -p is specified,
 789         p-value cutoff will be used instead.
 790
 791 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 792         MACS version 2.0.6 (tag:alpha)
 793
 794         * bin/macsdiff
 795
 796         A script to call differential regions. A naive way is introduced
 797         to find the regions where:
 798
 799         1. signal from condition 1 is larger than input 1 and condition 2 --
 800         unique region in condition 1;
 801         2. signal from condition 2 is larger than input 2 and condition 1
 802         -- unique region in condition 2;
 803         3. signal from condition 1 is larger than input 1, signal from
 804         condition 2 is larger than input 2, however either signal from
 805         condition 1 or 2 is not larger than the other.
 806
 807         Here 'larger' means the pvalue or qvalue from a Poisson test is
 808         under certain cutoff.
 809
 810         (I will make another script to wrap up mulitple scripts for
 811         differential calling)
 812
 813 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 814         MACS version 2.0.5 (tag:alpha)
 815
 816         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 817         MACS2/IO/cPeakIO.pyx
 818
 819         Use hash to store peak information. Add back the feature to deal
 820         with data without control.
 821
 822         Fix bug which incorrectly allows small peaks at the end of
 823         chromosomes.
 824
 825         * bin/bdgpeakcall, bin/bdgcmp
 826
 827         Fix bugs. bdgpeakcall can output encodePeak format.
 828
 829 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
 830         MACS version 2.0.4 (tag:alpha)
 831
 832         * cPeakDetect.py
 833
 834         Fix a bug, correctly assign lambda_bg while --to-small is
 835         set. Thanks Junya Seo!
 836
 837         Add rank and num of bp columns to pvalue-qvalue table.
 838
 839         * cScoreTrack.py
 840
 841         Fix bugs to correctly deal with peakless chromosomes. Thanks
 842         Vaibhav Jain!
 843
 844         Use AFDR for independent tests instead.
 845
 846         * encodePeak
 847
 848         Now MACS can output peak coordinates together with pvalue, qvalue,
 849         summit positions in a single encodePeak format (designed for
 850         ENCODE project) file. This file can be loaded to UCSC
 851         browser. Definition of some specific columns are: 5th:
 852         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
 853         -log10qvalue, 10th: relative summit position to peak start.
 854
 855
 856 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 857         MACS version 2.0.3 (tag:alpha)
 858
 859         * Rich output with qvalue, fold enrichment, and pileup height
 860
 861         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
 862         procedure:
 863
 864         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
 865
 866         Now we have a similiar xls output file as before. The differences
 867         from previous file are:
 868
 869         1. Summit now is absolute summit, instead of relative summit
 870            position;
 871         2. 'Pileup' is previous 'tag' column. It's the extended fragment
 872            pileup at the peak summit;
 873         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
 874            5.00 means 1e-5, simple and less confusing.
 875         4. FDR column becomes '-log10(qvalue)' column.
 876         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
 877            the values at the peak summit.
 878
 879         * Extra output files
 880
 881         NAME_pqtable.txt contains pvalue and qvalue relationships.
 882
 883         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
 884         and -log10qvalue scores in BedGraph format. Nearby regions with
 885         the same value are not merged.
 886
 887         * Separation of FeatIO.py
 888
 889         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
 890         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
 891         implemented to store pileup, local lambda, pvalue, and qvalue
 892         alltogether in cScoreTrack.pyx.
 893
 894         * Experimental option --half-ext
 895
 896         Suggested by NPS algorithm, I added an experimental option
 897         --half-ext to let MACS only extends ChIP fragment around its
 898         middle point for only 1/2 d.
 899
 900 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
 901         MACS version 2.0.2 (tag:alpha)
 902
 903         * macs2
 904
 905         Add an error check to see if there is no common chromosome names
 906         from treatment file and control file
 907
 908         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
 909
 910         Reduce memory usage by removing deepcopy() calls.
 911
 912         * Modify README documents and others.
 913
 914 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 915         MACS Version 2.0.1 (tag:alpha)
 916
 917         * cPileup.pyx, cPeakDetect.pyx and peak calling process
 918
 919         Jie suggested me a brilliant simple method to pileup fragments
 920         into bedGraph track. It works extremely faster than the previous
 921         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
 922         large local lambda calculation in MACSv2 now. Now I generate three
 923         bedGraphs for d-size local bias, slocal-size and llocal-size local
 924         bias, and calculate the maximum local bias as local lambda
 925         bedGraph track.
 926
 927         Minor: add_loc in bedGraphTrackI now can correctly merge the
 928         region with its preceding region if their value are the same.
 929
 930         * macs2
 931
 932         Add an option to shift control tags before extension. By default,
 933         control tags will be extended to both sides regardless of strand
 934         information.
 935
 936 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
 937         MACS Version 2.0.0 (tag:alpha)
 938
 939         * Use bedGraph type to store data internally and externally.
 940
 941         We can have theoretically one-basepair resolution profiles. 10
 942         times smaller in filesize and even smaller after converting to
 943         bigWig for visualization.
 944
 945         * Peak calling process modified. Better peak boundary detection.
 946
 947         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
 948         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
 949         one will be averaged to d size) Then calculate the maximum value
 950         of these two tracks and a global background, to have a
 951         local-lambda bedGraph.
 952
 953         Use -10log10poisson_pvalue as scores to generate a score track
 954         before peak calling.
 955
 956         A general peak calling based on a score cutoff, min length of peak
 957         and max gap between nearby peaks.
 958
 959         * Option changes.
 960
 961         Wiggle file output is removed. Now we only support bedGraph
 962         output. The generation of bedGraph is highly recommended since it
 963         will not cost extra time. In other words, bedGraph generation is
 964         internally run even you don't want to save bedGraphs on disk, due
 965         to the peak calling algorithm in MACS v2.
 966
 967         * cProb.pyx
 968
 969         We now can calculate poisson pvalue in log space so that the score
 970         (-10*log10pvalue) will not have a upper limit of 3100 due to
 971         precision of float number.
 972
 973         * Cython is adopted to speed up Python code.
 974
 975 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
 976         Small fixes
 977
 978         * Replaced with a newest WigTrackI class and fixed the wignorm script.
 979
 980 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 981         Version 1.4.0rc2 (Valentine)
 982
 983         * --single-wig option is renamed to --single-profile
 984
 985         * BedGraph output with --bdg or -B option.
 986
 987         The BedGraph output provides 1bp resolution fragment pileup
 988         profile. File size is smaller than wig file. This option can be
 989         combined with --single-profile option to produce a bedgraph file
 990         for the whole genome. This option can also make --space,
 991         --call-subpeaks invalid.
 992
 993         * Fix the description of --shiftsize to correctly state that the
 994         value is 1/2 d (fragment size).
 995
 996         * Fix a bug in the call to __filter_w_control_tags when control is
 997         not available.
 998
 999         * Fix a bug on --to-small option. Now it works as expected.
1000
1001         * Fix a bug while counting the tags in candidate peak region, an
1002         extra tag may be included. (Thanks to Jake Biesinger!)
1003
1004         * Fix the bug for the peaks extended outside of chromosome
1005         start. If the minus strand tag goes outside of chromosome start
1006         after extension of d, it will be thrown out.
1007
1008         * Post-process script for a combined wig file:
1009
1010         The "wignorm" command can be called after a full run of MACS14 as
1011         a postprocess. wignorm can calculate the local background from the
1012         control wig file from MACS14, then use either foldchange,
1013         -10*log10(pvalue) from possion test, or difference after asinh
1014         transformation as the score to build a single wig track to
1015         represent the binding strength. This script will take a
1016         significant long time to process.
1017
1018         * --wigextend has been obsoleted.
1019
1020 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1021         Version 1.4.0rc1 (Starry Sky)
1022
1023         * Duplicate reads option
1024
1025         --keep-dup behavior is changed. Now user can specify how many
1026         reads he/she wants to keep at the same genomic location. 'auto' to
1027         let MACS decide the number based on binomial distribution, 'all'
1028         to let MACS keep all reads.
1029
1030         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1031
1032         By default, MACS will now scale the smaller dataset to the bigger
1033         dataset. For instance, if IP has 10 million reads, and Input has 5
1034         million, MACS will double the lambda value calculated from Input
1035         reads while calling BOTH the positive peaks and negative
1036         peaks. This will address the issue caused by unbalanced numbers of
1037         reads from IP and Input. If --to-small is turned on, MACS will
1038         scale the larger dataset to the smaller one. So from now on, if d
1039         is fixed, then the peaks from a MACS call for A vs B should be
1040         identical to the negative peaks from a B vs A.
1041
1042 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1043         Version 1.4.0beta (summer wishes)
1044
1045         * New features
1046
1047         ** Model building
1048
1049         The default behavior in the model building step is slightly
1050         changed. When MACS can't find enough pairs to build model
1051         (implemented in alpha version) or the modeled fragment length is
1052         less than 2 times of tag length (implemented in beta version),
1053         MACS will use 2 times of --shiftsize value as fragment length in
1054         the later analysis. --off-auto can turn off this default behavior.
1055
1056         ** Redundant tag filtering
1057
1058         The IO module is rewritten. The redundant tag filtering process
1059         becomes simpler and works as promise. The maximum allowed number
1060         of tags at the exact same location is calculated from the
1061         sequencing depth and genome size using a binomial distribution,
1062         for both TREAMENT and CONTROL separately. ( previously only
1063         TREATMENT is considered ) The exact same location means the same
1064         coordination and the same strand. Then MACS will only keep at most
1065         this number of tags at the exact same location in the following
1066         analysis. An option --keep-dup can let MACS skip the filtering and
1067         keep all the tags. However this may bring in a lot of sequencing
1068         bias, so you may get many false positive peaks.
1069
1070         ** Single wiggle mode
1071
1072         First thing to mention, this is not the score track that I
1073         described before. By default, MACS generates wiggle files for
1074         fragment pileup for every chromosomes separately. When you use
1075         --single-wig option, MACS will generate a single wiggle file for
1076         all the chromosomes so you will get a wig.gz for TREATMENT and
1077         another wig.gz for CONTROL if available.
1078
1079         ** Sniff -- automatic format detection
1080
1081         Now, by default or "-f AUTO", MACS will decide the input file
1082         format automatically. Technically, it will try to read at most
1083         1000 records for the first 10 non-comment lines. If it succeeds,
1084         the format is decided. I recommend not to use AUTO and specify the
1085         right format for your input files, unless you combine different
1086         formats in a single MACS run.
1087
1088         * Options changes
1089
1090         --single-wig and --keep-dup are added. Check previous section in
1091         ChangeLog for detail.
1092
1093         -f (--format) AUTO is now the default option.
1094
1095         --slocal default: 1000
1096         --llocal default: 10000
1097
1098         * Bug fixed
1099
1100         Setup script will stop the installation if python version is not
1101         python2.6 or python2.7.
1102
1103         Local lambda calculation has been changed back. MACS will check
1104         peak_region, slocal( default 1K) and llocal (default 10K) for the
1105         local bias. The previous 200bps default will cause MACS misses
1106         some peaks where the input bias is very sharp.
1107
1108         sam2bed.py script is corrected.
1109
1110         Relative pos in xls output is fixed.
1111
1112         Parser for ELAND_export is fixed to pass some of the no match
1113         lines. And elandexport2bed.py is fixed too. ( however I can't
1114         guarantee that it works on any eland_export files. )
1115
1116 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1117         Version 1.4.0alpha2 (be smarter)
1118
1119         * Options changes
1120
1121         --gsize now provides shortcuts for common genomes, including
1122         human, mouse, C. elegans and fruitfly.
1123
1124         --llocal now will be 5000 bps if there is no input file, so that
1125         local lambda doesn't overkill enriched binding sites.
1126
1127 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1128         Version 1.4alpha (be smarter)
1129
1130         * Options changes
1131
1132         --tsize option is redesigned. MACS will use the first 10 lines of
1133         the input to decide the tag size. If user specifies --tsize, it
1134         will override the auto decided tsize.
1135
1136         --lambdaset is replaced by --slocal and --llocal which mean the
1137         small local region and large local region.
1138
1139         --bw has no effect on the scan-window size now. It only affects the
1140         paired-peaks model process.
1141
1142         * Model building
1143
1144         During the model building, MACS will pick out the enriched regions
1145         which are not too high and not too low to build the paired-peak
1146         model. Default the region is from fold 10 to fold 30. If MACS
1147         fails to build the model, by default it will use the nomodel
1148         settings, like shiftsize=100bps, to shift and extend each
1149         tags. This behavior can be turned off by '--off-auto'.
1150
1151         * Output files
1152
1153         An extra file including all the summit positions are saved in
1154         *_summits.bed file. An option '--call-subpeaks' will invoke
1155         PeakSplitter developed by Mali Salmon to split wide peaks into
1156         smaller subpeaks.
1157
1158         * Sniff ( will in beta )
1159
1160         Automatically recognize the input file format, so use can combine
1161         different format in one MACS run.
1162
1163         Not implemented features/TODO:
1164
1165         * Algorithms ( in near future? )
1166
1167         MACS will try to refine the peak boundaries by calculating the
1168         scores for every point in the candidate peak regions. The score
1169         will be the -10*log(10,pvalue) on a local poisson distribution. A
1170         cutoff specified by users (--pvalue) will be applied to find the
1171         precise sub-peaks in the original candidate peak region. Peak
1172         boudaries and peak summits positions will be saved in separate BED
1173         files.
1174
1175         * Single wiggle track ( in near future? )
1176
1177         A single wiggle track will be generated to save the scores within
1178         candidate peak regions in the 10bps resolution. The wiggle file
1179         is in fixedStep format.
1180
1181
1182 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1183         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1184
1185         * bin/Constants.py
1186
1187         Fixed typo. FCSTEP -> FESTEP
1188
1189         * lib/PeakDetect.py
1190
1191         The 'femax' attribute bug is fixed
1192
1193 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1194         Version 1.3.7 (Oktoberfest)
1195
1196         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1197
1198         Enhancements by Peter Chines:
1199
1200         1. gzip files are supported.
1201         2. when --diag is on, user can set the increment and endpoint for
1202         fold enrichment analysis by setting --fe-step and --fe-max.
1203
1204         Enhancements by Davide Cittaro:
1205
1206         1. BAM and SAM formats are supported.
1207         2. small changes in the header lines of wiggle output.
1208
1209         Enhancements by Me:
1210         1. I added --fe-min option;
1211         2. Bowtie ascii output with suffix ".map" is supported.
1212
1213         Bug fixed:
1214
1215         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1216         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1217         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1218         4. Some "fold change" have been changed to "fold enrichment".
1219
1220 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1221         Version 1.3.6.1 (default parameter change)
1222
1223         * bin/macs, lib/PeakDetect.py
1224
1225         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1226         default. "--futurefdr" is added which can turn on the 'new' method
1227         introduced in 1.3.6. By default it's off.
1228
1229         * lib/PeakDetect.py
1230
1231         Fixed a bug. p-value is corrected a little bit.
1232
1233
1234 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1235         Version 1.3.6 (Birthday cake)
1236
1237         * bin/macs
1238
1239         "track name" is added to the header of BED output file.
1240
1241         Now the default peak detection method is to consider 5k and 10k
1242         nearby regions in treatment data and peak location, 1k, 5k, and
1243         10k regions in control data to calculate local bias. The old
1244         method can be called through '--old' option.
1245
1246         Information about how many total/unique tags in treatment or
1247         control will be saved in final .xls output.
1248
1249         * lib/IO/__init__.py
1250
1251         ".fa" will be removed from input tag alignment so only the
1252         chromosome names are kept.
1253
1254         WigTrackI class is added for Wiggle like data structure. (not used
1255         now)
1256
1257         The parser for ELAND multi PET files has been fixed. Now the 5'
1258         tag position for a pair will be kept, whereas in the previous
1259         version, the middle points are kept.
1260
1261         * lib/IO/BinKeeper.py
1262
1263         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1264         browser, which can quickly access certain region for values in a
1265         large wiggle like data file. (not used now)
1266
1267         * lib/OptValidator.py
1268
1269         typo fixed.
1270
1271         * lib/PeakDetect.py
1272
1273         Now the default peak detection method is to consider 5k and 10k
1274         nearby regions in treatment data and peak location, 1k, 5k, and
1275         10k regions in control data to calculate local bias. The old
1276         method can be called through '--old' option.
1277
1278         Two columns have beed added to BED output file. 4th column: peak
1279         name; 5th column: peak score using -10log(10,pvalue) as score.
1280
1281         * setup.py
1282
1283         Add support to build a Mac App through 'setup.py py2app', or a
1284         Windows executable through 'setup.py py2exe'. You need to install
1285         py2app or py2exe package in order to use these functions.
1286
1287 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1288         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1289
1290         * PeakDetect.py
1291
1292         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1293         in control data to calculate local lambda for each peak. Peak
1294         calling results will be slightly different with previous version,
1295         beware!
1296
1297         * OptValidator.py
1298
1299         Typo fixed, ELANDParser -> ELANDResultParser
1300
1301         * OutputWriter.py
1302
1303         Now, modeled d value will be shown on the model figure.
1304
1305 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1306         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1307
1308         * macs, IO/__init__.py, PeakDetect.py
1309
1310         Add support for ELAND multi format. Add support for Pair-End
1311         experiment, in this case, 5'end and 3'end ELAND multi format files
1312         are required for treatment or control data. See 00README file for
1313         detail.
1314
1315         Add wigextend option.
1316
1317         Add petdist option for Pair-End Tag experiment, which is the best
1318         distance between 5' and 3' tags.
1319
1320         * PeakDetect.py
1321
1322         Fixed a bug which cause the end positions of every peak region
1323         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1324
1325         * OutputWriter.py
1326
1327         Fix bugs while generating wiggle files. The start position of
1328         wiggle file is set to 1 instead of 0.
1329
1330         Fix a bug that every 10M bps, signals in the first 'd' range are
1331         lower than actual. ( Thanks Mali Salmon!)
1332
1333
1334 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1335         Version 1.3.3 (wiggle bugs fixed)
1336
1337         * OutputWriter.py
1338
1339         Fix bugs while generating wiggle files. 1. 'span=' is added to
1340         'variableStep' line; 2. previously, every 10M bps, the coordinates
1341         were wrongly shifted to the right for 'd' basepairs.
1342
1343         * macs, PeakDetect.py
1344
1345         Add an option to save wiggle files on different resolution.
1346
1347 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1348         Version 1.3.2 (tiny bugs fixed)
1349
1350         * IO/__init__.py
1351
1352         Fix 65536 -> 65535. ( Thank Joon)
1353
1354         * Prob.py
1355
1356         Improved for binomial function with extra large number. Imported
1357         from Cistrome project.
1358
1359         * PeakDetect.py
1360
1361         If treatment channel misses reads in some chromosome included in
1362         control channel, or vice versa, MACS will not exit. (Thank Shaun
1363         Mahony)
1364
1365         Instead, MACS will fake a tag at position -1 when calling
1366         treatment peaks vs control, but will ignore the chromosome while
1367         calling negative peaks.
1368
1369 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1370         Version 1.3.1 (tiny bugs fixed version)
1371
1372         * Prob.py
1373
1374         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1375         binomial functions can tolerate large and small numbers.
1376
1377         * IO/__init__.py
1378
1379         Parsers now split lines in BED/ELAND file using any
1380         whitespaces. 'track' or 'browser' lines will be regarded as
1381         comment lines. A bug fixed when throwing StrandFormatError. The
1382         maximum redundant tag number at a single position can be no less
1383         than 65536.
1384
1385
1386 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1387         Version 1.3 (naming clarification version)
1388
1389         * Naming clarification changes according to our manuscript:
1390
1391         'frag_len' is changed to 'd'.
1392
1393         'fold_change' is changed to 'fold_enrichment'.
1394
1395         Suggest '--bw' parameter to be determined by users from the real
1396         sonication size.
1397
1398         Maximum FDR is 100% in the output file.
1399
1400         And other clarifications in 00README file and the documents on the
1401         website.
1402
1403         * IO/__init__.py
1404         If the redundant tag number at a single position is over 32767,
1405         just remember 32767, instead of raising an overflow exception.
1406
1407         * setup.py
1408         fixed a typo.
1409
1410         * PeakDetect.py
1411         Bug fixed for diagnosis report.
1412
1413
1414 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1415         Version 1.2.2gamma
1416
1417         * Serious bugs fix:
1418
1419         Poisson distribution CDF and inverse CDF functions are
1420         corrected. They can produce right results even for huge lambda
1421         now. So that the p-value and FDR values in the final excel sheet
1422         are corrected.
1423
1424         IO package now can tolerate some rare cases; ELANDParser in IO
1425         package is fixed. (Thank Bogdan)
1426
1427         * Improvement:
1428
1429         Reverse paired peaks in model are rejected. So there will be no
1430         negative 'frag_len'. (Thank Bogdan)
1431
1432         * Features added:
1433
1434         Diagnosis function is completed. Which can output a table file for
1435         users to estimate their sequencing depth.
1436
1437
1438 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1439         Version 1.2
1440
1441         * Probe.py is added!
1442
1443         GSL is totally removed from MACS. Instead, I have implemented the
1444         CDF and inverse CDF for poisson and binomial distribution purely
1445         in python.
1446
1447         * Constants.py is added!
1448
1449         Organize constants used in MACS in the Constants.py file.
1450
1451         * All other files are modified!
1452
1453         Foldchange calculation is modified. Now the foldchange only be
1454         calculated at the peak summit position instead of the whole peak
1455         region. The values will be higher and more robust than before.
1456
1457         Features added:
1458
1459         1. MACS can save wiggle format files containing the tag number at
1460         every 10 bp along the genome. Tags are shifted according to our
1461         model before they are calculated.
1462
1463         2. Model building and local lambda calculation can be skipped with
1464         certain options.
1465
1466         3. A diagnosis report can be generated through '--diag'
1467         option. This report can help you get an assumption about the
1468         sequencing saturation. This funtion is only in beta stage.
1469
1470         4. FDR calculation speed is highly improved.
1471
1472 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1473         Version 1.1
1474
1475         * TabIO, PeakModel.py ...
1476         Bug fixed to let MACS tolerate some cases while there is no tag on
1477         either plus strand or minus strand.
1478
1479         * setup.py
1480         Check the version of python. If the version is lower than 2.4,
1481         refuse to install with warning.
1482
1483
1484 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1485         MACS version 2.0.10 20130731 (tag:alpha)
1486
1487         * callpeak --call-summits
1488
1489         Fix bugs causing callpeak --call-summits option generating extra
1490         number of peaks and inconsistent peak boundaries comparing to
1491         default option. Thank Ben Levinson!
1492
1493         * bdgcmp output
1494
1495         Fix bugs causing bdgcmp output logLR all in positive values. Now
1496         'depletion' can be correctly represented as negative values.
1497
1498         * bdgdiff
1499
1500         Fix the behavior of bdgdiff module. Now it can take four
1501         bedGraph files, then use logLR as cutoff to call differential
1502         regions. Check command line of bdgdiff for detail.
1503
1504 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1505         MACS version 2.0.10 20130713 (tag:alpha)
1506
1507         * fix bugs while output broadPeak and gappedPeak.
1508
1509         Note. Those weak broad regions without any strong enrichment
1510         regions inside won't be saved in gappedPeak file.
1511
1512         * bdgcmp -T and -C are merged into -S and description is updated.
1513
1514         Now, you can use it to override SPMR values in your input for
1515         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1516         statistics will cause weird results ( in most cases, lower
1517         significancy), and won't be consistent with MACS2 callpeak
1518         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1519         sample size in MILLION according to 'callpeak --to-large' option.
1520
1521 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1522         MACS version 2.0.10 20130710 (tag:alpha)
1523
1524         * fix BED style output format of callpeak module:
1525
1526         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1527         the output. Old BED format file won't be saved.
1528
1529         2) with --broad: broadPeak (BED6+3) for broad region and
1530         gappedPeak (BED12+3) for chained enriched regions will be the
1531         output. Old BED format, narrowPeak format, summit file won't be
1532         saved.
1533
1534         * bdgcmp now can accept list of methods to calculate scores. So
1535         you can run it once to generate multiple types of scores. Thank
1536         Jon Urban for this suggestion!
1537
1538         * C codes are re-generated through Cython 0.19.1.
1539
1540 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1541         MACS version 2.0.10 20130520 (tag:alpha)
1542
1543         * broad peak calling modules are modified in order to report all
1544         relexed regions even there is no strong enrichment inside.
1545
1546 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1547         MACS version 2.0.10 20130501 (tag:alpha)
1548
1549         * Memory usage is decreased to about 1/4-1/5 of previous usage
1550         Now, the internal data structure and algorithm are both
1551         re-organized, so that intermediate data wouldn't be saved in
1552         memory. Intead they will be calculated on the fly. New MACS2 will
1553         spend longer time (1.5 to 2 times) however it will use less memory
1554         so can be more usable on small mem servers.
1555
1556         * --seed option is added to callpeak and randsample commands
1557         Thank Mathieu Gineste for this suggestion!
1558
1559 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1560         MACS version 2.0.10 20130306 (tag:alpha)
1561
1562         * diffpeak module New module to detect differential binding sites
1563         with more statistics.
1564
1565         * Introduced --refine-peaks
1566         Calculates reads balancing to refine peak summits
1567
1568         * Ouput file names prefix
1569         Correct encodePeak to narrowPeak, broadPeak to bed12.
1570
1571 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1572         MACS version 2.0.10 (tag:alpha not released)
1573
1574         * Introduced BAMPEParser
1575         Reads PE data directly, requires bedtools for now
1576
1577         * Introduced --call-summits
1578         Uses signal processing methods to call overlapping peaks
1579
1580         * Added --no-trackline
1581         By default, files have descriptive tracklines now
1582
1583         * new refinepeak command (experimental)
1584         This new function will use a similar method in SPP (wtd), to
1585         analyze raw tag distribution in peak region, then redefine the
1586         peak summit where plus and minus tags are evenly distributed
1587         around.
1588
1589         * Changes to output *
1590         cPeakDetect.pyx has full support for new print/write methods and
1591         --call-peaks, BAMPEParser, and use of paired-end data
1592
1593         * Parser optimization
1594
1595         cParser.pyx is rewritten to use io.BufferedReader to speed
1596         up. Speed is doubled.
1597
1598         Code is reorganized -- most of functions are inherited from
1599         GenericParser class.
1600
1601         * Use cross-correlation to calculate fragment size
1602
1603         First, all pairs will be used in prediction for fragment
1604         size. Previously, only no more than 1000 pairs are used. Second,
1605         cross-correlation is used to find the best phase difference
1606         between + and - tag pileups.
1607
1608         * Speed up p-value and q-value calculation
1609
1610         This part is ten times faster now. I am using a dictionary to
1611         cache p-value results from Poisson CDF function. A bit more memory
1612         will be used to increase speed. I hope this dictionary would not
1613         explode since the possible pairs of ChIP signal and control lambda
1614         are hugely redundant. Also, I rewrited part of q-value
1615         calculation.
1616
1617         * Speed up peak detection
1618
1619         This part is about hundred of times faster now.  Optimizations
1620         include using Numpy functions as much as possible, and making loop
1621         body as small as possible.
1622
1623         * Post-processing on differential calls
1624
1625         After macs2diff finds differential binding sites between two
1626         conditions, it will try to annotate the peak calls from one of two
1627         conditions, describe the changes ...
1628
1629         * Fragment size prediction in macs2diff
1630
1631         Now by default, macs2diff will try to use the average fragment
1632         size from both condition 1 and condition 2 for tag extension and
1633         peak calling. Previously, by default, it will use different sizes
1634         unless --nomodel is specified.
1635
1636         Technically, I separate model building processes out. So macs2diff
1637         will build fragment sizes for condition 1 and 2 in parallel (2
1638         processes maximum), then perform 4-way comparisons in parallel (4
1639         processes maximum).
1640
1641         * Diff score
1642
1643         Combine two p/qscore tracks together. At regions where condition 1
1644         is higher than condition 2, score would be positive, otherwise,
1645         negative.
1646
1647         * SAMParser and BAMParser
1648
1649         Bug fixed for paired-end sequencing data.
1650
1651         * BedGraph.pyx
1652
1653         Fixed a bug while calling peaks from BedGraph file. It previously
1654         mistakenly output same peaks multiple times at the end of
1655         chromosome.
1656
1657 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1658         MACS version 2.0.9 (tag:alpha)
1659
1660         * Auto fixation on predicted d is turned off by default!
1661
1662         Previous --off-auto is now default. MACS will not automatically
1663         fix d less than 2 times of tag size according to
1664         --shiftsize. While tag size is getting longer nowadays, it would
1665         be easier to have d less than 2 times of tag size, however d may
1666         still be meaningful and useful. Please judge it using your own
1667         wisdom.
1668
1669         * Scaling issue
1670
1671         Now, the default scaling while treatment and input are unbalanced
1672         has been adjusted. By default, larger sample will be scaled down
1673         linearly to match the smaller sample. In this way, background
1674         noise will be reduced more than real signals, so we expect to have
1675         more specific results than the other way around (i.e. --to-large
1676         is set).
1677
1678         Also, an alternative option to randomly sample larger data
1679         (--down-sample) is provided to replace default linear
1680         scaling. However, this option will cause results irresproducible,
1681         so be careful.
1682
1683         * randsample script
1684
1685         A new script 'randsample'  is added, which can randomly sample
1686         certain percentage or number of tags.
1687
1688         * Peak summit
1689
1690         Now, MACS will decide peak summits according to pileup height
1691         instead of qvalue scores. In this way, the summit may be more
1692         accurate.
1693
1694         * Diff score
1695
1696         MACS calculate qvalue scores as differential scores. When compare
1697         two conditions (saying A and B), the maximum qscore for comparing
1698         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1699         will be computed. If maxqscore_a2b is bigger, the diff score is
1700         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1701
1702 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1703         MACS version 2.0.8 (tag:alpha)
1704
1705         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1706
1707         New script bdgbroadcall and the extra option '--broad' for macs2
1708         script, can be used to call broad regions with a loose cutoff to
1709         link nearby significant regions. The output is represented as
1710         BED12 format.
1711
1712         * MACS2/IO/cScoreTrack.pyx
1713
1714         Fix q-value calculation to generate forcefully monotonic values.
1715
1716         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1717
1718         They are combined to one more powerful script called
1719         "filterdup". The script filterdup can filter duplicated reads
1720         according to sequencing depth and genome size. The script can also
1721         convert any format supported by MACS to BED format.
1722
1723 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1724         MACS version 2.0.7 (tag:alpha)
1725
1726         * bin/macsdiff renamed to bin/bdgdiff
1727
1728         Now this script will work as a low-level finetuning tool as bdgcmp
1729         and bdgpeakcall.
1730
1731         * bin/macs2diff
1732
1733         A new script to take treatment and control files from two
1734         condition, calculate fragment size, use local poisson to get
1735         pvalues and BH process to get qvalues, then combine 4-ways result
1736         to call differential sites.
1737
1738         This script can use upto 4 cpus to speed up 4-ways calculation. (
1739         I am trying multiprocessing in python. )
1740
1741         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1742         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1743         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1744
1745         All above files are modified for the new macs2diff script.
1746
1747         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1748
1749         Now q-value 0.01 is the default cutoff. If -p is specified,
1750         p-value cutoff will be used instead.
1751
1752 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1753         MACS version 2.0.6 (tag:alpha)
1754
1755         * bin/macsdiff
1756
1757         A script to call differential regions. A naive way is introduced
1758         to find the regions where:
1759
1760         1. signal from condition 1 is larger than input 1 and condition 2 --
1761         unique region in condition 1;
1762         2. signal from condition 2 is larger than input 2 and condition 1
1763         -- unique region in condition 2;
1764         3. signal from condition 1 is larger than input 1, signal from
1765         condition 2 is larger than input 2, however either signal from
1766         condition 1 or 2 is not larger than the other.
1767
1768         Here 'larger' means the pvalue or qvalue from a Poisson test is
1769         under certain cutoff.
1770
1771         (I will make another script to wrap up mulitple scripts for
1772         differential calling)
1773
1774 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1775         MACS version 2.0.5 (tag:alpha)
1776
1777         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1778         MACS2/IO/cPeakIO.pyx
1779
1780         Use hash to store peak information. Add back the feature to deal
1781         with data without control.
1782
1783         Fix bug which incorrectly allows small peaks at the end of
1784         chromosomes.
1785
1786         * bin/bdgpeakcall, bin/bdgcmp
1787
1788         Fix bugs. bdgpeakcall can output encodePeak format.
1789
1790 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1791         MACS version 2.0.4 (tag:alpha)
1792
1793         * cPeakDetect.py
1794
1795         Fix a bug, correctly assign lambda_bg while --to-small is
1796         set. Thanks Junya Seo!
1797
1798         Add rank and num of bp columns to pvalue-qvalue table.
1799
1800         * cScoreTrack.py
1801
1802         Fix bugs to correctly deal with peakless chromosomes. Thanks
1803         Vaibhav Jain!
1804
1805         Use AFDR for independent tests instead.
1806
1807         * encodePeak
1808
1809         Now MACS can output peak coordinates together with pvalue, qvalue,
1810         summit positions in a single encodePeak format (designed for
1811         ENCODE project) file. This file can be loaded to UCSC
1812         browser. Definition of some specific columns are: 5th:
1813         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1814         -log10qvalue, 10th: relative summit position to peak start.
1815
1816
1817 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1818         MACS version 2.0.3 (tag:alpha)
1819
1820         * Rich output with qvalue, fold enrichment, and pileup height
1821
1822         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1823         procedure:
1824
1825         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1826
1827         Now we have a similiar xls output file as before. The differences
1828         from previous file are:
1829
1830         1. Summit now is absolute summit, instead of relative summit
1831            position;
1832         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1833            pileup at the peak summit;
1834         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1835            5.00 means 1e-5, simple and less confusing.
1836         4. FDR column becomes '-log10(qvalue)' column.
1837         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1838            the values at the peak summit.
1839
1840         * Extra output files
1841
1842         NAME_pqtable.txt contains pvalue and qvalue relationships.
1843
1844         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1845         and -log10qvalue scores in BedGraph format. Nearby regions with
1846         the same value are not merged.
1847
1848         * Separation of FeatIO.py
1849
1850         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1851         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1852         implemented to store pileup, local lambda, pvalue, and qvalue
1853         alltogether in cScoreTrack.pyx.
1854
1855         * Experimental option --half-ext
1856
1857         Suggested by NPS algorithm, I added an experimental option
1858         --half-ext to let MACS only extends ChIP fragment around its
1859         middle point for only 1/2 d.
1860
1861 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1862         MACS version 2.0.2 (tag:alpha)
1863
1864         * macs2
1865
1866         Add an error check to see if there is no common chromosome names
1867         from treatment file and control file
1868
1869         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1870
1871         Reduce memory usage by removing deepcopy() calls.
1872
1873         * Modify README documents and others.
1874
1875 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1876         MACS Version 2.0.1 (tag:alpha)
1877
1878         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1879
1880         Jie suggested me a brilliant simple method to pileup fragments
1881         into bedGraph track. It works extremely faster than the previous
1882         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1883         large local lambda calculation in MACSv2 now. Now I generate three
1884         bedGraphs for d-size local bias, slocal-size and llocal-size local
1885         bias, and calculate the maximum local bias as local lambda
1886         bedGraph track.
1887
1888         Minor: add_loc in bedGraphTrackI now can correctly merge the
1889         region with its preceding region if their value are the same.
1890
1891         * macs2
1892
1893         Add an option to shift control tags before extension. By default,
1894         control tags will be extended to both sides regardless of strand
1895         information.
1896
1897 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1898         MACS Version 2.0.0 (tag:alpha)
1899
1900         * Use bedGraph type to store data internally and externally.
1901
1902         We can have theoretically one-basepair resolution profiles. 10
1903         times smaller in filesize and even smaller after converting to
1904         bigWig for visualization.
1905
1906         * Peak calling process modified. Better peak boundary detection.
1907
1908         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1909         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1910         one will be averaged to d size) Then calculate the maximum value
1911         of these two tracks and a global background, to have a
1912         local-lambda bedGraph.
1913
1914         Use -10log10poisson_pvalue as scores to generate a score track
1915         before peak calling.
1916
1917         A general peak calling based on a score cutoff, min length of peak
1918         and max gap between nearby peaks.
1919
1920         * Option changes.
1921
1922         Wiggle file output is removed. Now we only support bedGraph
1923         output. The generation of bedGraph is highly recommended since it
1924         will not cost extra time. In other words, bedGraph generation is
1925         internally run even you don't want to save bedGraphs on disk, due
1926         to the peak calling algorithm in MACS v2.
1927
1928         * cProb.pyx
1929
1930         We now can calculate poisson pvalue in log space so that the score
1931         (-10*log10pvalue) will not have a upper limit of 3100 due to
1932         precision of float number.
1933
1934         * Cython is adopted to speed up Python code.
1935
1936 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1937         Small fixes
1938
1939         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1940
1941 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1942         Version 1.4.0rc2 (Valentine)
1943
1944         * --single-wig option is renamed to --single-profile
1945
1946         * BedGraph output with --bdg or -B option.
1947
1948         The BedGraph output provides 1bp resolution fragment pileup
1949         profile. File size is smaller than wig file. This option can be
1950         combined with --single-profile option to produce a bedgraph file
1951         for the whole genome. This option can also make --space,
1952         --call-subpeaks invalid.
1953
1954         * Fix the description of --shiftsize to correctly state that the
1955         value is 1/2 d (fragment size).
1956
1957         * Fix a bug in the call to __filter_w_control_tags when control is
1958         not available.
1959
1960         * Fix a bug on --to-small option. Now it works as expected.
1961
1962         * Fix a bug while counting the tags in candidate peak region, an
1963         extra tag may be included. (Thanks to Jake Biesinger!)
1964
1965         * Fix the bug for the peaks extended outside of chromosome
1966         start. If the minus strand tag goes outside of chromosome start
1967         after extension of d, it will be thrown out.
1968
1969         * Post-process script for a combined wig file:
1970
1971         The "wignorm" command can be called after a full run of MACS14 as
1972         a postprocess. wignorm can calculate the local background from the
1973         control wig file from MACS14, then use either foldchange,
1974         -10*log10(pvalue) from possion test, or difference after asinh
1975         transformation as the score to build a single wig track to
1976         represent the binding strength. This script will take a
1977         significant long time to process.
1978
1979         * --wigextend has been obsoleted.
1980
1981 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1982         Version 1.4.0rc1 (Starry Sky)
1983
1984         * Duplicate reads option
1985
1986         --keep-dup behavior is changed. Now user can specify how many
1987         reads he/she wants to keep at the same genomic location. 'auto' to
1988         let MACS decide the number based on binomial distribution, 'all'
1989         to let MACS keep all reads.
1990
1991         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1992
1993         By default, MACS will now scale the smaller dataset to the bigger
1994         dataset. For instance, if IP has 10 million reads, and Input has 5
1995         million, MACS will double the lambda value calculated from Input
1996         reads while calling BOTH the positive peaks and negative
1997         peaks. This will address the issue caused by unbalanced numbers of
1998         reads from IP and Input. If --to-small is turned on, MACS will
1999         scale the larger dataset to the smaller one. So from now on, if d
2000         is fixed, then the peaks from a MACS call for A vs B should be
2001         identical to the negative peaks from a B vs A.
2002
2003 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
2004         Version 1.4.0beta (summer wishes)
2005
2006         * New features
2007
2008         ** Model building
2009
2010         The default behavior in the model building step is slightly
2011         changed. When MACS can't find enough pairs to build model
2012         (implemented in alpha version) or the modeled fragment length is
2013         less than 2 times of tag length (implemented in beta version),
2014         MACS will use 2 times of --shiftsize value as fragment length in
2015         the later analysis. --off-auto can turn off this default behavior.
2016
2017         ** Redundant tag filtering
2018
2019         The IO module is rewritten. The redundant tag filtering process
2020         becomes simpler and works as promise. The maximum allowed number
2021         of tags at the exact same location is calculated from the
2022         sequencing depth and genome size using a binomial distribution,
2023         for both TREAMENT and CONTROL separately. ( previously only
2024         TREATMENT is considered ) The exact same location means the same
2025         coordination and the same strand. Then MACS will only keep at most
2026         this number of tags at the exact same location in the following
2027         analysis. An option --keep-dup can let MACS skip the filtering and
2028         keep all the tags. However this may bring in a lot of sequencing
2029         bias, so you may get many false positive peaks.
2030
2031         ** Single wiggle mode
2032
2033         First thing to mention, this is not the score track that I
2034         described before. By default, MACS generates wiggle files for
2035         fragment pileup for every chromosomes separately. When you use
2036         --single-wig option, MACS will generate a single wiggle file for
2037         all the chromosomes so you will get a wig.gz for TREATMENT and
2038         another wig.gz for CONTROL if available.
2039
2040         ** Sniff -- automatic format detection
2041
2042         Now, by default or "-f AUTO", MACS will decide the input file
2043         format automatically. Technically, it will try to read at most
2044         1000 records for the first 10 non-comment lines. If it succeeds,
2045         the format is decided. I recommend not to use AUTO and specify the
2046         right format for your input files, unless you combine different
2047         formats in a single MACS run.
2048
2049         * Options changes
2050
2051         --single-wig and --keep-dup are added. Check previous section in
2052         ChangeLog for detail.
2053
2054         -f (--format) AUTO is now the default option.
2055
2056         --slocal default: 1000
2057         --llocal default: 10000
2058
2059         * Bug fixed
2060
2061         Setup script will stop the installation if python version is not
2062         python2.6 or python2.7.
2063
2064         Local lambda calculation has been changed back. MACS will check
2065         peak_region, slocal( default 1K) and llocal (default 10K) for the
2066         local bias. The previous 200bps default will cause MACS misses
2067         some peaks where the input bias is very sharp.
2068
2069         sam2bed.py script is corrected.
2070
2071         Relative pos in xls output is fixed.
2072
2073         Parser for ELAND_export is fixed to pass some of the no match
2074         lines. And elandexport2bed.py is fixed too. ( however I can't
2075         guarantee that it works on any eland_export files. )
2076
2077 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2078         Version 1.4.0alpha2 (be smarter)
2079
2080         * Options changes
2081
2082         --gsize now provides shortcuts for common genomes, including
2083         human, mouse, C. elegans and fruitfly.
2084
2085         --llocal now will be 5000 bps if there is no input file, so that
2086         local lambda doesn't overkill enriched binding sites.
2087
2088 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2089         Version 1.4alpha (be smarter)
2090
2091         * Options changes
2092
2093         --tsize option is redesigned. MACS will use the first 10 lines of
2094         the input to decide the tag size. If user specifies --tsize, it
2095         will override the auto decided tsize.
2096
2097         --lambdaset is replaced by --slocal and --llocal which mean the
2098         small local region and large local region.
2099
2100         --bw has no effect on the scan-window size now. It only affects the
2101         paired-peaks model process.
2102
2103         * Model building
2104
2105         During the model building, MACS will pick out the enriched regions
2106         which are not too high and not too low to build the paired-peak
2107         model. Default the region is from fold 10 to fold 30. If MACS
2108         fails to build the model, by default it will use the nomodel
2109         settings, like shiftsize=100bps, to shift and extend each
2110         tags. This behavior can be turned off by '--off-auto'.
2111
2112         * Output files
2113
2114         An extra file including all the summit positions are saved in
2115         *_summits.bed file. An option '--call-subpeaks' will invoke
2116         PeakSplitter developed by Mali Salmon to split wide peaks into
2117         smaller subpeaks.
2118
2119         * Sniff ( will in beta )
2120
2121         Automatically recognize the input file format, so use can combine
2122         different format in one MACS run.
2123
2124         Not implemented features/TODO:
2125
2126         * Algorithms ( in near future? )
2127
2128         MACS will try to refine the peak boundaries by calculating the
2129         scores for every point in the candidate peak regions. The score
2130         will be the -10*log(10,pvalue) on a local poisson distribution. A
2131         cutoff specified by users (--pvalue) will be applied to find the
2132         precise sub-peaks in the original candidate peak region. Peak
2133         boudaries and peak summits positions will be saved in separate BED
2134         files.
2135
2136         * Single wiggle track ( in near future? )
2137
2138         A single wiggle track will be generated to save the scores within
2139         candidate peak regions in the 10bps resolution. The wiggle file
2140         is in fixedStep format.
2141
2142
2143 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2144         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2145
2146         * bin/Constants.py
2147
2148         Fixed typo. FCSTEP -> FESTEP
2149
2150         * lib/PeakDetect.py
2151
2152         The 'femax' attribute bug is fixed
2153
2154 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2155         Version 1.3.7 (Oktoberfest)
2156
2157         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2158
2159         Enhancements by Peter Chines:
2160
2161         1. gzip files are supported.
2162         2. when --diag is on, user can set the increment and endpoint for
2163         fold enrichment analysis by setting --fe-step and --fe-max.
2164
2165         Enhancements by Davide Cittaro:
2166
2167         1. BAM and SAM formats are supported.
2168         2. small changes in the header lines of wiggle output.
2169
2170         Enhancements by Me:
2171         1. I added --fe-min option;
2172         2. Bowtie ascii output with suffix ".map" is supported.
2173
2174         Bug fixed:
2175
2176         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2177         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2178         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2179         4. Some "fold change" have been changed to "fold enrichment".
2180
2181 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2182         Version 1.3.6.1 (default parameter change)
2183
2184         * bin/macs, lib/PeakDetect.py
2185
2186         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2187         default. "--futurefdr" is added which can turn on the 'new' method
2188         introduced in 1.3.6. By default it's off.
2189
2190         * lib/PeakDetect.py
2191
2192         Fixed a bug. p-value is corrected a little bit.
2193
2194
2195 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2196         Version 1.3.6 (Birthday cake)
2197
2198         * bin/macs
2199
2200         "track name" is added to the header of BED output file.
2201
2202         Now the default peak detection method is to consider 5k and 10k
2203         nearby regions in treatment data and peak location, 1k, 5k, and
2204         10k regions in control data to calculate local bias. The old
2205         method can be called through '--old' option.
2206
2207         Information about how many total/unique tags in treatment or
2208         control will be saved in final .xls output.
2209
2210         * lib/IO/__init__.py
2211
2212         ".fa" will be removed from input tag alignment so only the
2213         chromosome names are kept.
2214
2215         WigTrackI class is added for Wiggle like data structure. (not used
2216         now)
2217
2218         The parser for ELAND multi PET files has been fixed. Now the 5'
2219         tag position for a pair will be kept, whereas in the previous
2220         version, the middle points are kept.
2221
2222         * lib/IO/BinKeeper.py
2223
2224         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2225         browser, which can quickly access certain region for values in a
2226         large wiggle like data file. (not used now)
2227
2228         * lib/OptValidator.py
2229
2230         typo fixed.
2231
2232         * lib/PeakDetect.py
2233
2234         Now the default peak detection method is to consider 5k and 10k
2235         nearby regions in treatment data and peak location, 1k, 5k, and
2236         10k regions in control data to calculate local bias. The old
2237         method can be called through '--old' option.
2238
2239         Two columns have beed added to BED output file. 4th column: peak
2240         name; 5th column: peak score using -10log(10,pvalue) as score.
2241
2242         * setup.py
2243
2244         Add support to build a Mac App through 'setup.py py2app', or a
2245         Windows executable through 'setup.py py2exe'. You need to install
2246         py2app or py2exe package in order to use these functions.
2247
2248 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2249         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2250
2251         * PeakDetect.py
2252
2253         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2254         in control data to calculate local lambda for each peak. Peak
2255         calling results will be slightly different with previous version,
2256         beware!
2257
2258         * OptValidator.py
2259
2260         Typo fixed, ELANDParser -> ELANDResultParser
2261
2262         * OutputWriter.py
2263
2264         Now, modeled d value will be shown on the model figure.
2265
2266 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2267         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2268
2269         * macs, IO/__init__.py, PeakDetect.py
2270
2271         Add support for ELAND multi format. Add support for Pair-End
2272         experiment, in this case, 5'end and 3'end ELAND multi format files
2273         are required for treatment or control data. See 00README file for
2274         detail.
2275
2276         Add wigextend option.
2277
2278         Add petdist option for Pair-End Tag experiment, which is the best
2279         distance between 5' and 3' tags.
2280
2281         * PeakDetect.py
2282
2283         Fixed a bug which cause the end positions of every peak region
2284         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2285
2286         * OutputWriter.py
2287
2288         Fix bugs while generating wiggle files. The start position of
2289         wiggle file is set to 1 instead of 0.
2290
2291         Fix a bug that every 10M bps, signals in the first 'd' range are
2292         lower than actual. ( Thanks Mali Salmon!)
2293
2294
2295 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2296         Version 1.3.3 (wiggle bugs fixed)
2297
2298         * OutputWriter.py
2299
2300         Fix bugs while generating wiggle files. 1. 'span=' is added to
2301         'variableStep' line; 2. previously, every 10M bps, the coordinates
2302         were wrongly shifted to the right for 'd' basepairs.
2303
2304         * macs, PeakDetect.py
2305
2306         Add an option to save wiggle files on different resolution.
2307
2308 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2309         Version 1.3.2 (tiny bugs fixed)
2310
2311         * IO/__init__.py
2312
2313         Fix 65536 -> 65535. ( Thank Joon)
2314
2315         * Prob.py
2316
2317         Improved for binomial function with extra large number. Imported
2318         from Cistrome project.
2319
2320         * PeakDetect.py
2321
2322         If treatment channel misses reads in some chromosome included in
2323         control channel, or vice versa, MACS will not exit. (Thank Shaun
2324         Mahony)
2325
2326         Instead, MACS will fake a tag at position -1 when calling
2327         treatment peaks vs control, but will ignore the chromosome while
2328         calling negative peaks.
2329
2330 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2331         Version 1.3.1 (tiny bugs fixed version)
2332
2333         * Prob.py
2334
2335         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2336         binomial functions can tolerate large and small numbers.
2337
2338         * IO/__init__.py
2339
2340         Parsers now split lines in BED/ELAND file using any
2341         whitespaces. 'track' or 'browser' lines will be regarded as
2342         comment lines. A bug fixed when throwing StrandFormatError. The
2343         maximum redundant tag number at a single position can be no less
2344         than 65536.
2345
2346
2347 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2348         Version 1.3 (naming clarification version)
2349
2350         * Naming clarification changes according to our manuscript:
2351
2352         'frag_len' is changed to 'd'.
2353
2354         'fold_change' is changed to 'fold_enrichment'.
2355
2356         Suggest '--bw' parameter to be determined by users from the real
2357         sonication size.
2358
2359         Maximum FDR is 100% in the output file.
2360
2361         And other clarifications in 00README file and the documents on the
2362         website.
2363
2364         * IO/__init__.py
2365         If the redundant tag number at a single position is over 32767,
2366         just remember 32767, instead of raising an overflow exception.
2367
2368         * setup.py
2369         fixed a typo.
2370
2371         * PeakDetect.py
2372         Bug fixed for diagnosis report.
2373
2374
2375 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2376         Version 1.2.2gamma
2377
2378         * Serious bugs fix:
2379
2380         Poisson distribution CDF and inverse CDF functions are
2381         corrected. They can produce right results even for huge lambda
2382         now. So that the p-value and FDR values in the final excel sheet
2383         are corrected.
2384
2385         IO package now can tolerate some rare cases; ELANDParser in IO
2386         package is fixed. (Thank Bogdan)
2387
2388         * Improvement:
2389
2390         Reverse paired peaks in model are rejected. So there will be no
2391         negative 'frag_len'. (Thank Bogdan)
2392
2393         * Features added:
2394
2395         Diagnosis function is completed. Which can output a table file for
2396         users to estimate their sequencing depth.
2397
2398
2399 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2400         Version 1.2
2401
2402         * Probe.py is added!
2403
2404         GSL is totally removed from MACS. Instead, I have implemented the
2405         CDF and inverse CDF for poisson and binomial distribution purely
2406         in python.
2407
2408         * Constants.py is added!
2409
2410         Organize constants used in MACS in the Constants.py file.
2411
2412         * All other files are modified!
2413
2414         Foldchange calculation is modified. Now the foldchange only be
2415         calculated at the peak summit position instead of the whole peak
2416         region. The values will be higher and more robust than before.
2417
2418         Features added:
2419
2420         1. MACS can save wiggle format files containing the tag number at
2421         every 10 bp along the genome. Tags are shifted according to our
2422         model before they are calculated.
2423
2424         2. Model building and local lambda calculation can be skipped with
2425         certain options.
2426
2427         3. A diagnosis report can be generated through '--diag'
2428         option. This report can help you get an assumption about the
2429         sequencing saturation. This funtion is only in beta stage.
2430
2431         4. FDR calculation speed is highly improved.
2432
2433 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2434         Version 1.1
2435
2436         * TabIO, PeakModel.py ...
2437         Bug fixed to let MACS tolerate some cases while there is no tag on
2438         either plus strand or minus strand.
2439
2440         * setup.py
2441         Check the version of python. If the version is lower than 2.4,
2442         refuse to install with warning.
2443