ChangeLog

   1 2019-12-12  Tao Liu  <vladimir.liu@gmail.com>
   2         MACS version 2.2.6
   3
   4         * New Features
   5
   6         1) Speed up MACS2. Some programming tricks and code cleanup. The
   7         filter_dup function replaces separate_dups. The later one was
   8         implemented for potentially putting back duplicate reads in
   9         certain downstream analysis. However such analysis hasn't been
  10         implemented. Optimize the speed of writing bedGraph
  11         files. Optimize BAM and BAMPE parsing with pointer casting instead
  12         of python unpack.
  13
  14         2) The comment lines in the headers of BED or SAM files will be
  15         correctly skipped. However, MACS2 won't check comment lines in the
  16         middle of the file.
  17
  18         * Bugs fixed
  19
  20         1) Cutoff-analysis in callpeak command. #341
  21
  22         2) Issues related to SAMParser and three ELAND Parsers are
  23         fixed. #347
  24
  25         * Other
  26
  27         1) cmdlinetest script in test/ folder has been updated to: 1. test
  28         cutoff-analysis with callpeak cmd; 2. output the 2 lines before
  29         and after the error or warning message during tests; 3. output
  30         only the first 10 lines if the difference between test result and
  31         standard result can be found; 4. prockreport monitor CPU time and
  32         memory usage in 1 sec interval -- a bit more accurate.
  33
  34         2) Python3.5 support is removed. Now MACS2 requires Python>=3.6.
  35
  36 2019-10-31  Tao Liu  <vladimir.liu@gmail.com>
  37         MACS version 2.2.5 (Py3 speed up)
  38
  39         * Features added
  40
  41         1) *Github code only and Not included in MACS2 release* New
  42         testing data for performance test. An subsampled ENCODE2 CTCF
  43         ChIP-seq dataset, including 5million ChIP reads and 5 million
  44         control reads, has been included in the test folder for testing
  45         CPU and memory usage (i.e. 5M test). Several related scripts ,
  46         including `prockreport` for output cpu memory usage, `pyprofile`
  47         and `pyprofile_stat` for debuging and profiling MACS2 codes, have
  48         been included.
  49
  50         2) Speed up pvalue-qvalue checkup (pqtable checkup) #335 #338.
  51         The old hashtable.pyx implementation copied from Pandas (very old
  52         version) doesn't work well in Python3+Cython. It slows down the
  53         pqtable checkup using the identical Cython codes as in
  54         v2.1.4. While running 5M test, the `__getitem__` function in the
  55         hashtable.pyx took 3.5s with 37,382,037 calls in MACS2 v2.1.4, but
  56         148.6s with the same number of calls in MACS2 v2.2.4. As a
  57         consequence, the standard python dictionary implementation has
  58         replaced hashtable.pyx for pqtable checkup. Now MACS2 runs a bit
  59         faster than py2 version, but uses a bit more memory. In general,
  60         v2.2.5 can finish 5M reads test in 20% less time than MACS2
  61         v2.1.4, but use 15% more memory.
  62
  63         * Bug fixed
  64
  65         1) More Python3 related fixes, e.g. the return value of keys from
  66         py3 dict. #333 #337
  67
  68
  69 2019-10-01  Tao Liu  <vladimir.liu@gmail.com>
  70         MACS version 2.2.4 (Python3)
  71
  72         * Features added
  73
  74         1) First Python3 version MACS2 released.
  75
  76         2) Version number 2.2.X will be used for MACS2 in Python3, in
  77         parallel to 2.1.X.
  78
  79         3) More comprehensive test.sh script to check the consistency of
  80         results from Python2 version and Python3 version.
  81
  82         4) Simplify setup.py script since the newest version transparently
  83         supports cython. And when cython is not installed by the user,
  84         setup.py can still compile using only C codes.
  85
  86         5) Fix Signal.pyx to use np.array instead of np.mat.
  87
  88 2019-09-30  Tao Liu  <vladimir.liu@gmail.com>
  89         MACS version 2.1.4
  90
  91         * Features added
  92
  93         Github Actions is used together with Travis CI for testing and
  94         deployment.
  95
  96         * Bugs fixed
  97
  98         PR #322:
  99
 100         1) #318 Random score in bdgdiff output. It turns out the sum_v is
 101         not initialized as 0 before adding. Potential bugs are fixed in
 102         other functions in ScoreTrack and CallPeakUnit codes.
 103
 104         2) #321 Cython dependency in setup.py script is removed. And place
 105         'cythonzie' call to the correct position.
 106
 107         3) A typo is fixed in Github Actions script.
 108
 109 2019-09-19  Tao Liu  <vladimir.liu@gmail.com>
 110         MACS version 2.1.3.3
 111
 112         * Features added
 113
 114         1) Support Docker auto-deploy. PR #309
 115
 116         2) Support Travis CI auto-testing, update unit-testing
 117         scripts, and enable subcommand testing on small datasets.
 118
 119         3) Update README documents. #297 PR #306
 120
 121         4) `cmbreps` supports more than 2 replicates. Merged from PR #304
 122         @Maarten-vd-Sande and PR #307 (our own chi-sq test code)
 123
 124         5) `--d-min` option is added in `callpeak` and `predictd`, to
 125         exclude predictions of fragment size smaller than the given
 126         value. Merged from PR #267 @shouldsee.
 127
 128         6) `--buffer-size` option is added in `predictd`, `filterdup`,
 129         `pileup` and `refinepeak` subcommands. Users can use this option
 130         to decrease memory usage while there are a large number of contigs
 131         in the data. Also, now `callpeak`, `predictd`, `filterdup`,
 132         `pileup` and `refinepeak` will suggest users to tweak
 133         `--buffer-size` while catching a MemoryError. #313 PR #314
 134
 135         * Bugs fixed
 136
 137         1) #265 Fixed a bug where the pseudocount hasn't been applied
 138         while calculating p-value score in ScoreTrack object.
 139
 140         2) Fixed bdgbroadcall so that it will report those broad peaks
 141         without strong peak inside, a consistent behavior as `callpeak
 142         --broad`.
 143
 144         3) Rename COPYING to LICENSE.
 145
 146 2018-10-17  Tao Liu  <vladimir.liu@gmail.com>
 147         MACS version 2.1.2
 148
 149         * New features
 150
 151         1) Added missing BEDPE support. And enable the support for BAMPE
 152         and BEDPE formats in 'pileup', 'filterdup' and 'randsample'
 153         subcommands. When format is BAMPE or BEDPE, The 'pileup' command
 154         will pile up the whole fragment defined by mapping locations of
 155         the left end and right end of each read pair. Thank @purcaro
 156
 157         2) Added options to callpeak command for tweaking max-gap and
 158         min-len during peak calling. Thank @jsh58!
 159
 160         3) The callpeak option "--to-large" option is replaced with
 161         "--scale-to large".
 162
 163         4) The randsample option "-t" has been replaced with "-i".
 164
 165         * Bug fixes
 166
 167         1) Fixed memory issue related to #122 and #146
 168
 169         2) Fixed a bug caused by a typo. Related to #249, Thank @shengqh
 170
 171         3) Fixed a bug while setting commandline qvalue cutoff.
 172
 173         4) Better describe the 5th column of narrowPeak. Thank @alexbarrera
 174
 175         5) Fixed the calculation of average fragment length for paired-end
 176         data. Thank @jsh58
 177
 178         6) Fixed bugs caused by khash while computing p/q-value and log
 179         likelihood ratios. Thank @jsh58
 180
 181         7) More spelling tweaks in source code. Thank @mr-c
 182
 183 2016-03-09  Tao Liu  <vladimir.liu@gmail.com>
 184         MACS version 2.1.1 20160309
 185
 186         * Retire the tag:rc.
 187
 188         * Fixed spelling. Merged pull request #120. Thank @mr-c!
 189
 190         * Change filtering criteria for reading BAM/SAM files
 191
 192         Related to callpeak and filterdup commands. Now the
 193         reads/alignments flagged with 1028 or 'PCR/Optical duplicate' will
 194         still be read although MACS2 may decide them as duplicates
 195         later. Related to old issue #33. Sorry I forgot to address it for
 196         years!
 197
 198 2016-02-26  Tao Liu  <vladimir.liu@gmail.com>
 199         MACS version 2.1.1 20160226 (tag:rc Zhengyue)
 200
 201         * Bug fixes
 202
 203         1) Now "-Ofast" has been replaced by "-O3 --ffast-math", because
 204         the former option is not supported by older GCC. Related to issues
 205         #91, #109.
 206
 207         2) Issue #108 is fixed. If no peak can be found in a chromosome,
 208         the PeakIO won't throw an error.
 209
 210         * New features
 211
 212         1) callpeak
 213
 214         a) A more flexible format, BEDPE, is supported. Now users can
 215         define the left and right position of the ChIPed fragment, and
 216         MACS2 will skip model building and directly pileup the
 217         fragments. Related to issue #112.
 218
 219         b) The 'tempdir' can be specified, to save cached pileup
 220         tracks. Originially, the temporary files were stored in
 221         /tmp. Thank @daler! Related to issues #97 and #105.
 222
 223         2) bdgopt
 224
 225         New operations are added, to calculate the maximum or minimum value between
 226         values in BEDGRAPH and given value.
 227
 228         3) bdgcmp
 229
 230         New method is added, to calculate the maximum value between values
 231         defined in two BEDGRAPH files.
 232
 233 2015-12-22  Tao Liu  <vladimir.liu@gmail.com>
 234         MACS version 2.1.0 20151222 (tag:rc Dongzhi)
 235
 236         * Bug fixes
 237
 238         1) Fix a bug while dealing with some chromosomes only containing
 239         one read (pair). The size of dup_plus/dup_minus arrays after
 240         filtering dups should +1.
 241
 242         2) Fix a bug related to the broad peak calling function in
 243         previous versions. The gaps were miscalculated, so segmented weak
 244         broad calls may be reported, and sometimes you would see peaks
 245         with lower than cutoff values in the output files.
 246
 247         3) "Potentially" Fixed issue #105 on temporary cache files, need
 248         further followup.
 249
 250
 251 2015-07-31  Tao Liu  <vladimir.liu@gmail.com>
 252         MACS version 2.1.0 20150731 (tag:rc)
 253
 254         * Bug fixes
 255
 256         1) Fixed issue #76: information about broad/narrow cutoff will be
 257         correctly displayed.
 258
 259         2) Fixed issue #79: bdgopt extparam option is fixed.
 260
 261         3) Fixed issue #87: reference to cProb has been fixed as 'Prob'
 262         for filterdup command.
 263
 264         4) Fixed issue #78, #88 and similar issue reported in MACS google
 265         group: MACS2 now can correctly deal with multiple alignment files
 266         for -t or -c. The 'finalize' function will be correctly
 267         called. Multiple files option is enabled for filterdup,
 268         randsample, predictd, pileup and refinepeak commands.
 269
 270         5) A related issue to #88, when BAMPE mode is used, PE pairs will
 271         be sorted by leftmost then rightmost ends.
 272
 273         6) Fixed issue #86: A wrong use of 'ndarray' to create Numpy
 274         array. This will cause 'callpeak --nolambda' hang forever while
 275         calculating pvalues and qvalues.
 276
 277 2015-04-20  Tao Liu  <vladimir.liu@gmail.com>
 278         MACS version 2.1.0 20150420 (tag:rc)
 279
 280         * New commands
 281
 282         1) bdgopt: some convenient functions to modify bedGraph files.
 283
 284         2) cmbreps: Combine scores from two replicates. Including three
 285         methods: 1. take the maximum; 2. take the average; 3. use Fisher's
 286         method to combine two p-value scores. After that, user can use
 287         bdgpeakcall to call peaks on combined scores.
 288
 289         * New features
 290
 291         1) callpeak and bdgpeakcall now can try to analyze the
 292         relationship between p-values and number/length of peaks then
 293         generate a summary to help users decide an appropriate cutoff.
 294
 295         2) callpeak now can accept fold-enrichment cutoff as a filter for
 296         final peak calls.
 297
 298         * Performance
 299
 300         Now MACS2 runs about 3X as fast as previous version. Trade
 301         clean python codes for speed... Now while processing 50M ChIP vs
 302         50M control, it will take only 10 minutes.
 303
 304         * Bug fixes
 305
 306         1) Sampling function in BAMPE mode.
 307
 308         2) Callpeak while there are >= 2 input files for -t or -c.
 309
 310         3) While reading BAM/SAM, those secondary or supplementary
 311         alignments will be correctly skipped.
 312
 313         4) Fixed issue #33: Explanation is added to callpeak --keep-dup
 314         option that MACS2 will discard those SAM/BAM alignments with bit
 315         1024 no matter how --keep-dup is set.
 316
 317         5) Fixed issue #49: setuptools is used intead of distutils
 318
 319         6) Fixed issue #51: fix the problem when using --trackline
 320         argument when control file is absent.
 321
 322         7) Fixed issue #53: Use Use SAM/BAM CIGAR to find the 5' end of
 323         read mapped to minus strand. Previous implementation will find
 324         incorrect 5' end if there is indel in alignment.
 325
 326         8) Fixed issue #56: An incorrect sorting method used for BAMPE
 327         mode which will cause incorrect filtering of duplicated reads. Now
 328         fixed.
 329
 330         9) Issue #63: Merged from jayhesselberth@github, extsize now can
 331         be 1.
 332
 333         10) Issue #71: Merged from aertslab@github, close file descriptor
 334         after creating them with mkstemp().
 335
 336 2014-06-16  Tao Liu  <vladimir.liu@gmail.com>
 337         MACS version 2.1.0 20140616 (tag:rc)
 338
 339         * callpeak module
 340
 341         "--ratio" is added to manually assign the scaling factor of ChIP
 342         vs control, e.g. from NCIS. Thank Colin D and Dietmar Rieder for
 343         implementing the patch file!
 344
 345         "--shift" is added to move cutting ends (5' end of reads) around,
 346         in order to process DNAse-Seq data, e.g., use "--shift -100
 347         --extsize 200" to get 200bps fragments around 5' ends. For general
 348         ChIP-Seq data analysis, this option should be always set as
 349         0. Thank Xi Chen and Anshul Kundaje for the discussions in user
 350         group!
 351
 352         ** Do not output negative fragment size from cross-correlation
 353         analysis. Thank Alvin Qin for the feedback!
 354
 355         ** --half-ext and --control-shift are removed. For complex read
 356         shifting and extending, combine '--shift' and '--extsize'
 357         options. For comparing two conditions, use 'bdgdiff' module
 358         instead.
 359
 360         ** a bug is fixed to output the last pileup value in bdg file
 361         correctly.
 362
 363         * filterdup
 364
 365         A 'dry-run' option is added to only output numbers, including the
 366         number of allowed duplicates, the total number of reads before and
 367         after filtering duplicates and the estimated duplication
 368         rate. Thank John Urban for the suggestion!
 369
 370
 371 2013-12-16  Tao Liu  <vladimir.liu@gmail.com>
 372         MACS version 2.0.10 20131216 (tag:alpha)
 373
 374         bug fixes and tweaks
 375
 376         * We changed license from Artistic License to 3-clauses BSD license.
 377
 378         Yes. Simpler the better.
 379
 380         * Process paired-end data with "-f BAMPE" without control
 381
 382         * GappedPeak output for --broad option has been fixed again to be
 383         consistent with official UCSC format. We add 1bp pseudo-block to
 384         left and/or right of broad region when necessary, so that you can
 385         virtualize the regions without strong enrichment inside
 386         successfully. In downstream analysis except for virtualization,
 387         you may need to remove all 1bps blocks from gappedPeak file.
 388
 389         * diffpeak subcommand is temporarily disabled. Till we
 390         re-implement it.
 391
 392 2013-10-28  Tao Liu  <vladimir.liu@gmail.com>
 393         MACS version 2.0.10 20131028 (tag:alpha)
 394
 395         * callpeak --call-summits improvement
 396
 397         The smoothing window length has been fixed as fragment length
 398         instead of short read length. The larger smoothing window will
 399         grant better smoothing results and better sub-peak summits
 400         detection.
 401
 402         * --outdir and --ofile options for almost all commands
 403
 404         Thank Björn Grüning for initially implementing these options!
 405         Now, MACS2 will save results into a specified
 406         directory by '--outdir' option, and/or save result into a
 407         specified file by '--ofile' option. Note, in case '--ofile' is
 408         available for a subcommand, '-o' now has been adjusted to be the
 409         same as '--ofile' instead of '--o-prefix'.
 410
 411         Here is the list of changes. For more detail, use 'macs2 xxx -h'
 412         for each subcommand:
 413
 414         ** callpeak: --outdir
 415         ** diffpeak: Not implemented
 416         ** bdgpeakcall: --outdir and --ofile
 417         ** bdgbroadcall: --outdir and --ofile
 418         ** bdgcmp: --outdir and --ofile. While --ofile is used, the number
 419         and the order of arguments for --ofile must be the same as for -m.
 420         ** bdgdiff: --outdir and --ofile
 421         ** filterdup: --outdir
 422         ** pileup: --outdir
 423         ** randsample: --outdir
 424         ** refinepeak: --outdir and --ofile
 425
 426
 427 2013-09-15  Tao Liu  <vladimir.liu@gmail.com>
 428         MACS version 2.0.10 20130915 (tag:alpha)
 429
 430         * callpeak Added a new option --buffer-size
 431
 432         This option is to tweak a previously hidden parameter that
 433         controls the steps to increase array size for storing alignment
 434         information. While in some rare cases, the number of
 435         chromosomes/contigs/scaffolds is huge, the original default
 436         setting will cause a huge memory waste. In these cases, we
 437         recommend to decrease --buffer-size (e.g., 1000) to save memory,
 438         although the decrease will slow process to read alignment files.
 439
 440         * an optimization to speed up pvalue-qvalue statistics
 441
 442         Previously, it took a hour to prepare p-q-table for 65M vs 65M
 443         human TF library, and now it will take 10 minutes. It was due to a
 444         single line of code to get a value from a numpy array ...
 445
 446         * fixed logLR bugs.
 447
 448 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
 449         MACS version 2.0.10 20130731 (tag:alpha)
 450
 451         * callpeak --call-summits
 452
 453         Fix bugs causing callpeak --call-summits option generating extra
 454         number of peaks and inconsistent peak boundaries comparing to
 455         default option. Thank Ben Levinson!
 456
 457         * bdgcmp output
 458
 459         Fix bugs causing bdgcmp output logLR all in positive values. Now
 460         'depletion' can be correctly represented as negative values.
 461
 462         * bdgdiff
 463
 464         Fix the behavior of bdgdiff module. Now it can take four
 465         bedGraph files, then use logLR as cutoff to call differential
 466         regions. Check command line of bdgdiff for detail.
 467
 468 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
 469         MACS version 2.0.10 20130713 (tag:alpha)
 470
 471         * fix bugs while output broadPeak and gappedPeak.
 472
 473         Note. Those weak broad regions without any strong enrichment
 474         regions inside won't be saved in gappedPeak file.
 475
 476         * bdgcmp -T and -C are merged into -S and description is updated.
 477
 478         Now, you can use it to override SPMR values in your input for
 479         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
 480         statistics will cause weird results ( in most cases, lower
 481         significancy), and won't be consistent with MACS2 callpeak
 482         behavior. So if you have SPMR bedGraphs, input the smaller/larger
 483         sample size in MILLION according to 'callpeak --to-large' option.
 484
 485 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
 486         MACS version 2.0.10 20130710 (tag:alpha)
 487
 488         * fix BED style output format of callpeak module:
 489
 490         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
 491         the output. Old BED format file won't be saved.
 492
 493         2) with --broad: broadPeak (BED6+3) for broad region and
 494         gappedPeak (BED12+3) for chained enriched regions will be the
 495         output. Old BED format, narrowPeak format, summit file won't be
 496         saved.
 497
 498         * bdgcmp now can accept list of methods to calculate scores. So
 499         you can run it once to generate multiple types of scores. Thank
 500         Jon Urban for this suggestion!
 501
 502         * C codes are re-generated through Cython 0.19.1.
 503
 504 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
 505         MACS version 2.0.10 20130520 (tag:alpha)
 506
 507         * broad peak calling modules are modified in order to report all
 508         relexed regions even there is no strong enrichment inside.
 509
 510 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
 511         MACS version 2.0.10 20130501 (tag:alpha)
 512
 513         * Memory usage is decreased to about 1/4-1/5 of previous usage
 514         Now, the internal data structure and algorithm are both
 515         re-organized, so that intermediate data wouldn't be saved in
 516         memory. Intead they will be calculated on the fly. New MACS2 will
 517         spend longer time (1.5 to 2 times) however it will use less memory
 518         so can be more usable on small mem servers.
 519
 520         * --seed option is added to callpeak and randsample commands
 521         Thank Mathieu Gineste for this suggestion!
 522
 523 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
 524         MACS version 2.0.10 20130306 (tag:alpha)
 525
 526         * diffpeak module New module to detect differential binding sites
 527         with more statistics.
 528
 529         * Introduced --refine-peaks
 530         Calculates reads balancing to refine peak summits
 531
 532         * Ouput file names prefix
 533         Correct encodePeak to narrowPeak, broadPeak to bed12.
 534
 535 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
 536         MACS version 2.0.10 (tag:alpha not released)
 537
 538         * Introduced BAMPEParser
 539         Reads PE data directly, requires bedtools for now
 540
 541         * Introduced --call-summits
 542         Uses signal processing methods to call overlapping peaks
 543
 544         * Added --no-trackline
 545         By default, files have descriptive tracklines now
 546
 547         * new refinepeak command (experimental)
 548         This new function will use a similar method in SPP (wtd), to
 549         analyze raw tag distribution in peak region, then redefine the
 550         peak summit where plus and minus tags are evenly distributed
 551         around.
 552
 553         * Changes to output *
 554         cPeakDetect.pyx has full support for new print/write methods and
 555         --call-peaks, BAMPEParser, and use of paired-end data
 556
 557         * Parser optimization
 558
 559         cParser.pyx is rewritten to use io.BufferedReader to speed
 560         up. Speed is doubled.
 561
 562         Code is reorganized -- most of functions are inherited from
 563         GenericParser class.
 564
 565         * Use cross-correlation to calculate fragment size
 566
 567         First, all pairs will be used in prediction for fragment
 568         size. Previously, only no more than 1000 pairs are used. Second,
 569         cross-correlation is used to find the best phase difference
 570         between + and - tag pileups.
 571
 572         * Speed up p-value and q-value calculation
 573
 574         This part is ten times faster now. I am using a dictionary to
 575         cache p-value results from Poisson CDF function. A bit more memory
 576         will be used to increase speed. I hope this dictionary would not
 577         explode since the possible pairs of ChIP signal and control lambda
 578         are hugely redundant. Also, I rewrited part of q-value
 579         calculation.
 580
 581         * Speed up peak detection
 582
 583         This part is about hundred of times faster now.  Optimizations
 584         include using Numpy functions as much as possible, and making loop
 585         body as small as possible.
 586
 587         * Post-processing on differential calls
 588
 589         After macs2diff finds differential binding sites between two
 590         conditions, it will try to annotate the peak calls from one of two
 591         conditions, describe the changes ...
 592
 593         * Fragment size prediction in macs2diff
 594
 595         Now by default, macs2diff will try to use the average fragment
 596         size from both condition 1 and condition 2 for tag extension and
 597         peak calling. Previously, by default, it will use different sizes
 598         unless --nomodel is specified.
 599
 600         Technically, I separate model building processes out. So macs2diff
 601         will build fragment sizes for condition 1 and 2 in parallel (2
 602         processes maximum), then perform 4-way comparisons in parallel (4
 603         processes maximum).
 604
 605         * Diff score
 606
 607         Combine two p/qscore tracks together. At regions where condition 1
 608         is higher than condition 2, score would be positive, otherwise,
 609         negative.
 610
 611         * SAMParser and BAMParser
 612
 613         Bug fixed for paired-end sequencing data.
 614
 615         * BedGraph.pyx
 616
 617         Fixed a bug while calling peaks from BedGraph file. It previously
 618         mistakenly output same peaks multiple times at the end of
 619         chromosome.
 620
 621 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
 622         MACS version 2.0.9 (tag:alpha)
 623
 624         * Auto fixation on predicted d is turned off by default!
 625
 626         Previous --off-auto is now default. MACS will not automatically
 627         fix d less than 2 times of tag size according to
 628         --shiftsize. While tag size is getting longer nowadays, it would
 629         be easier to have d less than 2 times of tag size, however d may
 630         still be meaningful and useful. Please judge it using your own
 631         wisdom.
 632
 633         * Scaling issue
 634
 635         Now, the default scaling while treatment and input are unbalanced
 636         has been adjusted. By default, larger sample will be scaled down
 637         linearly to match the smaller sample. In this way, background
 638         noise will be reduced more than real signals, so we expect to have
 639         more specific results than the other way around (i.e. --to-large
 640         is set).
 641
 642         Also, an alternative option to randomly sample larger data
 643         (--down-sample) is provided to replace default linear
 644         scaling. However, this option will cause results irresproducible,
 645         so be careful.
 646
 647         * randsample script
 648
 649         A new script 'randsample'  is added, which can randomly sample
 650         certain percentage or number of tags.
 651
 652         * Peak summit
 653
 654         Now, MACS will decide peak summits according to pileup height
 655         instead of qvalue scores. In this way, the summit may be more
 656         accurate.
 657
 658         * Diff score
 659
 660         MACS calculate qvalue scores as differential scores. When compare
 661         two conditions (saying A and B), the maximum qscore for comparing
 662         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
 663         will be computed. If maxqscore_a2b is bigger, the diff score is
 664         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
 665
 666 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
 667         MACS version 2.0.8 (tag:alpha)
 668
 669         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
 670
 671         New script bdgbroadcall and the extra option '--broad' for macs2
 672         script, can be used to call broad regions with a loose cutoff to
 673         link nearby significant regions. The output is represented as
 674         BED12 format.
 675
 676         * MACS2/IO/cScoreTrack.pyx
 677
 678         Fix q-value calculation to generate forcefully monotonic values.
 679
 680         * bin/eland*2bed, bin/sam2bed and bin/filterdup
 681
 682         They are combined to one more powerful script called
 683         "filterdup". The script filterdup can filter duplicated reads
 684         according to sequencing depth and genome size. The script can also
 685         convert any format supported by MACS to BED format.
 686
 687 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 688         MACS version 2.0.7 (tag:alpha)
 689
 690         * bin/macsdiff renamed to bin/bdgdiff
 691
 692         Now this script will work as a low-level finetuning tool as bdgcmp
 693         and bdgpeakcall.
 694
 695         * bin/macs2diff
 696
 697         A new script to take treatment and control files from two
 698         condition, calculate fragment size, use local poisson to get
 699         pvalues and BH process to get qvalues, then combine 4-ways result
 700         to call differential sites.
 701
 702         This script can use upto 4 cpus to speed up 4-ways calculation. (
 703         I am trying multiprocessing in python. )
 704
 705         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
 706         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
 707         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
 708
 709         All above files are modified for the new macs2diff script.
 710
 711         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
 712
 713         Now q-value 0.01 is the default cutoff. If -p is specified,
 714         p-value cutoff will be used instead.
 715
 716 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
 717         MACS version 2.0.6 (tag:alpha)
 718
 719         * bin/macsdiff
 720
 721         A script to call differential regions. A naive way is introduced
 722         to find the regions where:
 723
 724         1. signal from condition 1 is larger than input 1 and condition 2 --
 725         unique region in condition 1;
 726         2. signal from condition 2 is larger than input 2 and condition 1
 727         -- unique region in condition 2;
 728         3. signal from condition 1 is larger than input 1, signal from
 729         condition 2 is larger than input 2, however either signal from
 730         condition 1 or 2 is not larger than the other.
 731
 732         Here 'larger' means the pvalue or qvalue from a Poisson test is
 733         under certain cutoff.
 734
 735         (I will make another script to wrap up mulitple scripts for
 736         differential calling)
 737
 738 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
 739         MACS version 2.0.5 (tag:alpha)
 740
 741         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
 742         MACS2/IO/cPeakIO.pyx
 743
 744         Use hash to store peak information. Add back the feature to deal
 745         with data without control.
 746
 747         Fix bug which incorrectly allows small peaks at the end of
 748         chromosomes.
 749
 750         * bin/bdgpeakcall, bin/bdgcmp
 751
 752         Fix bugs. bdgpeakcall can output encodePeak format.
 753
 754 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
 755         MACS version 2.0.4 (tag:alpha)
 756
 757         * cPeakDetect.py
 758
 759         Fix a bug, correctly assign lambda_bg while --to-small is
 760         set. Thanks Junya Seo!
 761
 762         Add rank and num of bp columns to pvalue-qvalue table.
 763
 764         * cScoreTrack.py
 765
 766         Fix bugs to correctly deal with peakless chromosomes. Thanks
 767         Vaibhav Jain!
 768
 769         Use AFDR for independent tests instead.
 770
 771         * encodePeak
 772
 773         Now MACS can output peak coordinates together with pvalue, qvalue,
 774         summit positions in a single encodePeak format (designed for
 775         ENCODE project) file. This file can be loaded to UCSC
 776         browser. Definition of some specific columns are: 5th:
 777         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
 778         -log10qvalue, 10th: relative summit position to peak start.
 779
 780
 781 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 782         MACS version 2.0.3 (tag:alpha)
 783
 784         * Rich output with qvalue, fold enrichment, and pileup height
 785
 786         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
 787         procedure:
 788
 789         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
 790
 791         Now we have a similiar xls output file as before. The differences
 792         from previous file are:
 793
 794         1. Summit now is absolute summit, instead of relative summit
 795            position;
 796         2. 'Pileup' is previous 'tag' column. It's the extended fragment
 797            pileup at the peak summit;
 798         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
 799            5.00 means 1e-5, simple and less confusing.
 800         4. FDR column becomes '-log10(qvalue)' column.
 801         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
 802            the values at the peak summit.
 803
 804         * Extra output files
 805
 806         NAME_pqtable.txt contains pvalue and qvalue relationships.
 807
 808         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
 809         and -log10qvalue scores in BedGraph format. Nearby regions with
 810         the same value are not merged.
 811
 812         * Separation of FeatIO.py
 813
 814         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
 815         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
 816         implemented to store pileup, local lambda, pvalue, and qvalue
 817         alltogether in cScoreTrack.pyx.
 818
 819         * Experimental option --half-ext
 820
 821         Suggested by NPS algorithm, I added an experimental option
 822         --half-ext to let MACS only extends ChIP fragment around its
 823         middle point for only 1/2 d.
 824
 825 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
 826         MACS version 2.0.2 (tag:alpha)
 827
 828         * macs2
 829
 830         Add an error check to see if there is no common chromosome names
 831         from treatment file and control file
 832
 833         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
 834
 835         Reduce memory usage by removing deepcopy() calls.
 836
 837         * Modify README documents and others.
 838
 839 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
 840         MACS Version 2.0.1 (tag:alpha)
 841
 842         * cPileup.pyx, cPeakDetect.pyx and peak calling process
 843
 844         Jie suggested me a brilliant simple method to pileup fragments
 845         into bedGraph track. It works extremely faster than the previous
 846         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
 847         large local lambda calculation in MACSv2 now. Now I generate three
 848         bedGraphs for d-size local bias, slocal-size and llocal-size local
 849         bias, and calculate the maximum local bias as local lambda
 850         bedGraph track.
 851
 852         Minor: add_loc in bedGraphTrackI now can correctly merge the
 853         region with its preceding region if their value are the same.
 854
 855         * macs2
 856
 857         Add an option to shift control tags before extension. By default,
 858         control tags will be extended to both sides regardless of strand
 859         information.
 860
 861 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
 862         MACS Version 2.0.0 (tag:alpha)
 863
 864         * Use bedGraph type to store data internally and externally.
 865
 866         We can have theoretically one-basepair resolution profiles. 10
 867         times smaller in filesize and even smaller after converting to
 868         bigWig for visualization.
 869
 870         * Peak calling process modified. Better peak boundary detection.
 871
 872         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
 873         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
 874         one will be averaged to d size) Then calculate the maximum value
 875         of these two tracks and a global background, to have a
 876         local-lambda bedGraph.
 877
 878         Use -10log10poisson_pvalue as scores to generate a score track
 879         before peak calling.
 880
 881         A general peak calling based on a score cutoff, min length of peak
 882         and max gap between nearby peaks.
 883
 884         * Option changes.
 885
 886         Wiggle file output is removed. Now we only support bedGraph
 887         output. The generation of bedGraph is highly recommended since it
 888         will not cost extra time. In other words, bedGraph generation is
 889         internally run even you don't want to save bedGraphs on disk, due
 890         to the peak calling algorithm in MACS v2.
 891
 892         * cProb.pyx
 893
 894         We now can calculate poisson pvalue in log space so that the score
 895         (-10*log10pvalue) will not have a upper limit of 3100 due to
 896         precision of float number.
 897
 898         * Cython is adopted to speed up Python code.
 899
 900 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
 901         Small fixes
 902
 903         * Replaced with a newest WigTrackI class and fixed the wignorm script.
 904
 905 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 906         Version 1.4.0rc2 (Valentine)
 907
 908         * --single-wig option is renamed to --single-profile
 909
 910         * BedGraph output with --bdg or -B option.
 911
 912         The BedGraph output provides 1bp resolution fragment pileup
 913         profile. File size is smaller than wig file. This option can be
 914         combined with --single-profile option to produce a bedgraph file
 915         for the whole genome. This option can also make --space,
 916         --call-subpeaks invalid.
 917
 918         * Fix the description of --shiftsize to correctly state that the
 919         value is 1/2 d (fragment size).
 920
 921         * Fix a bug in the call to __filter_w_control_tags when control is
 922         not available.
 923
 924         * Fix a bug on --to-small option. Now it works as expected.
 925
 926         * Fix a bug while counting the tags in candidate peak region, an
 927         extra tag may be included. (Thanks to Jake Biesinger!)
 928
 929         * Fix the bug for the peaks extended outside of chromosome
 930         start. If the minus strand tag goes outside of chromosome start
 931         after extension of d, it will be thrown out.
 932
 933         * Post-process script for a combined wig file:
 934
 935         The "wignorm" command can be called after a full run of MACS14 as
 936         a postprocess. wignorm can calculate the local background from the
 937         control wig file from MACS14, then use either foldchange,
 938         -10*log10(pvalue) from possion test, or difference after asinh
 939         transformation as the score to build a single wig track to
 940         represent the binding strength. This script will take a
 941         significant long time to process.
 942
 943         * --wigextend has been obsoleted.
 944
 945 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
 946         Version 1.4.0rc1 (Starry Sky)
 947
 948         * Duplicate reads option
 949
 950         --keep-dup behavior is changed. Now user can specify how many
 951         reads he/she wants to keep at the same genomic location. 'auto' to
 952         let MACS decide the number based on binomial distribution, 'all'
 953         to let MACS keep all reads.
 954
 955         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
 956
 957         By default, MACS will now scale the smaller dataset to the bigger
 958         dataset. For instance, if IP has 10 million reads, and Input has 5
 959         million, MACS will double the lambda value calculated from Input
 960         reads while calling BOTH the positive peaks and negative
 961         peaks. This will address the issue caused by unbalanced numbers of
 962         reads from IP and Input. If --to-small is turned on, MACS will
 963         scale the larger dataset to the smaller one. So from now on, if d
 964         is fixed, then the peaks from a MACS call for A vs B should be
 965         identical to the negative peaks from a B vs A.
 966
 967 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
 968         Version 1.4.0beta (summer wishes)
 969
 970         * New features
 971
 972         ** Model building
 973
 974         The default behavior in the model building step is slightly
 975         changed. When MACS can't find enough pairs to build model
 976         (implemented in alpha version) or the modeled fragment length is
 977         less than 2 times of tag length (implemented in beta version),
 978         MACS will use 2 times of --shiftsize value as fragment length in
 979         the later analysis. --off-auto can turn off this default behavior.
 980
 981         ** Redundant tag filtering
 982
 983         The IO module is rewritten. The redundant tag filtering process
 984         becomes simpler and works as promise. The maximum allowed number
 985         of tags at the exact same location is calculated from the
 986         sequencing depth and genome size using a binomial distribution,
 987         for both TREAMENT and CONTROL separately. ( previously only
 988         TREATMENT is considered ) The exact same location means the same
 989         coordination and the same strand. Then MACS will only keep at most
 990         this number of tags at the exact same location in the following
 991         analysis. An option --keep-dup can let MACS skip the filtering and
 992         keep all the tags. However this may bring in a lot of sequencing
 993         bias, so you may get many false positive peaks.
 994
 995         ** Single wiggle mode
 996
 997         First thing to mention, this is not the score track that I
 998         described before. By default, MACS generates wiggle files for
 999         fragment pileup for every chromosomes separately. When you use
1000         --single-wig option, MACS will generate a single wiggle file for
1001         all the chromosomes so you will get a wig.gz for TREATMENT and
1002         another wig.gz for CONTROL if available.
1003
1004         ** Sniff -- automatic format detection
1005
1006         Now, by default or "-f AUTO", MACS will decide the input file
1007         format automatically. Technically, it will try to read at most
1008         1000 records for the first 10 non-comment lines. If it succeeds,
1009         the format is decided. I recommend not to use AUTO and specify the
1010         right format for your input files, unless you combine different
1011         formats in a single MACS run.
1012
1013         * Options changes
1014
1015         --single-wig and --keep-dup are added. Check previous section in
1016         ChangeLog for detail.
1017
1018         -f (--format) AUTO is now the default option.
1019
1020         --slocal default: 1000
1021         --llocal default: 10000
1022
1023         * Bug fixed
1024
1025         Setup script will stop the installation if python version is not
1026         python2.6 or python2.7.
1027
1028         Local lambda calculation has been changed back. MACS will check
1029         peak_region, slocal( default 1K) and llocal (default 10K) for the
1030         local bias. The previous 200bps default will cause MACS misses
1031         some peaks where the input bias is very sharp.
1032
1033         sam2bed.py script is corrected.
1034
1035         Relative pos in xls output is fixed.
1036
1037         Parser for ELAND_export is fixed to pass some of the no match
1038         lines. And elandexport2bed.py is fixed too. ( however I can't
1039         guarantee that it works on any eland_export files. )
1040
1041 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1042         Version 1.4.0alpha2 (be smarter)
1043
1044         * Options changes
1045
1046         --gsize now provides shortcuts for common genomes, including
1047         human, mouse, C. elegans and fruitfly.
1048
1049         --llocal now will be 5000 bps if there is no input file, so that
1050         local lambda doesn't overkill enriched binding sites.
1051
1052 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1053         Version 1.4alpha (be smarter)
1054
1055         * Options changes
1056
1057         --tsize option is redesigned. MACS will use the first 10 lines of
1058         the input to decide the tag size. If user specifies --tsize, it
1059         will override the auto decided tsize.
1060
1061         --lambdaset is replaced by --slocal and --llocal which mean the
1062         small local region and large local region.
1063
1064         --bw has no effect on the scan-window size now. It only affects the
1065         paired-peaks model process.
1066
1067         * Model building
1068
1069         During the model building, MACS will pick out the enriched regions
1070         which are not too high and not too low to build the paired-peak
1071         model. Default the region is from fold 10 to fold 30. If MACS
1072         fails to build the model, by default it will use the nomodel
1073         settings, like shiftsize=100bps, to shift and extend each
1074         tags. This behavior can be turned off by '--off-auto'.
1075
1076         * Output files
1077
1078         An extra file including all the summit positions are saved in
1079         *_summits.bed file. An option '--call-subpeaks' will invoke
1080         PeakSplitter developed by Mali Salmon to split wide peaks into
1081         smaller subpeaks.
1082
1083         * Sniff ( will in beta )
1084
1085         Automatically recognize the input file format, so use can combine
1086         different format in one MACS run.
1087
1088         Not implemented features/TODO:
1089
1090         * Algorithms ( in near future? )
1091
1092         MACS will try to refine the peak boundaries by calculating the
1093         scores for every point in the candidate peak regions. The score
1094         will be the -10*log(10,pvalue) on a local poisson distribution. A
1095         cutoff specified by users (--pvalue) will be applied to find the
1096         precise sub-peaks in the original candidate peak region. Peak
1097         boudaries and peak summits positions will be saved in separate BED
1098         files.
1099
1100         * Single wiggle track ( in near future? )
1101
1102         A single wiggle track will be generated to save the scores within
1103         candidate peak regions in the 10bps resolution. The wiggle file
1104         is in fixedStep format.
1105
1106
1107 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
1108         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
1109
1110         * bin/Constants.py
1111
1112         Fixed typo. FCSTEP -> FESTEP
1113
1114         * lib/PeakDetect.py
1115
1116         The 'femax' attribute bug is fixed
1117
1118 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1119         Version 1.3.7 (Oktoberfest)
1120
1121         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
1122
1123         Enhancements by Peter Chines:
1124
1125         1. gzip files are supported.
1126         2. when --diag is on, user can set the increment and endpoint for
1127         fold enrichment analysis by setting --fe-step and --fe-max.
1128
1129         Enhancements by Davide Cittaro:
1130
1131         1. BAM and SAM formats are supported.
1132         2. small changes in the header lines of wiggle output.
1133
1134         Enhancements by Me:
1135         1. I added --fe-min option;
1136         2. Bowtie ascii output with suffix ".map" is supported.
1137
1138         Bug fixed:
1139
1140         1. --nolambda bug is fixed. ( reported by Martin in JHU )
1141         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
1142         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
1143         4. Some "fold change" have been changed to "fold enrichment".
1144
1145 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1146         Version 1.3.6.1 (default parameter change)
1147
1148         * bin/macs, lib/PeakDetect.py
1149
1150         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
1151         default. "--futurefdr" is added which can turn on the 'new' method
1152         introduced in 1.3.6. By default it's off.
1153
1154         * lib/PeakDetect.py
1155
1156         Fixed a bug. p-value is corrected a little bit.
1157
1158
1159 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
1160         Version 1.3.6 (Birthday cake)
1161
1162         * bin/macs
1163
1164         "track name" is added to the header of BED output file.
1165
1166         Now the default peak detection method is to consider 5k and 10k
1167         nearby regions in treatment data and peak location, 1k, 5k, and
1168         10k regions in control data to calculate local bias. The old
1169         method can be called through '--old' option.
1170
1171         Information about how many total/unique tags in treatment or
1172         control will be saved in final .xls output.
1173
1174         * lib/IO/__init__.py
1175
1176         ".fa" will be removed from input tag alignment so only the
1177         chromosome names are kept.
1178
1179         WigTrackI class is added for Wiggle like data structure. (not used
1180         now)
1181
1182         The parser for ELAND multi PET files has been fixed. Now the 5'
1183         tag position for a pair will be kept, whereas in the previous
1184         version, the middle points are kept.
1185
1186         * lib/IO/BinKeeper.py
1187
1188         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
1189         browser, which can quickly access certain region for values in a
1190         large wiggle like data file. (not used now)
1191
1192         * lib/OptValidator.py
1193
1194         typo fixed.
1195
1196         * lib/PeakDetect.py
1197
1198         Now the default peak detection method is to consider 5k and 10k
1199         nearby regions in treatment data and peak location, 1k, 5k, and
1200         10k regions in control data to calculate local bias. The old
1201         method can be called through '--old' option.
1202
1203         Two columns have beed added to BED output file. 4th column: peak
1204         name; 5th column: peak score using -10log(10,pvalue) as score.
1205
1206         * setup.py
1207
1208         Add support to build a Mac App through 'setup.py py2app', or a
1209         Windows executable through 'setup.py py2exe'. You need to install
1210         py2app or py2exe package in order to use these functions.
1211
1212 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1213         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
1214
1215         * PeakDetect.py
1216
1217         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
1218         in control data to calculate local lambda for each peak. Peak
1219         calling results will be slightly different with previous version,
1220         beware!
1221
1222         * OptValidator.py
1223
1224         Typo fixed, ELANDParser -> ELANDResultParser
1225
1226         * OutputWriter.py
1227
1228         Now, modeled d value will be shown on the model figure.
1229
1230 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
1231         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
1232
1233         * macs, IO/__init__.py, PeakDetect.py
1234
1235         Add support for ELAND multi format. Add support for Pair-End
1236         experiment, in this case, 5'end and 3'end ELAND multi format files
1237         are required for treatment or control data. See 00README file for
1238         detail.
1239
1240         Add wigextend option.
1241
1242         Add petdist option for Pair-End Tag experiment, which is the best
1243         distance between 5' and 3' tags.
1244
1245         * PeakDetect.py
1246
1247         Fixed a bug which cause the end positions of every peak region
1248         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
1249
1250         * OutputWriter.py
1251
1252         Fix bugs while generating wiggle files. The start position of
1253         wiggle file is set to 1 instead of 0.
1254
1255         Fix a bug that every 10M bps, signals in the first 'd' range are
1256         lower than actual. ( Thanks Mali Salmon!)
1257
1258
1259 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
1260         Version 1.3.3 (wiggle bugs fixed)
1261
1262         * OutputWriter.py
1263
1264         Fix bugs while generating wiggle files. 1. 'span=' is added to
1265         'variableStep' line; 2. previously, every 10M bps, the coordinates
1266         were wrongly shifted to the right for 'd' basepairs.
1267
1268         * macs, PeakDetect.py
1269
1270         Add an option to save wiggle files on different resolution.
1271
1272 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
1273         Version 1.3.2 (tiny bugs fixed)
1274
1275         * IO/__init__.py
1276
1277         Fix 65536 -> 65535. ( Thank Joon)
1278
1279         * Prob.py
1280
1281         Improved for binomial function with extra large number. Imported
1282         from Cistrome project.
1283
1284         * PeakDetect.py
1285
1286         If treatment channel misses reads in some chromosome included in
1287         control channel, or vice versa, MACS will not exit. (Thank Shaun
1288         Mahony)
1289
1290         Instead, MACS will fake a tag at position -1 when calling
1291         treatment peaks vs control, but will ignore the chromosome while
1292         calling negative peaks.
1293
1294 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
1295         Version 1.3.1 (tiny bugs fixed version)
1296
1297         * Prob.py
1298
1299         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
1300         binomial functions can tolerate large and small numbers.
1301
1302         * IO/__init__.py
1303
1304         Parsers now split lines in BED/ELAND file using any
1305         whitespaces. 'track' or 'browser' lines will be regarded as
1306         comment lines. A bug fixed when throwing StrandFormatError. The
1307         maximum redundant tag number at a single position can be no less
1308         than 65536.
1309
1310
1311 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1312         Version 1.3 (naming clarification version)
1313
1314         * Naming clarification changes according to our manuscript:
1315
1316         'frag_len' is changed to 'd'.
1317
1318         'fold_change' is changed to 'fold_enrichment'.
1319
1320         Suggest '--bw' parameter to be determined by users from the real
1321         sonication size.
1322
1323         Maximum FDR is 100% in the output file.
1324
1325         And other clarifications in 00README file and the documents on the
1326         website.
1327
1328         * IO/__init__.py
1329         If the redundant tag number at a single position is over 32767,
1330         just remember 32767, instead of raising an overflow exception.
1331
1332         * setup.py
1333         fixed a typo.
1334
1335         * PeakDetect.py
1336         Bug fixed for diagnosis report.
1337
1338
1339 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
1340         Version 1.2.2gamma
1341
1342         * Serious bugs fix:
1343
1344         Poisson distribution CDF and inverse CDF functions are
1345         corrected. They can produce right results even for huge lambda
1346         now. So that the p-value and FDR values in the final excel sheet
1347         are corrected.
1348
1349         IO package now can tolerate some rare cases; ELANDParser in IO
1350         package is fixed. (Thank Bogdan)
1351
1352         * Improvement:
1353
1354         Reverse paired peaks in model are rejected. So there will be no
1355         negative 'frag_len'. (Thank Bogdan)
1356
1357         * Features added:
1358
1359         Diagnosis function is completed. Which can output a table file for
1360         users to estimate their sequencing depth.
1361
1362
1363 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
1364         Version 1.2
1365
1366         * Probe.py is added!
1367
1368         GSL is totally removed from MACS. Instead, I have implemented the
1369         CDF and inverse CDF for poisson and binomial distribution purely
1370         in python.
1371
1372         * Constants.py is added!
1373
1374         Organize constants used in MACS in the Constants.py file.
1375
1376         * All other files are modified!
1377
1378         Foldchange calculation is modified. Now the foldchange only be
1379         calculated at the peak summit position instead of the whole peak
1380         region. The values will be higher and more robust than before.
1381
1382         Features added:
1383
1384         1. MACS can save wiggle format files containing the tag number at
1385         every 10 bp along the genome. Tags are shifted according to our
1386         model before they are calculated.
1387
1388         2. Model building and local lambda calculation can be skipped with
1389         certain options.
1390
1391         3. A diagnosis report can be generated through '--diag'
1392         option. This report can help you get an assumption about the
1393         sequencing saturation. This funtion is only in beta stage.
1394
1395         4. FDR calculation speed is highly improved.
1396
1397 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1398         Version 1.1
1399
1400         * TabIO, PeakModel.py ...
1401         Bug fixed to let MACS tolerate some cases while there is no tag on
1402         either plus strand or minus strand.
1403
1404         * setup.py
1405         Check the version of python. If the version is lower than 2.4,
1406         refuse to install with warning.
1407
1408
1409 2013-07-31  Tao Liu  <vladimir.liu@gmail.com>
1410         MACS version 2.0.10 20130731 (tag:alpha)
1411
1412         * callpeak --call-summits
1413
1414         Fix bugs causing callpeak --call-summits option generating extra
1415         number of peaks and inconsistent peak boundaries comparing to
1416         default option. Thank Ben Levinson!
1417
1418         * bdgcmp output
1419
1420         Fix bugs causing bdgcmp output logLR all in positive values. Now
1421         'depletion' can be correctly represented as negative values.
1422
1423         * bdgdiff
1424
1425         Fix the behavior of bdgdiff module. Now it can take four
1426         bedGraph files, then use logLR as cutoff to call differential
1427         regions. Check command line of bdgdiff for detail.
1428
1429 2013-07-13  Tao Liu  <vladimir.liu@gmail.com>
1430         MACS version 2.0.10 20130713 (tag:alpha)
1431
1432         * fix bugs while output broadPeak and gappedPeak.
1433
1434         Note. Those weak broad regions without any strong enrichment
1435         regions inside won't be saved in gappedPeak file.
1436
1437         * bdgcmp -T and -C are merged into -S and description is updated.
1438
1439         Now, you can use it to override SPMR values in your input for
1440         bdgcmp. To use SPMR (from 'callpeak --SPMR -B') while calculating
1441         statistics will cause weird results ( in most cases, lower
1442         significancy), and won't be consistent with MACS2 callpeak
1443         behavior. So if you have SPMR bedGraphs, input the smaller/larger
1444         sample size in MILLION according to 'callpeak --to-large' option.
1445
1446 2013-07-10  Tao Liu  <vladimir.liu@gmail.com>
1447         MACS version 2.0.10 20130710 (tag:alpha)
1448
1449         * fix BED style output format of callpeak module:
1450
1451         1) without --broad: narrowPeak (BED6+4) and BED for summit will be
1452         the output. Old BED format file won't be saved.
1453
1454         2) with --broad: broadPeak (BED6+3) for broad region and
1455         gappedPeak (BED12+3) for chained enriched regions will be the
1456         output. Old BED format, narrowPeak format, summit file won't be
1457         saved.
1458
1459         * bdgcmp now can accept list of methods to calculate scores. So
1460         you can run it once to generate multiple types of scores. Thank
1461         Jon Urban for this suggestion!
1462
1463         * C codes are re-generated through Cython 0.19.1.
1464
1465 2013-05-21  Tao Liu  <vladimir.liu@gmail.com>
1466         MACS version 2.0.10 20130520 (tag:alpha)
1467
1468         * broad peak calling modules are modified in order to report all
1469         relexed regions even there is no strong enrichment inside.
1470
1471 2013-05-01  Tao Liu  <vladimir.liu@gmail.com>
1472         MACS version 2.0.10 20130501 (tag:alpha)
1473
1474         * Memory usage is decreased to about 1/4-1/5 of previous usage
1475         Now, the internal data structure and algorithm are both
1476         re-organized, so that intermediate data wouldn't be saved in
1477         memory. Intead they will be calculated on the fly. New MACS2 will
1478         spend longer time (1.5 to 2 times) however it will use less memory
1479         so can be more usable on small mem servers.
1480
1481         * --seed option is added to callpeak and randsample commands
1482         Thank Mathieu Gineste for this suggestion!
1483
1484 2013-03-05  Tao Liu  <vladimir.liu@gmail.com>
1485         MACS version 2.0.10 20130306 (tag:alpha)
1486
1487         * diffpeak module New module to detect differential binding sites
1488         with more statistics.
1489
1490         * Introduced --refine-peaks
1491         Calculates reads balancing to refine peak summits
1492
1493         * Ouput file names prefix
1494         Correct encodePeak to narrowPeak, broadPeak to bed12.
1495
1496 2012-09-13 Benjamin Schiller <benjamin.schiller@ucsf.edu>,  Tao Liu  <taoliu@jimmy.harvard.edu>
1497         MACS version 2.0.10 (tag:alpha not released)
1498
1499         * Introduced BAMPEParser
1500         Reads PE data directly, requires bedtools for now
1501
1502         * Introduced --call-summits
1503         Uses signal processing methods to call overlapping peaks
1504
1505         * Added --no-trackline
1506         By default, files have descriptive tracklines now
1507
1508         * new refinepeak command (experimental)
1509         This new function will use a similar method in SPP (wtd), to
1510         analyze raw tag distribution in peak region, then redefine the
1511         peak summit where plus and minus tags are evenly distributed
1512         around.
1513
1514         * Changes to output *
1515         cPeakDetect.pyx has full support for new print/write methods and
1516         --call-peaks, BAMPEParser, and use of paired-end data
1517
1518         * Parser optimization
1519
1520         cParser.pyx is rewritten to use io.BufferedReader to speed
1521         up. Speed is doubled.
1522
1523         Code is reorganized -- most of functions are inherited from
1524         GenericParser class.
1525
1526         * Use cross-correlation to calculate fragment size
1527
1528         First, all pairs will be used in prediction for fragment
1529         size. Previously, only no more than 1000 pairs are used. Second,
1530         cross-correlation is used to find the best phase difference
1531         between + and - tag pileups.
1532
1533         * Speed up p-value and q-value calculation
1534
1535         This part is ten times faster now. I am using a dictionary to
1536         cache p-value results from Poisson CDF function. A bit more memory
1537         will be used to increase speed. I hope this dictionary would not
1538         explode since the possible pairs of ChIP signal and control lambda
1539         are hugely redundant. Also, I rewrited part of q-value
1540         calculation.
1541
1542         * Speed up peak detection
1543
1544         This part is about hundred of times faster now.  Optimizations
1545         include using Numpy functions as much as possible, and making loop
1546         body as small as possible.
1547
1548         * Post-processing on differential calls
1549
1550         After macs2diff finds differential binding sites between two
1551         conditions, it will try to annotate the peak calls from one of two
1552         conditions, describe the changes ...
1553
1554         * Fragment size prediction in macs2diff
1555
1556         Now by default, macs2diff will try to use the average fragment
1557         size from both condition 1 and condition 2 for tag extension and
1558         peak calling. Previously, by default, it will use different sizes
1559         unless --nomodel is specified.
1560
1561         Technically, I separate model building processes out. So macs2diff
1562         will build fragment sizes for condition 1 and 2 in parallel (2
1563         processes maximum), then perform 4-way comparisons in parallel (4
1564         processes maximum).
1565
1566         * Diff score
1567
1568         Combine two p/qscore tracks together. At regions where condition 1
1569         is higher than condition 2, score would be positive, otherwise,
1570         negative.
1571
1572         * SAMParser and BAMParser
1573
1574         Bug fixed for paired-end sequencing data.
1575
1576         * BedGraph.pyx
1577
1578         Fixed a bug while calling peaks from BedGraph file. It previously
1579         mistakenly output same peaks multiple times at the end of
1580         chromosome.
1581
1582 2011-11-2  Tao Liu  <taoliu@jimmy.harvard.edu>
1583         MACS version 2.0.9 (tag:alpha)
1584
1585         * Auto fixation on predicted d is turned off by default!
1586
1587         Previous --off-auto is now default. MACS will not automatically
1588         fix d less than 2 times of tag size according to
1589         --shiftsize. While tag size is getting longer nowadays, it would
1590         be easier to have d less than 2 times of tag size, however d may
1591         still be meaningful and useful. Please judge it using your own
1592         wisdom.
1593
1594         * Scaling issue
1595
1596         Now, the default scaling while treatment and input are unbalanced
1597         has been adjusted. By default, larger sample will be scaled down
1598         linearly to match the smaller sample. In this way, background
1599         noise will be reduced more than real signals, so we expect to have
1600         more specific results than the other way around (i.e. --to-large
1601         is set).
1602
1603         Also, an alternative option to randomly sample larger data
1604         (--down-sample) is provided to replace default linear
1605         scaling. However, this option will cause results irresproducible,
1606         so be careful.
1607
1608         * randsample script
1609
1610         A new script 'randsample'  is added, which can randomly sample
1611         certain percentage or number of tags.
1612
1613         * Peak summit
1614
1615         Now, MACS will decide peak summits according to pileup height
1616         instead of qvalue scores. In this way, the summit may be more
1617         accurate.
1618
1619         * Diff score
1620
1621         MACS calculate qvalue scores as differential scores. When compare
1622         two conditions (saying A and B), the maximum qscore for comparing
1623         A to B -- maxqscore_a2b, and for comparing B to A --maxqscore_b2a
1624         will be computed. If maxqscore_a2b is bigger, the diff score is
1625         +maxqscore_a2b, otherwise, diff score is -1*maxqscore_b2a.
1626
1627 2011-09-15  Tao Liu  <taoliu@jimmy.harvard.edu>
1628         MACS version 2.0.8 (tag:alpha)
1629
1630         * bin/macs2, bin/bdgbroadcall, MACS2/IO/cScoreTrack.pyx, MACS2/IO/cBedGraph.pyx
1631
1632         New script bdgbroadcall and the extra option '--broad' for macs2
1633         script, can be used to call broad regions with a loose cutoff to
1634         link nearby significant regions. The output is represented as
1635         BED12 format.
1636
1637         * MACS2/IO/cScoreTrack.pyx
1638
1639         Fix q-value calculation to generate forcefully monotonic values.
1640
1641         * bin/eland*2bed, bin/sam2bed and bin/filterdup
1642
1643         They are combined to one more powerful script called
1644         "filterdup". The script filterdup can filter duplicated reads
1645         according to sequencing depth and genome size. The script can also
1646         convert any format supported by MACS to BED format.
1647
1648 2011-08-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1649         MACS version 2.0.7 (tag:alpha)
1650
1651         * bin/macsdiff renamed to bin/bdgdiff
1652
1653         Now this script will work as a low-level finetuning tool as bdgcmp
1654         and bdgpeakcall.
1655
1656         * bin/macs2diff
1657
1658         A new script to take treatment and control files from two
1659         condition, calculate fragment size, use local poisson to get
1660         pvalues and BH process to get qvalues, then combine 4-ways result
1661         to call differential sites.
1662
1663         This script can use upto 4 cpus to speed up 4-ways calculation. (
1664         I am trying multiprocessing in python. )
1665
1666         * MACS2/Constants.py, MACS2/IO/cBedGraph.pyx,
1667         MACS2/IO/cScoreTrack.pyx, MACS2/OptValidator.py,
1668         MACS2/PeakModel.py, MACS2/cPeakDetect.pyx
1669
1670         All above files are modified for the new macs2diff script.
1671
1672         * bin/macs2, bin/macs2diff, MACS2/OptValidator.py
1673
1674         Now q-value 0.01 is the default cutoff. If -p is specified,
1675         p-value cutoff will be used instead.
1676
1677 2011-07-25  Tao Liu  <vladimir.liu@gmail.com>
1678         MACS version 2.0.6 (tag:alpha)
1679
1680         * bin/macsdiff
1681
1682         A script to call differential regions. A naive way is introduced
1683         to find the regions where:
1684
1685         1. signal from condition 1 is larger than input 1 and condition 2 --
1686         unique region in condition 1;
1687         2. signal from condition 2 is larger than input 2 and condition 1
1688         -- unique region in condition 2;
1689         3. signal from condition 1 is larger than input 1, signal from
1690         condition 2 is larger than input 2, however either signal from
1691         condition 1 or 2 is not larger than the other.
1692
1693         Here 'larger' means the pvalue or qvalue from a Poisson test is
1694         under certain cutoff.
1695
1696         (I will make another script to wrap up mulitple scripts for
1697         differential calling)
1698
1699 2011-07-07  Tao Liu  <vladimir.liu@gmail.com>
1700         MACS version 2.0.5 (tag:alpha)
1701
1702         * bin/macs2, MACS2/cPeakDetect.py, MACS2/IO/cScoreTrack.pyx,
1703         MACS2/IO/cPeakIO.pyx
1704
1705         Use hash to store peak information. Add back the feature to deal
1706         with data without control.
1707
1708         Fix bug which incorrectly allows small peaks at the end of
1709         chromosomes.
1710
1711         * bin/bdgpeakcall, bin/bdgcmp
1712
1713         Fix bugs. bdgpeakcall can output encodePeak format.
1714
1715 2011-06-22  Tao Liu  <taoliu@jimmy.harvard.edu>
1716         MACS version 2.0.4 (tag:alpha)
1717
1718         * cPeakDetect.py
1719
1720         Fix a bug, correctly assign lambda_bg while --to-small is
1721         set. Thanks Junya Seo!
1722
1723         Add rank and num of bp columns to pvalue-qvalue table.
1724
1725         * cScoreTrack.py
1726
1727         Fix bugs to correctly deal with peakless chromosomes. Thanks
1728         Vaibhav Jain!
1729
1730         Use AFDR for independent tests instead.
1731
1732         * encodePeak
1733
1734         Now MACS can output peak coordinates together with pvalue, qvalue,
1735         summit positions in a single encodePeak format (designed for
1736         ENCODE project) file. This file can be loaded to UCSC
1737         browser. Definition of some specific columns are: 5th:
1738         int(-log10pvalue*10), 7th: fold-change, 8th: -log10pvalue, 9th:
1739         -log10qvalue, 10th: relative summit position to peak start.
1740
1741
1742 2011-06-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1743         MACS version 2.0.3 (tag:alpha)
1744
1745         * Rich output with qvalue, fold enrichment, and pileup height
1746
1747         Calculate q-values using a refined Benjamini–Hochberg–Yekutieli
1748         procedure:
1749
1750         http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
1751
1752         Now we have a similiar xls output file as before. The differences
1753         from previous file are:
1754
1755         1. Summit now is absolute summit, instead of relative summit
1756            position;
1757         2. 'Pileup' is previous 'tag' column. It's the extended fragment
1758            pileup at the peak summit;
1759         3. We now use '-log10(pvalue)' instead of '-10log10(pvalue)', so
1760            5.00 means 1e-5, simple and less confusing.
1761         4. FDR column becomes '-log10(qvalue)' column.
1762         5. The pileup, -log10pvalue, fold_enrichment and -log10qvalue are
1763            the values at the peak summit.
1764
1765         * Extra output files
1766
1767         NAME_pqtable.txt contains pvalue and qvalue relationships.
1768
1769         NAME_treat_pvalue.bdg and NAME_treat_qvalue.bdg store -log10pvalue
1770         and -log10qvalue scores in BedGraph format. Nearby regions with
1771         the same value are not merged.
1772
1773         * Separation of FeatIO.py
1774
1775         Its content has been divided into cPeakIO.pyx, cBedGraph.pyx, and
1776         cFixWidthTrack.pyx. A modified bedGraphTrackI class was
1777         implemented to store pileup, local lambda, pvalue, and qvalue
1778         alltogether in cScoreTrack.pyx.
1779
1780         * Experimental option --half-ext
1781
1782         Suggested by NPS algorithm, I added an experimental option
1783         --half-ext to let MACS only extends ChIP fragment around its
1784         middle point for only 1/2 d.
1785
1786 2011-06-12  Tao Liu  <taoliu@jimmy.harvard.edu>
1787         MACS version 2.0.2 (tag:alpha)
1788
1789         * macs2
1790
1791         Add an error check to see if there is no common chromosome names
1792         from treatment file and control file
1793
1794         * cPeakDetect.pyx, cFeatIO.pyx, cPileup.pyx
1795
1796         Reduce memory usage by removing deepcopy() calls.
1797
1798         * Modify README documents and others.
1799
1800 2011-05-19  Tao Liu  <taoliu@jimmy.harvard.edu>
1801         MACS Version 2.0.1 (tag:alpha)
1802
1803         * cPileup.pyx, cPeakDetect.pyx and peak calling process
1804
1805         Jie suggested me a brilliant simple method to pileup fragments
1806         into bedGraph track. It works extremely faster than the previous
1807         function, i.e, faster than MACS1.3 or MACS1.4. So I can include
1808         large local lambda calculation in MACSv2 now. Now I generate three
1809         bedGraphs for d-size local bias, slocal-size and llocal-size local
1810         bias, and calculate the maximum local bias as local lambda
1811         bedGraph track.
1812
1813         Minor: add_loc in bedGraphTrackI now can correctly merge the
1814         region with its preceding region if their value are the same.
1815
1816         * macs2
1817
1818         Add an option to shift control tags before extension. By default,
1819         control tags will be extended to both sides regardless of strand
1820         information.
1821
1822 2011-05-17  Tao Liu  <taoliu@jimmy.harvard.edu>
1823         MACS Version 2.0.0 (tag:alpha)
1824
1825         * Use bedGraph type to store data internally and externally.
1826
1827         We can have theoretically one-basepair resolution profiles. 10
1828         times smaller in filesize and even smaller after converting to
1829         bigWig for visualization.
1830
1831         * Peak calling process modified. Better peak boundary detection.
1832
1833         Extend ChIP tag to d, and pileup to have a ChIP bedGraph. Extend
1834         Control tag to d and 1,000bp, and pileup to two bedGraphs. (1000bp
1835         one will be averaged to d size) Then calculate the maximum value
1836         of these two tracks and a global background, to have a
1837         local-lambda bedGraph.
1838
1839         Use -10log10poisson_pvalue as scores to generate a score track
1840         before peak calling.
1841
1842         A general peak calling based on a score cutoff, min length of peak
1843         and max gap between nearby peaks.
1844
1845         * Option changes.
1846
1847         Wiggle file output is removed. Now we only support bedGraph
1848         output. The generation of bedGraph is highly recommended since it
1849         will not cost extra time. In other words, bedGraph generation is
1850         internally run even you don't want to save bedGraphs on disk, due
1851         to the peak calling algorithm in MACS v2.
1852
1853         * cProb.pyx
1854
1855         We now can calculate poisson pvalue in log space so that the score
1856         (-10*log10pvalue) will not have a upper limit of 3100 due to
1857         precision of float number.
1858
1859         * Cython is adopted to speed up Python code.
1860
1861 2011-02-28  Tao Liu  <taoliu@jimmy.harvard.edu>
1862         Small fixes
1863
1864         * Replaced with a newest WigTrackI class and fixed the wignorm script.
1865
1866 2011-02-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1867         Version 1.4.0rc2 (Valentine)
1868
1869         * --single-wig option is renamed to --single-profile
1870
1871         * BedGraph output with --bdg or -B option.
1872
1873         The BedGraph output provides 1bp resolution fragment pileup
1874         profile. File size is smaller than wig file. This option can be
1875         combined with --single-profile option to produce a bedgraph file
1876         for the whole genome. This option can also make --space,
1877         --call-subpeaks invalid.
1878
1879         * Fix the description of --shiftsize to correctly state that the
1880         value is 1/2 d (fragment size).
1881
1882         * Fix a bug in the call to __filter_w_control_tags when control is
1883         not available.
1884
1885         * Fix a bug on --to-small option. Now it works as expected.
1886
1887         * Fix a bug while counting the tags in candidate peak region, an
1888         extra tag may be included. (Thanks to Jake Biesinger!)
1889
1890         * Fix the bug for the peaks extended outside of chromosome
1891         start. If the minus strand tag goes outside of chromosome start
1892         after extension of d, it will be thrown out.
1893
1894         * Post-process script for a combined wig file:
1895
1896         The "wignorm" command can be called after a full run of MACS14 as
1897         a postprocess. wignorm can calculate the local background from the
1898         control wig file from MACS14, then use either foldchange,
1899         -10*log10(pvalue) from possion test, or difference after asinh
1900         transformation as the score to build a single wig track to
1901         represent the binding strength. This script will take a
1902         significant long time to process.
1903
1904         * --wigextend has been obsoleted.
1905
1906 2010-09-21  Tao Liu  <taoliu@jimmy.harvard.edu>
1907         Version 1.4.0rc1 (Starry Sky)
1908
1909         * Duplicate reads option
1910
1911         --keep-dup behavior is changed. Now user can specify how many
1912         reads he/she wants to keep at the same genomic location. 'auto' to
1913         let MACS decide the number based on binomial distribution, 'all'
1914         to let MACS keep all reads.
1915
1916         * pvalue and FDR fixes (Thanks to Prof. Zhiping Weng)
1917
1918         By default, MACS will now scale the smaller dataset to the bigger
1919         dataset. For instance, if IP has 10 million reads, and Input has 5
1920         million, MACS will double the lambda value calculated from Input
1921         reads while calling BOTH the positive peaks and negative
1922         peaks. This will address the issue caused by unbalanced numbers of
1923         reads from IP and Input. If --to-small is turned on, MACS will
1924         scale the larger dataset to the smaller one. So from now on, if d
1925         is fixed, then the peaks from a MACS call for A vs B should be
1926         identical to the negative peaks from a B vs A.
1927
1928 2010-09-01  Tao Liu  <taoliu@jimmy.harvard.edu>
1929         Version 1.4.0beta (summer wishes)
1930
1931         * New features
1932
1933         ** Model building
1934
1935         The default behavior in the model building step is slightly
1936         changed. When MACS can't find enough pairs to build model
1937         (implemented in alpha version) or the modeled fragment length is
1938         less than 2 times of tag length (implemented in beta version),
1939         MACS will use 2 times of --shiftsize value as fragment length in
1940         the later analysis. --off-auto can turn off this default behavior.
1941
1942         ** Redundant tag filtering
1943
1944         The IO module is rewritten. The redundant tag filtering process
1945         becomes simpler and works as promise. The maximum allowed number
1946         of tags at the exact same location is calculated from the
1947         sequencing depth and genome size using a binomial distribution,
1948         for both TREAMENT and CONTROL separately. ( previously only
1949         TREATMENT is considered ) The exact same location means the same
1950         coordination and the same strand. Then MACS will only keep at most
1951         this number of tags at the exact same location in the following
1952         analysis. An option --keep-dup can let MACS skip the filtering and
1953         keep all the tags. However this may bring in a lot of sequencing
1954         bias, so you may get many false positive peaks.
1955
1956         ** Single wiggle mode
1957
1958         First thing to mention, this is not the score track that I
1959         described before. By default, MACS generates wiggle files for
1960         fragment pileup for every chromosomes separately. When you use
1961         --single-wig option, MACS will generate a single wiggle file for
1962         all the chromosomes so you will get a wig.gz for TREATMENT and
1963         another wig.gz for CONTROL if available.
1964
1965         ** Sniff -- automatic format detection
1966
1967         Now, by default or "-f AUTO", MACS will decide the input file
1968         format automatically. Technically, it will try to read at most
1969         1000 records for the first 10 non-comment lines. If it succeeds,
1970         the format is decided. I recommend not to use AUTO and specify the
1971         right format for your input files, unless you combine different
1972         formats in a single MACS run.
1973
1974         * Options changes
1975
1976         --single-wig and --keep-dup are added. Check previous section in
1977         ChangeLog for detail.
1978
1979         -f (--format) AUTO is now the default option.
1980
1981         --slocal default: 1000
1982         --llocal default: 10000
1983
1984         * Bug fixed
1985
1986         Setup script will stop the installation if python version is not
1987         python2.6 or python2.7.
1988
1989         Local lambda calculation has been changed back. MACS will check
1990         peak_region, slocal( default 1K) and llocal (default 10K) for the
1991         local bias. The previous 200bps default will cause MACS misses
1992         some peaks where the input bias is very sharp.
1993
1994         sam2bed.py script is corrected.
1995
1996         Relative pos in xls output is fixed.
1997
1998         Parser for ELAND_export is fixed to pass some of the no match
1999         lines. And elandexport2bed.py is fixed too. ( however I can't
2000         guarantee that it works on any eland_export files. )
2001
2002 2010-06-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2003         Version 1.4.0alpha2 (be smarter)
2004
2005         * Options changes
2006
2007         --gsize now provides shortcuts for common genomes, including
2008         human, mouse, C. elegans and fruitfly.
2009
2010         --llocal now will be 5000 bps if there is no input file, so that
2011         local lambda doesn't overkill enriched binding sites.
2012
2013 2010-06-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2014         Version 1.4alpha (be smarter)
2015
2016         * Options changes
2017
2018         --tsize option is redesigned. MACS will use the first 10 lines of
2019         the input to decide the tag size. If user specifies --tsize, it
2020         will override the auto decided tsize.
2021
2022         --lambdaset is replaced by --slocal and --llocal which mean the
2023         small local region and large local region.
2024
2025         --bw has no effect on the scan-window size now. It only affects the
2026         paired-peaks model process.
2027
2028         * Model building
2029
2030         During the model building, MACS will pick out the enriched regions
2031         which are not too high and not too low to build the paired-peak
2032         model. Default the region is from fold 10 to fold 30. If MACS
2033         fails to build the model, by default it will use the nomodel
2034         settings, like shiftsize=100bps, to shift and extend each
2035         tags. This behavior can be turned off by '--off-auto'.
2036
2037         * Output files
2038
2039         An extra file including all the summit positions are saved in
2040         *_summits.bed file. An option '--call-subpeaks' will invoke
2041         PeakSplitter developed by Mali Salmon to split wide peaks into
2042         smaller subpeaks.
2043
2044         * Sniff ( will in beta )
2045
2046         Automatically recognize the input file format, so use can combine
2047         different format in one MACS run.
2048
2049         Not implemented features/TODO:
2050
2051         * Algorithms ( in near future? )
2052
2053         MACS will try to refine the peak boundaries by calculating the
2054         scores for every point in the candidate peak regions. The score
2055         will be the -10*log(10,pvalue) on a local poisson distribution. A
2056         cutoff specified by users (--pvalue) will be applied to find the
2057         precise sub-peaks in the original candidate peak region. Peak
2058         boudaries and peak summits positions will be saved in separate BED
2059         files.
2060
2061         * Single wiggle track ( in near future? )
2062
2063         A single wiggle track will be generated to save the scores within
2064         candidate peak regions in the 10bps resolution. The wiggle file
2065         is in fixedStep format.
2066
2067
2068 2009-10-16  Tao Liu  <taoliu@jimmy.harvard.edu>
2069         Version 1.3.7.1 (Oktoberfest, bug fixed #1)
2070
2071         * bin/Constants.py
2072
2073         Fixed typo. FCSTEP -> FESTEP
2074
2075         * lib/PeakDetect.py
2076
2077         The 'femax' attribute bug is fixed
2078
2079 2009-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2080         Version 1.3.7 (Oktoberfest)
2081
2082         * bin/macs, lib/PeakDetect.py, lib/IO/__init__.py, lib/OptValidator.py
2083
2084         Enhancements by Peter Chines:
2085
2086         1. gzip files are supported.
2087         2. when --diag is on, user can set the increment and endpoint for
2088         fold enrichment analysis by setting --fe-step and --fe-max.
2089
2090         Enhancements by Davide Cittaro:
2091
2092         1. BAM and SAM formats are supported.
2093         2. small changes in the header lines of wiggle output.
2094
2095         Enhancements by Me:
2096         1. I added --fe-min option;
2097         2. Bowtie ascii output with suffix ".map" is supported.
2098
2099         Bug fixed:
2100
2101         1. --nolambda bug is fixed. ( reported by Martin in JHU )
2102         2. --diag bug is fixed. ( reported by Bogdan Tanasa )
2103         3. Function to remove suffix '.fa' is fixed. ( reported by Jeff Johnston )
2104         4. Some "fold change" have been changed to "fold enrichment".
2105
2106 2009-06-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2107         Version 1.3.6.1 (default parameter change)
2108
2109         * bin/macs, lib/PeakDetect.py
2110
2111         "--oldfdr" is removed. The 'oldfdr' behaviour becomes
2112         default. "--futurefdr" is added which can turn on the 'new' method
2113         introduced in 1.3.6. By default it's off.
2114
2115         * lib/PeakDetect.py
2116
2117         Fixed a bug. p-value is corrected a little bit.
2118
2119
2120 2009-05-11  Tao Liu  <taoliu@jimmy.harvard.edu>
2121         Version 1.3.6 (Birthday cake)
2122
2123         * bin/macs
2124
2125         "track name" is added to the header of BED output file.
2126
2127         Now the default peak detection method is to consider 5k and 10k
2128         nearby regions in treatment data and peak location, 1k, 5k, and
2129         10k regions in control data to calculate local bias. The old
2130         method can be called through '--old' option.
2131
2132         Information about how many total/unique tags in treatment or
2133         control will be saved in final .xls output.
2134
2135         * lib/IO/__init__.py
2136
2137         ".fa" will be removed from input tag alignment so only the
2138         chromosome names are kept.
2139
2140         WigTrackI class is added for Wiggle like data structure. (not used
2141         now)
2142
2143         The parser for ELAND multi PET files has been fixed. Now the 5'
2144         tag position for a pair will be kept, whereas in the previous
2145         version, the middle points are kept.
2146
2147         * lib/IO/BinKeeper.py
2148
2149         BinKeeperI class is inspired by Jim Kent's library for UCSC genome
2150         browser, which can quickly access certain region for values in a
2151         large wiggle like data file. (not used now)
2152
2153         * lib/OptValidator.py
2154
2155         typo fixed.
2156
2157         * lib/PeakDetect.py
2158
2159         Now the default peak detection method is to consider 5k and 10k
2160         nearby regions in treatment data and peak location, 1k, 5k, and
2161         10k regions in control data to calculate local bias. The old
2162         method can be called through '--old' option.
2163
2164         Two columns have beed added to BED output file. 4th column: peak
2165         name; 5th column: peak score using -10log(10,pvalue) as score.
2166
2167         * setup.py
2168
2169         Add support to build a Mac App through 'setup.py py2app', or a
2170         Windows executable through 'setup.py py2exe'. You need to install
2171         py2app or py2exe package in order to use these functions.
2172
2173 2009-02-12  Tao Liu  <taoliu@jimmy.harvard.edu>
2174         Version 1.3.5 (local lambda fixed, typo fixed, model figure improved)
2175
2176         * PeakDetect.py
2177
2178         Now, besides 1k, 5k, 10k, MACS will also consider peak size region
2179         in control data to calculate local lambda for each peak. Peak
2180         calling results will be slightly different with previous version,
2181         beware!
2182
2183         * OptValidator.py
2184
2185         Typo fixed, ELANDParser -> ELANDResultParser
2186
2187         * OutputWriter.py
2188
2189         Now, modeled d value will be shown on the model figure.
2190
2191 2009-01-06  Tao Liu  <taoliu@jimmy.harvard.edu>
2192         Version 1.3.4 (Happy New Year Version, bug fixed, ELAND multi/PET support)
2193
2194         * macs, IO/__init__.py, PeakDetect.py
2195
2196         Add support for ELAND multi format. Add support for Pair-End
2197         experiment, in this case, 5'end and 3'end ELAND multi format files
2198         are required for treatment or control data. See 00README file for
2199         detail.
2200
2201         Add wigextend option.
2202
2203         Add petdist option for Pair-End Tag experiment, which is the best
2204         distance between 5' and 3' tags.
2205
2206         * PeakDetect.py
2207
2208         Fixed a bug which cause the end positions of every peak region
2209         incorrectly added by 1 bp. ( Thanks Mali Salmon!)
2210
2211         * OutputWriter.py
2212
2213         Fix bugs while generating wiggle files. The start position of
2214         wiggle file is set to 1 instead of 0.
2215
2216         Fix a bug that every 10M bps, signals in the first 'd' range are
2217         lower than actual. ( Thanks Mali Salmon!)
2218
2219
2220 2008-12-03  Tao Liu  <taoliu@jimmy.harvard.edu>
2221         Version 1.3.3 (wiggle bugs fixed)
2222
2223         * OutputWriter.py
2224
2225         Fix bugs while generating wiggle files. 1. 'span=' is added to
2226         'variableStep' line; 2. previously, every 10M bps, the coordinates
2227         were wrongly shifted to the right for 'd' basepairs.
2228
2229         * macs, PeakDetect.py
2230
2231         Add an option to save wiggle files on different resolution.
2232
2233 2008-10-02  Tao Liu  <taoliu@jimmy.harvard.edu>
2234         Version 1.3.2 (tiny bugs fixed)
2235
2236         * IO/__init__.py
2237
2238         Fix 65536 -> 65535. ( Thank Joon)
2239
2240         * Prob.py
2241
2242         Improved for binomial function with extra large number. Imported
2243         from Cistrome project.
2244
2245         * PeakDetect.py
2246
2247         If treatment channel misses reads in some chromosome included in
2248         control channel, or vice versa, MACS will not exit. (Thank Shaun
2249         Mahony)
2250
2251         Instead, MACS will fake a tag at position -1 when calling
2252         treatment peaks vs control, but will ignore the chromosome while
2253         calling negative peaks.
2254
2255 2008-09-04  Tao Liu  <taoliu@jimmy.harvard.edu>
2256         Version 1.3.1 (tiny bugs fixed version)
2257
2258         * Prob.py
2259
2260         Hyunjin Gene Shin contributed some codes to Prob.py. Now the
2261         binomial functions can tolerate large and small numbers.
2262
2263         * IO/__init__.py
2264
2265         Parsers now split lines in BED/ELAND file using any
2266         whitespaces. 'track' or 'browser' lines will be regarded as
2267         comment lines. A bug fixed when throwing StrandFormatError. The
2268         maximum redundant tag number at a single position can be no less
2269         than 65536.
2270
2271
2272 2008-07-15  Tao Liu  <taoliu@jimmy.harvard.edu>
2273         Version 1.3 (naming clarification version)
2274
2275         * Naming clarification changes according to our manuscript:
2276
2277         'frag_len' is changed to 'd'.
2278
2279         'fold_change' is changed to 'fold_enrichment'.
2280
2281         Suggest '--bw' parameter to be determined by users from the real
2282         sonication size.
2283
2284         Maximum FDR is 100% in the output file.
2285
2286         And other clarifications in 00README file and the documents on the
2287         website.
2288
2289         * IO/__init__.py
2290         If the redundant tag number at a single position is over 32767,
2291         just remember 32767, instead of raising an overflow exception.
2292
2293         * setup.py
2294         fixed a typo.
2295
2296         * PeakDetect.py
2297         Bug fixed for diagnosis report.
2298
2299
2300 2008-07-10  Tao Liu  <taoliu@jimmy.harvard.edu>
2301         Version 1.2.2gamma
2302
2303         * Serious bugs fix:
2304
2305         Poisson distribution CDF and inverse CDF functions are
2306         corrected. They can produce right results even for huge lambda
2307         now. So that the p-value and FDR values in the final excel sheet
2308         are corrected.
2309
2310         IO package now can tolerate some rare cases; ELANDParser in IO
2311         package is fixed. (Thank Bogdan)
2312
2313         * Improvement:
2314
2315         Reverse paired peaks in model are rejected. So there will be no
2316         negative 'frag_len'. (Thank Bogdan)
2317
2318         * Features added:
2319
2320         Diagnosis function is completed. Which can output a table file for
2321         users to estimate their sequencing depth.
2322
2323
2324 2008-06-30  Tao Liu  <taoliu@jimmy.harvard.edu>
2325         Version 1.2
2326
2327         * Probe.py is added!
2328
2329         GSL is totally removed from MACS. Instead, I have implemented the
2330         CDF and inverse CDF for poisson and binomial distribution purely
2331         in python.
2332
2333         * Constants.py is added!
2334
2335         Organize constants used in MACS in the Constants.py file.
2336
2337         * All other files are modified!
2338
2339         Foldchange calculation is modified. Now the foldchange only be
2340         calculated at the peak summit position instead of the whole peak
2341         region. The values will be higher and more robust than before.
2342
2343         Features added:
2344
2345         1. MACS can save wiggle format files containing the tag number at
2346         every 10 bp along the genome. Tags are shifted according to our
2347         model before they are calculated.
2348
2349         2. Model building and local lambda calculation can be skipped with
2350         certain options.
2351
2352         3. A diagnosis report can be generated through '--diag'
2353         option. This report can help you get an assumption about the
2354         sequencing saturation. This funtion is only in beta stage.
2355
2356         4. FDR calculation speed is highly improved.
2357
2358 2008-05-28  Tao Liu  <taoliu@jimmy.harvard.edu>
2359         Version 1.1
2360
2361         * TabIO, PeakModel.py ...
2362         Bug fixed to let MACS tolerate some cases while there is no tag on
2363         either plus strand or minus strand.
2364
2365         * setup.py
2366         Check the version of python. If the version is lower than 2.4,
2367         refuse to install with warning.
2368