samples/fdiff.f

   1 #! env urforth
   2 \ ANEW --DFC--   DECIMAL                        \  Wil Baden  2002-04-17
   3 \ adapted to UrForth by Ketmar Dark
   4 use-lib: tty
   5
   6 \  *******************************************************************
   7 \  *                                                                 *
   8 \  *  Wil Baden  2002-04-18                                          *
   9 \  *                                                                 *
  10 \  *          DFC-2002  Differential File Comparison                 *
  11 \  *                                                                 *
  12 \  *  DFC Show differences.                                          *
  13 \  *                                                                 *
  14 \  *  COLLATE Show matches and differences.                          *
  15 \  *                                                                 *
  16 \  *******************************************************************
  17
  18 \  A necessary tool for every programmer is a utility to compare
  19 \  files - particularly source files - to find where and how
  20 \  they are different.  A prudent programmer will make a copy of
  21 \  a file before modifying it. In the course of making a series
  22 \  of small modifications to fix a misbehaving application the
  23 \  programmer can easily lose track of just what has been done.
  24 \  Then the file comparison utility can be used to show the
  25 \  changes.
  26
  27 \  This utility can be used to show the differences between
  28 \  released versions as well.
  29
  30 \  To do the job right is not a trivial task. The obvious
  31 \  algorithm will sooner than later fail miserably.
  32
  33 \  The obvious algorithm is to compare lines until a difference
  34 \  is found, then search forward in both files to find where
  35 \  they are the same again.
  36
  37 \  The trick is not to look for differences but to look for the
  38 \  longest common subsequence - the longest set of lines which
  39 \  are the same in both files and in the same order with what's
  40 \  different interspersed. What's left are the differences.
  41
  42 \  How to do this is the subject of
  43
  44 \    Hunt, J. W. and M. D. McIlroy [1976],
  45 \
  46 \    "An algorithm for differential file comparison,"
  47 \
  48 \    Computing Science Technical Report 41, AT&T Bell
  49 \    Laboratories, Murray Hill, N.J.
  50
  51 \  It is based on
  52
  53 \    Hunt, J.W and T.G. Szymanski, [1977]
  54 \
  55 \    "A fast algorithm for computing longest common subsequences",
  56 \
  57 \    Comm. ACM , vol. 20 no. 5, pp. 350-353.
  58
  59 \  In 1976 I implemented this using my own code in Fortran II
  60 \  for a 8K 16-bit word IBM 1130. It has followed me ever
  61 \  since, becoming re-incarnated on each new platform in
  62 \  whatever the language of the moment was.
  63
  64 \  I even did this in C for Unix because my output format was
  65 \  more useful to me than that of the Unix tool `diff`.
  66
  67 \  Some years ago I did it for MacForth.  In the present
  68 \  incarnation it is Standard Forth.
  69
  70 \  Differential file comparison is the foundation of version
  71 \  control systems - SCCS, RCS, SCVS.
  72
  73 \  The algorithm is essentially brute force.  Read and save one
  74 \  file, then read records from the other file trying to find
  75 \  with each record a longer common subsequence than you already
  76 \  have.
  77
  78 \  Potentially this could require M * N line comparisons,  where
  79 \  M and N are the number of lines in each file. In real life
  80 \  that never happens. The time and memory constraints are still
  81 \  too extravagant. So a really slick trick is used. Instead of
  82 \  comparing whole lines, an integer hash value is computed for
  83 \  each line, and the associated hash values are compared.
  84 \  Making believe that every unique line has a unique hash
  85 \  value, we compute a longest common subsequence. Not until we
  86 \  print do we check whether equal hash values represent
  87 \  identical lines.
  88
  89 \  In 25 years of use this has hardly ever happened.  In the
  90 \  very few times it has, the effect has been negligible. (You
  91 \  can tell that it has happened when an insertion appears just
  92 \  before a deletion.)  I haven't seen it since 1988.
  93
  94 \  Of course you can force it to happen by using a poor hashing
  95 \  function. However the hashing function doesn't have to be
  96 \  sophisticated. The one used here has worked fine with 32-bit
  97 \  or 16-bit arithmetic.
  98
  99 \  Where I used to work, the Pascal incarnation was used 30 to
 100 \  200 times a day for ten years, using 16-bit arithmetic.  It
 101 \  was used even after the company went to Unix.
 102
 103 \         How to Use
 104
 105 \     old-file-id TO OLDFILE
 106 \     new-file-id TO NEWFILE
 107 \     DFC
 108
 109 \  or
 110
 111 \     old-file-id TO OLDFILE
 112 \     new-file-id TO NEWFILE
 113 \     COLLATE
 114
 115 \  NEWFILE and OLDFILE may be assigned in either order.
 116
 117 \  You should adapt the file-opening to your environment.
 118
 119 \  Here is an example from John Peters that works on two versions of
 120 \  WinView.
 121
 122 \     S" C:\WIN32FOR\WINVIEW.F" R/O OPEN-FILE DROP TO OLDFILE
 123 \     S" C:\WIN32FORCG\WINVIEW.F" R/O OPEN-FILE DROP TO NEWFILE
 124 \     DFC
 125
 126 \  The following compares an old source for `DFC` with a revision.
 127
 128 \     S" DFC.4TH" R/O OPEN-FILE DROP TO OLDFILE
 129 \     S" DFCNEW.4TH" R/O OPEN-FILE DROP TO NEWFILE
 130 \     DFC
 131
 132 \  The output was:
 133
 134 \      1 --- ( DFC - Differential File Comparison.  Wil Baden 1976-1996 )
 135 \    +++   1 ( DFC - Differential File Comparison Using HERE  Wil Baden )
 136 \      2   2
 137
 138 \     50  37
 139 \     51 --- 6000 CONSTANT   lcs-space ( The larger the better. )
 140 \     52 --- CREATE          LCS   lcs-space CELLS ALLOT
 141 \     53 ---
 142 \    +++  38 0 VALUE lcs-space       0 VALUE LCS
 143 \     54  39 0 VALUE oldlines        0 VALUE newlines
 144
 145 \  [Some lines omitted]
 146
 147 \    394 379                 ( Differential file comparison. )
 148 \    +++ 380     ALIGN HERE TO LCS
 149 \    +++ 381     UNUSED 1 CELLS - 1+ ALIGNED 1 CELLS / TO LCS-Space
 150 \    395 382     Read-Newerfile   Sort-Hash-Values   Mark-Hash-Classes
 151
 152 \    397 384     Build-Candidate-Table   Show-Differences
 153 \    +++ 385     oldlines newlines - 2 - LCS @ - . ." deletions, "
 154 \    +++ 386         newlines 1- LCS @ - . ." insertions, "
 155 \    +++ 387             LCS @ . ." unchanged " CR
 156 \    398 388     OLDFILE REWIND   NEWFILE REWIND
 157
 158 \  This shows that in the old file, DFC.4TH,
 159
 160 \   *   Line 1 has been replaced.
 161
 162 \   *   Lines 51 through 53 have been replaced by a single line.
 163
 164 \   *   A few new lines have been inserted after lines 394 and 397.
 165
 166 \  The numbers in the first column are the line numbers in the
 167 \  first file.
 168
 169 \  The numbers in the second column are the line numbers in the
 170 \  second file.
 171
 172 \  The code has been checked for 16-bit and 32-bit cell size.
 173
 174 \  DFC - Differential File Comparison.
 175 \     Make a line by line comparison of two files, showing where and
 176 \     how they are different.
 177
 178 \  Usage:
 179
 180 \     old-file-id TO OLDFILE
 181 \     new-file-id TO NEWFILE
 182 \     DFC
 183
 184 \  *******************************************************************
 185 \  *     Common Functions                                            *
 186 \  *******************************************************************
 187
 188 \  Comment out definitions that you already have.
 189 \ : BOUNDS  ( addr len -- addr+len addr )  over + SWAP ;
 190 : Is-White  ( char -- flag )  bl <= ;
 191 \ : NOT  ( x -- flag )  0= ;
 192 \ : OFF  ( addr -- )  FALSE SWAP ! ;
 193 \ : ON   ( addr -- )  TRUE SWAP ! ;
 194 : REWIND  ( fid -- )  0 0 ROT REPOSITION-FILE THROW ;
 195
 196
 197 \  *******************************************************************
 198 \  *      Application Values and Variables                           *
 199 \  *******************************************************************
 200
 201 \  OLDFILE                      ( -- file-id )
 202 \     Value for file-ID of "old" file.  To be set by user.
 203
 204 \  NEWFILE                      ( -- file-id )
 205 \     Value for file-ID of "newer" file.  To be set by user.
 206
 207 \  DFC-Maxline                  ( -- n )
 208 \     Value of maximum size of line for DFC comparisons.  Initially 255.
 209
 210 \  DFC-Right-Margin             ( -- n )
 211 \     Value of the right-hand margin for automatically wrapping
 212 \     output lines.  Set this to a convenient size for you.
 213
 214 \  DFC-Collate                  ( -- addr )
 215 \     Variable should be set ON to collate instead of showing
 216 \     differences.
 217
 218 \  SWAPFILES                    ( -- )
 219 \    Exchange  OLDFILE  and  NEWFILE  .
 220
 221 0 VALUE OLDFILE
 222 0 VALUE NEWFILE
 223
 224 120 VALUE DFC-Right-Margin
 225
 226 0 VAR DFC-Collate   \ DFC-Collate 0!
 227
 228 : SWAPFILES                       ( -- )
 229     OLDFILE NEWFILE to OLDFILE to NEWFILE ;
 230
 231 \  *******************************************************************
 232 \  *     Implementation                                              *
 233 \  *******************************************************************
 234
 235 VOCABULARY Differential-File-Comparison
 236
 237 stdout-fd tty:fd-tty? value istty?
 238
 239 : attr-norm    ( -- )  istty? if " \x1b[0m" forth:(stdtty-xtype) endif ;
 240 : attr-del     ( -- )  istty? if " \x1b[33m" forth:(stdtty-xtype) endif ;
 241 : attr-ins     ( -- )  istty? if " \x1b[32m" forth:(stdtty-xtype) endif ;
 242 : attr-report  ( -- )  istty? if " \x1b[37;1m" forth:(stdtty-xtype) endif ;
 243
 244 : INTERFACE                       ( -- )
 245    GET-ORDER
 246    >R  over SET-CURRENT  R>
 247    SET-ORDER ;
 248
 249 ALSO Differential-File-Comparison
 250
 251 DEFINITIONS
 252
 253 255 CONSTANT  DFC-Maxline
 254 DFC-Maxline 2 + aligned CELL+ CONSTANT  Textbuffer-Size
 255
 256 50000 CONSTANT  DFC-Space  \  Make as big as possible.
 257
 258 CREATE  &OLDTEXT          Textbuffer-Size ALLOT
 259 CREATE  &NEWTEXT          Textbuffer-Size ALLOT
 260 CREATE  &MATCHINGTEXT     Textbuffer-Size ALLOT
 261 CREATE  &Cleaned-Oldtext  Textbuffer-Size ALLOT
 262 CREATE  &Cleaned-Newtext  Textbuffer-Size ALLOT
 263
 264 \  LCS
 265 \     Cell for each record + 3 * matching-candidates.
 266 \     Thus 6000 cells takes care of files up to at least 1200
 267 \     lines.
 268 \     In `Find-Longest-Common-Subsequence`, pointer to
 269 \     candidate. In `Show-Differences`, number of matched
 270 \     lines.
 271
 272 \  Cell-Place                   ( c_addr len addr -- )
 273 \     Cell version of `PLACE`.
 274
 275 \  Cell-Count                   ( addr -- c_addr len )
 276 \     Cell version of `COUNT`.
 277
 278 CREATE  LCS  DFC-Space CELLS ALLOT
 279
 280 : Cell-Place                      ( c_addr len addr -- )
 281     2dup 2>R  CELL+  SWAP chars MOVE 2R> !
 282     ;
 283
 284 : Cell-Count                      ( addr -- c_addr len )
 285     dup CELL+ SWAP @ -TRAILING ;
 286
 287     0 VAR SKIPPING
 288
 289     : Clean-Line                    ( c_addr len -- c_addr len' )
 290         \  Remove fairy characters.
 291         SKIPPING 0!
 292         >R 0 over R>                ( c_addr len' c_addr len )
 293         chars BOUNDS ?DO            ( c_addr len')
 294             I C@ Is-White IF
 295                 SKIPPING 1!
 296             ELSE
 297                 SKIPPING @ IF
 298                     2dup chars + BL SWAP C!
 299                     1+
 300                     SKIPPING 0!
 301                 THEN
 302                 2dup chars +  I C@ SWAP C!
 303                 1+
 304             THEN
 305         LOOP ;
 306
 307     131 CONSTANT Hash-Factor
 308
 309 \  >HASH                            ( c_addr len -- hash-value )
 310 \    Compute hash value for a string.
 311
 312 : >HASH                             ( c_addr len -- hash-value )
 313     SKIPPING 0!
 314     0 ROT ROT chars BOUNDS ?DO      ( hash-value)
 315         I C@ Is-White IF
 316             SKIPPING 1!
 317         ELSE
 318             SKIPPING @ IF
 319                 Hash-Factor *  BL +
 320                 SKIPPING 0!
 321             THEN
 322             Hash-Factor *  I C@ +
 323         THEN
 324     LOOP ;
 325
 326 : Read-Text                       ( buffer fileid -- flag )
 327     >R  dup CELL+  DFC-Maxline  R> READ-LINE THROW
 328     SWAP ROT ! ;
 329
 330 \  NEWLINES                     ( -- n )
 331 \     1 + lines in newer file.
 332
 333 \  OLDLINES                     ( -- n )
 334 \     1 + lines in old file + 1 + lines in newer file.
 335
 336 \  CAND                         ( -- addr )
 337 \     Next candidate.
 338
 339 \  X
 340 \     Generally, working variable. In `Show-Differences`,
 341 \     old line number.
 342
 343 \  Y
 344 \     Generally, working variable. In `Show-Differences`,
 345 \     new line number.
 346
 347 \  SLOT                         ( n -- addr )
 348 \     Address of _n_th item in working memory from the bottom.
 349 \     Has the record number of a line.
 350
 351 \  SLOT-H                       ( n -- addr )
 352 \     Address of _n_th item in working memory from the top.
 353 \     Used for the hash value of a line. The memory for this is
 354 \     separate from the line numbers so it can be recovered
 355 \     after being sorted. The area will then be used for
 356 \     candidate identification.
 357
 358 0 VALUE NEWLINES     0 VALUE OLDLINES
 359 0 VAR X              0 VAR Y
 360 0 VAR CAND
 361
 362 : SLOT    CELLS LCS + ;
 363 : Slot-H  DFC-Space SWAP - CELLS LCS + ;
 364
 365 \  *******************************************************************
 366 \  *     Read-Newerfile                                              *
 367 \  *******************************************************************
 368
 369 \  Read-Newerfile               ( -- )
 370 \     Read in the newer file, which is generally longer. Work
 371 \     from both ends toward the middle.  From the beginning of
 372 \     `LCS` put in the line numbers: 1, 2, 3, ....  From the end
 373 \     of `LCS` put in corresponding hash values: ... h3, h2, h1.
 374 \     Cell `LCS[0]` is not used now.
 375 \     Output: NEWLINES ; Uses: &NEWTEXT
 376
 377 : Read-Newerfile                  ( -- )
 378     0                               ( n)
 379     BEGIN
 380         1+
 381         &NEWTEXT NEWFILE Read-Text
 382     WHILE
 383         dup 2* DFC-Space > ?abort" Sorry, not enough space. "
 384         dup  dup SLOT !
 385         &NEWTEXT Cell-Count >HASH over Slot-H !
 386     REPEAT
 387     to NEWLINES ;
 388
 389 \  *******************************************************************
 390 \  *     Sort-Hash-Values                                            *
 391 \  *******************************************************************
 392
 393 \  Sort-Hash-Values             ( -- )
 394 \     Order the hash values, carrying the line numbers as the
 395 \     minor key. The result has the first n-1 line numbers in
 396 \     the cells 1..n-1 sorted by the hash value of the text
 397 \     of the corresponding lines.
 398
 399     : Insert-Hash-Value             ( Gap j -- Gap )
 400         \  Inner insertion loop for custom Shell sort.
 401         \  Uses: X Y
 402         dup Slot-H @  X !    dup SLOT @  Y !
 403         over -                      ( Gap j)
 404         BEGIN   dup Slot-H @  X @ < NOT
 405         WHILE
 406             dup Slot-H @  X @ >
 407             ?dup 0= IF   dup SLOT @  Y @ >  THEN
 408         WHILE
 409             2dup + >R
 410                 dup Slot-H @    R@ Slot-H !
 411                 dup SLOT @
 412             R> SLOT !
 413             over -
 414             dup 1 <
 415         UNTIL   ( THEN THEN)   over +  ( Gap j+Gap)
 416         X @  over Slot-H !    Y @  over SLOT !
 417         DROP ;
 418
 419 \  Sort-Hash-Values                ( -- )
 420 \    Shell sort for unusual data structure.
 421 \    Input: NEWLINES
 422
 423 : Sort-Hash-Values                ( -- )
 424     NEWLINES   1                    ( lines gap)
 425     BEGIN   2dup 1+ > WHILE  2* 1+  REPEAT
 426     BEGIN   2/   dup
 427     WHILE
 428         2dup 1+ ?DO   I Insert-Hash-Value   LOOP
 429     REPEAT  2DROP ;
 430
 431 \  *******************************************************************
 432 \  *     Mark-Hash-Classes                                           *
 433 \  *******************************************************************
 434
 435 \  Mark-Hash-Classes            ( -- )
 436 \     Mark the hash value equivalence classes by negating
 437 \     the last line number associated with a hash value.
 438 \     Input: newlines
 439
 440 : Mark-Hash-Classes               ( -- )
 441     NEWLINES 1-  1  ?DO
 442         I Slot-H @  I 1+ Slot-H @ = NOT
 443             IF   I SLOT dup @ NEGATE SWAP !   THEN
 444     LOOP
 445     NEWLINES 1- SLOT dup @ NEGATE SWAP !
 446     ;
 447
 448 \  *******************************************************************
 449 \  *     Read-Oldfile                                                *
 450 \  *******************************************************************
 451
 452 \  Read-Oldfile                 ( -- )
 453 \     Read oldfile and match newfile hashed lines.
 454 \     Reserve two cells following the line numbers of the newer
 455 \     file. Now read in each line of the old file. Take the hash
 456 \     value of the line, and find the first line in the newer
 457 \     file having the same hash value. Store the number of the
 458 \     cell containing line number in the next successive cell.
 459 \     If the line in the old file does not appear anywhere in
 460 \     newer file, store 0.
 461 \     Input: NEWLINES ; Output: OLDLINES
 462
 463 : Search-for-Hash           ( match low high hash -- match )
 464     >R                          ( match low high)( R: hash)
 465         BEGIN   over 1+ over <
 466         WHILE
 467             2dup + 2/           ( match low high mid)
 468             dup Slot-H @  R@ < IF
 469                 ROT DROP SWAP   ( match low high)
 470             ELSE                ( match low high mid)
 471                 NIP             ( match low high)
 472                 dup Slot-H @  R@ =
 473                     IF   ROT DROP TUCK  THEN
 474             THEN
 475         REPEAT   2DROP          ( match)
 476     R> DROP                     ( R: )
 477     ;
 478
 479 : Read-Oldfile                    ( -- )
 480     NEWLINES 1+                     ( biased-line-number)
 481     BEGIN
 482         1+
 483         &OLDTEXT OLDFILE Read-Text
 484     WHILE
 485         dup NEWLINES + DFC-Space >
 486             ?abort" Sorry, out of space for newer file. "
 487         0 0 NEWLINES                ( . match low high)
 488         &OLDTEXT Cell-Count >HASH Search-for-Hash
 489                                     ( biased-line-number match)
 490         over SLOT !                 ( biased-line-number)
 491     REPEAT
 492     to OLDLINES ;
 493
 494 \  *******************************************************************
 495 \  *     Find-Longest-Common-Subsequence                             *
 496 \  *******************************************************************
 497
 498 \  We are done with the sub-array of hash values, and the
 499 \  memory can be used for something else.
 500
 501 \  Find-Longest-Common-Subsequence ( -- )
 502 \     Find the longest common subsequence.  Following the
 503 \     sub-array used for the old file, build a doubly-linked
 504 \     list representing the potential longest common subsequences
 505 \     in reverse order. In doing this, replace the value in
 506 \     the cells associated with the old file with the cell number
 507 \     of the appropriate doubly-linked list. The two cells that
 508 \     were reserved are used as the bounds of the subsequences.
 509
 510 \  CANDIDATE                       ( x y z -- candidate-pointer)
 511 \    Make a new candidate for LCS.
 512 \    In/Out: cand
 513
 514 : CANDIDATE                       ( x y z -- candidate-pointer)
 515     CAND @ DFC-Space 2 - >
 516         ?abort" Sorry, candidate space exhausted. "
 517     CAND @ >R                       ( R: candidate-pointer)
 518         >R >R                       ( x)
 519             CAND @ SLOT !           (  )
 520             1 CAND +!
 521             R> ( y) CAND @ SLOT !
 522             1 CAND +!
 523         R> ( z) CAND @ SLOT !
 524         1 CAND +!
 525     R>                              ( candidate-pointer)( R: )
 526     ;
 527
 528 \  Search-for-Match            ( Value low high -- 0 | Value wherefound )
 529 \    Binary search for LCS candidates.
 530
 531 : Search-for-Match            ( Value low high -- 0 | Value wherefound )
 532     ROT >R                          ( low high)( R: Value)
 533         BEGIN   2dup > NOT
 534         WHILE
 535             2dup + 2/               ( low high mid)
 536             dup SLOT @ 1+ SLOT @  R@ < NOT IF
 537                 1-    NIP           ( low high)
 538             ELSE                    ( low high mid)
 539                 dup 1+ SLOT @ 1+ SLOT @  R@ < NOT IF
 540                   NIP NIP  R> SWAP
 541                 EXIT THEN
 542                 1+  ROT DROP SWAP   ( low high)
 543             THEN
 544         REPEAT                      2DROP
 545     R> DROP                         ( R: )
 546     0 ;
 547
 548 \  New-Candidate                   ( value wherefound i -- flag)
 549 \    Make and link a new LCS candidate.
 550 \    In/Out: X Y LCS
 551
 552 : New-Candidate                   ( value wherefound i -- flag)
 553     ROT ROT                         ( i value wherefound)
 554     dup >R
 555         2dup 1+ SLOT @ 1+ SLOT @  < IF
 556             Y @  X @ SLOT !
 557             dup 1+  X !
 558             SLOT @  CANDIDATE Y !   ( )
 559         ELSE   2DROP DROP   THEN
 560     R>  LCS @ =                     ( flag)
 561     dup IF                          ( Move fence. )
 562         LCS @ 1+ SLOT @  LCS @ 2 + SLOT !
 563         1 LCS +!
 564     THEN ;
 565
 566 \  Find-Longest-Common-Subsequence ( -- )
 567 \    Nuf ced.
 568 \    Input: oldlines newlines ; Uses: cand LCS X Y
 569
 570 : Find-Longest-Common-Subsequence ( -- )
 571     OLDLINES  CAND !
 572     NEWLINES  LCS !
 573     NEWLINES 1+  0 0 CANDIDATE  LCS @ SLOT !
 574     OLDLINES  NEWLINES 0 CANDIDATE  LCS @ 1+ SLOT !
 575     OLDLINES  NEWLINES 2 +
 576     ?DO                             ( )
 577         I SLOT @                    ( newer-line-number)
 578         dup IF
 579             NEWLINES  dup X !  SLOT @  Y !
 580             BEGIN
 581                 dup SLOT @ ABS      ( . value)
 582                 X @  LCS @  Search-for-Match
 583                                     ( . 0 | . value wherefound)
 584                 dup IF   I  New-Candidate   THEN
 585                                     ( newer-line-number flag)
 586                 0=
 587             WHILE                   ( newer-line-number)
 588                 dup SLOT @ 0>
 589             WHILE
 590                 1+
 591             REPEAT ( THEN)
 592             Y @  X @ SLOT !
 593         THEN  DROP                  ( )
 594     LOOP ;
 595
 596 \  *******************************************************************
 597 \  *     Build-Candidate-Table                                       *
 598 \  *******************************************************************
 599
 600 \  Build-Candidate-Table        ( -- )
 601 \     Untangle the linked reverse list of the longest common
 602 \     subsequence to become a simple linear list in forward
 603 \     order in the sub-array used for the old file.
 604 \     Unravel LCS.
 605 \     Input: LCS oldlines newlines
 606
 607 : Build-Candidate-Table           ( -- )
 608     LCS @ SLOT @                    ( c)
 609     OLDLINES  NEWLINES 2 +
 610         ?DO    0  I SLOT !    LOOP
 611     NEWLINES  OLDLINES  SLOT !
 612     BEGIN   dup
 613     WHILE
 614         dup 1+ SLOT @               ( c j)
 615         over SLOT @ SLOT !          ( c)
 616         2 + SLOT @
 617     REPEAT  DROP ;
 618
 619 \  *******************************************************************
 620 \  *     Show-Differences                                            *
 621 \  *******************************************************************
 622
 623 \  Show-Differences             ( -- )
 624 \     The values are 0 if the line does not appear in the newer
 625 \     file, or the line number of a candidate match in the
 626 \     newer file.  Skipped numbers are lines that are new in
 627 \     the newer file.
 628
 629 \     Display the lines that were deleted from the old file,
 630 \     inserted in the newer file, or unchanged.
 631
 632 \     State: 0= delete, 0< add, 0> copy.
 633
 634 : Write-Text                      ( buffer --  )
 635   Cell-Count                      ( addr len)
 636     BEGIN dup DFC-Right-Margin > WHILE
 637         over DFC-Right-Margin       ( . . addr2 len2)
 638
 639         (*
 640         BEGIN dup WHILE
 641             2dup chars + C@ Is-White NOT
 642         WHILE   1-   REPEAT
 643         ELSE DROP DFC-Right-Margin THEN
 644         *)
 645         begin dup ifnot DROP DFC-Right-Margin break endif
 646         2dup chars + C@ Is-White not-while 1- repeat
 647
 648         TUCK TYPE ( str len len2) 1+ /STRING ( str len)
 649         CR 10 SPACES
 650     REPEAT
 651     TYPE attr-norm CR ;
 652
 653 \  DELETED                         ( previous-state -- state )
 654 \    What to do when the line is in the old file only.
 655 \    Input: X Y oldtext
 656 \    In/Out: matchingtext
 657
 658 : DELETED                         ( previous-state -- state )
 659     &MATCHINGTEXT @ 0< NOT IF
 660         DFC-Collate @ 0= IF
 661             X @ 1- 3 U.R SPACE
 662             Y @ 3 U.R    SPACE
 663         THEN
 664         &MATCHINGTEXT Write-Text
 665         -1 &MATCHINGTEXT !
 666     THEN
 667     attr-del
 668     DFC-Collate @ 0= IF
 669         X @ 3 U.R  SPACE
 670     THEN
 671     ." --- "
 672     &OLDTEXT Write-Text
 673     DROP 0 ( delete ) ;
 674
 675 \  ADDED                           ( previous-state -- state )
 676 \    What to do when the line is in the newer file only.
 677 \    Input: X Y newtext
 678 \    In/Out: matchingtext
 679
 680 : ADDED                           ( previous-state -- state )
 681     &MATCHINGTEXT @ 0< NOT IF
 682         DFC-Collate @ 0= IF
 683             X @ 1- 3 U.R  SPACE
 684             Y @ 1- 3 U.R  SPACE
 685         THEN
 686         &MATCHINGTEXT Write-Text
 687         -1 &MATCHINGTEXT !
 688     THEN
 689     attr-ins ." +++ "
 690     DFC-Collate @ 0= IF
 691         Y @ 3 U.R  SPACE
 692     THEN
 693     &NEWTEXT Write-Text
 694     DROP -1 ( add ) ;
 695
 696 \  MATCHED                         ( previous-state -- state )
 697 \    What to do when the line is in both files.
 698 \    Input: X Y oldtext newtext
 699 \    In/Out: LCS : number of matched lines.
 700 \    Output: matchingtext
 701
 702 : MATCHED                         ( previous-state -- state )
 703     1 LCS +!
 704     dup 1- 0< DFC-Collate @ OR ( adding or deleting ) IF
 705         DFC-Collate @ 0= IF
 706             X @ 3 U.R  SPACE
 707             Y @ 3 U.R  SPACE
 708         THEN
 709         &NEWTEXT Write-Text
 710         DROP 1                      ( copy )
 711     ELSE                     ( copying, = number of lines just copied. )
 712         1+
 713         dfc-collate @ 0= if
 714           3 over = IF  CR  THEN
 715         then
 716         &NEWTEXT Cell-Count &MATCHINGTEXT Cell-Place
 717     THEN
 718     ;
 719
 720 : Handle-Deleted                ( state -- same )
 721     BEGIN
 722         1 X +!    X @ NEWLINES + 1+  OLDLINES  < IF
 723             &OLDTEXT OLDFILE Read-Text 0=
 724                 ?abort" Oops, error with old file. "
 725         THEN
 726         X @ NEWLINES + 1+  SLOT @
 727             ( i.e. newer-line-number) 0=
 728     WHILE   DELETED   REPEAT
 729     ;
 730
 731 : Handle-Added                  ( state -- same )
 732     BEGIN
 733         1 Y +!    Y @  NEWLINES < IF
 734             &NEWTEXT NEWFILE Read-Text 0=
 735                 ?abort" Oops, error with newer file. "
 736         THEN
 737         X @ NEWLINES + 1+  SLOT @ Y @ >
 738     WHILE   ADDED   REPEAT
 739     ;
 740
 741 : Clean-Compare                 ( s1 . s2 . -- 0|-1|1 )
 742     &Cleaned-Newtext Cell-Place ( s1 .)
 743     &Cleaned-Oldtext Cell-Place ( )
 744     &Cleaned-Oldtext Cell-Count Clean-Line ( s1' .)
 745     &Cleaned-Newtext Cell-Count Clean-Line ( s2' .)
 746     COMPARE                     ( 0|-1|1)
 747     ;
 748
 749 : Handle-Matched                ( state -- same )
 750     \  Check that matched records are really the same.
 751     &OLDTEXT Cell-Count  &NEWTEXT Cell-Count
 752         Clean-Compare 0= IF
 753                 MATCHED
 754         ELSE    ADDED   DELETED   THEN
 755     ;
 756
 757 \  Show-Differences                ( -- )
 758 \    Let's see them.
 759 \    Input: oldlines newlines ; Uses: X Y LCS matchingtext
 760
 761 : Show-Differences                ( -- )
 762     attr-norm
 763     OLDFILE REWIND   NEWFILE REWIND
 764     0 X !    0 Y !    0 LCS !
 765     -1 &MATCHINGTEXT !
 766     1                               ( copying )
 767     BEGIN                           ( state)
 768         Handle-Deleted    Handle-Added
 769         Y @ NEWLINES <
 770     WHILE   Handle-Matched
 771     REPEAT  DROP  attr-norm ;
 772
 773 \  *******************************************************************
 774 \  *     Differential File Comparison and Collation                  *
 775 \  *******************************************************************
 776
 777 INTERFACE
 778
 779 : DFC                             ( -- )
 780     NEWFILE FILE-SIZE DROP  OR 0= ?abort" Size of NEWFILE is 0. "
 781     OLDFILE FILE-SIZE DROP  OR 0= ?abort" Size of OLDFILE is 0. "
 782
 783     Read-Newerfile   Sort-Hash-Values   Mark-Hash-Classes
 784     Read-Oldfile     Find-Longest-Common-Subsequence
 785     Build-Candidate-Table
 786
 787     Show-Differences
 788
 789     OLDFILE REWIND   NEWFILE REWIND
 790     attr-report
 791     OLDLINES NEWLINES - 2 - LCS @ - . ." deletions, "
 792     NEWLINES 1- LCS @ - . ." insertions, "
 793     LCS @ . ." unchanged." attr-norm cr
 794
 795   DFC-Collate 0! ;
 796
 797 PREVIOUS DEFINITIONS
 798
 799 \  COLLATE
 800 \     DFC in new window with DFC-Collate on.
 801
 802 : COLLATE   DFC-Collate 1!   DFC ;
 803
 804 \  *******************************************************************
 805 \  *     NEWER and OLDER                                             *
 806 \  *******************************************************************
 807
 808 \  When `CLIPBOARD` is defined, text can be copied to the clipboard
 809 \  and then written to files `Newer` and `Older` with file-IDs
 810 \  `NEWFILE` and `OLDFILE`. They can then be compared with `DFC`.
 811
 812 [DEFINED] CLIPBOARD [IF]
 813
 814     : INOUT                         ( str len -- fid )
 815             \ 2dup DELETE-FILE DROP
 816             \ R/W CREATE-FILE ?abort" Can't CREATE-FILE "
 817             R/W OPEN-FILE ?abort" Can't OPEN-FILE "
 818             ;
 819
 820     : NEWER CR \ 5 Wipe-Chars
 821           \ S" newer" DELETE-FILE DROP
 822             S" newer" R/W CREATE-FILE FILE-CHECK to NEWFILE
 823             CLIPBOARD NEWFILE WRITE-FILE
 824                 \ ?abort" Can't WRITE-FILE "
 825                 FILE-CHECK
 826             NEWFILE REWIND ;
 827
 828     : OLDER CR \ 5 Wipe-Chars
 829           \ S" older" DELETE-FILE DROP
 830             S" older" R/W CREATE-FILE FILE-CHECK to OLDFILE
 831             CLIPBOARD OLDFILE WRITE-FILE
 832                 \ ?abort" Can't WRITE-FILE "
 833                 FILE-CHECK
 834             OLDFILE REWIND ;
 835
 836 [THEN]
 837
 838 \\   //   \\   //   \\   //   \\   //   \\   //   \\   //   \\   //   \\
 839 (*
 840 " dfc2002.txt" r/o open-file throw to oldfile
 841 " dfc2002.f" r/o open-file throw to newfile
 842 dfc
 843 oldfile close-file throw
 844 newfile close-file throw
 845
 846 .stack
 847 bye
 848 *)
 849 : main ( -- )
 850   cli-arg-next 1+ argc >= if endcr ." files?\n" 1 bye endif
 851   ." first file : " cli-arg-next argv-str type cr
 852   cli-arg-next argv-str r/o open-file if ." ERROR: cannot open file!\n" 1 bye endif to oldfile
 853   cli-arg-skip
 854   ." second file: " cli-arg-next argv-str type cr
 855   cli-arg-next argv-str r/o open-file if ." ERROR: cannot open file!\n" 1 bye endif to newfile
 856   dfc
 857   oldfile close-file throw
 858   newfile close-file throw
 859   bye
 860 ;
 861
 862 main