locals: cosmetix
[urforth.git] / samples / fdiff.f
blob9b7c17bfdb400a46125c5a9e248fb25fcce4380a
1 #! env urforth
2 \ ANEW --DFC-- DECIMAL \ Wil Baden 2002-04-17
3 \ adapted to UrForth by Ketmar Dark
4 use-lib: tty
6 \ *******************************************************************
7 \ * *
8 \ * Wil Baden 2002-04-18 *
9 \ * *
10 \ * DFC-2002 Differential File Comparison *
11 \ * *
12 \ * DFC Show differences. *
13 \ * *
14 \ * COLLATE Show matches and differences. *
15 \ * *
16 \ *******************************************************************
18 \ A necessary tool for every programmer is a utility to compare
19 \ files - particularly source files - to find where and how
20 \ they are different. A prudent programmer will make a copy of
21 \ a file before modifying it. In the course of making a series
22 \ of small modifications to fix a misbehaving application the
23 \ programmer can easily lose track of just what has been done.
24 \ Then the file comparison utility can be used to show the
25 \ changes.
27 \ This utility can be used to show the differences between
28 \ released versions as well.
30 \ To do the job right is not a trivial task. The obvious
31 \ algorithm will sooner than later fail miserably.
33 \ The obvious algorithm is to compare lines until a difference
34 \ is found, then search forward in both files to find where
35 \ they are the same again.
37 \ The trick is not to look for differences but to look for the
38 \ longest common subsequence - the longest set of lines which
39 \ are the same in both files and in the same order with what's
40 \ different interspersed. What's left are the differences.
42 \ How to do this is the subject of
44 \ Hunt, J. W. and M. D. McIlroy [1976],
46 \ "An algorithm for differential file comparison,"
48 \ Computing Science Technical Report 41, AT&T Bell
49 \ Laboratories, Murray Hill, N.J.
51 \ It is based on
53 \ Hunt, J.W and T.G. Szymanski, [1977]
55 \ "A fast algorithm for computing longest common subsequences",
57 \ Comm. ACM , vol. 20 no. 5, pp. 350-353.
59 \ In 1976 I implemented this using my own code in Fortran II
60 \ for a 8K 16-bit word IBM 1130. It has followed me ever
61 \ since, becoming re-incarnated on each new platform in
62 \ whatever the language of the moment was.
64 \ I even did this in C for Unix because my output format was
65 \ more useful to me than that of the Unix tool `diff`.
67 \ Some years ago I did it for MacForth. In the present
68 \ incarnation it is Standard Forth.
70 \ Differential file comparison is the foundation of version
71 \ control systems - SCCS, RCS, SCVS.
73 \ The algorithm is essentially brute force. Read and save one
74 \ file, then read records from the other file trying to find
75 \ with each record a longer common subsequence than you already
76 \ have.
78 \ Potentially this could require M * N line comparisons, where
79 \ M and N are the number of lines in each file. In real life
80 \ that never happens. The time and memory constraints are still
81 \ too extravagant. So a really slick trick is used. Instead of
82 \ comparing whole lines, an integer hash value is computed for
83 \ each line, and the associated hash values are compared.
84 \ Making believe that every unique line has a unique hash
85 \ value, we compute a longest common subsequence. Not until we
86 \ print do we check whether equal hash values represent
87 \ identical lines.
89 \ In 25 years of use this has hardly ever happened. In the
90 \ very few times it has, the effect has been negligible. (You
91 \ can tell that it has happened when an insertion appears just
92 \ before a deletion.) I haven't seen it since 1988.
94 \ Of course you can force it to happen by using a poor hashing
95 \ function. However the hashing function doesn't have to be
96 \ sophisticated. The one used here has worked fine with 32-bit
97 \ or 16-bit arithmetic.
99 \ Where I used to work, the Pascal incarnation was used 30 to
100 \ 200 times a day for ten years, using 16-bit arithmetic. It
101 \ was used even after the company went to Unix.
103 \ How to Use
105 \ old-file-id TO OLDFILE
106 \ new-file-id TO NEWFILE
107 \ DFC
109 \ or
111 \ old-file-id TO OLDFILE
112 \ new-file-id TO NEWFILE
113 \ COLLATE
115 \ NEWFILE and OLDFILE may be assigned in either order.
117 \ You should adapt the file-opening to your environment.
119 \ Here is an example from John Peters that works on two versions of
120 \ WinView.
122 \ S" C:\WIN32FOR\WINVIEW.F" R/O OPEN-FILE DROP TO OLDFILE
123 \ S" C:\WIN32FORCG\WINVIEW.F" R/O OPEN-FILE DROP TO NEWFILE
124 \ DFC
126 \ The following compares an old source for `DFC` with a revision.
128 \ S" DFC.4TH" R/O OPEN-FILE DROP TO OLDFILE
129 \ S" DFCNEW.4TH" R/O OPEN-FILE DROP TO NEWFILE
130 \ DFC
132 \ The output was:
134 \ 1 --- ( DFC - Differential File Comparison. Wil Baden 1976-1996 )
135 \ +++ 1 ( DFC - Differential File Comparison Using HERE Wil Baden )
136 \ 2 2
138 \ 50 37
139 \ 51 --- 6000 CONSTANT lcs-space ( The larger the better. )
140 \ 52 --- CREATE LCS lcs-space CELLS ALLOT
141 \ 53 ---
142 \ +++ 38 0 VALUE lcs-space 0 VALUE LCS
143 \ 54 39 0 VALUE oldlines 0 VALUE newlines
145 \ [Some lines omitted]
147 \ 394 379 ( Differential file comparison. )
148 \ +++ 380 ALIGN HERE TO LCS
149 \ +++ 381 UNUSED 1 CELLS - 1+ ALIGNED 1 CELLS / TO LCS-Space
150 \ 395 382 Read-Newerfile Sort-Hash-Values Mark-Hash-Classes
152 \ 397 384 Build-Candidate-Table Show-Differences
153 \ +++ 385 oldlines newlines - 2 - LCS @ - . ." deletions, "
154 \ +++ 386 newlines 1- LCS @ - . ." insertions, "
155 \ +++ 387 LCS @ . ." unchanged " CR
156 \ 398 388 OLDFILE REWIND NEWFILE REWIND
158 \ This shows that in the old file, DFC.4TH,
160 \ * Line 1 has been replaced.
162 \ * Lines 51 through 53 have been replaced by a single line.
164 \ * A few new lines have been inserted after lines 394 and 397.
166 \ The numbers in the first column are the line numbers in the
167 \ first file.
169 \ The numbers in the second column are the line numbers in the
170 \ second file.
172 \ The code has been checked for 16-bit and 32-bit cell size.
174 \ DFC - Differential File Comparison.
175 \ Make a line by line comparison of two files, showing where and
176 \ how they are different.
178 \ Usage:
180 \ old-file-id TO OLDFILE
181 \ new-file-id TO NEWFILE
182 \ DFC
184 \ *******************************************************************
185 \ * Common Functions *
186 \ *******************************************************************
188 \ Comment out definitions that you already have.
189 \ : BOUNDS ( addr len -- addr+len addr ) over + SWAP ;
190 : Is-White ( char -- flag ) bl <= ;
191 \ : NOT ( x -- flag ) 0= ;
192 \ : OFF ( addr -- ) FALSE SWAP ! ;
193 \ : ON ( addr -- ) TRUE SWAP ! ;
194 : REWIND ( fid -- ) 0 0 ROT REPOSITION-FILE THROW ;
197 \ *******************************************************************
198 \ * Application Values and Variables *
199 \ *******************************************************************
201 \ OLDFILE ( -- file-id )
202 \ Value for file-ID of "old" file. To be set by user.
204 \ NEWFILE ( -- file-id )
205 \ Value for file-ID of "newer" file. To be set by user.
207 \ DFC-Maxline ( -- n )
208 \ Value of maximum size of line for DFC comparisons. Initially 255.
210 \ DFC-Right-Margin ( -- n )
211 \ Value of the right-hand margin for automatically wrapping
212 \ output lines. Set this to a convenient size for you.
214 \ DFC-Collate ( -- addr )
215 \ Variable should be set ON to collate instead of showing
216 \ differences.
218 \ SWAPFILES ( -- )
219 \ Exchange OLDFILE and NEWFILE .
221 0 VALUE OLDFILE
222 0 VALUE NEWFILE
224 120 VALUE DFC-Right-Margin
226 0 VAR DFC-Collate \ DFC-Collate 0!
228 : SWAPFILES ( -- )
229 OLDFILE NEWFILE to OLDFILE to NEWFILE ;
231 \ *******************************************************************
232 \ * Implementation *
233 \ *******************************************************************
235 VOCABULARY Differential-File-Comparison
237 stdout-fd tty:fd-tty? value istty?
239 : attr-norm ( -- ) istty? if " \x1b[0m" forth:(stdtty-xtype) endif ;
240 : attr-del ( -- ) istty? if " \x1b[33m" forth:(stdtty-xtype) endif ;
241 : attr-ins ( -- ) istty? if " \x1b[32m" forth:(stdtty-xtype) endif ;
242 : attr-report ( -- ) istty? if " \x1b[37;1m" forth:(stdtty-xtype) endif ;
244 : INTERFACE ( -- )
245 GET-ORDER
246 >R over SET-CURRENT R>
247 SET-ORDER ;
249 ALSO Differential-File-Comparison
251 DEFINITIONS
253 255 CONSTANT DFC-Maxline
254 DFC-Maxline 2 + aligned CELL+ CONSTANT Textbuffer-Size
256 50000 CONSTANT DFC-Space \ Make as big as possible.
258 CREATE &OLDTEXT Textbuffer-Size ALLOT
259 CREATE &NEWTEXT Textbuffer-Size ALLOT
260 CREATE &MATCHINGTEXT Textbuffer-Size ALLOT
261 CREATE &Cleaned-Oldtext Textbuffer-Size ALLOT
262 CREATE &Cleaned-Newtext Textbuffer-Size ALLOT
264 \ LCS
265 \ Cell for each record + 3 * matching-candidates.
266 \ Thus 6000 cells takes care of files up to at least 1200
267 \ lines.
268 \ In `Find-Longest-Common-Subsequence`, pointer to
269 \ candidate. In `Show-Differences`, number of matched
270 \ lines.
272 \ Cell-Place ( c_addr len addr -- )
273 \ Cell version of `PLACE`.
275 \ Cell-Count ( addr -- c_addr len )
276 \ Cell version of `COUNT`.
278 CREATE LCS DFC-Space CELLS ALLOT
280 : Cell-Place ( c_addr len addr -- )
281 2dup 2>R CELL+ SWAP chars MOVE 2R> !
284 : Cell-Count ( addr -- c_addr len )
285 dup CELL+ SWAP @ -TRAILING ;
287 0 VAR SKIPPING
289 : Clean-Line ( c_addr len -- c_addr len' )
290 \ Remove fairy characters.
291 SKIPPING 0!
292 >R 0 over R> ( c_addr len' c_addr len )
293 chars BOUNDS ?DO ( c_addr len')
294 I C@ Is-White IF
295 SKIPPING 1!
296 ELSE
297 SKIPPING @ IF
298 2dup chars + BL SWAP C!
300 SKIPPING 0!
301 THEN
302 2dup chars + I C@ SWAP C!
304 THEN
305 LOOP ;
307 131 CONSTANT Hash-Factor
309 \ >HASH ( c_addr len -- hash-value )
310 \ Compute hash value for a string.
312 : >HASH ( c_addr len -- hash-value )
313 SKIPPING 0!
314 0 ROT ROT chars BOUNDS ?DO ( hash-value)
315 I C@ Is-White IF
316 SKIPPING 1!
317 ELSE
318 SKIPPING @ IF
319 Hash-Factor * BL +
320 SKIPPING 0!
321 THEN
322 Hash-Factor * I C@ +
323 THEN
324 LOOP ;
326 : Read-Text ( buffer fileid -- flag )
327 >R dup CELL+ DFC-Maxline R> READ-LINE THROW
328 SWAP ROT ! ;
330 \ NEWLINES ( -- n )
331 \ 1 + lines in newer file.
333 \ OLDLINES ( -- n )
334 \ 1 + lines in old file + 1 + lines in newer file.
336 \ CAND ( -- addr )
337 \ Next candidate.
340 \ Generally, working variable. In `Show-Differences`,
341 \ old line number.
344 \ Generally, working variable. In `Show-Differences`,
345 \ new line number.
347 \ SLOT ( n -- addr )
348 \ Address of _n_th item in working memory from the bottom.
349 \ Has the record number of a line.
351 \ SLOT-H ( n -- addr )
352 \ Address of _n_th item in working memory from the top.
353 \ Used for the hash value of a line. The memory for this is
354 \ separate from the line numbers so it can be recovered
355 \ after being sorted. The area will then be used for
356 \ candidate identification.
358 0 VALUE NEWLINES 0 VALUE OLDLINES
359 0 VAR X 0 VAR Y
360 0 VAR CAND
362 : SLOT CELLS LCS + ;
363 : Slot-H DFC-Space SWAP - CELLS LCS + ;
365 \ *******************************************************************
366 \ * Read-Newerfile *
367 \ *******************************************************************
369 \ Read-Newerfile ( -- )
370 \ Read in the newer file, which is generally longer. Work
371 \ from both ends toward the middle. From the beginning of
372 \ `LCS` put in the line numbers: 1, 2, 3, .... From the end
373 \ of `LCS` put in corresponding hash values: ... h3, h2, h1.
374 \ Cell `LCS[0]` is not used now.
375 \ Output: NEWLINES ; Uses: &NEWTEXT
377 : Read-Newerfile ( -- )
378 0 ( n)
379 BEGIN
381 &NEWTEXT NEWFILE Read-Text
382 WHILE
383 dup 2* DFC-Space > ?abort" Sorry, not enough space. "
384 dup dup SLOT !
385 &NEWTEXT Cell-Count >HASH over Slot-H !
386 REPEAT
387 to NEWLINES ;
389 \ *******************************************************************
390 \ * Sort-Hash-Values *
391 \ *******************************************************************
393 \ Sort-Hash-Values ( -- )
394 \ Order the hash values, carrying the line numbers as the
395 \ minor key. The result has the first n-1 line numbers in
396 \ the cells 1..n-1 sorted by the hash value of the text
397 \ of the corresponding lines.
399 : Insert-Hash-Value ( Gap j -- Gap )
400 \ Inner insertion loop for custom Shell sort.
401 \ Uses: X Y
402 dup Slot-H @ X ! dup SLOT @ Y !
403 over - ( Gap j)
404 BEGIN dup Slot-H @ X @ < NOT
405 WHILE
406 dup Slot-H @ X @ >
407 ?dup 0= IF dup SLOT @ Y @ > THEN
408 WHILE
409 2dup + >R
410 dup Slot-H @ R@ Slot-H !
411 dup SLOT @
412 R> SLOT !
413 over -
414 dup 1 <
415 UNTIL ( THEN THEN) over + ( Gap j+Gap)
416 X @ over Slot-H ! Y @ over SLOT !
417 DROP ;
419 \ Sort-Hash-Values ( -- )
420 \ Shell sort for unusual data structure.
421 \ Input: NEWLINES
423 : Sort-Hash-Values ( -- )
424 NEWLINES 1 ( lines gap)
425 BEGIN 2dup 1+ > WHILE 2* 1+ REPEAT
426 BEGIN 2/ dup
427 WHILE
428 2dup 1+ ?DO I Insert-Hash-Value LOOP
429 REPEAT 2DROP ;
431 \ *******************************************************************
432 \ * Mark-Hash-Classes *
433 \ *******************************************************************
435 \ Mark-Hash-Classes ( -- )
436 \ Mark the hash value equivalence classes by negating
437 \ the last line number associated with a hash value.
438 \ Input: newlines
440 : Mark-Hash-Classes ( -- )
441 NEWLINES 1- 1 ?DO
442 I Slot-H @ I 1+ Slot-H @ = NOT
443 IF I SLOT dup @ NEGATE SWAP ! THEN
444 LOOP
445 NEWLINES 1- SLOT dup @ NEGATE SWAP !
448 \ *******************************************************************
449 \ * Read-Oldfile *
450 \ *******************************************************************
452 \ Read-Oldfile ( -- )
453 \ Read oldfile and match newfile hashed lines.
454 \ Reserve two cells following the line numbers of the newer
455 \ file. Now read in each line of the old file. Take the hash
456 \ value of the line, and find the first line in the newer
457 \ file having the same hash value. Store the number of the
458 \ cell containing line number in the next successive cell.
459 \ If the line in the old file does not appear anywhere in
460 \ newer file, store 0.
461 \ Input: NEWLINES ; Output: OLDLINES
463 : Search-for-Hash ( match low high hash -- match )
464 >R ( match low high)( R: hash)
465 BEGIN over 1+ over <
466 WHILE
467 2dup + 2/ ( match low high mid)
468 dup Slot-H @ R@ < IF
469 ROT DROP SWAP ( match low high)
470 ELSE ( match low high mid)
471 NIP ( match low high)
472 dup Slot-H @ R@ =
473 IF ROT DROP TUCK THEN
474 THEN
475 REPEAT 2DROP ( match)
476 R> DROP ( R: )
479 : Read-Oldfile ( -- )
480 NEWLINES 1+ ( biased-line-number)
481 BEGIN
483 &OLDTEXT OLDFILE Read-Text
484 WHILE
485 dup NEWLINES + DFC-Space >
486 ?abort" Sorry, out of space for newer file. "
487 0 0 NEWLINES ( . match low high)
488 &OLDTEXT Cell-Count >HASH Search-for-Hash
489 ( biased-line-number match)
490 over SLOT ! ( biased-line-number)
491 REPEAT
492 to OLDLINES ;
494 \ *******************************************************************
495 \ * Find-Longest-Common-Subsequence *
496 \ *******************************************************************
498 \ We are done with the sub-array of hash values, and the
499 \ memory can be used for something else.
501 \ Find-Longest-Common-Subsequence ( -- )
502 \ Find the longest common subsequence. Following the
503 \ sub-array used for the old file, build a doubly-linked
504 \ list representing the potential longest common subsequences
505 \ in reverse order. In doing this, replace the value in
506 \ the cells associated with the old file with the cell number
507 \ of the appropriate doubly-linked list. The two cells that
508 \ were reserved are used as the bounds of the subsequences.
510 \ CANDIDATE ( x y z -- candidate-pointer)
511 \ Make a new candidate for LCS.
512 \ In/Out: cand
514 : CANDIDATE ( x y z -- candidate-pointer)
515 CAND @ DFC-Space 2 - >
516 ?abort" Sorry, candidate space exhausted. "
517 CAND @ >R ( R: candidate-pointer)
518 >R >R ( x)
519 CAND @ SLOT ! ( )
520 1 CAND +!
521 R> ( y) CAND @ SLOT !
522 1 CAND +!
523 R> ( z) CAND @ SLOT !
524 1 CAND +!
525 R> ( candidate-pointer)( R: )
528 \ Search-for-Match ( Value low high -- 0 | Value wherefound )
529 \ Binary search for LCS candidates.
531 : Search-for-Match ( Value low high -- 0 | Value wherefound )
532 ROT >R ( low high)( R: Value)
533 BEGIN 2dup > NOT
534 WHILE
535 2dup + 2/ ( low high mid)
536 dup SLOT @ 1+ SLOT @ R@ < NOT IF
537 1- NIP ( low high)
538 ELSE ( low high mid)
539 dup 1+ SLOT @ 1+ SLOT @ R@ < NOT IF
540 NIP NIP R> SWAP
541 EXIT THEN
542 1+ ROT DROP SWAP ( low high)
543 THEN
544 REPEAT 2DROP
545 R> DROP ( R: )
548 \ New-Candidate ( value wherefound i -- flag)
549 \ Make and link a new LCS candidate.
550 \ In/Out: X Y LCS
552 : New-Candidate ( value wherefound i -- flag)
553 ROT ROT ( i value wherefound)
554 dup >R
555 2dup 1+ SLOT @ 1+ SLOT @ < IF
556 Y @ X @ SLOT !
557 dup 1+ X !
558 SLOT @ CANDIDATE Y ! ( )
559 ELSE 2DROP DROP THEN
560 R> LCS @ = ( flag)
561 dup IF ( Move fence. )
562 LCS @ 1+ SLOT @ LCS @ 2 + SLOT !
563 1 LCS +!
564 THEN ;
566 \ Find-Longest-Common-Subsequence ( -- )
567 \ Nuf ced.
568 \ Input: oldlines newlines ; Uses: cand LCS X Y
570 : Find-Longest-Common-Subsequence ( -- )
571 OLDLINES CAND !
572 NEWLINES LCS !
573 NEWLINES 1+ 0 0 CANDIDATE LCS @ SLOT !
574 OLDLINES NEWLINES 0 CANDIDATE LCS @ 1+ SLOT !
575 OLDLINES NEWLINES 2 +
576 ?DO ( )
577 I SLOT @ ( newer-line-number)
578 dup IF
579 NEWLINES dup X ! SLOT @ Y !
580 BEGIN
581 dup SLOT @ ABS ( . value)
582 X @ LCS @ Search-for-Match
583 ( . 0 | . value wherefound)
584 dup IF I New-Candidate THEN
585 ( newer-line-number flag)
587 WHILE ( newer-line-number)
588 dup SLOT @ 0>
589 WHILE
591 REPEAT ( THEN)
592 Y @ X @ SLOT !
593 THEN DROP ( )
594 LOOP ;
596 \ *******************************************************************
597 \ * Build-Candidate-Table *
598 \ *******************************************************************
600 \ Build-Candidate-Table ( -- )
601 \ Untangle the linked reverse list of the longest common
602 \ subsequence to become a simple linear list in forward
603 \ order in the sub-array used for the old file.
604 \ Unravel LCS.
605 \ Input: LCS oldlines newlines
607 : Build-Candidate-Table ( -- )
608 LCS @ SLOT @ ( c)
609 OLDLINES NEWLINES 2 +
610 ?DO 0 I SLOT ! LOOP
611 NEWLINES OLDLINES SLOT !
612 BEGIN dup
613 WHILE
614 dup 1+ SLOT @ ( c j)
615 over SLOT @ SLOT ! ( c)
616 2 + SLOT @
617 REPEAT DROP ;
619 \ *******************************************************************
620 \ * Show-Differences *
621 \ *******************************************************************
623 \ Show-Differences ( -- )
624 \ The values are 0 if the line does not appear in the newer
625 \ file, or the line number of a candidate match in the
626 \ newer file. Skipped numbers are lines that are new in
627 \ the newer file.
629 \ Display the lines that were deleted from the old file,
630 \ inserted in the newer file, or unchanged.
632 \ State: 0= delete, 0< add, 0> copy.
634 : Write-Text ( buffer -- )
635 Cell-Count ( addr len)
636 BEGIN dup DFC-Right-Margin > WHILE
637 over DFC-Right-Margin ( . . addr2 len2)
640 BEGIN dup WHILE
641 2dup chars + C@ Is-White NOT
642 WHILE 1- REPEAT
643 ELSE DROP DFC-Right-Margin THEN
645 begin dup ifnot DROP DFC-Right-Margin break endif
646 2dup chars + C@ Is-White not-while 1- repeat
648 TUCK TYPE ( str len len2) 1+ /STRING ( str len)
649 CR 10 SPACES
650 REPEAT
651 TYPE attr-norm CR ;
653 \ DELETED ( previous-state -- state )
654 \ What to do when the line is in the old file only.
655 \ Input: X Y oldtext
656 \ In/Out: matchingtext
658 : DELETED ( previous-state -- state )
659 &MATCHINGTEXT @ 0< NOT IF
660 DFC-Collate @ 0= IF
661 X @ 1- 3 U.R SPACE
662 Y @ 3 U.R SPACE
663 THEN
664 &MATCHINGTEXT Write-Text
665 -1 &MATCHINGTEXT !
666 THEN
667 attr-del
668 DFC-Collate @ 0= IF
669 X @ 3 U.R SPACE
670 THEN
671 ." --- "
672 &OLDTEXT Write-Text
673 DROP 0 ( delete ) ;
675 \ ADDED ( previous-state -- state )
676 \ What to do when the line is in the newer file only.
677 \ Input: X Y newtext
678 \ In/Out: matchingtext
680 : ADDED ( previous-state -- state )
681 &MATCHINGTEXT @ 0< NOT IF
682 DFC-Collate @ 0= IF
683 X @ 1- 3 U.R SPACE
684 Y @ 1- 3 U.R SPACE
685 THEN
686 &MATCHINGTEXT Write-Text
687 -1 &MATCHINGTEXT !
688 THEN
689 attr-ins ." +++ "
690 DFC-Collate @ 0= IF
691 Y @ 3 U.R SPACE
692 THEN
693 &NEWTEXT Write-Text
694 DROP -1 ( add ) ;
696 \ MATCHED ( previous-state -- state )
697 \ What to do when the line is in both files.
698 \ Input: X Y oldtext newtext
699 \ In/Out: LCS : number of matched lines.
700 \ Output: matchingtext
702 : MATCHED ( previous-state -- state )
703 1 LCS +!
704 dup 1- 0< DFC-Collate @ OR ( adding or deleting ) IF
705 DFC-Collate @ 0= IF
706 X @ 3 U.R SPACE
707 Y @ 3 U.R SPACE
708 THEN
709 &NEWTEXT Write-Text
710 DROP 1 ( copy )
711 ELSE ( copying, = number of lines just copied. )
713 dfc-collate @ 0= if
714 3 over = IF CR THEN
715 then
716 &NEWTEXT Cell-Count &MATCHINGTEXT Cell-Place
717 THEN
720 : Handle-Deleted ( state -- same )
721 BEGIN
722 1 X +! X @ NEWLINES + 1+ OLDLINES < IF
723 &OLDTEXT OLDFILE Read-Text 0=
724 ?abort" Oops, error with old file. "
725 THEN
726 X @ NEWLINES + 1+ SLOT @
727 ( i.e. newer-line-number) 0=
728 WHILE DELETED REPEAT
731 : Handle-Added ( state -- same )
732 BEGIN
733 1 Y +! Y @ NEWLINES < IF
734 &NEWTEXT NEWFILE Read-Text 0=
735 ?abort" Oops, error with newer file. "
736 THEN
737 X @ NEWLINES + 1+ SLOT @ Y @ >
738 WHILE ADDED REPEAT
741 : Clean-Compare ( s1 . s2 . -- 0|-1|1 )
742 &Cleaned-Newtext Cell-Place ( s1 .)
743 &Cleaned-Oldtext Cell-Place ( )
744 &Cleaned-Oldtext Cell-Count Clean-Line ( s1' .)
745 &Cleaned-Newtext Cell-Count Clean-Line ( s2' .)
746 COMPARE ( 0|-1|1)
749 : Handle-Matched ( state -- same )
750 \ Check that matched records are really the same.
751 &OLDTEXT Cell-Count &NEWTEXT Cell-Count
752 Clean-Compare 0= IF
753 MATCHED
754 ELSE ADDED DELETED THEN
757 \ Show-Differences ( -- )
758 \ Let's see them.
759 \ Input: oldlines newlines ; Uses: X Y LCS matchingtext
761 : Show-Differences ( -- )
762 attr-norm
763 OLDFILE REWIND NEWFILE REWIND
764 0 X ! 0 Y ! 0 LCS !
765 -1 &MATCHINGTEXT !
766 1 ( copying )
767 BEGIN ( state)
768 Handle-Deleted Handle-Added
769 Y @ NEWLINES <
770 WHILE Handle-Matched
771 REPEAT DROP attr-norm ;
773 \ *******************************************************************
774 \ * Differential File Comparison and Collation *
775 \ *******************************************************************
777 INTERFACE
779 : DFC ( -- )
780 NEWFILE FILE-SIZE DROP OR 0= ?abort" Size of NEWFILE is 0. "
781 OLDFILE FILE-SIZE DROP OR 0= ?abort" Size of OLDFILE is 0. "
783 Read-Newerfile Sort-Hash-Values Mark-Hash-Classes
784 Read-Oldfile Find-Longest-Common-Subsequence
785 Build-Candidate-Table
787 Show-Differences
789 OLDFILE REWIND NEWFILE REWIND
790 attr-report
791 OLDLINES NEWLINES - 2 - LCS @ - . ." deletions, "
792 NEWLINES 1- LCS @ - . ." insertions, "
793 LCS @ . ." unchanged." attr-norm cr
795 DFC-Collate 0! ;
797 PREVIOUS DEFINITIONS
799 \ COLLATE
800 \ DFC in new window with DFC-Collate on.
802 : COLLATE DFC-Collate 1! DFC ;
804 \ *******************************************************************
805 \ * NEWER and OLDER *
806 \ *******************************************************************
808 \ When `CLIPBOARD` is defined, text can be copied to the clipboard
809 \ and then written to files `Newer` and `Older` with file-IDs
810 \ `NEWFILE` and `OLDFILE`. They can then be compared with `DFC`.
812 [DEFINED] CLIPBOARD [IF]
814 : INOUT ( str len -- fid )
815 \ 2dup DELETE-FILE DROP
816 \ R/W CREATE-FILE ?abort" Can't CREATE-FILE "
817 R/W OPEN-FILE ?abort" Can't OPEN-FILE "
820 : NEWER CR \ 5 Wipe-Chars
821 \ S" newer" DELETE-FILE DROP
822 S" newer" R/W CREATE-FILE FILE-CHECK to NEWFILE
823 CLIPBOARD NEWFILE WRITE-FILE
824 \ ?abort" Can't WRITE-FILE "
825 FILE-CHECK
826 NEWFILE REWIND ;
828 : OLDER CR \ 5 Wipe-Chars
829 \ S" older" DELETE-FILE DROP
830 S" older" R/W CREATE-FILE FILE-CHECK to OLDFILE
831 CLIPBOARD OLDFILE WRITE-FILE
832 \ ?abort" Can't WRITE-FILE "
833 FILE-CHECK
834 OLDFILE REWIND ;
836 [THEN]
838 \\ // \\ // \\ // \\ // \\ // \\ // \\ // \\
840 " dfc2002.txt" r/o open-file throw to oldfile
841 " dfc2002.f" r/o open-file throw to newfile
843 oldfile close-file throw
844 newfile close-file throw
846 .stack
849 : main ( -- )
850 cli-arg-next 1+ argc >= if endcr ." files?\n" 1 bye endif
851 ." first file : " cli-arg-next argv-str type cr
852 cli-arg-next argv-str r/o open-file if ." ERROR: cannot open file!\n" 1 bye endif to oldfile
853 cli-arg-skip
854 ." second file: " cli-arg-next argv-str type cr
855 cli-arg-next argv-str r/o open-file if ." ERROR: cannot open file!\n" 1 bye endif to newfile
857 oldfile close-file throw
858 newfile close-file throw
862 main