Add copyright claims through to 2020
[wiggle/upstream.git] / diff.c
blob65b06cbb2aebacd3135d7867ca9d9f72b27fee85
1 /*
2 * wiggle - apply rejected patches
4 * Copyright (C) 2003 Neil Brown <neilb@cse.unsw.edu.au>
5 * Copyright (C) 2011-2013 Neil Brown <neilb@suse.de>
6 * Copyright (C) 2014-2020 Neil Brown <neil@brown.name>
9 * This program is free software; you can redistribute it and/or modify
10 * it under the terms of the GNU General Public License as published by
11 * the Free Software Foundation; either version 2 of the License, or
12 * (at your option) any later version.
14 * This program is distributed in the hope that it will be useful,
15 * but WITHOUT ANY WARRANTY; without even the implied warranty of
16 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17 * GNU General Public License for more details.
19 * You should have received a copy of the GNU General Public License
20 * along with this program.
22 * Author: Neil Brown
23 * Email: <neil@brown.name>
27 * Calculate longest common subsequence between two sequences
29 * Each sequence contains strings with
30 * hash start length
31 * We produce a list of tripples: a b len
32 * where A and B point to elements in the two sequences, and len is the number
33 * of common elements there. The list is terminated by an entry with len==0.
35 * This is roughly based on
36 * "An O(ND) Difference Algorithm and its Variations", Eugene Myers,
37 * Algorithmica Vol. 1 No. 2, 1986, pp. 251-266;
38 * http://xmailserver.org/diff2.pdf
40 * However we don't run the basic algorithm both forward and backward until
41 * we find an overlap as Myers suggests. Rather we always run forwards, but
42 * we record the location of the (possibly empty) snake that crosses the
43 * midline. When we finish, this recorded location for the best path shows
44 * us where to divide and find further midpoints.
46 * In brief, the algorithm is as follows.
48 * Imagine a Cartesian Matrix where x co-ordinates correspond to symbols in
49 * the first sequence (A, length a) and y co-ordinates correspond to symbols
50 * in the second sequence (B, length b). At the origin we have the first
51 * sequence.
52 * Movement in the x direction represents deleting the symbol as that point,
53 * so from x=i-1 to x=i deletes symbol i from A.
55 * Movement in the y direction represents adding the corresponding symbol
56 * from B. So to move from the origin 'a' spaces along X and then 'b' spaces
57 * up Y will remove all of the first sequence and then add all of the second
58 * sequence. Similarly moving firstly up the Y axis, then along the X
59 * direction will add the new sequence, then remove the old sequence. Thus
60 * the point a,b represents the second sequence and a part from 0,0 to a,b
61 * represent an sequence of edits to change A into B.
63 * There are clearly many paths from 0,0 to a,b going through different
64 * points in the matrix in different orders. At some points in the matrix
65 * the next symbol to be added from B is the same as the next symbol to be
66 * removed from A. At these points we can take a diagonal step to a new
67 * point in the matrix without actually changing any symbol. A sequence of
68 * these diagonal steps is called a 'snake'. The goal then is to find a path
69 * of x-steps (removals), y-steps (additions) and diagonals (steps and
70 * snakes) where the number of (non-diagonal) steps is minimal.
72 * i.e. we aim for as many long snakes as possible.
73 * If the total number of 'steps' is called the 'cost', we aim to minimise
74 * the cost.
76 * As storing the whole matrix in memory would be prohibitive with large
77 * sequences we limit ourselves to linear storage proportional to a+b and
78 * repeat the search at most log2(a+b) times building up the path as we go.
79 * Specifically we perform a search on the full matrix and record where each
80 * path crosses the half-way point. i.e. where x+y = (a+b)/2 (== mid). This
81 * tells us the mid point of the best path. We then perform two searches,
82 * one on each of the two halves and find the 1/4 and 3/4 way points. This
83 * continues recursively until we have all points.
85 * The storage is an array v of 'struct v'. This is indexed by the
86 * diagonal-number k = x-y. Thus k can be negative and the array is
87 * allocated to allow for that. During the search there is an implicit value
88 * 'c' which is the cost (length in steps) of all the paths currently under
89 * consideration.
90 * v[k] stores details of the longest reaching path of cost c that finishes
91 * on diagonal k. "longest reaching" means "finishes closest to a,b".
92 * Details are:
93 * The location of the end point. 'x' is stored. y = x - k.
94 * The diagonal of the midpoint crossing. md is stored. x = (mid + md)/2
95 * y = (mid - md)/2
96 * = x - md
97 * (note: md is a diagonal so md = x-y. mid is an anti-diagonal: mid = x+y)
98 * The number of 'snakes' in the path (l). This is used to allocate the
99 * array which will record the snakes and to terminate recursion.
101 * A path with an even cost (c is even) must end on an even diagonal (k is
102 * even) and when c is odd, k must be odd. So the v[] array is treated as
103 * two sub arrays, the even part and the odd part. One represents paths of
104 * cost 'c', the other paths of cost c-1.
106 * Initially only v[0] is meaningful and there are no snakes. We firstly
107 * extend all paths under consideration with the longest possible snake on
108 * that diagonal.
110 * Then we increment 'c' and calculate for each suitable 'k' whether the best
111 * path to diagonal k of cost c comes from taking an x-step from the c-1 path
112 * on diagonal k-1, or from taking a y-step from the c-1 path on diagonal
113 * k+1. Obviously we need to avoid stepping out of the matrix. Finally we
114 * check if the 'v' array can be extended or reduced at the boundaries. If
115 * we hit a border we must reduce. If the best we could possibly do on that
116 * diagonal is less than the worst result from the current leading path, then
117 * we also reduce. Otherwise we extend the range of 'k's we consider.
119 * We continue until we find a path has reached a,b. This must be a minimal
120 * cost path (cost==c). At this point re-check the end of the snake at the
121 * midpoint and report that.
123 * This all happens recursively for smaller and smaller subranges stopping
124 * when we examine a submatrix and find that it contains no snakes. As we
125 * are usually dealing with sub-matrixes we are not walking from 0,0 to a,b
126 * from alo,blo to ahi,bhi - low point to high point. So the initial k is
127 * alo-blo, not 0.
131 #include "wiggle.h"
132 #include <stdlib.h>
133 #include <sys/time.h>
135 struct v {
136 int x; /* x location of furthest reaching path of current cost */
137 int md; /* diagonal location of midline crossing */
138 int l; /* number of continuous common sequences found so far */
141 static int find_common(struct file *a, struct file *b,
142 int *alop, int *ahip,
143 int *blop, int *bhip,
144 struct v *v, int shortcut)
146 /* Examine matrix from alo to ahi and blo to bhi.
147 * i.e. including alo and blo, but less than ahi and bhi.
148 * Finding longest subsequence and
149 * return new {a,b}{lo,hi} either side of midline.
150 * i.e. mid = ( (ahi-alo) + (bhi-blo) ) / 2
151 * alo+blo <= mid <= ahi+bhi
152 * and alo,blo to ahi,bhi is a common (possibly empty)
153 * subseq - a snake.
155 * v is scratch space which is indexable from
156 * alo-bhi to ahi-blo inclusive.
157 * i.e. even though there is no symbol at ahi or bhi, we do
158 * consider paths that reach there as they simply cannot
159 * go further in that direction.
161 * Return the number of snakes found.
164 struct timeval start, stop;
165 int klo, khi;
166 int alo = *alop;
167 int ahi = *ahip;
168 int blo = *blop;
169 int bhi = *bhip;
171 int mid = (ahi+bhi+alo+blo)/2;
173 /* 'worst' is the worst-case extra cost that we need
174 * to pay before reaching our destination. It assumes
175 * no more snakes in the furthest-reaching path so far.
176 * We use this to know when we can trim the extreme
177 * diagonals - when their best case does not improve on
178 * the current worst case.
180 int worst = (ahi-alo)+(bhi-blo);
182 int loopcount = -1;
183 shortcut = !!shortcut;
184 if (shortcut) {
185 char *lc = getenv("WIGGLE_LOOPCOUNT");
186 if (lc)
187 loopcount = atoi(lc);
188 if (loopcount < 5) {
189 loopcount = -1;
190 gettimeofday(&start, NULL);
194 klo = khi = alo-blo;
195 v[klo].x = alo;
196 v[klo].l = 0;
198 while (1) {
199 int x, y;
200 int cost;
201 int k;
203 if (loopcount > 0)
204 loopcount -= 1;
205 if (shortcut == 1 &&
206 khi - klo > 5000 &&
207 (loopcount == 0 ||
208 (loopcount < 0 &&
209 gettimeofday(&stop, NULL) == 0 &&
210 (stop.tv_sec - start.tv_sec) * 1000000 +
211 (stop.tv_usec - start.tv_usec) > 20000)))
212 /* 20ms is a long time. Time to take a shortcut
213 * Next snake wins
215 shortcut = 2;
216 /* Find the longest snake extending on each current
217 * diagonal, and record if it crosses the midline.
218 * If we reach the end, return.
220 for (k = klo ; k <= khi ; k += 2) {
221 int snake = 0;
223 x = v[k].x;
224 y = x-k;
225 if (y > bhi)
226 abort();
228 /* Follow any snake that is here */
229 while (x < ahi && y < bhi &&
230 match(&a->list[x], &b->list[y])
232 x++;
233 y++;
234 snake = 1;
237 /* Refine the worst-case remaining cost */
238 cost = (ahi-x)+(bhi-y);
239 if (cost < worst) {
240 worst = cost;
241 if (snake && shortcut == 2) {
242 *alop = v[k].x;
243 *blop = v[k].x - k;
244 *ahip = x;
245 *bhip = y;
246 return 1;
249 /* Check for midline crossing */
250 if (x+y >= mid &&
251 v[k].x + v[k].x-k <= mid)
252 v[k].md = k;
254 v[k].x = x;
255 v[k].l += snake;
257 if (cost == 0) {
258 /* OK! We have arrived.
259 * We crossed the midpoint on diagonal v[k].md
261 if (x != ahi)
262 abort();
264 /* The snake could start earlier than the
265 * midline. We cannot just search backwards
266 * as that might find the wrong path - the
267 * greediness of the diff algorithm is
268 * asymmetric.
269 * We could record the start of the snake in
270 * 'v', but we will find the actual snake when
271 * we recurse so there is no need.
273 x = (v[k].md+mid)/2;
274 y = x-v[k].md;
276 *alop = x;
277 *blop = y;
279 /* Find the end of the snake using the same
280 * greedy approach as when we first found the
281 * snake
283 while (x < ahi && y < bhi &&
284 match(&a->list[x], &b->list[y])
286 x++;
287 y++;
289 *ahip = x;
290 *bhip = y;
292 return v[k].l;
296 /* No success with previous cost, so increment cost (c) by 1
297 * and for each other diagonal, set from the end point of the
298 * diagonal on one side of it or the other.
300 for (k = klo+1; k <= khi-1 ; k += 2) {
301 if (v[k-1].x+1 > ahi) {
302 /* cannot step to the right from previous
303 * diagonal as there is no room.
304 * So step up from next diagonal.
306 v[k] = v[k+1];
307 } else if (v[k+1].x - k > bhi || v[k-1].x+1 >= v[k+1].x) {
308 /* Cannot step up from next diagonal as either
309 * there is no room, or doing so wouldn't get us
310 * as close to the endpoint.
311 * So step to the right.
313 v[k] = v[k-1];
314 v[k].x++;
315 } else {
316 /* There is room in both directions, but
317 * stepping up from the next diagonal gets us
318 * closer
320 v[k] = v[k+1];
324 /* Now we need to either extend or contract klo and khi
325 * so they both change parity (odd vs even).
326 * If we extend we need to step up (for klo) or to the
327 * right (khi) from the adjacent diagonal. This is
328 * not possible if we have hit the edge of the matrix, and
329 * not sensible if the new point has a best case remaining
330 * cost that is worse than our current worst case remaining
331 * cost.
332 * The best-case remaining cost is the absolute difference
333 * between the remaining number of additions and the remaining
334 * number of deletions - and assumes lots of snakes.
336 /* new location if we step up from klo to klo-1*/
337 x = v[klo].x; y = x - (klo-1);
338 cost = abs((ahi-x)-(bhi-y));
339 klo--;
340 if (y <= bhi && cost <= worst) {
341 /* Looks acceptable - step up. */
342 v[klo] = v[klo+1];
343 } else do {
344 klo += 2;
345 x = v[klo].x; y = x - (klo-1);
346 cost = abs((ahi-x)-(bhi-y));
347 } while (cost > worst);
349 /* new location if we step to the right from khi to khi+1 */
350 x = v[khi].x+1; y = x - (khi+1);
351 cost = abs((ahi-x)-(bhi-y));
352 khi++;
353 if (x <= ahi && cost <= worst) {
354 /* Looks acceptable - step to the right */
355 v[khi] = v[khi-1];
356 v[khi].x++;
357 } else do {
358 khi -= 2;
359 x = v[khi].x+1; y = x - (khi+1);
360 cost = abs((ahi-x)-(bhi-y));
361 } while (cost > worst);
365 struct cslb {
366 int size; /* How much is alloced */
367 int len; /* How much is used */
368 struct csl *csl;
371 static void csl_add(struct cslb *buf, int a, int b, int len)
373 struct csl *csl;
374 if (len && buf->len) {
375 csl = buf->csl + buf->len - 1;
376 if (csl->a + csl->len == a &&
377 csl->b + csl->len == b) {
378 csl->len += len;
379 return;
382 if (buf->size <= buf->len) {
383 if (buf->size < 64)
384 buf->size = 64;
385 else
386 buf->size += buf->size;
387 buf->csl = realloc(buf->csl, sizeof(buf->csl[0]) * buf->size);
389 csl = buf->csl + buf->len;
390 csl->len = len;
391 csl->a = a;
392 csl->b = b;
393 buf->len += 1;
396 static void lcsl(struct file *a, int alo, int ahi,
397 struct file *b, int blo, int bhi,
398 struct cslb *cslb,
399 struct v *v, int shortcut)
401 /* lcsl == longest common sub-list.
402 * This calls itself recursively as it finds the midpoint
403 * of the best path.
404 * On first call, 'csl' is NULL and will need to be allocated and
405 * is returned.
406 * On subsequence calls when 'csl' is not NULL, we add all the
407 * snakes we find to csl, and return a pointer to the next
408 * location where future snakes can be stored.
410 int alo1 = alo;
411 int ahi1 = ahi;
412 int blo1 = blo;
413 int bhi1 = bhi;
415 if (ahi <= alo || bhi <= blo)
416 return;
418 if (!find_common(a, b,
419 &alo1, &ahi1,
420 &blo1, &bhi1,
421 v, shortcut))
422 return;
424 /* There are more snakes to find - keep looking. */
426 /* With depth-first recursion, this adds all the snakes
427 * before 'alo1' to 'csl'
429 lcsl(a, alo, alo1,
430 b, blo, blo1,
431 cslb, v, 0);
433 if (ahi1 > alo1)
434 /* need to add this common seq, possibly attach
435 * to last
437 csl_add(cslb, alo1, blo1, ahi1 - alo1);
439 /* Now recurse to add all the snakes after ahi1 to csl */
440 lcsl(a, ahi1, ahi,
441 b, bhi1, bhi,
442 cslb, v, shortcut);
445 /* If two common sequences are separated by only an add or remove,
446 * and the first sequence ends the same as the middle text,
447 * extend the second and contract the first in the hope that the
448 * first might become empty. This ameliorates against the greediness
449 * of the 'diff' algorithm.
450 * i.e. if we have:
451 * [ foo X ] X [ bar ]
452 * [ foo X ] [ bar ]
453 * Then change it to:
454 * [ foo ] X [ X bar ]
455 * [ foo ] [ X bar ]
456 * We treat the final zero-length 'csl' as a common sequence which
457 * can be extended so we must make sure to add a new zero-length csl
458 * to the end.
459 * If this doesn't make the first sequence disappear, and (one of the)
460 * X(s) was a newline, then move back so the newline is at the end
461 * of the first sequence. This encourages common sequences
462 * to be whole-line units where possible.
464 static void fixup(struct file *a, struct file *b, struct csl *list)
466 struct csl *list1, *orig;
467 int lasteol = -1;
468 int found_end = 0;
470 if (!list)
471 return;
473 /* 'list' and 'list1' are adjacent pointers into the csl.
474 * If a match gets deleted, they might not be physically
475 * adjacent any more. Once we get to the end of the list
476 * this will cease to matter - the list will be a bit
477 * shorter is all.
479 orig = list;
480 list1 = list+1;
481 while (list->len) {
482 if (list1->len == 0)
483 found_end = 1;
485 /* If a single token is either inserted or deleted
486 * immediately after a matching token...
488 if ((list->a+list->len == list1->a &&
489 list->b+list->len != list1->b &&
490 /* text at b inserted */
491 match(&b->list[list->b+list->len-1],
492 &b->list[list1->b-1])
495 (list->b+list->len == list1->b &&
496 list->a+list->len != list1->a &&
497 /* text at a deleted */
498 match(&a->list[list->a+list->len-1],
499 &a->list[list1->a-1])
502 /* If the last common token is a simple end-of-line
503 * record where it is. For a word-wise diff, this is
504 * any EOL. For a line-wise diff this is a blank line.
505 * If we are looking at a deletion it must be deleting
506 * the eol, so record that deleted eol.
508 if (ends_line(a->list[list->a+list->len-1])
509 && a->list[list->a+list->len-1].len == 1
510 && lasteol == -1
512 lasteol = list1->a-1;
514 /* Expand the second match, shrink the first */
515 list1->a--;
516 list1->b--;
517 list1->len++;
518 list->len--;
520 /* If the first match has become empty, make it
521 * disappear.. (and forget about the eol).
523 if (list->len == 0) {
524 lasteol = -1;
525 if (found_end) {
526 /* Deleting just before the last
527 * entry */
528 *list = *list1;
529 list1->a += list1->len;
530 list1->b += list1->len;
531 list1->len = 0;
532 } else if (list > orig)
533 /* Deleting in the middle */
534 list--;
535 else {
536 /* deleting the first entry */
537 *list = *list1++;
540 } else {
541 /* Nothing interesting here, though if we
542 * shuffled back past an eol, shuffle
543 * forward to line up with that eol.
544 * This causes an eol to bind more strongly
545 * with the preceding line than the following.
547 if (lasteol >= 0) {
548 while (list1->a <= lasteol
549 && (list1->len > 1 ||
550 (found_end && list1->len > 0))) {
551 list1->a++;
552 list1->b++;
553 list1->len--;
554 list->len++;
556 lasteol = -1;
558 *++list = *list1;
559 if (found_end) {
560 list1->a += list1->len;
561 list1->b += list1->len;
562 list1->len = 0;
563 } else
564 list1++;
566 if (list->len && list1 == list)
567 abort();
571 static int elcmp(const void *v1, const void *v2)
573 const struct elmnt *e1 = v1;
574 const struct elmnt *e2 = v2;
576 if (e1->hash != e2->hash) {
577 if (e1->hash < e2->hash)
578 return -1;
579 return 1;
581 if (e1->start[0] == 0 && e2->start[0] == 0)
582 return 0;
583 if (e1->len != e2->len)
584 return e1->len - e2->len;
585 return strncmp(e1->start, e2->start, e1->len);
588 #define BPL (sizeof(unsigned long) * 8)
589 static struct file filter_unique(struct file f, struct file ref)
591 /* Use a bloom-filter to record all hashes in 'ref' and
592 * then if there are consequtive entries in 'f' that are
593 * not in 'ref', reduce each such run to 1 entry
595 struct file n;
596 int fi, cnt;
597 struct file sorted;
599 sorted.list = wiggle_xmalloc(sizeof(sorted.list[0]) * ref.elcnt);
600 sorted.elcnt = ref.elcnt;
601 memcpy(sorted.list, ref.list, sizeof(sorted.list[0]) * sorted.elcnt);
602 qsort(sorted.list, sorted.elcnt, sizeof(sorted.list[0]),
603 elcmp);
605 n.list = wiggle_xmalloc(sizeof(n.list[0]) * f.elcnt);
606 n.elcnt = 0;
607 cnt = 0;
608 for (fi = 0; fi < f.elcnt; fi++) {
609 int lo = 0, hi = sorted.elcnt;
610 while (lo + 1 < hi) {
611 int mid = (lo + hi) / 2;
612 if (elcmp(&f.list[fi], &sorted.list[mid]) < 0)
613 hi = mid;
614 else
615 lo = mid;
617 if (match(&f.list[fi], &sorted.list[lo]))
618 cnt = 0;
619 else
620 cnt += 1;
621 if (cnt <= 1)
622 n.list[n.elcnt++] = f.list[fi];
624 free(sorted.list);
625 return n;
628 static void remap(struct csl *csl, int which, struct file from, struct file to)
630 /* The a,b pointer in csl points to 'from' we need to remap to 'to'.
631 * 'to' has everything that 'from' has, plus more.
632 * Each list[].start is unique
634 int ti = 0;
635 while (csl->len) {
636 int fi = which ? csl->b : csl->a;
637 while (to.list[ti].start != from.list[fi].start) {
638 ti += 1;
639 if (ti > to.elcnt)
640 abort();
642 if (which)
643 csl->b = ti;
644 else
645 csl->a = ti;
646 csl += 1;
648 if (which)
649 csl->b = to.elcnt;
650 else
651 csl->a = to.elcnt;
653 /* Main entry point - find the common-sub-list of files 'a' and 'b'.
654 * The final element in the list will have 'len' == 0 and will point
655 * beyond the end of the files.
657 struct csl *wiggle_diff(struct file a, struct file b, int shortest)
659 struct v *v;
660 struct cslb cslb = {};
661 struct file af, bf;
663 /* Remove runs of 2 or more elements in one file that don't
664 * exist in the other file. This often makes the number of
665 * elements more manageable.
667 af = filter_unique(a, b);
668 bf = filter_unique(b, a);
670 v = wiggle_xmalloc(sizeof(struct v)*(af.elcnt+bf.elcnt+2));
671 v += bf.elcnt+1;
673 lcsl(&af, 0, af.elcnt,
674 &bf, 0, bf.elcnt,
675 &cslb, v, !shortest);
676 csl_add(&cslb, af.elcnt, bf.elcnt, 0);
677 free(v-(bf.elcnt+1));
678 remap(cslb.csl, 0, af, a);
679 remap(cslb.csl, 1, bf, b);
680 free(af.list);
681 free(bf.list);
682 fixup(&a, &b, cslb.csl);
683 return cslb.csl;
686 /* Alternate entry point - find the common-sub-list in two
687 * subranges of files.
689 struct csl *wiggle_diff_partial(struct file a, struct file b,
690 int alo, int ahi, int blo, int bhi)
692 struct v *v;
693 struct cslb cslb = {};
694 v = wiggle_xmalloc(sizeof(struct v)*(ahi-alo+bhi-blo+2));
695 v += bhi-alo+1;
697 lcsl(&a, alo, ahi,
698 &b, blo, bhi,
699 &cslb, v, 0);
700 csl_add(&cslb, ahi, bhi, 0);
701 free(v-(bhi-alo+1));
702 fixup(&a, &b, cslb.csl);
703 return cslb.csl;
706 struct csl *wiggle_csl_join(struct csl *c1, struct csl *c2)
708 int cnt1, cnt2;
709 int offset = 0;
710 if (c1 == NULL)
711 return c2;
712 if (c2 == NULL)
713 return c1;
715 for (cnt1 = 0; c1[cnt1].len; cnt1++)
717 for (cnt2 = 0; c2[cnt2].len; cnt2++)
719 if (cnt1 && cnt2 &&
720 c1[cnt1-1].a + c1[cnt1-1].len == c2[0].a &&
721 c1[cnt1-1].b + c1[cnt1-1].len == c2[0].b) {
722 /* Merge these two */
723 c1[cnt1-1].len += c2[0].len;
724 offset = 1;
725 cnt2--;
727 c1 = realloc(c1, (cnt1+cnt2+1)*sizeof(*c1));
728 while (cnt2 >= 0) {
729 c1[cnt1+cnt2] = c2[cnt2 + offset];
730 cnt2--;
732 free(c2);
733 return c1;
736 /* When rediffing a patch, we *must* make sure the hunk headers
737 * line up. So don't do a full diff, but rather find the hunk
738 * headers and diff the bits between them.
740 struct csl *wiggle_diff_patch(struct file a, struct file b, int shortest)
742 int ap, bp;
743 struct csl *csl = NULL;
744 if (a.elcnt == 0 || b.elcnt == 0 ||
745 a.list[0].start[0] != '\0' ||
746 b.list[0].start[0] != '\0')
747 /* this is not a patch */
748 return wiggle_diff(a, b, shortest);
750 ap = 0; bp = 0;
751 while (ap < a.elcnt && bp < b.elcnt) {
752 int alo = ap;
753 int blo = bp;
754 struct csl *cs;
757 ap++;
758 while (ap < a.elcnt &&
759 a.list[ap].start[0] != '\0');
761 bp++;
762 while (bp < b.elcnt &&
763 b.list[bp].start[0] != '\0');
764 cs = wiggle_diff_partial(a, b, alo, ap, blo, bp);
765 csl = wiggle_csl_join(csl, cs);
767 return csl;
770 #ifdef MAIN
772 main(int argc, char *argv[])
774 struct file a, b;
775 struct csl *csl;
776 struct elmnt *lst = wiggle_xmalloc(argc*sizeof(*lst));
777 int arg;
778 struct v *v;
779 int ln;
781 arg = 1;
782 a.elcnt = 0;
783 a.list = lst;
784 while (argv[arg] && strcmp(argv[arg], "--")) {
785 lst->hash = 0;
786 lst->start = argv[arg];
787 lst->len = strlen(argv[arg]);
788 a.elcnt++;
789 lst++;
790 arg++;
792 if (!argv[arg]) {
793 printf("AARGH\n");
794 exit(1);
796 arg++;
797 b.elcnt = 0;
798 b.list = lst;
799 while (argv[arg] && strcmp(argv[arg], "--")) {
800 lst->hash = 0;
801 lst->start = argv[arg];
802 lst->len = strlen(argv[arg]);
803 b.elcnt++;
804 lst++;
805 arg++;
808 csl = wiggle_diff(a, b, 1);
809 fixup(&a, &b, csl);
810 while (csl && csl->len) {
811 int i;
812 printf("%d,%d for %d:\n", csl->a, csl->b, csl->len);
813 for (i = 0; i < csl->len; i++) {
814 printf(" %.*s (%.*s)\n",
815 a.list[csl->a+i].len, a.list[csl->a+i].start,
816 b.list[csl->b+i].len, b.list[csl->b+i].start);
818 csl++;
821 exit(0);
824 #endif