README

   1 Index.
   2
   3         0. Advice.
   4         0.a. AUTHORS
   5         0.b. LICENSE
   6         1. What is mx3r?
   7         2. What mx3r is not?
   8         3. Compiling and installing mx3r
   9         4. Theory behind mx3r
  10         5. Why a 32 bits hash?
  11 ___________________________________
  12
  13         0. Advice.
  14
  15 This program, called mx3r, is under heavy development, in a very very early
  16 development stage. At the moment I wrote this lines, the program cannot solve
  17 a simple merge.
  18
  19
  20         0.a. AUTHORS
  21
  22 - David Martínez Martí <deavidsedice@gmail.com>
  23
  24
  25         0.b. LICENSE
  26
  27     This program is free software; you can redistribute it and/or modify
  28     it under the terms of the GNU General Public License as published by
  29     the Free Software Foundation; either version 2 of the License, or
  30     (at your option) any later version.
  31
  32     This program is distributed in the hope that it will be useful,
  33     but WITHOUT ANY WARRANTY; without even the implied warranty of
  34     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  35     GNU General Public License for more details.
  36
  37     You should have received a copy of the GNU General Public License along
  38     with this program; if not, write to the Free Software Foundation, Inc.,
  39     51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
  40
  41
  42 You can read the full license at:
  43 http://www.gnu.org/licenses/old-licenses/gpl-2.0.txt
  44
  45
  46         1. What is mx3r?
  47
  48 mx3r is a program intended for merging files when a typical diff/patch/merge
  49 fails. It will use a sort of algorithms that will search patterns inside
  50 those files and see how a file has changed, to reproduce it into the other
  51 file. It will try to use a search by line pattern, by word-pattern, and last,
  52 it will try to know what type of file it is to know how the changes should be
  53 copied and pasted onto the other file.
  54
  55 Mx3r is intended for mixing those project files that are very very different,
  56 because both programmers have coded by their own, without knowing about the
  57 other one. Then you want all the features added by the developer A, and all
  58 the features added by the developer B.
  59
  60 A standard merge, or diff-patch process would end in a lot of rejects that
  61 you'll try to apply them manually. You can waste a week or two in this
  62 process.
  63
  64 In the future, when mx3r becomes usable, you will be able to use mx3r to
  65 resolve that process. It will recieve the BASE, REMOTE and LOCAL files from a
  66 SCM like GIT or SVN, for example. And then, it will try to determine what
  67 happened from file A to B, and from A to C, for, finally, replay the A->B
  68 changes into C, and the A->C changes into B. The process ends when B and C
  69 are almost the same file.
  70
  71
  72         2. What mx3r is not?
  73
  74 It is not a traditional merge nor an alternative to diff & patch.
  75 You should try always a standard three-way merge, and this program has
  76 nothing to do with the work of diff & patch. It does not read diffs and
  77 cannot create those files.
  78
  79
  80         3. Compiling and installing mx3r
  81
  82 There aren't any makefile yet. Then you can't compile & install by the
  83 typical way (make && sudo make install).
  84 I've written a shell script named "run" that compiles it, and executes it
  85 without arguments. Execute ./run, and you'll see a executable called "mx3r"
  86 on the same folder.
  87
  88 If you wish to install it in your system, copy it into /usr/bin.
  89
  90
  91         4. Theory behind mx3r
  92
  93 A Diff program tries to know what happened to a file, seeking it from top to
  94 bottom. Some diff programs are only for small changes, they cannot see what
  95 happened to a block of text. But it is possible to know if a block of text
  96 was changed, copied, or moved.
  97
  98 The way that mx3r does that is by, first, preprocessing each line data and
  99 deleting some spaces, repeated characters, etc. Then, mx3r creates a 32 bits
 100 hash of that line (that can be done with a CRC32, or cutting the first 32
 101 bits of any other hash algorithm). Then, with a index of lines, we compare
 102 those numbers in the file A and B, and we try to find large blocks of
 103 unmodified data. Every pass, mx3r reduces to a half the minimum size of block
 104 detection. After five or six passes, mx3r know all the blocks that remain
 105 equal (but in other position or order) in the other file.
 106
 107 After that, we could do other pass to try to know where the undetected lines
 108 were only a modified line, or a totally new line. That can be done by
 109 searching patterns on word sequences.
 110
 111 The goal of all of this is, to know how a file it's meant to be changed, *not
 112 how different is*.
 113
 114 Mx3r is unfinished yet. There are a lot of things that must be done before
 115 mx3r comes functional.
 116
 117 But, the idea is:
 118
 119 STAGE A - SCAN
 120
 121         1.- Split the file in big registers. (Lines, paragraphs, sections, XML tags)
 122         2.- Preprocess the content of those registers for reducing differences
 123 between very similar registers.
 124         3.- Create a 32 bits Hash of each preprocessed register.
 125                         (See 5. Why a 32 bits hash? )
 126         4.- Search the 32bit vector for blocks of registers that have the same hash
 127 as in the other file.
 128         5.- Try to know what happened to the registers that aren't inside a block.
 129 To do this, join these registers, and split them by something that produces
 130 smaller registers and try again.
 131
 132 STAGE B - DETERMINE ACTIONS
 133
 134         1.- Select the mixing direction: B changes into C or C changes into B.
 135         (For this example, B into C)
 136         2.- Determine where are the changes, in wich byte start and their range. If
 137 we're adding, editing, or deleting something.
 138         3.- Determine the type of the file, determine the best parser plugin for it.
 139         4.- Send this information to the parser. The parser will be a external
 140 plugin or helper program that recieves the input and knows the syntax for one
 141 language. The parser should return that information changed, placing the
 142 limits correctly for the syntax of the file.
 143
 144 STAGE C - MIX IT
 145
 146         1.- Read B modifications, and try to apply them to the file C. You know
 147 exactly in wich byte should add these modifications, or where you'll find the
 148 code for deleting or edit it. You should update de block index as well.
 149         2.- Optionally, send all relevant data to a external mx3r plugin instead of
 150 trying it yourself. If the parser has a related mx3r plugin, it should be
 151 better to give this work to the specific plugin.
 152
 153 This method is slower than a standard merge, of course. But it has knowledge
 154 about a lot of things that diff & patch can't see.
 155
 156 That's why I don't call this program a merge: Because the goal of mx3r and a
 157 merge are a bit different. Diff-Patch and Merge are programs intended to
 158 replicate some actions into other files. Mx3r tries to know what should be
 159 done to a file, to do all the modifications done in the other files. While
 160 diff & patch are good tools to extract a history of changes and reply it, mx3r
 161 is not. I cannot guarantee that a mix done in the future version 1.x, can be
 162 replicated byte-to-byte in another previous or future version of mx3r.
 163
 164 Mx3r acts as a human (comparing patterns, and doing some work sintactically),
 165 and can make some mistakes.
 166
 167
 168         5. Why a 32 bits hash?
 169
 170 Well, you can think that, a 32 bits hash is very insecure to define your
 171 code. That you can get a CRC32 collision, and get a line detected as it was
 172 part of a thing that does not belong to.
 173
 174 But that's not true. The probability of a collision in a 32 bits hash is
 175 aprox. one each 2^30. One each billion of tests. And the probability about
 176 getting a group of 6 hashes wrong, will be (2^30)^6. That is: 0.0% of
 177 probability.
 178
 179 And if you get a single line wrong, don't worry. Mx3r doesn't think that line
 180 is equal to the other one. The program only knows that his position is next
 181 to the other lines.
 182
 183 A 32 bits hash is not a matter of security. It is a matter of simplicity and
 184 speed. Compare two 32 bit variable is a lot cheaper and easier than the whole
 185 string. In the future it can be replaced with a 64 bits hash (from a MD5 or
 186 SHA1, for example), when most of us will be running a 64 bit Operating System.
 187
 188