Enable large file support
[chromium-blink-merge.git] / courgette / description.html
blob8fe4538bc5139f0eb858d1b1816ef6bf5eba8ae0
1 <h1>Courgette Internals</h1>
3 <h2>Patch Generation</h2>
5 <p><img src="generation.png" alt="Patch Generation" title="" /></p>
7 <ul>
8 <li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch
9 generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li>
10 <li><p>The files are read in by in courgette:SourceStream objects</p></li>
11 <li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which
12 uses MakeGenerator to create
13 patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li>
14 <li><p>PatchGeneratorX86_32's Transform method transforms the input file
15 using Courgette's core techniques that make the bsdiff delta
16 smaller. The steps it takes are the following:</p>
18 <ul>
19 <li><p><em>disassemble</em> the old and new binaries into AssemblyProgram
20 objects,</p></li>
21 <li><p><em>adjust</em> the new AssemblyProgram object, and</p></li>
22 <li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li>
23 </ul></li>
24 </ul>
26 <h3>Disassemble</h3>
28 <ul>
29 <li><p>The input is a pointer to a buffer containing the raw bytes of the
30 input file.</p></li>
31 <li><p>Disassembly converts certain machine instructions that reference
32 addresses to Courgette instructions. It is not actually
33 disassembly, but this is the term the code-base uses. Specifically,
34 it detects instructions that use absolute addresses given by the
35 binary file's relocation table, and relative addresses used in
36 relative branches.</p></li>
37 <li><p>Done by disassemble:ParseDetectedExecutable, which selects the
38 appropriate Disassembler subclass by looking at the binary file's
39 headers.</p>
41 <ul>
42 <li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li>
43 <li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li>
44 <li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li>
45 </ul></li>
46 <li><p>The Disassembler replaces the relocation table with a Courgette
47 instruction that can regenerate the relocation table.</p></li>
48 <li><p>The Disassembler builds a list of addresses referenced by the
49 machine code, numbering each one.</p></li>
50 <li><p>The Disassembler replaces and address used in machine instructions
51 with its index number.</p></li>
52 <li><p>The output is an assembly_program.h:AssemblyProgram class, which
53 contains a list of instructions, machine or Courgette, and a mapping
54 of indices to actual addresses.</p></li>
55 </ul>
57 <h3>Adjust</h3>
59 <ul>
60 <li><p>This step takes the AssemblyProgram for the old file and reassigns
61 the indices that map to actual addresses. It is performed by
62 adjustment_method.cc:Adjust().</p></li>
63 <li><p>The goal is the match the indices from the old program to the new
64 program as closely as possible.</p></li>
65 <li><p>When matched correctly, machine instructions that jump to the
66 function in both the new and old binary will look the same to
67 bsdiff, even the function is located in a different part of the
68 binary.</p></li>
69 </ul>
71 <h3>Encode</h3>
73 <ul>
74 <li><p>This step takes an AssemblyProgram object and encodes both the
75 instructions and the mapping of indices to addresses as byte
76 vectors. This format can be written to a file directly, and is also
77 more appropriate for bsdiffing. It is done by
78 AssemblyProgram.Encode().</p></li>
79 <li><p>encoded_program.h:EncodedProgram defines the binary format and a
80 WriteTo method that writes to a file.</p></li>
81 </ul>
83 <h3>bsdiff</h3>
85 <ul>
86 <li>simple_delta.c:GenerateSimpleDelta</li>
87 </ul>
89 <h2>Patch Application</h2>
91 <p><img src="application.png" alt="Patch Application" title="" /></p>
93 <ul>
94 <li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation
95 by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li>
96 <li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the
97 patch's header, then calls the overloaded version of
98 ensemble_create.cc:ApplyEnsemblePatch.</p></li>
99 <li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication
100 object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32
101 objects for the sections in the patch.</p></li>
102 <li><p>The original file is disassembled and encoded via a call
103 EnsemblePatchApplication.TransformUp, which in turn call
104 patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li>
105 <li><p>The transformed file is then bspatched via
106 EnsemblePatchApplication.SubpatchTransformedElements, which calls
107 EnsemblePatchApplication.SubpatchStreamSets, which calls
108 simple_delta.cc:ApplySimpleDelta, Courgette's built-in
109 implementation of bspatch.</p></li>
110 <li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e.,
111 reverses the encoding and disassembly, on the patched binary data.
112 This is done by calling PatcherX86<em>32.Reform, which in turn calls
113 the global function encoded</em>program.cc:Assemble, which calls
114 EncodedProgram.AssembleTo.</p></li>
115 </ul>
117 <h2>Glossary</h2>
119 <p><strong>Adjust</strong>: Reassign address indices in the new program to match more
120 closely those from the old.</p>
122 <p><strong>Assembly program</strong>: The output of <em>disassembly</em>. Contains a list of
123 <em>Courgette instructions</em> and an index of branch target addresses.</p>
125 <p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file
126 by evaluating the <em>Courgette instructions</em> and leaving the machine
127 instructions in place.</p>
129 <p><strong>Courgette instruction</strong>: Replaces machine instructions in the
130 program. Courgette instructions replace branches with an index to
131 the target addresses and replace part of the relocation table.</p>
133 <p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assembly
134 program</em>.</p>
136 <p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>encoded program</em> by
137 serializing its data structures into byte vectors more appropriate
138 for storage in a file.</p>
140 <p><strong>Encoded Program</strong>: The output of encoding.</p>
142 <p><strong>Ensemble</strong>: A Courgette-style patch containing sections for the list
143 of branch addresses, the encoded program. It supports patching
144 multiple object files at once.</p>
146 <p><strong>Opcode</strong>: The number corresponding to either a machine or <em>Courgette
147 instruction</em>.</p>