1 [ At the time of writing (1.7-dev), FSFS format 5 was current. Since then
2 format 5 has been retracted; it was never released. 1.7 shipped with f4 and
3 1.8 will most likely ship with f6. Format 5 was current during 1.7-dev and
4 packed revprops into an authoritative SQLite database. ]
9 The FSFS external data format can be changed to allow for
10 significantly reduced I/O overhead without changing the
11 fundamental ideas behind FSFS.
13 The whole idea is to rearrange packed revision info in a
14 new FSFS format "6". A two-way conversion between format
15 "5" and "6" should be possible as well as supporting both
16 formats for different repositories on the same server.
22 FSFS format "5" packs revisions by putting revision 0 at
23 the beginning of the file and concatenating the others in
24 ascending order. This intuitive design is counter-productive
25 because we always trace data HEAD -> r0 and scanning a file
26 backwards is expensive.
41 A counter-intuitive but more efficient reversed order (de-
42 scending revision order in a packed file) results in always
43 reading forward through a file.
59 Don't Interleave Revision and Content Data
60 ------------------------------------------
62 Format "5" keeps each revision as a single block. When
63 re-constructing a node content from the diffs, we jump
64 through a number of distant revision blocks. For each node.
85 Instead, delta-info should be removed from the revision
86 blocks and put at the end of the file. This speeds up
87 header-only operations like "log" without impairing node
88 content lookup. And it lays the foundation for further
114 The only downside is that putting headers first is somewhat
115 more computationally expensive since offsets needs to be
116 calculated in advance. This is made even more difficult by
117 using a variable length encoding for those offsets.
120 Group the Representations by Delta Tree
121 ---------------------------------------
123 All diffs that are based on the same node within the packed
124 revision file are put in one place. Re-constructing a node
125 would then involve reading the revision headers (relatively
126 close together) and the content deltas after that (again
152 We can optimize that even further by exploiting the skip-
153 delta information: the "walks" through the delta tree for a
154 given node will merge bit by bit into a main "trunk". The
155 nodes on these paths can be re-ordered such that very few
156 seek() operations are required on average to reconstruct
157 the node content in this file.
159 Example of 8 changes in revs r0 .. r7 (for simplicity),
160 looking at the delta-info only ("->" means "stored as diff
172 Default ordering r0,r1,r2,r3,r4,r5,r6,r7 requires
173 1/1/2/2/2/2/3/3 seeks. For 2^N changes (N integer > 0),
174 we need (N+1)/2 seeks on average.
176 A path-optimized ordering r0,r4,r6,r7,r5,r2,r3,r1
177 requires 1/1/1/1/2/2/2/2 seeks. It's (N+3)/4 on average.
178 That optimized ordering can easily be constructed by
180 * select the highest rev not covered, yet
181 * add its diff path to the existing result
182 (starting with the first revision not
183 already is in the result)
184 * repeat until all revs are covered
186 Note that revision paths start with the low end of
187 the revision range because contents gets reconstructed
191 Move the Manifest Info into the Packed File
192 -------------------------------------------
194 Once we mastered the offset pre-calculation issue (see
195 above), we may as well put the manifest info at the be-
196 ginning of the file. This will benefit "log" in particular
197 as only one consecutive chunk of data needs to be read