notes/fsfs-improvements.txt

   1 [ At the time of writing (1.7-dev), FSFS format 5 was current.  Since then
   2 format 5 has been retracted; it was never released.  1.7 shipped with f4 and
   3 1.8 will most likely ship with f6.  Format 5 was current during 1.7-dev and
   4 packed revprops into an authoritative SQLite database. ]
   5
   6 Introduction
   7 ------------
   8
   9 The FSFS external data format can be changed to allow for
  10 significantly reduced I/O overhead without changing the
  11 fundamental ideas behind FSFS.
  12
  13 The whole idea is to rearrange packed revision info in a
  14 new FSFS format "6".  A two-way conversion between format
  15 "5" and "6" should be possible as well as supporting both
  16 formats for different repositories on the same server.
  17
  18
  19 Revision Order
  20 --------------
  21
  22 FSFS format "5" packs revisions by putting revision 0 at
  23 the beginning of the file and concatenating the others in
  24 ascending order.  This intuitive design is counter-productive
  25 because we always trace data HEAD -> r0 and scanning a file
  26 backwards is expensive.
  27
  28 +-------+
  29 | rev 0 |
  30 +-------+
  31 | rev 1 |
  32 +-------+
  33
  34 :  ...  :
  35
  36 +-------+
  37 | rev N |
  38 +-------+
  39 <EOF>
  40
  41 A counter-intuitive but more efficient reversed order (de-
  42 scending revision order in a packed file) results in always
  43 reading forward through a file.
  44
  45 +-------+
  46 | rev N |
  47 +-------+
  48
  49 :  ...  :
  50
  51 +-------+
  52 | rev 1 |
  53 +-------+
  54 | rev 0 |
  55 +-------+
  56 <EOF>
  57
  58
  59 Don't Interleave Revision and Content Data
  60 ------------------------------------------
  61
  62 Format "5" keeps each revision as a single block.  When
  63 re-constructing a node content from the diffs,  we jump
  64 through a number of distant revision blocks.  For each node.
  65
  66 +--------------+
  67 | rev N Reps   |
  68 +--------------+
  69 | rev N Header |
  70 +--------------+
  71
  72 :     ...      :
  73
  74 +--------------+
  75 | rev 1 Reps   |
  76 +--------------+
  77 | rev 1 Header |
  78 +--------------+
  79 | rev 0 Reps   |
  80 +--------------+
  81 | rev 0 Header |
  82 +--------------+
  83 <EOF>
  84
  85 Instead,  delta-info should be removed from the revision
  86 blocks and put at the end of the file.  This speeds up
  87 header-only operations like "log" without impairing node
  88 content lookup.  And it lays the foundation for further
  89 improvements.
  90
  91 +--------------+
  92 | rev N Header |
  93 +--------------+
  94
  95 :     ...      :
  96
  97 +--------------+
  98 | rev 1 Header |
  99 +--------------+
 100 | rev 0 Header |
 101 +--------------+
 102 | rev N Reps   |
 103 +--------------+
 104
 105 :     ...      :
 106
 107 +--------------+
 108 | rev 1 Reps   |
 109 +--------------+
 110 | rev 0 Reps   |
 111 +--------------+
 112 <EOF>
 113
 114 The only downside is that putting headers first is somewhat
 115 more computationally expensive since offsets needs to be
 116 calculated in advance.  This is made even more difficult by
 117 using a variable length encoding for those offsets.
 118
 119
 120 Group the Representations by Delta Tree
 121 ---------------------------------------
 122
 123 All diffs that are based on the same node within the packed
 124 revision file are put in one place.  Re-constructing a node
 125 would then involve reading the revision headers (relatively
 126 close together) and the content deltas after that (again
 127 with high locality).
 128
 129 +------------------+
 130 | rev N Header     |
 131 +------------------+
 132
 133 :       ...        :
 134
 135 +------------------+
 136 | rev 1 Header     |
 137 +------------------+
 138 | rev 0 Header     |
 139 +------------------+
 140 | node tree A Reps |
 141 +------------------+
 142 | node tree B Reps |
 143 +------------------+
 144
 145 :       ...        :
 146
 147 +------------------+
 148 | node tree Z Reps |
 149 +------------------+
 150 <EOF>
 151
 152 We can optimize that even further by exploiting the skip-
 153 delta information: the "walks" through the delta tree for a
 154 given node will merge bit by bit into a main "trunk". The
 155 nodes on these paths can be re-ordered such that very few
 156 seek() operations are required on average to reconstruct
 157 the node content in this file.
 158
 159 Example of 8 changes in revs r0 .. r7 (for simplicity),
 160 looking at the delta-info only ("->" means "stored as diff
 161 against"):
 162
 163         r0
 164         r1->r0
 165         r2->r0
 166         r3->r2
 167         r4->r0
 168         r5->r4
 169         r6->r4
 170         r7->r6
 171
 172 Default ordering r0,r1,r2,r3,r4,r5,r6,r7 requires
 173 1/1/2/2/2/2/3/3 seeks.  For 2^N changes (N integer > 0),
 174 we need (N+1)/2 seeks on average.
 175
 176 A path-optimized ordering r0,r4,r6,r7,r5,r2,r3,r1
 177 requires 1/1/1/1/2/2/2/2 seeks.  It's (N+3)/4 on average.
 178 That optimized ordering can easily be constructed by
 179
 180         * select the highest rev not covered, yet
 181         * add its diff path to the existing result
 182           (starting with the first revision not
 183           already is in the result)
 184         * repeat until all revs are covered
 185
 186 Note that revision paths start with the low end of
 187 the revision range because contents gets reconstructed
 188 from r0 upwards.
 189
 190
 191 Move the Manifest Info into the Packed File
 192 -------------------------------------------
 193
 194 Once we mastered the offset pre-calculation issue (see
 195 above),  we may as well put the manifest info at the be-
 196 ginning of the file.  This will benefit "log" in particular
 197 as only one consecutive chunk of data needs to be read
 198 from only one file.
 199
 200 +------------------+
 201 | rev 0 Offset     |
 202 +------------------+
 203 | rev 1 Offset     |
 204 +------------------+
 205
 206 :       ...        :
 207
 208 +------------------+
 209 | rev N Offset     |
 210 +------------------+
 211 | rev N Header     |
 212 +------------------+
 213
 214 :       ...        :
 215
 216 +------------------+
 217 | rev 1 Header     |
 218 +------------------+
 219 | rev 0 Header     |
 220 +------------------+
 221 | node tree A Reps |
 222 +------------------+
 223 | node tree B Reps |
 224 +------------------+
 225
 226 :       ...        :
 227
 228 +------------------+
 229 | node tree Z Reps |
 230 +------------------+
 231 <EOF>
 232
 233