1 This file describes the format produced by 'svnadmin dump' and
2 consumed by 'svnadmin load'.
4 The format has undergone revisions over time. They are presented in
5 reverse chronological order here. You may wish to start with the
6 VERSION 1 description in order to get a baseline understanding first.
8 ===== SVN DUMPFILE VERSION 3 FORMAT =====
10 (generated by SVN versions 1.1.0-present, if requested by the user)
12 This format is equivalent to the VERSION 2 format except for the
15 1.) The format starts with the new version number of the dump format
16 ("SVN-fs-dump-format-version: 3\n").
18 2.) There are three new optional headers for node changes:
20 [Text-delta: true|false]
21 [Prop-delta: true|false]
22 [Text-delta-base-md5: blob]
24 The default value for the boolean headers is "false". If the value is
25 set to "true", then the text and property contents will be treated
26 as deltas against the previous contents of the node (as determined
27 by copy history for adds with history, or by the value in the
28 previous revision for changes--just as with commits).
30 Property deltas have the same format as regular property lists except
31 that (1) properties with the same value as in the previous contents of
32 the node are not printed, and (2) deleted properties will be written
38 just as a regular property is printed, but with the "K " changed to a
39 "D " and with no value part.
41 Text deltas are written out as a series of svndiff0 windows. If
42 Text-delta-base-md5 is provided, it is the checksum of the base to
43 which the text delta is applied; note that older versions (pre-1.5) of
44 'svnadmin load' may ignore the checksum.
46 ===== SVN DUMPFILE VERSION 2 FORMAT =====
48 (generated by SVN versions 0.18.0-present, by default)
50 This format is equivalent to the VERSION 1 format in every respect,
51 except for the following:
53 1.) The format starts with the new version number of the dump format
54 ("SVN-fs-dump-format-version: 2\n").
56 2.) In addition to "Revision Records", another sort of record is supported:
57 the "UUID" record, which should be of the form:
59 UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e
61 This should be used to indicate the UUID of the originating repository.
63 ===== SVN DUMPFILE VERSION 1 FORMAT =====
65 (generated by SVN versions prior to 0.18.0)
67 The binary format starts with the version number of the dump format
68 ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision
69 records. Each revision record starts with information about the
70 revision, followed by a variable number of node changes for that
71 revision. Fields in [braces] are optional, and unknown headers are
72 always ignored, for backwards compatibility.
75 Prop-content-length: P
78 ...P bytes of property data. Properties are stored in the same
79 human-readable hashdump format used by working copy property files,
80 except that they end with "PROPS-END\n" for better readability.
82 Node-path: /absolute/path/to/node/in/filesystem
83 Node-kind: file | dir (1)
84 Node-action: change | add | delete | replace
85 [Node-copyfrom-rev: X]
86 [Node-copyfrom-path: /path ]
87 [Text-copy-source-md5: blob] (2)
88 [Text-content-md5: blob]
89 [Text-content-length: T]
90 [Prop-content-length: P]
93 ... Y bytes of content data, divided into P bytes of "property"
94 data and T bytes of "text" data. The properties come first; their
95 total length (including formatting) is Prop-content-length, and is
96 included in Node-content-length. The "PROPS-END\n" line always
97 terminates the property section if there are props. The remainder
98 of the Y bytes (expected to be equivalent to Text-content-length)
99 represent the contents of the node.
104 (1) if the node represents a deletion, this field is optional.
106 (2) this is a checksum of the source of the copy. a loader process
107 can use this checksum to determine that the copyfrom path/rev
108 already present in a filesystem is really the *correct* one to
111 (3) the Content-length header is technically unnecessary, since the
112 information it holds (and more) can be found in the
113 Prop-content-length and Text-content-length fields. Though
114 Subversion itself does not make use of the header when reading
115 a dumpfile, we include it for compatibility with generic RFC822
118 (4) There are actually 2 types of version 1 dump streams. The
119 regular ones are generated since r2634 (svn 0.14.0). Older ones
120 also claim to be version 1, but miss the Props-content-length
121 and Text-content-length fields in the block header. In those
122 days there *always* was a properties block.
126 Here's an example of revision 1422, whereby I added a new directory
127 "baz", added a new file "bop" inside it, and modified the file "foo.c":
129 Revision-number: 1422
130 Prop-content-length: 80
140 Added two files, changed a third.
146 Prop-content-length: 35
156 Node-path: bar/baz/bop
159 Prop-content-length: 76
160 Text-content-length: 54
172 Here is the text of the newly added 'bop' file.
178 Text-content-length: 102
181 Here is the fulltext of my change to an existing /bar/foo.c.
182 Notice that this file has no properties.
184 -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*-
188 (This file started as a proposal, preserved here for posterity.)
190 A proposal for an svn filesystem dump/restore format.
192 Two problems we want to solve
193 =============================
195 1. When we change our node-id schema, we need to migrate all of our
196 data (by dumping and restoring).
198 2. Serves as a backup format. Could be read by other software tools
205 A. Written as two new public functions in svn_fs.h. To be invoked
206 by new 'svnadmin' subcommands.
208 B. Format uses only timeless fs concepts.
210 The dump format needs to reference concepts that we *know* are
211 general enough to never change. These concepts must exist
212 independently of any internal node-id schema, or any DB storage
213 backend. In other words, we're talking about the basic ideas in
214 our original "design spec" from May 2000.
220 Here are the timeless semantics of our fs design -- the things that
221 would be stored in our dump format.
223 - A filesystem is an array of trees.
224 Each tree is called a "revision" and has unversioned properties attached.
226 - A revision has a tree of "nodes" hanging off of it.
227 Actually, the nodes in the filesystem form a DAG. A revision
228 always points to an initial node that represents the 'root' of some tree.
230 - The majority of a tree's nodes are hard-links (references) to
231 nodes that were created in earlier trees.
236 - versioned properties
237 - predecessor history: "which node am I a variant of?"
238 - copy history: "which node am I a copy of?"
240 The history values can be non-existent (meaning the node is
241 completely new), or can have a value of {revision, path}.
244 ------------------------------------------------------------------------
245 Refinement of proposal #2: (after discussion with gstein)
246 =========================
248 Each node starts with RFC822-style headers at the top. The final
249 header is a 'Content-length:', followed by the content, so record
250 boundaries can be inferred.
252 The content section has two implicit parts: a property hash, and the
253 fulltext. The division between these two sections is implied by the
254 "PROPS-END\n" tag at the end of the prophash. In the case of a
255 directory node or a revision, only the prophash is present.