Follow-up to r29036: Now that the "mergeinfo" transaction file is no
[svn.git] / www / design.html
blobd7f74c7f4369f85ad77adbbf7c3c66e33721c635
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml">
4 <head>
5 <style type="text/css"> /* <![CDATA[ */
6 @import "branding/css/tigris.css";
7 @import "branding/css/inst.css";
8 /* ]]> */</style>
9 <link rel="stylesheet" type="text/css" media="print"
10 href="branding/css/print.css" />
11 <script type="text/javascript" src="branding/scripts/tigris.js"></script>
12 <title>Subversion Design</title>
13 </head>
15 <body>
16 <div class="app">
18 <div class="h1">
19 <h1 style="text-align: center">Subversion Design</h1>
20 </div>
22 <p class="warningmark"><em>NOTE: This document is out of date. The last
23 substantial update was in October 2002 (r3377). However, people often come
24 here for the section on the <a href="#server.fs.struct.bubble-up">directory
25 bubble-up method</a>, which is still accurate.</em></p>
27 <div class="h1">
28 <h2>Table of Contents</h2>
29 <ol id="toc">
30 <li><a href="#goals">Goals &mdash; The goals of the Subversion project</a>
31 <ol>
32 <li><a href="#goals.rename-remove-resurrect">Rename/removal/resurrection support</a></li>
33 <li><a href="#goals.textbinary">Text vs binary issues</a></li>
34 <li><a href="#goals.i18n">I18N/Multilingual support</a></li>
35 <li><a href="#goals.branching-and-tagging">Branching and tagging</a></li>
36 <li><a href="#goals.misc">Miscellaneous new behaviors</a>
37 <ol>
38 <li><a href="#goals.misc.logmsgs">Log messages</a></li>
39 <li><a href="#goals.misc.diffplugins">Client side diff plug-ins</a></li>
40 <li><a href="#goals.misc.merging">Better merging</a></li>
41 <li><a href="#goals.misc.conflicts">Conflicts resolution</a></li>
42 </ol>
43 </li> <!-- goals.misc -->
44 </ol>
45 </li> <!-- goals -->
46 <li><a href="#model">Model &mdash; The versioning model used by Subversion</a>
47 <ol>
48 <li><a href="#model.wc-and-repos">Working Directories and Repositories</a></li>
49 <li><a href="#model.txns-and-revnums">Transactions and Revision Numbers</a></li>
50 <li><a href="#model.how-wc">How Working Directories Track the Repository</a></li>
51 <li><a href="#model.lock-merge">Locking vs. Merging - Two Paradigms of Co-operative
52 Developments</a></li>
53 <li><a href="#model.props">Properties</a></li>
54 <li><a href="#model.merging-and-ancestry">Merging and Ancestry</a></li>
55 </ol>
56 </li> <!-- model -->
57 <li><a href="#archi">Architecture &mdash; How Subversion's components work together</a>
58 <ol>
59 <li><a href="#archi.client">Client Layer</a></li>
60 <li><a href="#archi.network">Network Layer</a></li>
61 <li><a href="#archi.fs">Filesystem Layer</a></li>
62 </ol>
63 </li> <!-- archi -->
64 <li><a href="#deltas">Deltas &mdash; How to describe changes</a>
65 <ol>
66 <li><a href="#deltas.text">Text Deltas</a></li>
67 <li><a href="#deltas.prop">Property Deltas</a></li>
68 <li><a href="#deltas.tree">Tree Deltas</a></li>
69 <li><a href="#deltas.postfix-text">Postfix Text Deltas</a></li>
70 <li><a href="#deltas.serializing-via-editor">Serializing Deltas via the "Editor" Interface</a></li>
71 </ol>
72 </li> <!-- deltas -->
73 <li><a href="#client">Client &mdash; How the client works</a>
74 <ol>
75 <li><a href="#client.wc">Working copies and the working copy library</a>
76 <ol>
77 <li><a href="#client.wc.layout">The layout of working copies</a></li>
78 <li><a href="#client.wc.library">The working copy management library</a></li>
79 </ol>
80 </li> <!-- client.wc -->
81 <li><a href="#client.libsvn_ra">The repository access library</a></li>
82 <li><a href="#client.libsvn_client">The client operation library</a></li>
83 </ol>
84 </li> <!-- client -->
85 <li><a href="#protocol">Protocol &mdash; How the client and server communicate</a>
86 <ol>
87 <li><a href="#protocol.webdav">The HTTP/WebDAV/DeltaV based protocol</a></li>
88 <li><a href="#protocol.svn">The custom protocol</a></li>
89 </ol>
90 </li> <!-- protocol -->
91 <li><a href="#server">Server &mdash; How the server works</a>
92 <ol>
93 <li><a href="#server.fs">Filesystem</a>
94 <ol>
95 <li><a href="#server.fs.overview">Filesystem Overview</a></li>
96 <li><a href="#server.fs.api">API</a></li>
97 <li><a href="#server.fs.struct">Repository Structure</a>
98 <ol>
99 <li><a href="#server.fs.struct.schema">Schema</a></li>
100 <li><a href="#server.fs.struct.bubble-up">Bubble-Up Method</a></li>
101 <li><a href="#server.fs.struct.diffy-storage">Diffy Storage</a></li>
102 </ol>
103 </li> <!-- server.fs.struct -->
104 <li><a href="#server.fs.implementation">Implementation</a></li>
105 </ol>
106 </li> <!-- server.fs -->
107 <li><a href="#server.libsvn_repos">Repository Library</a></li>
108 </ol>
109 </li> <!-- server -->
110 <li><a href="#license">License &mdash; Copyright</a></li>
111 </ol>
112 </div>
114 <!--
115 ================================================================
116 Copyright (c) 1999-2004 CollabNet. All rights reserved.
118 Redistribution and use in source and binary forms, with or without
119 modification, are permitted provided that the following conditions are
120 met:
122 1. Redistributions of source code must retain the above copyright
123 notice, this list of conditions and the following disclaimer.
125 2. Redistributions in binary form must reproduce the above copyright
126 notice, this list of conditions and the following disclaimer in the
127 documentation and/or other materials provided with the distribution.
129 3. The end-user documentation included with the redistribution, if
130 any, must include the following acknowledgment: "This product includes
131 software developed by CollabNet (http://www.Collab.Net/)."
132 Alternately, this acknowledgment may appear in the software itself, if
133 and wherever such third-party acknowledgments normally appear.
135 4. The hosted project names must not be used to endorse or promote
136 products derived from this software without prior written
137 permission. For written permission, please contact info@collab.net.
139 5. Products derived from this software may not use the "Tigris" name
140 nor may "Tigris" appear in their names without prior written
141 permission of CollabNet.
143 THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
144 WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
145 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
146 IN NO EVENT SHALL COLLABNET OR ITS CONTRIBUTORS BE LIABLE FOR ANY
147 DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
148 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
149 GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
150 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
151 IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
152 OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
153 ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
155 ====================================================================
157 This software consists of voluntary contributions made by many
158 individuals on behalf of CollabNet.
167 <div class="h2" id="goals" title="#goals">
168 <h2>Goals &mdash; The goals of the Subversion project</h2>
172 <p>The goal of the Subversion project is to write a version control
173 system that takes over CVS's current and future user base
175 (If you're not familiar with CVS or its shortcomings, then
176 skip to <a href="#model">Model &mdash; The versioning model used by Subversion</a>)
177 . The first release
178 has all the major features of CVS, plus certain new features that CVS
179 users often wish they had. In general, Subversion works like CVS, except
180 where there's a compelling reason to be different.</p>
182 <p>So what does Subversion have that CVS doesn't?</p>
184 <ul>
185 <li><p>It versions directories, file-metadata, renames, copies
186 and removals/resurrections. In other words, Subversion records the
187 changes users make to directory trees, not just changes to file
188 contents.</p></li>
190 <li><p>Tagging and branching are constant-time and
191 constant-space.</p></li>
193 <li><p>It is natively client-server, hence much more
194 maintainable than CVS. (In CVS, the client-server protocol was added
195 as an afterthought. This means that most new features have to be
196 implemented twice, or at least more than once: code for the local
197 case, and code for the client-server case.)</p></li>
199 <li><p>The repository is organized efficiently and
200 comprehensibly. (Without going into too much detail, let's just say
201 that CVS's repository structure is showing its
202 age.)</p></li>
204 <li><p>Commits are atomic. Each commit results in a single
205 revision number, which refers to the state of the entire tree. Files
206 no longer have their own revision numbers.</p></li>
208 <li><p>The locking scheme is only as strict as absolutely
209 necessary. Reads are never locked, and writes lock only the files
210 being written, for only as long as needed.</p></li>
212 <li><p>It has internationalization support.</p></li>
214 <li><p>It handles binary files gracefully (experience has shown
215 that CVS's binary file handling is prone to user
216 error).</p></li>
218 <li><p>It takes advantage of the Net's experience with CVS by
219 choosing better default behaviors for certain
220 situations.</p></li>
221 </ul>
223 <p>Some of these advantages are clear and require no further discussion.
224 Others are not so obvious, and are explained in greater detail
225 below.</p>
228 <div class="h3" id="goals.rename-remove-resurrect" title="#goals.rename-remove-resurrect">
229 <h3>Rename/removal/resurrection support</h3>
232 <p>Full rename support means you can trace through ancestry by name
233 <em>or</em> by entity. For example, if you say "Give me
234 revision 12 of foo.c", do you mean revision 12 of the file whose name is
235 <em>now</em> foo.c (but perhaps it was named bar.c back at
236 revision 12), or the file whose name was foo.c in revision 12 (perhaps
237 that file no longer exists, or has a different name now)? In Subversion,
238 both interpretations are available to the user.</p>
240 <p>(Note: we've not yet implemented this, but it wouldn't be too hard.
241 People are advocating switches to 'svn log' that cause history to be
242 traced backwards either by entity or by path.)</p>
243 </div> <!-- goals.rename-remove-resurrect (h3) -->
245 <div class="h3" id="goals.textbinary" title="#goals.textbinary">
246 <h3>Text vs binary issues</h3>
249 <p>Historically, binary files have been problematic in CVS for two
250 unrelated reasons: keyword expansion, and line-end conversion.</p>
252 <ul>
253 <li><p><strong class="firstterm">Keyword expansion</strong> is when CVS
254 expands "$Revision$" into "$Revision: 1.1 $", for example. There
255 are a number of keywords in CVS: "$Author: sussman $", "$Date:
256 2001/06/04 22:00:52 $", and so on.</p></li>
257 <li><p><strong class="firstterm">Line-end conversion</strong> is when CVS
258 gives plaintext files the appropriate line-ending conventions for the
259 working copy's platform. For example, Unix working copies use LF, but
260 Windows working copies use CRLF. (Like CVS, the Subversion
261 repository stores text files in Unix LF format).</p></li>
262 </ul>
264 <p>Both keyword substitution and line-end conversion are sensible only
265 for plain text files. CVS only recognizes two file types anyway:
266 plaintext and binary. And CVS assumes files are plain text unless you
267 tell it otherwise.</p>
269 <p>Subversion recognizes the same two types. The question is, how does
270 it determine a file's type? Experience with CVS suggests that assuming
271 text unless told otherwise is a losing strategy &ndash; people frequently
272 forget to mark images and other opaque formats as binary, then later they
273 wonder why CVS mangled their data. So Subversion will not mangle data:
274 when moving over the network, or when being stored in the repository, it
275 treats all files as binary. In the working copy, a tweakable meta-data
276 property indicates whether to treat the file as text or binary for
277 purposes of whether or not to allow contextual merging during
278 updates.</p>
280 <p>Users can turn line-end conversion on or off per file by tweaking
281 meta-data. Files do <em>not</em> undergo keyword
282 substitution by default, on the theory that if someone wants substitution
283 and isn't getting it, they'll look in the manual; but if they are getting
284 it and didn't want it, they might just be confused and not know what to
285 do. Users can turn substitution on or off per file.</p>
287 <p>Both of these changes are done on the client side; the repository
288 does not even know about them.</p>
289 </div> <!-- goals.textbinary (h3) -->
291 <div class="h3" id="goals.i18n" title="#goals.i18n">
292 <h3>I18N/Multilingual support</h3>
295 <p>Subversion is internationalized &ndash; commands, user messages, and
296 errors can be customized to the appropriate human language at build-time
297 (or run time, if that's not much harder).</p>
299 <p>File names and contents may be multilingual; Subversion does not
300 assume an ASCII-only universe. For purposes of keyword expansion and
301 line-end conversion, Subversion also understands the UTF-* encodings (but
302 not necessarily all of them by the first release).</p>
303 </div> <!-- goals.i18n (h3) -->
305 <div class="h3" id="goals.branching-and-tagging" title="#goals.branching-and-tagging">
306 <h3>Branching and tagging</h3>
309 <p>Subversion supports branching and tagging with one efficient
310 operation: `clone'. To clone a tree is to copy it, to create another
311 tree exactly like it (except that the new tree knows its ancestry
312 relationship to the old one).</p>
314 <p>At the moment of creation, a clone requires only a small, constant
315 amount of space in the repository &ndash; most of its storage is shared
316 with the original tree. If you never commit anything on the clone, then
317 it's just like a CVS tag. If you start committing on it, then it's a
318 branch. Voila! This also implies CVS's "vendor branching" feature,
319 since Subversion has real rename and directory support.</p>
320 </div> <!-- goals.branching-and-tagging (h3) -->
322 <div class="h3" id="goals.misc" title="#goals.misc">
323 <h3>Miscellaneous new behaviors</h3>
326 <div class="h4" id="goals.misc.logmsgs" title="#goals.misc.logmsgs">
327 <h4>Log messages</h4>
330 <p>Subversion has a flexible log message policy (a small matter, but
331 one dear to our hearts).</p>
333 <p>Log messages should be a matter of project policy, not version
334 control software policy. If a user commits with no log message, then
335 Subversion defaults to an empty message. (CVS tries to require log
336 messages, but fails: we've all seen empty log messages in CVS, where
337 the user committed with deliberately empty quotes. Let's stop the
338 madness now.)</p>
339 </div> <!-- goals.misc.logmsgs (h4) -->
341 <div class="h4" id="goals.misc.diffplugins" title="#goals.misc.diffplugins">
342 <h4>Client side diff plug-ins</h4>
345 <p>Subversion supports client-side plug-in diff programs.</p>
347 <p>There is no need for Subversion to have every possible diff
348 mechanism built in. It can invoke a user-specified client-side diff
349 program on the two revisions of the file(s) locally.</p>
351 <p>(Note: This feature does not exist yet, but is planned for
352 post-1.0.)</p>
353 </div> <!-- goals.misc.diffplugins (h4) -->
355 <div class="h4" id="goals.misc.merging" title="#goals.misc.merging">
356 <h4>Better merging</h4>
359 <p>Subversion remembers what has already been merged in and what
360 hasn't, thereby avoiding the problem, familiar to CVS users, of
361 spurious conflicts on repeated merges.</p>
363 <p>(Note: This feature (<a href="/merge-tracking/">Merge
364 Tracking</a>) does not exist yet, but is planned for inclusion
365 in Subversion 1.5.)</p>
367 <p>For details, see <a href="#model.merging-and-ancestry">Merging and Ancestry</a>.</p>
368 </div> <!-- goals.misc.merging (h4) -->
370 <div class="h4" id="goals.misc.conflicts" title="#goals.misc.conflicts">
371 <h4>Conflicts resolution</h4>
374 <p>For text files, Subversion resolves conflicts similarly to CVS, by
375 folding repository changes into the working files with conflict
376 markers. But, for <em>both</em> text and binary files,
377 Subversion also always puts the old and new pristine repository
378 revisions into temporary files, and the pristine working copy revision
379 in another temporary file.</p>
381 <p>Thus, for any conflict, the user has four files readily at
382 hand:</p>
384 <ol>
385 <li><p>the original working copy file with local
386 mods</p></li>
387 <li><p>the older repository file</p></li>
388 <li><p>the newest repository file</p></li>
389 <li><p>the merged file, with conflict
390 markers</p></li>
391 </ol>
393 <p>and in a binary file conflict, the user has all but the
394 last.</p>
396 <p>When the conflict has been resolved and the working copy is
397 committed, Subversion automatically removes the temporary pristine
398 files.</p>
400 <p>A more general solution would allow plug-in merge resolution tools
401 on the client side; but this is not scheduled for the first release).
402 Note that users can use their own merge tools anyway, since all the
403 original files are available.</p>
404 </div> <!-- goals.misc.conflicts (h4) -->
405 </div> <!-- goals.misc (h3) -->
406 </div> <!-- goals (h2) -->
408 <div class="h2" id="model" title="#model">
409 <h2>Model &mdash; The versioning model used by Subversion</h2>
413 <p>This chapter explains the user's view of Subversion &mdash; what
414 &ldquo;objects&rdquo; you interact with, how they behave, and how they
415 relate to each other.</p>
418 <div class="h3" id="model.wc-and-repos" title="#model.wc-and-repos">
419 <h3>Working Directories and Repositories</h3>
422 <p>Suppose you are using Subversion to manage a software project. There
423 are two things you will interact with: your working directory, and the
424 repository.</p>
426 <p>Your <strong class="firstterm">working directory</strong> is an ordinary
427 directory tree, on your local system, containing your project's sources.
428 You can edit these files and compile your program from them in the usual
429 way. Your working directory is your own private work area: Subversion
430 never changes the files in your working directory, or publishes the
431 changes you make there, until you explicitly tell it to do so.</p>
433 <p>After you've made some changes to the files in your working
434 directory, and verified that they work properly, Subversion provides
435 commands to publish your changes to the other people working with you on
436 your project. If they publish their own changes, Subversion provides
437 commands to incorporate those changes into your working directory.</p>
439 <p>A working directory contains some extra files, created and maintained
440 by Subversion, to help it carry out these commands. In particular, these
441 files help Subversion recognize which files contain unpublished changes,
442 and which files are out-of-date with respect to others' work.</p>
444 <p>While your working directory is for your use alone, the
445 <strong class="firstterm">repository</strong> is the common public record you share
446 with everyone else working on the project. To publish your changes, you
447 use Subversion to put them in the repository. (What this means, exactly,
448 we explain below.) Once your changes are in the repository, others can
449 tell Subversion to incorporate your changes into their working
450 directories. In a collaborative environment like this, each user will
451 typically have their own working directory (or perhaps more than one),
452 and all the working directories will be backed by a single repository,
453 shared amongst all the users.</p>
455 <p>A Subversion repository holds a single directory tree, and records
456 the history of changes to that tree. The repository retains enough
457 information to recreate any prior state of the tree, compute the
458 differences between any two prior trees, and report the relations between
459 files in the tree &mdash; which files are derived from which other
460 files.</p>
462 <p>A Subversion repository can hold the source code for several
463 projects; usually, each project is a subdirectory in the tree. In this
464 arrangement, a working directory will usually correspond to a particular
465 subtree of the repository.</p>
467 <p>For example, suppose you have a repository laid out like this:</p>
469 <pre>
470 /trunk/paint/Makefile
471 canvas.c
472 brush.c
473 write/Makefile
474 document.c
475 search.c
476 </pre>
478 <p>In other words, the repository's root directory has a single
479 subdirectory named <tt class="filename">trunk</tt>, which itself contains two
480 subdirectories: <tt class="filename">paint</tt> and
481 <tt class="filename">write</tt>.</p>
483 <p>To get a working directory, you must <strong class="firstterm">check out</strong>
484 some subtree of the repository. If you check out
485 <tt class="filename">/trunk/write</tt>, you will get a working directory like
486 this:</p>
488 <pre>
489 write/Makefile
490 document.c
491 search.c
492 .svn/
493 </pre>
495 <p>This working directory is a copy of the repository's
496 <tt class="filename">/trunk/write</tt> directory, with one additional entry
497 &mdash; <tt class="filename">.svn</tt> &mdash; which holds the extra
498 information needed by Subversion, as mentioned above.</p>
500 <p>Suppose you make changes to <tt class="filename">search.c</tt>. Since the
501 <tt class="filename">.svn</tt> directory remembers the file's modification
502 date and original contents, Subversion can tell that you've changed the
503 file. However, Subversion does not make your changes public until you
504 explicitly tell it to.</p>
506 <p>To publish your changes, you can use Subversion's
507 &lsquo;<tt class="literal">commit</tt>&rsquo; command:</p>
509 <pre>
510 $ pwd
511 /home/jimb/write
512 $ ls -a
513 .svn/ Makefile document.c search.c
514 $ svn commit search.c
516 </pre>
518 <p>Now your changes to <tt class="filename">search.c</tt> have been committed
519 to the repository; if another user checks out a working copy of
520 <tt class="filename">/trunk/write</tt>, they will see your text.</p>
522 <p>Suppose you have a collaborator, Felix, who checked out a working
523 directory of <tt class="filename">/trunk/write</tt> at the same time you did.
524 When you commit your change to <tt class="filename">search.c</tt>, Felix's
525 working copy is left unchanged; Subversion only modifies working
526 directories at the user's request.</p>
528 <p>To bring his working directory up to date, Felix can use the
529 Subversion &lsquo;<tt class="literal">update</tt>&rsquo; command. This will
530 incorporate your changes into his working directory, as well as any
531 others that have been committed since he checked it out.</p>
533 <pre>
534 $ pwd
535 /home/felix/write
536 $ ls -a
537 .svn/ Makefile document.c search.c
538 $ svn update
539 U search.c
541 </pre>
543 <p>The output from the &lsquo;<tt class="literal">svn update</tt>&rsquo;
544 command indicates that Subversion updated the contents of
545 <tt class="filename">search.c</tt>. Note that Felix didn't need to specify
546 which files to update; Subversion uses the information in the
547 <tt class="filename">.svn</tt> directory, and further information in the
548 repository, to decide which files need to be brought up to date.</p>
550 <p>We explain below what happens when both you and Felix make changes to
551 the same file.</p>
552 </div> <!-- model.wc-and-repos (h3) -->
554 <div class="h3" id="model.txns-and-revnums" title="#model.txns-and-revnums">
555 <h3>Transactions and Revision Numbers</h3>
558 <p>A Subversion &lsquo;<tt class="literal">commit</tt>&rsquo; operation can
559 publish changes to any number of files and directories as a single atomic
560 transaction. In your working directory, you can change files' contents,
561 create, delete, rename and copy files and directories, and then commit
562 the completed set of changes as a unit.</p>
564 <p>In the repository, each commit is treated as an atomic transaction:
565 either all the commit's changes take place, or none of them take place.
566 Subversion tries to retain this atomicity in the face of program crashes,
567 system crashes, network problems, and other users' actions. We may call
568 a commit a <strong class="firstterm">transaction</strong> when we want to emphasize
569 its indivisible nature.</p>
571 <p>Each time the repository accepts a transaction, this creates a new
572 state of the tree, called a <strong class="firstterm">revision</strong>. Each
573 revision is assigned a unique natural number, one greater than the number
574 of the previous revision. The initial revision of a freshly created
575 repository is numbered zero, and consists of an empty root
576 directory.</p>
578 <p>Since each transaction creates a new revision, with its own number,
579 we can also use these numbers to refer to transactions; transaction
580 <em class="replaceable">n</em> is the transaction which created revision
581 <em class="replaceable">n</em>. There is no transaction numbered
582 zero.</p>
584 <p>Unlike those of many other systems, Subversion's revision numbers
585 apply to an entire tree, not individual files. Each revision number
586 selects an entire tree.</p>
588 <p>It's important to note that working directories do not always
589 correspond to any single revision in the repository; they may contain
590 files from several different revisions. For example, suppose you check
591 out a working directory from a repository whose most recent revision is
592 4:</p>
594 <pre>
595 write/Makefile:4
596 document.c:4
597 search.c:4
598 </pre>
600 <p>At the moment, this working directory corresponds exactly to revision
601 4 in the repository. However, suppose you make a change to
602 <tt class="filename">search.c</tt>, and commit that change. Assuming no other
603 commits have taken place, your commit will create revision 5 of the
604 repository, and your working directory will look like this:</p>
606 <pre>
607 write/Makefile:4
608 document.c:4
609 search.c:5
610 </pre>
612 <p>Suppose that, at this point, Felix commits a change to
613 <tt class="filename">document.c</tt>, creating revision 6. If you use
614 &lsquo;<tt class="literal">svn update</tt>&rsquo; to bring your working
615 directory up to date, then it will look like this:</p>
617 <pre>
618 write/Makefile:6
619 document.c:6
620 search.c:6
621 </pre>
623 <p>Felix's changes to <tt class="filename">document.c</tt> will appear in
624 your working copy of that file, and your change will still be present in
625 <tt class="filename">search.c</tt>. In this example, the text of
626 <tt class="filename">Makefile</tt> is identical in revisions 4, 5, and 6, but
627 Subversion will mark your working copy with revision 6 to indicate that
628 it is still current. So, after you do a clean update at the root of your
629 working directory, your working directory will generally correspond
630 exactly to some revision in the repository.</p>
631 </div> <!-- model.txns-and-revnums (h3) -->
633 <div class="h3" id="model.how-wc" title="#model.how-wc">
634 <h3>How Working Directories Track the Repository</h3>
637 <p>For each file in a working directory, Subversion records two
638 essential pieces of information:</p>
640 <ul>
641 <li><p>what revision of what repository file your working copy
642 is based on (this is called the file's <strong class="firstterm">base
643 revision</strong>), and</p></li>
644 <li><p>a timestamp recording when the local copy was last
645 updated.</p></li>
646 </ul>
648 <p>Given this information, by talking to the repository, Subversion can
649 tell which of the following four states a file is in:</p>
651 <ul>
652 <li><p><strong>Unchanged, and current.</strong>
653 The file is unchanged in the working directory, and no changes to that
654 file have been committed to the repository since its base
655 revision.</p></li>
656 <li><p><strong>Locally changed, and
657 current</strong>. The file has been changed in the working
658 directory, and no changes to that file have been committed to the
659 repository since its base revision. There are local changes that have
660 not been committed to the repository.</p></li>
661 <li><p><strong>Unchanged, and
662 out-of-date</strong>. The file has not been changed in
663 the working directory, but it has been changed in the repository. The
664 file should eventually be updated, to make it current with the
665 public revision.</p></li>
666 <li><p><strong>Locally changed, and
667 out-of-date</strong>. The file has been changed both in the
668 working directory, and in the repository. The file should be updated;
669 Subversion will attempt to merge the public changes with the local
670 changes. If it can't complete the merge in a plausible
671 way automatically, Subversion leaves it to the user to resolve the
672 conflict.</p></li>
673 </ul>
674 </div> <!-- model.how-wc (h3) -->
676 <div class="h3" id="model.lock-merge" title="#model.lock-merge">
677 <h3>Locking vs. Merging - Two Paradigms of Co-operative
678 Developments</h3>
681 <p>By default, Subversion prefers the &ldquo;merging&rdquo; method of
682 handling simultaneous editing by multiple users. This means that
683 Subversion does not prevent two users from making changes to the same
684 file at the same time. For example, if both you and Felix have checked
685 out working directories of <tt class="filename">/trunk/write</tt>, Subversion
686 will allow both of you to change <tt class="filename">write/search.c</tt> in
687 your working directories. Then, the following sequence of events will
688 occur:</p>
690 <ul>
691 <li><p>Suppose Felix tries to commit his changes to
692 <tt class="filename">search.c</tt> first. His commit will succeed, and
693 his text will appear in the latest revision in the
694 repository.</p></li>
695 <li><p>When you attempt to commit your changes to
696 <tt class="filename">search.c</tt>, Subversion will reject your commit,
697 and tell you that you must update <tt class="filename">search.c</tt> before
698 you can commit it.</p></li>
699 <li><p>When you update <tt class="filename">search.c</tt>, Subversion
700 will try to merge Felix's changes from the repository with your local
701 changes. By default, Subversion merges as if it were applying a
702 patch: if your local changes do not overlap textually with Felix's,
703 then all is well; otherwise, Subversion leaves it to you to resolve
704 the overlapping changes. In either case, Subversion carefully
705 preserves a copy of the original pre-merge text.</p></li>
706 <li><p>Once you have verified that Felix's changes and your
707 changes have been merged correctly, you can commit the new revision
708 of <tt class="filename">search.c</tt>, which now contains everyone's
709 changes.</p></li>
710 </ul>
712 <p>Some version control systems provide &ldquo;locks&rdquo;, which
713 prevent others from changing a file once one person has begun working on
714 it. In our experience, merging is preferable to locks, because:</p>
716 <ul>
717 <li><p>changes usually do not conflict, so Subversion's behavior
718 does the right thing by default, while locking can interfere with
719 legitimate work;</p></li>
720 <li><p>locking can prevent conflicts within a file, but not
721 conflicts between files (say, between a C header file and another
722 file that includes it), so it doesn't really solve the problem; and
723 finally,</p></li>
724 <li><p>people often forget that they are holding locks,
725 resulting in unnecessary delays and friction.</p></li>
726 </ul>
728 <p>Of course, some kinds of files with rigid formats, like images or
729 executables, are simply not mergeable. To support this, Subversion
730 allows users to customize its merging behavior on a per-file basis.
731 Firstly, you can direct Subversion to refuse to merge changes to certain
732 files, and simply present you with the two original texts to choose from.
733 Secondly, in Subversion 1.2 and later, support for the
734 &ldquo;locking&rdquo; method of working is also available, and individual
735 files can be designated as requiring locking.</p>
737 <p>(In the future, you may be able to direct Subversion to merge using a
738 tool which respects the semantics of specific complex file
739 formats.)</p>
740 </div> <!-- model.lock-merge (h3) -->
742 <div class="h3" id="model.props" title="#model.props">
743 <h3>Properties</h3>
746 <p>Files generally have interesting attributes beyond their contents:
747 mime-types, executable permissions, EOL styles, and so on. Subversion
748 attempts to preserve these attributes, or at least record them, when
749 doing so would be meaningful. However, different operating systems
750 support very different sets of file attributes: Windows NT supports
751 access control lists, while Linux provides only the simpler traditional
752 Unix permission bits.</p>
754 <p>In order to interoperate well with clients on many different
755 operating systems, Subversion supports <strong class="firstterm">property
756 lists</strong>, a simple, general-purpose mechanism which clients
757 can use to store arbitrary out-of-band information about files.</p>
759 <p>A property list is a set of name / value pairs. A property name is
760 an arbitrary text string, expressed as a Unicode UTF-8 string,
761 canonically decomposed and ordered. A property value is an arbitrary
762 string of bytes. Property values may be of any size, but Subversion may
763 not handle very large property values efficiently. No two properties in
764 a given a property list may have the same name. Although the word `list'
765 usually denotes an ordered sequence, there is no fixed order to the
766 properties in a property list; the term `property list' is
767 historical.</p>
769 <p>Each revision number, file, directory, and directory entry in the
770 Subversion repository, has its own property list. Subversion puts these
771 property lists to several uses:</p>
773 <ul>
774 <li><p>Clients can use properties to store file attributes, as
775 described above.</p></li>
776 <li><p>The Subversion server uses properties to hold attributes
777 of its own, and allow clients to read and modify them. For example,
778 someday a hypothetical &lsquo;<tt class="literal">svn-acl</tt>&rsquo;
779 property might hold an access control list which the Subversion server
780 uses to regulate access to repository files.</p></li>
781 <li><p>Users can invent properties of their own, to store
782 arbitrary information for use by scripts, build environments, and so
783 on. Names of user properties should be URI's, to avoid conflicts
784 between organizations.</p></li>
785 </ul>
787 <p>Property lists are versioned, just like file contents. You can
788 change properties in your working directory, but those changes are not
789 visible in the repository until you commit your local changes. If you do
790 commit a change to a property value, other users will see your change
791 when they update their working directories.</p>
792 </div> <!-- model.props (h3) -->
794 <div class="h3" id="model.merging-and-ancestry" title="#model.merging-and-ancestry">
795 <h3>Merging and Ancestry</h3>
798 <p>[WARNING: this section was written in May 2000, at the very
799 beginning of the Subversion project. This functionality probably will
800 not exist in Subversion 1.0, but it's planned for post-1.0. The problem
801 should be reasonably solvable by recording merge data in
802 'properties'.]</p>
804 <p>Subversion defines merges the same way CVS does: to merge means to
805 take a set of previously committed changes and apply them, as a patch, to
806 a working copy. This change can then be committed, like any other
807 change. (In Subversion's case, the patch may include changes to
808 directory trees, not just file contents.)</p>
810 <p>As defined thus far, merging is equivalent to hand-editing the
811 working copy into the same state as would result from the patch
812 application. In fact, in CVS there <em>is</em> no difference
813 &ndash; it is equivalent to just editing the files, and there is no
814 record of which ancestors these particular changes came from.
815 Unfortunately, this leads to conflicts when users unintentionally merge
816 the same changes again. (Experienced CVS users avoid this problem by
817 using branch- and merge-point tags, but that involves a lot of unwieldy
818 bookkeeping.)</p>
820 <p>In Subversion, merges are remembered by recording <strong class="firstterm">ancestry
821 sets</strong>. A revision's ancestry set is the set of all changes
822 "accounted for" in that revision. By maintaining ancestry sets, and
823 consulting them when doing merges, Subversion can detect when it would
824 apply the same patch twice, and spare users much bookkeeping. Ancestry
825 sets are stored as properties.</p>
827 <p>In the examples below, bear in mind that revision numbers usually
828 refer to changes, rather than the full contents of that revision. For
829 example, "the change A:4" means "the delta that resulted in A:4", not
830 "the full contents of A:4".</p>
832 <p>The simplest ancestor sets are associated with linear histories. For
833 example, here's the history of a file A:</p>
835 <pre>
837 _____ _____ _____ _____ _____
838 | | | | | | | | | |
839 | A:1 |-----&gt;| A:2 |-----&gt;| A:3 |-----&gt;| A:4 |-----&gt;| A:5 |
840 |_____| |_____| |_____| |_____| |_____|
842 </pre>
844 <p>The ancestor set of A:5 is:</p>
846 <pre>
848 { A:1, A:2, A:3, A:4, A:5 }
850 </pre>
852 <p>That is, it includes the change that brought A from nothing to A:1,
853 the change from A:1 to A:2, and so on to A:5. From now on, ranges like
854 this will be represented with a more compact notation:</p>
856 <pre>
858 { A:1-5 }
860 </pre>
862 <p>Now assume there's a branch B based, or "rooted", at A:2. (This
863 postulates an entirely different revision history, of course, and the
864 global revision numbers in the diagrams will change to reflect it.)
865 Here's what the project looks like with the branch:</p>
867 <pre>
869 _____ _____ _____ _____ _____ _____
870 | | | | | | | | | | | |
871 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |-----&gt;| A:9 |
872 |_____| |_____| |_____| |_____| |_____| |_____|
875 \ _____ _____ _____
876 \| | | | | |
877 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |
878 |_____| |_____| |_____|
880 </pre>
882 <p>If we produce A:9 by merging the B branch back into the
883 trunk</p>
885 <pre>
887 _____ _____ _____ _____ _____ _____
888 | | | | | | | | | | | |
889 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |---.-&gt;| A:9 |
890 |_____| |_____| |_____| |_____| |_____| / |_____|
893 \ _____ _____ _____ /
894 \| | | | | | /
895 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |---&gt;-'
896 |_____| |_____| |_____|
898 </pre>
900 <p>then what will A:9's ancestor set be?</p>
902 <pre>
904 { A:1, A:2, A:4, A:6, A:8, A:9, B:3, B:5, B:7}
906 </pre>
908 <p>or more compactly:</p>
910 <pre>
912 { A:1-9, B:3-7 }
914 </pre>
916 <p>(It's all right that each file's ranges seem to include non-changes;
917 this is just a notational convenience, and you can think of the
918 non-changes as either not being included, or being included but being
919 null deltas as far as that file is concerned).</p>
921 <p>All changes along the B line are accounted for (changes B:3-7), and
922 so are all changes along the A line, including both the merge and any
923 non-merge-related edits made before the commit.</p>
925 <p>Although this merge happened to include all the branch changes, that
926 needn't be the case. For example, the next time we merge the B
927 line</p>
929 <pre>
931 _____ _____ _____ _____ _____ _____ _____
932 | | | | | | | | | | | | | |
933 | A:1 |--&gt;| A:2 |--&gt;| A:4 |--&gt;| A:6 |--&gt;| A:8 |-.-&gt;| A:9 |-.-&gt;|A:11 |
934 |_____| |_____| |_____| |_____| |_____| | |_____| | |_____|
935 \ / |
936 \ / |
937 \ _____ _____ _____ / _____ |
938 \| | | | | | / | | /
939 | B:3 |--&gt;| B:5 |--&gt;| B:7 |--&gt;|B:10 |-&gt;-'
940 |_____| |_____| |_____| |_____|
942 </pre>
944 <p>Subversion will know that A's ancestry set already contains B:3-7, so
945 only the difference between B:7 and B:10 will be applied. A's new
946 ancestry will be</p>
948 <pre>
950 { A:1-11, B:3-10 }
952 </pre>
954 <p>But why limit ourselves to contiguous ranges? An ancestry set is
955 truly a set &ndash; it can be any subset of the changes available:</p>
957 <pre>
959 _____ _____ _____ _____ _____ _____
960 | | | | | | | | | | | |
961 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |--.--&gt;|A:10 |
962 |_____| |_____| |_____| |_____| |_____| / |_____|
964 | ______________________.__/
965 | / |
966 | / |
967 \ __/_ _|__
968 \ { } { }
969 \ _____ _____ _____ _____
970 \| | | | | | | |
971 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |-----&gt;| B:9 |-----&gt;
972 |_____| |_____| |_____| |_____|
974 </pre>
976 <p>In this diagram, the change from B:3-5 and the change from B:7-9 are
977 merged into a working copy whose ancestry set (so far) is
978 {&nbsp;A:1-8&nbsp;} plus any local changes. After committing, A:10's
979 ancestry set is</p>
981 <pre>
983 { A:1-10, B:5, B:9 }
985 </pre>
987 <p>Clearly, saying "Let's merge branch B into A" is a little ambiguous.
988 It usually means "Merge all the changes accounted for in B's tip into A",
989 but it <em>might</em> mean "Merge the single change that
990 resulted in B's tip into A".</p>
992 <p>Any merge, when viewed in detail, is an application of a particular
993 set of changes &ndash; not necessarily adjacent ones &ndash; to a working
994 copy. The user-level interface may allow some of these changes to be
995 specified implicitly. For example, many merges involve a single,
996 contiguous range of changes, with one or both ends of the range easily
997 deducible from context (i.e., branch root to branch tip). These
998 inference rules are not specified here, but it should be clear in most
999 contexts how they work.</p>
1001 <p>Because each node knows its ancestors, Subversion never merges the
1002 same change twice (unless you force it to). For example, if after the
1003 above merge, you tell Subversion to merge all B changes into A,
1004 Subversion will notice that two of them have already been merged, and so
1005 merge only the other two changes, resulting in a final ancestry set
1006 of:</p>
1008 <pre>
1010 { A:1-10, B:3-9 }
1012 </pre>
1014 <!--
1015 Heh, what about this:
1017 B:3 adds line 3, with the text "foo".
1018 B:5 deletes line 3.
1019 B:7 adds line 3, with the text "foo".
1020 B:9 deletes line 3.
1022 The user first merges B:5 and B:9 into A. If A had that line, it goes away
1023 now, nothing more.
1025 Next, user merges B:3 and B:7 into A. The second merge must conflict.
1027 I'm not sure we need to care about this, I just thought I'd note how even
1028 merges that seem like they ought to be easily composable can still suck. :-)
1031 <p>This description of merging and ancestry applies to both intra- and
1032 inter-repository merges. However, inter-repository merging will probably
1033 not be implemented until a future release of Subversion.</p>
1034 </div> <!-- model.merging-and-ancestry (h3) -->
1035 </div> <!-- model (h2) -->
1037 <div class="h2" id="archi" title="#archi">
1038 <h2>Architecture &mdash; How Subversion's components work together</h2>
1042 <p>Subversion is conceptually divided into a number of separable
1043 layers.</p>
1045 <p>Assuming that the programmatic interface of each layer is
1046 well-defined, it is easy to customize the different parts of the system.
1047 Contributors can write new client apps, new network protocols, new server
1048 processes, new server features, and new storage back-ends.</p>
1050 <p>The following diagram illustrates the "layered" architecture, and
1051 where each particular interface lies.</p>
1053 <pre>
1054 +--------------------+
1055 | commandline or GUI |
1056 | client app |
1057 +----------+--------------------+----------+ &lt;=== Client interface
1058 | Client Library |
1060 | +----+ |
1061 | | | |
1062 +-------+--------+ +--------------+--+----------+ &lt;=== Network interface
1063 | Working Copy | | Remote | | Local |
1064 | Management lib | | Repos Access | | Repos |
1065 +----------------+ +--------------+ | Access |
1066 | neon | | |
1067 +--------------+ | |
1068 ^ | |
1069 / | |
1070 DAV / | |
1071 / | |
1072 v | |
1073 +---------+ | |
1074 | | | |
1075 | Apache | | |
1076 | | | |
1077 +---------+ | |
1078 | mod_DAV | | |
1079 +-------------+ | |
1080 | mod_DAV_SVN | | |
1081 +----------+-------------+--------------+----------+ &lt;=== Filesystem interface
1083 | Subversion Filesystem |
1085 +--------------------------------------------------+
1087 </pre>
1090 <div class="h3" id="archi.client" title="#archi.client">
1091 <h3>Client Layer</h3>
1094 <p>The Subversion client, which may be either
1095 command-line or GUI, draws on three libraries.</p>
1097 <p>The working copy library, <tt class="filename">libsvn_wc</tt>, provides
1098 an API for managing the client's working copy of a project. This
1099 includes operations like renaming or removal of files, patching files,
1100 extracting local diffs, and routines for maintaining administrative
1101 files in the <tt class="filename">.svn/</tt> directory.</p>
1103 <p>The repository_access library, <tt class="filename">libsvn_ra</tt>,
1104 provides an API for exchanging information with a Subversion
1105 repository. This includes the ability to read files, write new
1106 revisions of files, and ask the repository to compare a working copy
1107 against its latest revision. Note that there are two implementations
1108 of this interface: one designed to talk to a repository over a network,
1109 and one designed to work with a repository on local disk. Any number
1110 of interface implementations can exist.</p>
1112 <p>The client library, <tt class="filename">libsvn_client</tt> provides
1113 general client functions such as <tt class="literal">update()</tt> and
1114 <tt class="literal">commit()</tt>, which may involve one or both of the other
1115 two client libraries. <tt class="filename">libsvn_client</tt> should, in
1116 theory, provide an API that allows anyone to write a Subversion client
1117 application.</p>
1119 <p>For details, see <a href="#client">Client &mdash; How the client works</a>.</p>
1120 </div> <!-- archi.client (h3) -->
1122 <div class="h3" id="archi.network" title="#archi.network">
1123 <h3>Network Layer</h3>
1126 <p> The network layer's job is to move the repository API requests
1127 over a wire.</p>
1129 <p>On the client side, a network library
1130 (<tt class="filename">libneon</tt>) translates these requests into a set of
1131 HTTP WebDAV/DeltaV requests. The information is sent over TCP/IP to an
1132 Apache server. Apache is used for the following reasons:</p>
1134 <ul>
1135 <li><p>it is time-tested and extremely
1136 stable;</p></li>
1137 <li><p>it has built-in load-balancing;</p></li>
1138 <li><p>it has built-in proxy and firewall
1139 support;</p></li>
1140 <li><p>it has authentication and encryption
1141 features;</p></li>
1142 <li><p>it allows client-side caching;</p></li>
1143 <li><p>it has an extensible module system</p></li>
1144 </ul>
1146 <p>Our rationale is that any attempt to write a dedicated "Subversion
1147 server" (with a "Subversion protocol") would inevitably end up evolving
1148 towards Apache's already-existing feature set. (However, Subversion's
1149 layered architecture certainly doesn't <em>prevent</em>
1150 anyone from writing a totally new network access
1151 implementation.)</p>
1153 <p>An Apache module (<tt class="filename">mod_dav_svn</tt>) translates the
1154 DAV requests into API calls against a particular repository.</p>
1156 <p>For details, see <a href="#protocol">Protocol &mdash; How the client and server communicate</a>.</p>
1157 </div> <!-- archi.network (h3) -->
1159 <div class="h3" id="archi.fs" title="#archi.fs">
1160 <h3>Filesystem Layer</h3>
1163 <p>When the requests reach a particular repository, they are
1164 interpreted by the <strong class="firstterm">Subversion Filesystem
1165 library</strong>, <tt class="filename">libsvn_fs</tt>. The Subversion
1166 Filesystem is a custom Unix-like filesystem, with a twist: writes are
1167 revisioned and atomic, and no data is ever deleted! This filesystem is
1168 currently implemented on top of a normal filesystem, using Berkeley DB
1169 files.</p>
1171 <p>For a more detailed explanation: see <a href="#server">Server &mdash; How the server works</a>.</p>
1172 </div> <!-- archi.fs (h3) -->
1173 </div> <!-- archi (h2) -->
1175 <div class="h2" id="deltas" title="#deltas">
1176 <h2>Deltas &mdash; How to describe changes</h2>
1180 <p>Subversion uses three kinds of deltas:</p>
1182 <ul>
1184 <li><p>A <strong><strong class="firstterm">tree
1185 delta</strong></strong> describes the difference between two
1186 arbitrary directory trees, the way a traditional patch describes the
1187 difference between two files. For example, the delta between
1188 directories A and B could be applied to A, to produce B.</p>
1190 <p>Tree deltas can also carry ancestry information, indicating how
1191 the files in one tree are related to files in the other tree. And
1192 deltas can describe changes to file meta-information, like permission
1193 bits, creation dates, and so on. The repository and working copy use
1194 deltas to communicate changes.</p></li>
1196 <li><p>A <strong><strong class="firstterm">text
1197 delta</strong></strong> describes changes to a string of
1198 bytes, such as the contents of a file. It is analogous to
1199 traditional patch format, except that it works equally well on binary
1200 and text files, and is not invertible (because context and deleted
1201 data are not recorded).</p></li>
1203 <li><p>A <strong><strong class="firstterm">property
1204 delta</strong></strong> describes changes to a list of named
1205 properties (see <a href="#model.props">Properties</a>).</p></li>
1206 </ul>
1208 <p>The term <strong class="firstterm">delta</strong> without qualification generally
1209 means a tree delta, unless some other meaning is clear from
1210 context.</p>
1212 <p>In the examples below, deltas will be described in XML, which happens
1213 to be Subversion's (now mostly defunct) import/export patch format.
1214 However, note that deltas are an abstract data structure, of which the
1215 XML format is merely one representation. Later, we will describe other
1216 representations: for example, there is a serialized representation
1217 (useful for streaming protocols, among other things), and a db-style
1218 representation, used for repository storage. The various representations
1219 of a given delta are (in theory, anyway) perfectly isomorphic to one
1220 another, since they describe the same underlying structure.</p>
1223 <div class="h3" id="deltas.text" title="#deltas.text">
1224 <h3>Text Deltas</h3>
1227 <p>A text delta describes the difference between two strings of bytes,
1228 the <strong class="firstterm">source</strong> string and the
1229 <strong class="firstterm">target</strong> string. Given a source string and a target
1230 string, we can compute a text delta; given a source string and a delta,
1231 we can reconstruct the target string. However, note that deltas are not
1232 invertible: you cannot always reconstruct the source string given the
1233 target string and delta.</p>
1235 <p>The standard Unix &ldquo;diff&rdquo; format is one possible
1236 representation for text deltas; however, diffs are not ideal for internal
1237 use by a revision control system, for several reasons:</p>
1239 <ul>
1240 <li><p>Diffs are line-oriented, which makes them human-readable,
1241 but sometimes makes them perform poorly on binary
1242 files.</p></li>
1243 <li><p>Diffs represent a series of replacements, exchanging
1244 selected ranges ofthe old text with new text; again, this is easy for
1245 humans to read, butit is more expensive to compute and less compact
1246 than some alternatives.</p></li>
1247 </ul>
1249 <p>Instead, Subversion uses the VDelta binary-diffing algorithm, as
1250 described in <em class="citetitle">Hunt, J. J., Vo, K.-P., and Tichy, W. F. An
1251 empirical study of delta algorithms. Lecture Notes in Computer Science
1252 1167 (July 1996), 49-66.</em> Currently, the output of this
1253 algorithm is stored in a custom data format called
1254 <strong class="firstterm">svndiff</strong>, invented by Greg Hudson &lt;&gt;, a
1255 Subversion developer.</p>
1257 <p>The concrete form of a text delta is a well-formed XML element,
1258 having the following form:</p>
1260 <pre>
1261 &lt;text-delta&gt;<em class="replaceable">data</em>&lt;/text-delta&gt;
1262 </pre>
1264 <p>Here, <em class="replaceable">data</em> is the raw svndiff data,
1265 encoded in the MIME Base64 format.</p>
1266 </div> <!-- deltas.text (h3) -->
1268 <div class="h3" id="deltas.prop" title="#deltas.prop">
1269 <h3>Property Deltas</h3>
1272 <p>A property delta describes changes to a property list, of the sort
1273 associated with files, directories, and directory entries, and revision
1274 numbers (see <a href="#model.props">Properties</a>). A property delta can record
1275 creating, deleting, and changing the text of any number of
1276 properties.</p>
1278 <p>A property delta is an unordered set of name/change pairs. No two
1279 pairs within a given property delta have the same name. A pair's name
1280 indicates the property affected, and the change indicates what happens to
1281 its value. There are two kinds of changes:</p>
1283 <dl>
1284 <dt>set <em class="replaceable">value</em></dt>
1285 <dd><p>Change the value of the named property to the byte
1286 string <em class="replaceable">value</em>. If there is no property
1287 with the given name, one is added to the property
1288 list.</p></dd>
1290 <dt>delete</dt>
1291 <dd><p>Remove the named property from the property
1292 list.</p></dd>
1294 </dl>
1296 <p>At the moment, the <tt class="literal">set</tt> command can either create
1297 or change a property value. However, this simplification means that the
1298 server cannot distinguish between a client which believes it is creating
1299 a value afresh, and a client which believes it is changing the value of
1300 an existing property. It may simplify conflict detection to divide
1301 <tt class="literal">set</tt> into two separate <tt class="literal">add</tt> and
1302 <tt class="literal">change</tt> operations.</p>
1304 <p>In the future, we may add a <tt class="literal">text-delta</tt> change,
1305 which specifies a change to an existing property's value as a text delta.
1306 This would give us a compact way to describe small changes to large
1307 property values.</p>
1309 <p>The concrete form of a property delta is a well-formed XML element,
1310 having the following form:</p>
1312 <pre>
1313 &lt;property-delta&gt;<em class="replaceable">change</em>&hellip;&lt;/property-delta&gt;
1314 </pre>
1316 <p>Each <em class="replaceable">change</em> in a property delta has one of
1317 the following forms:</p>
1319 <pre>
1320 &lt;set name='<em class="replaceable">name</em>'&gt;<em class="replaceable">value</em>&lt;/set&gt;
1321 &lt;delete name='<em class="replaceable">name</em>'/&gt;
1322 </pre>
1324 <p>The <em class="replaceable">name</em> attribute of a
1325 <tt class="literal">set</tt> or <tt class="literal">delete</tt> element gives the
1326 name of the property to change. The <em class="replaceable">value</em> of
1327 a <tt class="literal">set</tt> element gives the new value of the
1328 property.</p>
1330 <p>If either the property name or the property value contains the
1331 characters &lsquo;<tt class="literal">&amp;</tt>&rsquo;,
1332 &lsquo;<tt class="literal">&lt;</tt>&rsquo;, or
1333 &lsquo;<tt class="literal">'</tt>&rsquo;, they should be replaced with the
1334 sequences &lsquo;<tt class="literal">&amp;#38</tt>&rsquo;,
1335 &lsquo;<tt class="literal">&amp;#60</tt>&rsquo;, or
1336 &lsquo;<tt class="literal">&amp;#39</tt>&rsquo;, respectively.</p>
1337 </div> <!-- deltas.prop (h3) -->
1339 <div class="h3" id="deltas.tree" title="#deltas.tree">
1340 <h3>Tree Deltas</h3>
1343 <p>A tree delta describes changes between two directory trees, the
1344 <strong class="firstterm">source tree</strong> and the <strong class="firstterm">target
1345 tree</strong>. Tree deltas can describe copies, renames, and
1346 deletions of files and directories, changes to file contents, and changes
1347 to property lists. A tree delta can also carry information about how the
1348 files in the target tree are derived from the files in the source tree,
1349 if this information is available.</p>
1351 <p>The format for tree deltas described here is easy to compute from a
1352 Subversion working directory, and easy to apply to a Subversion
1353 repository. Furthermore, the size of a tree delta in this format is
1354 independent of the commands used to produce the target tree &mdash; it
1355 depends only on the degree of difference between the source and target
1356 trees.</p>
1358 <p>A tree delta is interpreted in the context of three
1359 parameters:</p>
1361 <ul>
1362 <li><p><em class="replaceable">source-root</em>, the name of the
1363 directory to which this complete tree delta applies,</p></li>
1364 <li><p><em class="replaceable">revision</em>, indicating a
1365 particular revision of &hellip;</p></li>
1366 <li><p><em class="replaceable">source-dir</em>, which is a
1367 directory in the source tree that we are currently modifying to yield
1368 &hellip;</p></li>
1369 <li><p>&hellip; <strong class="firstterm">target-dir</strong> &mdash; the
1370 directory we're constructing.</p></li>
1371 </ul>
1373 <p>When we start interpreting a tree delta,
1374 <em class="replaceable">source-root</em>,
1375 <em class="replaceable">source-dir</em>, and
1376 <em class="replaceable">target-dir</em> are all equal. As we walk the tree
1377 delta, <em class="replaceable">target-dir</em> walks the tree we are
1378 constructing, and <em class="replaceable">source-dir</em> walks the
1379 corresponding portion of the source tree, which we use as the original.
1380 <em class="replaceable">Source-root</em> remains constant as we walk the
1381 delta; we may use it to choose new source trees.</p>
1383 <p>A tree delta is a list of changes of the form</p>
1385 <pre>
1386 &lt;tree-delta&gt;<em class="replaceable">change</em>&hellip;&lt;/tree-delta&gt;
1387 </pre>
1389 <p>which describe how to edit the contents of
1390 <em class="replaceable">source-dir</em> to yield
1391 <em class="replaceable">target-dir</em>. There are three kinds of
1392 changes:</p>
1394 <dl>
1396 <dt>&lt;delete
1397 name='<em class="replaceable">name</em>'/&gt;</dt>
1398 <dd><p><em class="replaceable">Source-dir</em> has an entry
1399 named <em class="replaceable">name</em>, which is not present
1400 in <em class="replaceable">target-dir</em>.</p></dd>
1403 <dt>&lt;add
1404 name='<em class="replaceable">name</em>'&gt;<em class="replaceable">content</em>&lt;/add&gt;</dt>
1405 <dd><p><em class="replaceable">target-dir</em> has an entry
1406 named <em class="replaceable">name</em>, which is not present
1407 in <em class="replaceable">source-dir</em>;
1408 <em class="replaceable">content</em> describes the file or directory
1409 to which the new directory entry refers.</p></dd>
1412 <dt>&lt;open
1413 name='<em class="replaceable">name</em>'&gt;<em class="replaceable">content</em>&lt;/open&gt;</dt>
1414 <dd><p>Both <em class="replaceable">source-dir</em> and
1415 <em class="replaceable">target-dir</em> have an entry
1416 named <em class="replaceable">name</em>, which has changed;
1417 <em class="replaceable">content</em> describes the new file
1418 or directory.</p></dd>
1420 </dl>
1422 <p>Any entries in <em class="replaceable">source-dir</em> whose names
1423 aren't mentioned are assumed to appear unchanged in
1424 <em class="replaceable">target-dir</em>. Thus, an empty
1425 <tt class="literal">tree-delta</tt> element indicates that
1426 <em class="replaceable">target-dir</em> is identical to
1427 <em class="replaceable">source-dir</em>.</p>
1429 <p>In the change descriptions above, each
1430 <em class="replaceable">content</em> takes one of the following
1431 forms:</p>
1433 <dl>
1435 <dt>&lt;file
1436 <em class="replaceable">ancestor</em>&gt;<em class="replaceable">prop-delta</em>
1437 <em class="replaceable">text-delta</em>&lt;/file&gt;</dt>
1439 <dd><p>The given <em class="replaceable">target-dir</em> entry
1440 refers to a file, <em class="replaceable">f</em>.
1441 <em class="replaceable">Ancestor</em> indicates which file in the
1442 source tree <em class="replaceable">f</em> is derived from, if any.
1443 </p>
1445 <p><em class="replaceable">Prop-delta</em> is a property delta
1446 describing how <em class="replaceable">f</em>'s properties differ
1447 from that ancestor; it may be omitted, indicating that the
1448 properties are unchanged.</p>
1450 <p><em class="replaceable">Text-delta</em> is a text delta
1451 describing how to construct <em class="replaceable">f</em> from that
1452 ancestor; it may also be omitted, indicating that
1453 <em class="replaceable">f</em>'s text is identical to its
1454 ancestor's.</p></dd>
1458 <dt>&lt;file <em class="replaceable">ancestor</em>/&gt;</dt>
1460 <dd><p>An abbreviation for <tt class="literal">&lt;file
1461 <em class="replaceable">ancestor</em>&gt;&lt;/file&gt;</tt>
1462 &mdash; a fileelement with no property or text delta, thus
1463 describing a file identicalto its ancestor.</p></dd>
1467 <dt>&lt;directory
1468 <em class="replaceable">ancestor</em>&gt;<em class="replaceable">prop-delta</em>
1469 <em class="replaceable">tree-delta</em>&lt;/directory&gt;</dt>
1471 <dd><p>The given <em class="replaceable">target-dir</em> entry
1472 refers to a subdirectory, <em class="replaceable">sub</em>.
1473 <em class="replaceable">Ancestor</em> indicates which directory in
1474 the source tree <em class="replaceable">sub</em> is derived from, if
1475 any.</p>
1477 <p><em class="replaceable">Prop-delta</em> is a property delta
1478 describing how <em class="replaceable">sub</em>'sproperties differ
1479 from that ancestor; it may be omitted, indicating thatthe
1480 properties are unchanged.</p>
1482 <p><em class="replaceable">Tree-delta</em>
1483 describes how to construct <em class="replaceable">sub</em> from
1484 that ancestor; it may be omitted, indicating that the directory is
1485 identical to its ancestor. <em class="replaceable">Tree-delta</em>
1486 should be interpreted with a new
1487 <em class="replaceable">target-dir</em> of
1488 <tt class="filename"><em class="replaceable">target-dir</em>/<em class="replaceable">name</em></tt>.</p>
1490 <p>Since <em class="replaceable">tree-delta</em> is itself a
1491 complete tree delta structure, tree deltas are themselves trees,
1492 whose structure is a subgraph of the target tree.</p></dd>
1496 <dt>&lt;directory
1497 <em class="replaceable">ancestor</em>/&gt;</dt>
1499 <dd><p>An abbreviation for <tt class="literal">&lt;directory
1500 <em class="replaceable">ancestor</em>&gt;&lt;/directory&gt;</tt>
1501 &mdash; a directory element with no property or tree delta, thus
1502 describing a directory identical to its ancestor.</p></dd>
1504 </dl>
1506 <p>The <em class="replaceable">content</em> of a <tt class="literal">add</tt> or
1507 <tt class="literal">open</tt> tag may also contain a property delta, describing
1508 changes to the properties of that <em>directory
1509 entry</em>.</p>
1511 <p>In the <tt class="literal">file</tt> and <tt class="literal">directory</tt>
1512 elements described above, each <em class="replaceable">ancestor</em> has
1513 one of the following forms:</p>
1515 <dl>
1517 <dt>ancestor='<em class="replaceable">path</em>'</dt>
1519 <dd><p>The ancestor of the new or changed file or directory is
1520 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>,
1521 in <em class="replaceable">revision</em>. When this appears as an
1522 attribute of a <tt class="literal">file</tt> element, the element's text
1523 delta should be applied to
1524 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>.
1525 When this appears as an attribute of a <tt class="literal">directory</tt>
1526 element,
1527 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>
1528 should be the new <em class="replaceable">source-dir</em> for
1529 interpreting that element's tree delta.</p></dd>
1533 <dt>new='true'</dt>
1535 <dd><p>This indicates that the file or directory has no
1536 ancestor in the source tree. When followed by a
1537 <em class="replaceable">text-delta</em>, that delta should be applied
1538 to the empty file to yield the new text; when followed by a
1539 <em class="replaceable">tree-delta</em>, that delta should be
1540 evaluated as if <em class="replaceable">source-dir</em> were an
1541 imaginary empty directory.</p></dd>
1545 <dt><em class="replaceable">nothing</em></dt>
1547 <dd><p>If neither an <tt class="literal">ancestor</tt> nor a
1548 <tt class="literal">new</tt> attribute is given, this is an abbreviation
1550 <tt class="literal">ancestor='<em class="replaceable">source-dir</em>/<em class="replaceable">name</em>'</tt>,
1551 with the same revision number. This makes the common case &mdash;
1552 files or directories modified in place &mdash; more
1553 compact.</p></dd>
1555 </dl>
1557 <p>If the <em class="replaceable">ancestor</em> spec is not
1558 <tt class="literal">new='true'</tt>, it may also contain the text
1559 <tt class="literal">revision='<em class="replaceable">rev</em>'</tt>, indicating
1560 a new value for <em class="replaceable">revision</em>, in which we should
1561 find the ancestor.</p>
1563 <p>If a filename or path appearing as a <em class="replaceable">name</em>
1564 or <em class="replaceable">path</em> in the description above contains the
1565 characters &lsquo;<tt class="literal">&amp;</tt>&rsquo;,
1566 &lsquo;<tt class="literal">&lt;</tt>&rsquo;, or
1567 &lsquo;<tt class="literal">'</tt>&rsquo;, they should be replaced with the
1568 sequences &lsquo;<tt class="literal">&amp;#38;</tt>&rsquo;,
1569 &lsquo;<tt class="literal">&amp;#60;</tt>&rsquo;, or
1570 &lsquo;<tt class="literal">&amp;#39;</tt>&rsquo;, respectively.</p>
1572 <p>Suppose we have the following source tree:</p>
1574 <pre>
1575 /dir1/file1
1576 file2
1577 dir2/file3
1578 file4
1579 dir3/file5
1580 file6
1581 </pre>
1583 <p>If we edit the contents of <tt class="filename">/dir1/file1</tt>, we can
1584 describe the effect on the tree with the following tree delta, to be
1585 applied to the root:</p>
1587 <pre>
1588 &lt;tree-delta&gt;
1589 &lt;open name='dir1'&gt;
1590 &lt;directory&gt;
1591 &lt;tree-delta&gt;
1592 &lt;open name='file1'&gt;
1593 &lt;file&gt;<em class="replaceable">text-delta</em>&lt;/file&gt;
1594 &lt;/open&gt;
1595 &lt;/tree-delta&gt;
1596 &lt;/directory&gt;
1597 &lt;/open&gt;
1598 &lt;/tree-delta&gt;
1599 </pre>
1601 <p>The outer <tt class="literal">tree-delta</tt> element describes the changes
1602 made to the root directory. Within the root directory, there are changes
1603 in <tt class="filename">dir1</tt>, described by the nested
1604 <tt class="literal">tree-delta</tt>. Within <tt class="filename">/dir1</tt>, there
1605 are changes in <tt class="filename">file1</tt>, described by the
1606 <em class="replaceable">text-delta</em>.</p>
1608 <p>If we had edited both <tt class="filename">/dir1/file1</tt> and
1609 <tt class="filename">/dir1/file2</tt>, then there would simply be two
1610 <tt class="literal">open</tt> elements in the inner
1611 <tt class="literal">tree-delta</tt>.</p>
1613 <p>As another example, starting from the same source tree, suppose we
1614 rename <tt class="filename">/dir1/file1</tt> to
1615 <tt class="filename">/dir1/file8</tt>:</p>
1617 <pre>
1618 &lt;tree-delta&gt;
1619 &lt;open name='dir1'&gt;
1620 &lt;directory&gt;
1621 &lt;tree-delta&gt;
1622 &lt;delete name='file1'/&gt;
1623 &lt;add name='file8'&gt;
1624 &lt;file ancestor='/dir1/file1'/&gt;
1625 &lt;/add&gt;
1626 &lt;/tree-delta&gt;
1627 &lt;/directory&gt;
1628 &lt;/open&gt;
1629 &lt;/tree-delta&gt;
1630 </pre>
1632 <p>As above, the inner <tt class="literal">tdelta</tt> describes how
1633 <tt class="filename">/dir1</tt> has changed: the entry for
1634 <tt class="filename">/dir1/file1</tt> has disappeared, but there is a new
1635 entry, <tt class="filename">/dir1/file8</tt>, which is derived from and
1636 textually identical to <tt class="filename">/dir1/file1</tt> in the source
1637 directory. This is just an indirect way of describing the rename.</p>
1639 <p>Why is it necessary to be so indirect? Consider the delta
1640 representing the result of:</p>
1642 <ol>
1643 <li><p>renaming <tt class="filename">/dir1/file1</tt> to
1644 <tt class="filename">/dir1/tmp</tt>,</p></li>
1645 <li><p>renaming <tt class="filename">/dir1/file2</tt> to
1646 <tt class="filename">/dir1/file1</tt>, and</p></li>
1647 <li><p>renaming <tt class="filename">/dir1/tmp</tt> to
1648 <tt class="filename">/dir1/file2</tt></p></li>
1649 </ol>
1651 <p>(in other words, exchanging <tt class="filename">file1</tt> and
1652 <tt class="filename">file2</tt>):</p>
1654 <pre>
1655 &lt;tree-delta&gt;
1656 &lt;open name='dir1'&gt;
1657 &lt;directory&gt;
1658 &lt;tree-delta&gt;
1659 &lt;open name='file1'&gt;
1660 &lt;file ancestor='/dir1/file2'/&gt;
1661 &lt;/open&gt;
1662 &lt;open name='file2'&gt;
1663 &lt;file ancestor='/dir1/file1'/&gt;
1664 &lt;/open&gt;
1665 &lt;/tree-delta&gt;
1666 &lt;/directory&gt;
1667 &lt;/open&gt;
1668 &lt;/tree-delta&gt;
1669 </pre>
1671 <p>The indirectness allows the tree delta to capture an arbitrary
1672 rearrangement without resorting to temporary filenames.</p>
1674 <p>Another example, starting from the same source tree:</p>
1676 <ol>
1677 <li><p>rename <tt class="filename">/dir1/dir2</tt> to
1678 <tt class="filename">/dir1/dir4</tt>,</p></li>
1679 <li><p>rename <tt class="filename">/dir1/dir3</tt> to
1680 <tt class="filename">/dir1/dir2</tt>, and</p></li>
1681 <li><p>move <tt class="filename">file3</tt> from
1682 <em class="replaceable">/dir1/dir4</em> to
1683 <em class="replaceable">/dir1/dir2</em>.</p></li>
1684 </ol>
1686 <p>Note that <tt class="filename">file3</tt>'s path has remained the same,
1687 even though the directories around it have changed. Here is the tree
1688 delta:</p>
1690 <pre>
1691 &lt;tree-delta&gt;
1692 &lt;open name='dir1'&gt;
1693 &lt;directory&gt;
1694 &lt;tree-delta&gt;
1695 &lt;open name='dir2'&gt;
1696 &lt;directory ancestor='/dir1/dir3'&gt;
1697 &lt;tree-delta&gt;
1698 &lt;add name='file3'&gt;
1699 &lt;file ancestor='/dir1/dir2/file3'/&gt;
1700 &lt;/add&gt;
1701 &lt;/tree-delta&gt;
1702 &lt;/directory&gt;
1703 &lt;/open&gt;
1704 &lt;delete name='dir3'/&gt;
1705 &lt;add name='dir4'&gt;
1706 &lt;directory ancestor='/dir1/dir2'&gt;
1707 &lt;tree-delta&gt;
1708 &lt;delete name='file3'/&gt;
1709 &lt;/tree-delta&gt;
1710 &lt;/directory&gt;
1711 &lt;/add&gt;
1712 &lt;/tree-delta&gt;
1713 &lt;/directory&gt;
1714 &lt;/open&gt;
1715 &lt;/tree-delta&gt;
1716 </pre>
1718 <p>In other words:</p>
1720 <ul>
1721 <li><p><tt class="filename">/dir1</tt> has changed;</p></li>
1722 <li><p>the new directory <tt class="filename">/dir1/dir2</tt> is
1723 derived from the old <tt class="filename">/dir1/dir3</tt>, and contains a
1724 new entry <tt class="filename">file3</tt>, derived from the old
1725 <tt class="filename">/dir1/dir2/file3</tt>;</p></li>
1726 <li><p>there is no longer any <tt class="filename">/dir1/dir3</tt>;
1727 and</p></li>
1728 <li><p>the new directory <tt class="filename">/dir1/dir4</tt> is
1729 derived from the old <tt class="filename">/dir1/dir2</tt>, except that its
1730 entry for <tt class="filename">file3</tt> is now gone.</p></li>
1732 </ul>
1734 <p>Some more possible maneuvers, left as exercises for the
1735 reader:</p>
1737 <ul>
1738 <li><p>Delete <tt class="filename">dir2</tt>, and then create a file
1739 named <tt class="filename">dir2</tt>.</p></li>
1740 <li><p>Rename <tt class="filename">/dir1/dir2</tt> to
1741 <tt class="filename">/dir1/dir4</tt>; move <tt class="filename">file2</tt>
1742 into <tt class="filename">/dir1/dir4</tt>; and move
1743 <tt class="filename">file3</tt> into
1744 <em class="replaceable">/dir1/dir3</em>.</p></li>
1745 <li><p>Move <tt class="filename">dir2</tt> into
1746 <tt class="filename">dir3</tt>, and move <tt class="filename">dir3</tt> into
1747 <tt class="filename">/</tt>.</p></li>
1748 </ul>
1749 </div> <!-- deltas.tree (h3) -->
1751 <div class="h3" id="deltas.postfix-text" title="#deltas.postfix-text">
1752 <h3>Postfix Text Deltas</h3>
1755 <p>It is sometimes useful to represent a set of changes to a tree
1756 without providing text deltas in the middle of the stream. Text deltas
1757 are often large and expensive to compute, and tree deltas can be useful
1758 without them. For example, one can detect whether two changes might
1759 conflict &mdash; whether they change the same file, for example &mdash;
1760 without knowing exactly how the conflicting files changed.</p>
1762 <p>For this reason, our XML representation of a tree delta allows the
1763 text deltas to come <em>after</em> the &lt;/tree-delta&gt;
1764 closure. This allows the client to receive early notice of conflicts:
1765 during a <tt class="literal">svn commit</tt> command, the client sends a
1766 tree-delta to the server, which can check for skeletal conflicts and
1767 reject the commit, before the client takes the time to transmit the
1768 (possibly large) textual changes. This potentially saves quite a bit of
1769 network traffic.</p>
1771 <p>In terms of XML, postfix text deltas are split into two parts. The
1772 first part appears "in-line" and contains a reference ID. The second
1773 part appears after the tree delta is complete. Here's an example:</p>
1775 <pre>
1776 &lt;tree-delta&gt;
1777 &lt;open name="foo.c"&gt;
1778 &lt;file&gt;
1779 &lt;text-delta-ref id="123"&gt;
1780 &lt;/file&gt;
1781 &lt;/open&gt;
1782 &lt;add name="bar.c"&gt;
1783 &lt;file&gt;
1784 &lt;text-delta-ref id="456"&gt;
1785 &lt;/file&gt;
1786 &lt;/add&gt;
1787 &lt;/tree-delta&gt;
1788 &lt;text-delta id="123"&gt;<em>data</em>&lt;/text-delta&gt;
1789 &lt;text-delta id="456"&gt;<em>data</em>&lt;/text-delta&gt;
1790 </pre>
1792 </div> <!-- deltas.postfix-text (h3) -->
1794 <div class="h3" id="deltas.serializing-via-editor" title="#deltas.serializing-via-editor">
1795 <h3>Serializing Deltas via the "Editor" Interface</h3>
1798 <p>The static XML forms above are useful as an import/export format, and
1799 as a visualization aid, but we also need a way to express a delta as a
1800 <em>series of operations</em>, to implement directory tree
1801 diffing and patching. Subversion defines a standard set of such
1802 operations in the vtable <tt class="literal">svn_delta_edit_fns_t</tt>, a set
1803 of function prototypes which anyone may implement (see
1804 <tt class="filename">svn_delta.h</tt>).</p>
1806 <p>Each function in an instance of <tt class="literal">svn_delta_editor_t</tt>
1807 (colloquially known as an <strong class="firstterm">editor</strong>) implements some
1808 distinct subtask of editing a directory tree. In fact, if you compare
1809 the editor function prototypes to the XML elements described previously,
1810 you'll notice a fairly strict correspondence: there's one function for
1811 replacing a directory, another function for replacing a file, one for
1812 adding a directory, another for adding a file, a function for deleting,
1813 and so on.</p>
1815 <p>Although the editor interface was designed around the general idea of
1816 making changes to a directory tree, a specific implementation's behavior
1817 depends on its role. For example, the versioning filesystem library
1818 offers an editor that creates new revisions, while the working copy
1819 library offers an editor that updates working copies. And the network
1820 layer offers an editor that turns editing calls into wire protocol, which
1821 is then converted back into editing calls on the other side! All of
1822 these different tasks can share a single interface, because they are all
1823 fundamentally about the same thing: expressing and applying differences
1824 between directory trees.</p>
1826 <p>Like the XML forms, a series of editor calls must follow certain
1827 nesting conventions; these conventions are implicit in the interface, in
1828 that some of the functions take arguments that can only be obtained from
1829 previous calls to other editor functions.</p>
1831 <p>Editors can best be understood by watching one work on a real
1832 directory tree. For example:</p>
1834 <!-- kff todo: fooo working here. -->
1836 <p>Suppose that the user has made a number of local changes to her
1837 working copy and wants to commit them to the repository. Let's represent
1838 her changes with the same tree-delta from a previous example. Notice
1839 that she has also made textual modifications to
1840 <tt class="filename">file3</tt>; hence the in-line
1841 <tt class="literal">&lt;text-delta&gt;</tt>:</p>
1843 <pre>
1844 &lt;tree-delta&gt;
1845 &lt;open name='dir1'&gt;
1846 &lt;directory&gt;
1847 &lt;tree-delta&gt;
1848 &lt;open name='dir2'&gt;
1849 &lt;directory ancestor='/dir1/dir3'&gt;
1850 &lt;tree-delta&gt;
1851 &lt;add name='file3'&gt;
1852 &lt;file ancestor='/dir1/dir2/file3'&gt;
1853 &lt;text-delta&gt;<em>data</em>&lt;/text-delta&gt;
1854 &lt;/file&gt;
1855 &lt;/add&gt;
1856 &lt;/tree-delta&gt;
1857 &lt;/directory&gt;
1858 &lt;/open&gt;
1859 &lt;delete name='dir3'/&gt;
1860 &lt;add name='dir4'&gt;
1861 &lt;directory ancestor='/dir1/dir2'&gt;
1862 &lt;tree-delta&gt;
1863 &lt;delete name='file3'/&gt;
1864 &lt;/tree-delta&gt;
1865 &lt;/directory&gt;
1866 &lt;/add&gt;
1867 &lt;/tree-delta&gt;
1868 &lt;/directory&gt;
1869 &lt;/open&gt;
1870 &lt;/tree-delta&gt;
1871 </pre>
1873 <p>So how does the client send this information to the server?</p>
1875 <p>In a nutshell: the tree-delta is <em>streamed</em> over
1876 the network, as a series of individual commands given in depth-first
1877 order.</p>
1879 <p>Let's be more specific. The server presents the client with an
1880 object of type <tt class="literal">struct svn_delta_edit_fns_t</tt>,
1881 colloquially known as an <strong class="firstterm">editor</strong>. An editor is
1882 really just table of functions; each function makes a change to a
1883 filesystem. Agent A (who has a private filesystem) presents an editor to
1884 agent B. Agent B then calls the editor's functions to change A's
1885 filesystem. B is said to be <strong class="firstterm">driving</strong> the
1886 editor.</p>
1888 <p>As Karl Fogel likes to describe the process, if one thinks of the
1889 tree-delta as a lion, the editor is a "hoop" that the lion jumps through
1890 &ndash; each portion of the lion being decomposed through time.</p>
1892 <p>B cannot call the functions in any willy-nilly order; there are some
1893 logical restrictions. In particular, as B drives the editor, it receives
1894 opaque data structures which represent directories and files. It must
1895 use and pass these structures, known as <strong class="firstterm">batons</strong>, to
1896 make further function calls.</p>
1898 <p>As an example, let's watch how the client would transmit the above
1899 tree-delta to the repository. (The description below is slightly
1900 simplified. For exact interface details, see
1901 <tt class="filename">subversion/include/svn_delta.h</tt>.)</p>
1903 <p>[Note: in the examples below, and throughout Subversion's code base,
1904 you'll see references to 'baton' objects. This is simply a project
1905 convention, a name given to structures that define contexts for
1906 functions. Many APIs call these structures 'userdata'. In Subversion,
1907 we like the term 'baton', because it reminds us of one function
1908 &ldquo;handing off&rdquo; context to another function.]</p>
1910 <ol>
1911 <li><p>The repository hands an "editor" to the
1912 client.</p></li>
1914 <li><p>The client begins by calling <tt class="literal">root_baton =
1915 editor-&gt;open_root();</tt> The client now has an opaque
1916 object, <strong class="firstterm">root_baton</strong>, which represents the root
1917 of the repository's filesystem.</p></li>
1919 <li><p><tt class="literal">dir1_baton = editor-&gt;open_dir("dir1",
1920 root_baton);</tt> Notice that <em>root_baton</em>
1921 gives the client free license to make any changes it wants in the
1922 repository's root directory &ndash; until, of course, it calls
1923 <tt class="literal">editor-&gt;close_dir(root_baton)</tt>. The first
1924 change made was a replacement of <tt class="filename">dir1</tt>. In
1925 return, the client now has a new opaque data structure that can be
1926 used to change <tt class="filename">dir1</tt>.</p></li>
1928 <li><p><tt class="literal">dir2_baton = editor-&gt;open_dir("dir2",
1929 "/dir1/dir3", dir1_baton);</tt> The
1930 <em>dir1_baton</em> is now used to open
1931 <tt class="filename">dir2</tt> with a directory whose ancestor is
1932 <tt class="filename">/dir1/dir3</tt>.</p></li>
1934 <li><p><tt class="literal">file_baton = editor-&gt;add_file("file3",
1935 "/dir1/dir2/file3", dir2_baton);</tt> Edits are now made to
1936 <tt class="filename">dir2</tt> (using <em>dir2_baton</em>).
1937 In particular, a new file is added to this directory whose ancestor
1938 is <tt class="filename">/dir1/dir2/file3</tt>.</p></li>
1940 <li><p>Now the text-delta associated with
1941 <em>file_baton</em> needs to be transmitted:
1942 <tt class="literal">window_handler =
1943 editor-&gt;apply_textdelta(file_baton);</tt> Text-deltas
1944 themselves, for network efficiency, are streamed in "chunks". So
1945 instead of receiving a baton object, we now have a routine that is
1946 able to receive any number of small "windows" of text-delta data.We
1947 won't go into the details of the <tt class="literal">svn_txdelta_*</tt>
1948 functions right here; but suffice it to say that these routines are
1949 used for sending svndiff data to the
1950 <em>window_handler</em> routine.</p></li>
1952 <li><p><tt class="literal">editor-&gt;close_file(file_baton);</tt> The
1953 client is done sending the file's text-delta, so it releases the file
1954 baton.</p></li>
1956 <li><p><tt class="literal">editor-&gt;close_dir(dir2_baton));</tt> The
1957 client is done making changes to <tt class="filename">dir2</tt>, so it
1958 releases its baton as well.</p></li>
1960 <li><p>The client isn't yet finished with
1961 <tt class="filename">dir1</tt>, however; it makes two more edits:
1962 <tt class="literal">editor-&gt;delete_item("dir3", dir1_baton);</tt>
1963 <tt class="literal">dir4_baton = editor-&gt;add_dir("dir4", "/dir1/dir2",
1964 dir1_baton);</tt> <em>(The function's name is
1965 <tt class="literal">delete_item</tt> rather than
1966 <tt class="literal">delete</tt> to avoid gratuitous incompatibility with
1967 C++, where <tt class="literal">delete</tt> is a reserved
1968 keyword.)</em></p></li>
1970 <li><p>Within the directory <tt class="filename">dir4</tt> (whose
1971 ancestry is <tt class="filename">/dir1/dir2</tt>), the client removes a
1972 file: <tt class="literal">editor-&gt;delete_item("file3",
1973 dir4_baton);</tt></p></li>
1975 <li><p>The client is now finished with both
1976 <tt class="filename">dir4</tt>, as well as its
1977 parent <tt class="filename">dir1</tt>:
1978 <tt class="literal">editor-&gt;close_dir(dir4_baton);</tt>
1979 <tt class="literal">editor-&gt;close_dir(dir1_baton);</tt></p></li>
1981 <li><p>The entire tree-delta is complete. The repository knows
1982 this when the root directory is closed:
1983 <tt class="literal">editor-&gt;close_dir(root_baton);</tt></p></li>
1985 </ol>
1987 <p>Of course, at any point above, the repository may reject an edit. If
1988 this is the case, the client aborts the transmission and the repository
1989 hasn't changed a bit. (Thank goodness for transactions!)</p>
1991 <p>Note, however, that this "editor interface" works in the other
1992 direction as well. When the repository wishes to update a client's
1993 working copy, it is the <em>client's</em> reponsibility to
1994 give a custom editor-object to the server, and the
1995 <em>server</em> is the editor-driver.</p>
1997 <p>Here are the main advantages of this interface:</p>
1999 <ul>
2000 <li><p><em>Consistency</em>. Tree-deltas move
2001 across the network, in both directions, using the same
2002 interface.</p></li>
2003 <li><p><em>Flexibility</em>. Custom
2004 editor-implementations can be written to do anything one might want;
2005 the editor-driver has no idea what is happening on the other side of
2006 the interface. For example, an editor might
2007 </p><ul>
2008 <li><p>Output XML that matches the tree-delta DTD
2009 above;</p></li>
2010 <li><p>Output human-readable descriptions of the edits
2011 taking place;</p></li>
2012 <li><p>Modify a filesystem</p></li>
2013 </ul><p>
2014 </p></li>
2015 </ul>
2017 <p>Whatever the case, it's easy to "swap" editors around, and make
2018 client and server do new and interesting things.</p>
2019 </div> <!-- deltas.serializing-via-editor (h3) -->
2020 </div> <!-- deltas (h2) -->
2022 <div class="h2" id="client" title="#client">
2023 <h2>Client &mdash; How the client works</h2>
2027 <p>The Subversion client is built on three libraries. One operates
2028 strictly on the working copy and does not talk to the repository.
2029 Another talks to the repository but never changes the working copy. The
2030 third library uses the first two to provide operations such as
2031 <tt class="literal">commit</tt> and <tt class="literal">update</tt> &ndash;
2032 operations which need to both talk to the repository and change the
2033 working copy.</p>
2035 <p>The initial client is a Unix-style command-line tool (like standard
2036 CVS), but it should be easy to write a GUI client as well, based on the
2037 same libraries. The libraries capture the core Subversion functionality,
2038 segregating it from user interface concerns.</p>
2040 <p>This chapter describes the libraries, and the physical layout of
2041 working copies.</p>
2044 <div class="h3" id="client.wc" title="#client.wc">
2045 <h3>Working copies and the working copy library</h3>
2048 <p>Working copies are client-side directory trees containing both
2049 versioned data and Subversion administrative files. The functions in the
2050 working copy management library are the only functions in Subversion
2051 which operate on these trees.</p>
2053 <div class="h4" id="client.wc.layout" title="#client.wc.layout">
2054 <h4>The layout of working copies</h4>
2057 <p>This section gives an overview of how
2058 working copies are arranged physically, but is not a full specification
2059 of working copy layout.</p>
2061 <p>As with CVS, Subversion working copies are simply directory trees
2062 with special administrative subdirectories, in this case named ".svn"
2063 instead of "CVS":</p>
2065 <pre>
2066 myproj
2067 / | \
2068 _____________/ | \______________
2069 / | \
2070 .svn src doc
2071 ___/ | \___ /|\ ___/ \___
2072 | | | / | \ | |
2073 base ... ... / | \ myproj.texi .svn
2074 / | \ ___/ | \___
2075 ____/ | \____ | | |
2076 | | | base ... ...
2077 .svn foo.c bar.c |
2078 ___/ | \___ |
2079 | | | |
2080 base ... ... myproj.texi
2081 ___/ \___
2083 foo.c bar.c
2085 </pre>
2087 <p>Each <tt class="filename">dir/.svn/</tt> directory records the files in
2088 <tt class="filename">dir</tt>, their revision numbers and property lists,
2089 pristine revisions of all the files (for client-side delta generation),
2090 the repository from which <tt class="filename">dir</tt> came, and any local
2091 changes (such as uncommitted adds, deletes, and renames) that affect
2092 <tt class="filename">dir</tt>.</p>
2094 <p>Although it would often be possible to deduce certain information
2095 (such as the original repository) by examining parent directories, this
2096 is avoided in favor of making each directory be as much a
2097 self-contained unit as possible.</p>
2099 <p>For example, immediately after a checkout the administrative
2100 information for the entire working tree <em>could</em> be
2101 stored in one top-level file. But subdirectories instead keep track of
2102 their own revision information. This would be necessary anyway once
2103 the user starts committing new revisions for particular files, and it
2104 also makes it easier for the user to prune a big, complete tree into a
2105 small subtree and still have a valid working copy.</p>
2107 <p>The <tt class="filename">.svn</tt> subdir contains:</p>
2109 <ul>
2110 <li><p>A <tt class="filename">format</tt> file, which indicates
2111 which version of the working copy adm format this is (so future
2112 clients can be backwards compatible easily).</p></li>
2114 <li><p>A <tt class="filename">text-base</tt> directory,
2115 containing the pristine repository revisions of the files in the
2116 corresponding working directory</p></li>
2118 <li><p>An <tt class="filename">entries</tt> file, which holds
2119 revision numbers and other information for this directory and its
2120 files, and records the presence of subdirs. It also contains the
2121 repository URLs that each file and directory came from. It may
2122 help to think of this file as the functional equivalent of the
2123 <tt class="filename">CVS/Entries</tt> file.</p></li>
2125 <li><p>A <tt class="filename">props</tt> directory, containing
2126 property names and values for each file in the working
2127 directory.</p></li>
2129 <li><p>A <tt class="filename">prop-base</tt> directory,
2130 containing pristine property names and values for each file in
2131 the working directory.</p></li>
2133 <li><p>A <tt class="filename">dir-props</tt> file, recording
2134 properties for this directory.</p></li>
2136 <li><p>A <tt class="filename">dir-prop-base</tt> file, recording
2137 pristine properties for this directory.</p></li>
2139 <li><p>A <tt class="filename">lock</tt> file, whose presence
2140 implies that some client is currently operating on the
2141 administrative area.</p></li>
2143 <li><p>A <tt class="filename">tmp</tt> directory, for holding
2144 scratch-work and helping make working copy operations more
2145 crash-proof.</p></li>
2147 <li><p>A <tt class="filename">log</tt> file. If present,
2148 indicates a list of actions that need to be taken to complete a
2149 working-copy-operation that is still "in
2150 progress".</p></li>
2151 </ul>
2153 <p>You can read much more about these files in the file
2154 <tt class="filename">subversion/libsvn_wc/README</tt>.</p>
2155 </div> <!-- client.wc.layout (h4) -->
2157 <div class="h4" id="client.wc.library" title="#client.wc.library">
2158 <h4>The working copy management library</h4>
2161 <ul>
2162 <li><p><strong>Requires:</strong>
2163 </p><ul>
2164 <li><p>a working copy</p></li>
2165 </ul><p>
2166 </p></li>
2167 <li><p><strong>Provides:</strong>
2168 </p><ul>
2169 <li><p>ability to manipulate the working copy's versioned
2170 data</p></li>
2171 <li><p>ability to manipulate the working copy's
2172 administrative files</p></li>
2173 </ul><p>
2174 </p></li>
2175 </ul>
2177 <p>This library performs "offline" operations on the working copy, and
2178 lives in <tt class="filename">subversion/libsvn_wc/</tt>.</p>
2180 <p>The API for <em class="replaceable">libsvn_wc</em> is always
2181 evolving; please read the header file for a detailed description:
2182 <tt class="filename">subversion/include/svn_wc.h</tt>.</p>
2183 </div> <!-- client.wc.library (h4) -->
2184 </div> <!-- client.wc (h3) -->
2186 <div class="h3" id="client.libsvn_ra" title="#client.libsvn_ra">
2187 <h3>The repository access library</h3>
2190 <ul>
2191 <li><p><strong>Requires:</strong>
2192 </p><ul>
2193 <li><p>network access to a Subversion
2194 server</p></li>
2195 </ul><p>
2196 </p></li>
2197 <li><p><strong>Provides:</strong>
2198 </p><ul>
2199 <li><p>the ability to interact with a
2200 repository</p></li>
2201 </ul><p>
2202 </p></li>
2203 </ul>
2205 <p>This library performs operations involving communication with the
2206 repository.</p>
2208 <p>The interface defined in
2209 <tt class="filename">subversion/include/svn_ra.h</tt> provides a uniform
2210 interface to both local and remote repository access.</p>
2212 <p>Specifically, <em class="replaceable">libsvn_ra_dav</em> will provide
2213 this interface and speak to repositories using DAV requests. At some
2214 future point, another library <em class="replaceable">libsvn_ra_local</em>
2215 will provide the same interface &ndash; but will link directly to the
2216 filesystem library for accessing local disk repositories.</p>
2217 </div> <!-- client.libsvn_ra (h3) -->
2219 <div class="h3" id="client.libsvn_client" title="#client.libsvn_client">
2220 <h3>The client operation library</h3>
2223 <ul>
2224 <li><p><strong>Requires:</strong>
2225 </p><ul>
2226 <li><p>the working copy management library</p></li>
2227 <li><p>a repository access library</p></li>
2228 </ul><p>
2229 </p></li>
2230 <li><p><strong>Provides:</strong>
2231 </p><ul>
2232 <li><p>all client-side Subversion commands</p></li>
2233 </ul><p>
2234 </p></li>
2235 </ul>
2237 <p>These functions correspond to user-level client commands. In theory,
2238 any client interface (command-line, GUI, emacs, Python, etc.) should be
2239 able to link to <em class="replaceable">libsvn_client</em> and have the
2240 ability to act as a full-featured Subversion client.</p>
2242 <p>Again, the detailed API can be found in
2243 <tt class="filename">subversion/include/svn_client.h</tt>.</p>
2244 </div> <!-- client.libsvn_client (h3) -->
2245 </div> <!-- client (h2) -->
2247 <div class="h2" id="protocol" title="#protocol">
2248 <h2>Protocol &mdash; How the client and server communicate</h2>
2252 <p>The wire protocol is the connection between the servers, and the
2253 client-side <em>Repository Access (RA) API</em>, provided by
2254 <tt class="literal">libsvn_ra</tt>. Note that <tt class="literal">libsvn_ra</tt> is
2255 in fact only a plugin manager, which delegates the actual task of
2256 communicating with a server to one of a selection of back-end modules (the
2257 <tt class="literal">libsvn_ra_*</tt> libraries). Therefore, there is not just
2258 one Subversion protocol - in fact, at present, there are two:</p>
2260 <ul>
2261 <li><p>The HTTP/WebDAV/DeltaV based protocol, implemented by the
2262 <tt class="literal">mod_dav_svn</tt> Apache 2 server module, and by two
2263 independent RA modules, <tt class="literal">libsvn_ra_dav</tt> and
2264 <tt class="literal">libsvn_ra_serf</tt>.</p></li>
2266 <li><p>The custom-designed protocol built directly upon TCP,
2267 implemented by the <tt class="literal">svnserve</tt> server, and the
2268 <tt class="literal">libsvn_ra_svn</tt> RA module.</p></li>
2269 </ul>
2272 <div class="h3" id="protocol.webdav" title="#protocol.webdav">
2273 <h3>The HTTP/WebDAV/DeltaV based protocol</h3>
2276 <p>The Subversion client library <tt class="literal">libsvn_ra_dav</tt> uses
2277 the <em>Neon</em> library to generate WebDAV DeltaV requests
2278 and sends them to a "Subversion-aware" Apache server.</p>
2280 <p>This Apache server is running <tt class="literal">mod_dav</tt> and
2281 <tt class="literal">mod_dav_svn</tt>, which translates the requests into
2282 Subversion filesystem calls.</p>
2284 <p>For more info, see <a href="#archi.network">Network Layer</a>.</p>
2286 <p>For a detailed description of exactly how Greg Stein
2287 <em class="email">gstein@lyra.org</em> is mapping the WebDAV DeltaV spec to
2288 Subversion, see his paper: <a href="http://svn.collab.net/repos/svn/trunk/www/webdav-usage.html">http://svn.collab.net/repos/svn/trunk/www/webdav-usage.html</a>
2289 </p>
2291 <p>For more information on WebDAV and the DeltaV extensions, see
2292 <a href="http://www.webdav.org">http://www.webdav.org</a> and
2293 <a href="http://www.webdav.org/deltav">http://www.webdav.org/deltav</a>.
2294 </p>
2296 <p>For more information on <em>Neon</em>, see
2297 <a href="http://www.webdav.org/neon">http://www.webdav.org/neon</a>.</p>
2298 </div> <!-- protocol.webdav (h3) -->
2300 <div class="h3" id="protocol.svn" title="#protocol.svn">
2301 <h3>The custom protocol</h3>
2304 <p>The client library <tt class="literal">libsvn_ra_svn</tt> and standalone
2305 server program <tt class="literal">svnserve</tt> implement a custom protocol
2306 over TCP. This protocol is documented at <a href="http://svn.collab.net/repos/svn/trunk/subversion/libsvn_ra_svn/protocol">http://svn.collab.net/repos/svn/trunk/subversion/libsvn_ra_svn/protocol</a>.</p>
2307 </div> <!-- protocol.svn (h3) -->
2308 </div> <!-- protocol (h2) -->
2310 <div class="h2" id="server" title="#server">
2311 <h2>Server &mdash; How the server works</h2>
2315 <p>The term &ldquo;server&rdquo; is ambiguous, because it has at least
2316 two different meanings: it can refer to a powerful computer which offers
2317 services to users on a network, or it can refer to a CPU process designed
2318 to receive network requests.</p>
2320 <p>In Subversion, however, the <strong class="firstterm">server</strong> is just a
2321 set of libraries that implements <strong class="firstterm">repositories</strong> and
2322 makes them available to other programs. No networking is
2323 required.</p>
2325 <p>There are two main libraries: the <strong class="firstterm">Subversion
2326 Filesystem</strong> library, and the <strong class="firstterm">Subversion
2327 Repository</strong> library.</p>
2330 <div class="h3" id="server.fs" title="#server.fs">
2331 <h3>Filesystem</h3>
2334 <div class="h4" id="server.fs.overview" title="#server.fs.overview">
2335 <h4>Filesystem Overview</h4>
2337 <ul>
2338 <li><p><strong>Requires:</strong>
2339 </p><ul>
2340 <li><p>some writable disk space</p></li>
2341 <li><p>(for now) Berkeley DB library</p></li>
2342 </ul><p>
2343 </p></li>
2344 <li><p><strong>Provides:</strong>
2345 </p><ul>
2346 <li><p>a repository for storing files</p></li>
2347 <li><p>concurrent client transactions</p></li>
2348 <li><p>enforcement of user &amp; group permissions
2349 [someday, not yet]</p></li>
2350 </ul><p>
2351 </p></li>
2352 </ul>
2353 <p>This library implements a hierarchical filesystem which supports
2354 atomic changes to directory trees, and records a complete history of
2355 the changes. In addition to recording changes to file and directory
2356 contents, the Subversion Filesystem records changes to file meta-data
2357 (see discussion of <strong class="firstterm">properties</strong> in <a href="#model">Model &mdash; The versioning model used by Subversion</a>).</p>
2358 </div> <!-- server.fs.overview (h4) -->
2360 <div class="h4" id="server.fs.api" title="#server.fs.api">
2361 <h4>API</h4>
2364 <p> There are two main files that describe the Subversion
2365 filesystem.</p>
2367 <p>First, read the section below (<a href="#server.fs.struct">Repository Structure</a>)
2368 for a general overview of how the filesystem works.</p>
2370 <p>Once you've done this, read Jim Blandy's own structural overview,
2371 which explains how nodes and revisions are organized (among other
2372 things) in the filesystem implementation:
2373 <tt class="filename">subversion/libsvn_fs/structure</tt>.</p>
2375 <p>Finally, read the well-documented API in
2376 <tt class="filename">subversion/include/svn_fs.h</tt>.</p>
2377 </div> <!-- server.fs.api (h4) -->
2379 <div class="h4" id="server.fs.struct" title="#server.fs.struct">
2380 <h4>Repository Structure</h4>
2383 <div class="h5" id="server.fs.struct.schema">
2384 <h5>Schema</h5>
2388 To begin, please be sure that you're already casually familiar with
2389 Subversion's ideas of files, directories, and revision histories. If
2390 not, see <a href="#model">Model &mdash; The versioning model used by Subversion</a>. We can now offer precise,
2391 technical descriptions of the terms introduced there.</p>
2393 <!-- This is taken from jimb's very first Subversion spec! -->
2395 <pre>
2396 A <strong class="firstterm">text string</strong> is a string of Unicode characters which is
2397 canonically decomposed and ordered, according to the rules described in the
2398 Unicode standard.
2400 A <strong class="firstterm">string of bytes</strong> is what you'd expect.
2402 A <strong class="firstterm">property list</strong> is an unordered list of properties. A
2403 <strong class="firstterm">property</strong> is a pair
2404 <tt class="literal">(<em class="replaceable">name</em>,
2405 <em class="replaceable">value</em>)</tt>, where
2406 <em class="replaceable">name</em> is a text string, and
2407 <em class="replaceable">value</em> is a string of bytes. No two properties in a
2408 property list have the same name.
2410 A <strong class="firstterm">file</strong> is a property list and a string of bytes.
2412 A <strong class="firstterm">node</strong> is either a file or a directory. (We define a
2413 directory below.) Nodes are distinguished unions &mdash; you can always tell
2414 whether a node is a file or a directory.
2416 A <strong class="firstterm">node table</strong> is an array mapping some set of positive
2417 integers, called <strong class="firstterm">node numbers</strong>, onto
2418 <strong class="firstterm">nodes</strong>. If a node table maps some number
2419 <em class="replaceable">i</em> to some node <em class="replaceable">n</em>, then
2420 <em class="replaceable">i</em> is a <strong class="firstterm">valid node number</strong> in
2421 that table, and <strong class="firstterm">node</strong> <em class="replaceable">i</em>is
2422 <em class="replaceable">n</em>. Otherwise, <em class="replaceable">i</em> is an
2423 <strong class="firstterm">invalid node number</strong> in that table.
2425 A <strong class="firstterm">directory entry</strong> is a triple
2426 <tt class="literal">(<em class="replaceable">name</em>, <em class="replaceable">props</em>,
2427 <em class="replaceable">node</em>)</tt>, where
2428 <em class="replaceable">name</em> is a text string,
2429 <em class="replaceable">props</em> is a property list, and
2430 <em class="replaceable">node</em> is a node number.
2432 A <strong class="firstterm">directory</strong> is an unordered list of directory entries,
2433 and a property list.
2435 A <strong class="firstterm">revision</strong> is a node number and a property list.
2437 A <strong class="firstterm">history</strong> is an array of revisions, indexed by a
2438 contiguous range of non-negative integers containing 0.
2440 A <strong class="firstterm">repository</strong> consists of node table and a history.
2442 </pre>
2444 <!-- Some definitions: we say that a node @var{n} is a @dfn{direct
2445 child} of a directory @var{d} iff @var{d} contains a directory entry
2446 whose node number is @var{n}. A node @var{n} is a @dfn{child} of a
2447 directory @var{d} iff @var{n} is a direct child of @var{d}, or if there
2448 exists some directory @var{e} which is a direct child of @var{d}, and
2449 @var{n} is a child of @var{e}. Given this definition of ``direct
2450 child'' and ``child,'' the obvious definitions of ``direct parent'' and
2451 ``parent'' hold.
2453 In these restrictions, let @var{r} be any repository. When we refer,
2454 implicitly or explicitly, to a node table without further
2455 clarification, we mean @var{r}'s node table. Thus, if we refer to ``a
2456 valid node number'' without specifying the node table in which it is
2457 valid, we mean ``a valid node number in @var{r}'s node table''.
2458 Similarly for @var{r}'s history. -->
2460 <p>Now that we've explained the form of the data, we make some
2461 restrictions on that form.</p>
2463 <p><strong>Every revision has a root
2464 directory.</strong> Every revision's node number is a valid node
2465 number, and the node it refers to is always a directory. We call
2466 this the revision's <strong class="firstterm">root directory</strong>.</p>
2468 <p><strong>Revision 0 always contains an empty root
2469 directory.</strong> This baseline makes it easy to check out
2470 whole projects from the repository.</p>
2472 <p><strong>Directories contain only valid
2473 links.</strong> Every directory entry's
2474 <em class="replaceable">node</em> is a valid node number.</p>
2476 <p><strong>Directory entries can be identified by
2477 name.</strong> For any directory <em class="replaceable">d</em>,
2478 every directory entry in <em class="replaceable">d</em> has a distinct
2479 name.</p>
2481 <p><strong>There are no cycles of
2482 directories.</strong> No node is its own child.</p>
2484 <p><strong>Directories can have more than one
2485 parent.</strong> The Unix file system does not allow more than
2486 one hard link to a directory, but Subversion does allow the analogous
2487 situation. Thus, the directories in a Subversion repository form a
2488 directed acyclic graph (<strong class="firstterm">DAG</strong>), not a tree.
2489 However, it would be distracting and unhelpful to replace the
2490 familiar term &ldquo;directory tree&rdquo; with the unfamiliar term
2491 &ldquo;directory DAG&rdquo;, so we still call it a &ldquo;directory
2492 tree&rdquo; here.</p>
2494 <p><strong>There are no dead nodes.</strong> Every
2495 node is a child of some revision's root directory.</p>
2497 <!-- </jimb> -->
2498 </div> <!-- server.fs.struct.schema (h5) -->
2500 <div class="h5" id="server.fs.struct.bubble-up">
2501 <h5>Bubble-Up Method</h5>
2504 <p>This section provides a conversational explanation of how the
2505 repository actually stores and revisions file trees. It's not
2506 critical knowledge for a programmer using the Subversion Filesystem
2507 API, but most people probably still want to know what's going on
2508 &ldquo;under the hood&rdquo; of the repository.</p>
2510 <p>Suppose we have a new project, at revision 1, looking like this
2511 (using CVS syntax):</p>
2513 <pre>
2514 prompt$ svn checkout myproj
2515 U myproj/
2516 U myproj/B
2517 U myproj/A
2518 U myproj/A/fish
2519 U myproj/A/fish/tuna
2520 prompt$
2521 </pre>
2523 <p>Only the file <tt class="filename">tuna</tt> is a regular file,
2524 everything else in myproj is a directory.</p>
2526 <p>Let's see what this looks like as an abstract data structure in
2527 the repository, and how that structure works in various operations
2528 (such as update, commit, and branch).</p>
2530 <p>In the diagrams that follow, lines represent parent-to-child
2531 connections in a directory hierarchy. Boxes are "nodes". A node is
2532 either a file or a directory &ndash; a letter in the upper left
2533 indicates which kind. A file node has a byte-string for its content,
2534 whereas directory nodes have a list of dir_entries, each pointing to
2535 another node.</p>
2537 <p>Parent-child links go both ways (i.e., a child knows who all its
2538 parents are), but a node's name is stored only in its parent, because
2539 a node with multiple parents may have different names in different
2540 parents.</p>
2542 <p>At the top of the repository is an array of revision numbers,
2543 stretching off to infinity. Since the project is at revision 1, only
2544 index 1 points to anything; it points to the root node of revision 1
2545 of the project:</p>
2547 <pre>
2548 ( myproj's revision array )
2549 ______________________________________________________
2550 |___1_______2________3________4________5_________6_____...
2553 ___|_____
2554 |D |
2556 | A | /* Two dir_entries, `A' and `B'. */
2557 | \ |
2558 | B \ |
2559 |__/___\__|
2563 ___|___ ___\____
2564 |D | |D |
2565 | | | |
2566 | | | fish | /* One dir_entry, `fish'. */
2567 |_______| |___\____|
2570 ___\____
2571 |D |
2573 | tuna | /* One dir_entry, `tuna'. */
2574 |___\____|
2577 ___\____
2578 |F |
2580 | | /* (Contents of tuna not shown.) */
2581 |________|
2583 </pre>
2585 <p>What happens when we modify <tt class="filename">tuna</tt> and commit?
2586 First, we make a new <tt class="filename">tuna</tt> node, containing the
2587 latest text. The new node is not connected to anything yet, it's
2588 just hanging out there in space:</p>
2590 <pre>
2591 ________
2592 |F |
2595 |________|
2596 </pre>
2598 <p>Next, we create a <em>new</em> revision of its parent
2599 directory:</p>
2601 <pre>
2602 ________
2603 |D |
2605 | tuna |
2606 |___\____|
2609 ___\____
2610 |F |
2613 |________|
2614 </pre>
2616 <p>We continue up the line, creating a new revision of the next
2617 parent directory:</p>
2619 <pre>
2620 ________
2621 |D |
2623 | fish |
2624 |___\____|
2627 ___\____
2628 |D |
2630 | tuna |
2631 |___\____|
2634 ___\____
2635 |F |
2638 |________|
2639 </pre>
2641 <p>Now it gets more tricky: we need to create a new revision of the
2642 root directory. This new root directory needs an entry to point to
2643 the &ldquo;new&rdquo; directory A, but directory B hasn't changed at
2644 all. Therefore, our new root directory also has an entry that still
2645 points to the <em>old</em> directory B node!</p>
2647 <pre>
2648 ______________________________________________________
2649 |___1_______2________3________4________5_________6_____...
2652 ___|_____ ________
2653 |D | |D |
2654 | | | |
2655 | A | | A |
2656 | \ | | \ |
2657 | B \ | | B \ |
2658 |__/___\__| |__/___\_|
2659 / \ / \
2660 | ___\_____________/ \
2661 | / \ \
2662 ___|__/ ___\____ ___\____
2663 |D | |D | |D |
2664 | | | | | |
2665 | | | fish | | fish |
2666 |_______| |___\____| |___\____|
2669 ___\____ ___\____
2670 |D | |D |
2671 | | | |
2672 | tuna | | tuna |
2673 |___\____| |___\____|
2676 ___\____ ___\____
2677 |F | |F |
2678 | | | |
2679 | | | |
2680 |________| |________|
2682 </pre>
2684 <p>Finally, after all our new nodes are written, we finish the
2685 &ldquo;bubble up&rdquo; process by linking this new tree to the next
2686 available revision in the history array. In this case, the new tree
2687 becomes revision 2 in the repository.</p>
2689 <pre>
2690 ______________________________________________________
2691 |___1_______2________3________4________5_________6_____...
2693 | \__________
2694 ___|_____ __\_____
2695 |D | |D |
2696 | | | |
2697 | A | | A |
2698 | \ | | \ |
2699 | B \ | | B \ |
2700 |__/___\__| |__/___\_|
2701 / \ / \
2702 | ___\_____________/ \
2703 | / \ \
2704 ___|__/ ___\____ ___\____
2705 |D | |D | |D |
2706 | | | | | |
2707 | | | fish | | fish |
2708 |_______| |___\____| |___\____|
2711 ___\____ ___\____
2712 |D | |D |
2713 | | | |
2714 | tuna | | tuna |
2715 |___\____| |___\____|
2718 ___\____ ___\____
2719 |F | |F |
2720 | | | |
2721 | | | |
2722 |________| |________|
2724 </pre>
2726 <p>Generalizing on this example, you can now see that each
2727 &ldquo;revision&rdquo; in the repository history represents a root
2728 node of a unique tree (and an atomic commit to the whole filesystem.)
2729 There are many trees in the repository, and many of them share
2730 nodes.</p>
2732 <p>Many nice behaviors come from this model:</p>
2734 <ol>
2735 <li><p><strong>Easy reads.</strong> If a
2736 filesystem reader wants to locate revision
2737 <em class="replaceable">X</em> of file <tt class="filename">foo.c</tt>,
2738 it need only traverse the repository's history, locate revision
2739 <em class="replaceable">X</em>'s root node, then walk down the tree
2740 to <tt class="filename">foo.c</tt>.</p></li>
2742 <li><p><strong>Writers don't interfere with
2743 readers.</strong> Writers can continue to create new nodes,
2744 bubbling their way up to the top, and concurrent readers cannot
2745 see the work in progress. The new tree only becomes visible to
2746 readers after the writer makes its final &ldquo;link&rdquo; to
2747 the repository's history.</p></li>
2749 <li><p><strong>File structure is
2750 versioned.</strong> Unlike CVS, the very structure of each
2751 tree is being saved from revision to revision. File and
2752 directory renames, additions, and deletions are part of the
2753 repository's history.</p></li>
2754 </ol>
2756 <p>Let's demonstrate the last point by renaming the
2757 <tt class="filename">tuna</tt> to <tt class="filename">book</tt>.</p>
2759 <p>We start by creating a new parent &ldquo;fish&rdquo; directory,
2760 except that this parent directory has a different dir_entry, one
2761 which points the <em>same</em> old file node, but has a
2762 different name:</p>
2764 <pre>
2765 ______________________________________________________
2766 |___1_______2________3________4________5_________6_____...
2768 | \__________
2769 ___|_____ __\_____
2770 |D | |D |
2771 | | | |
2772 | A | | A |
2773 | \ | | \ |
2774 | B \ | | B \ |
2775 |__/___\__| |__/___\_|
2776 / \ / \
2777 | ___\_____________/ \
2778 | / \ \
2779 ___|__/ ___\____ ___\____
2780 |D | |D | |D |
2781 | | | | | |
2782 | | | fish | | fish |
2783 |_______| |___\____| |___\____|
2786 ___\____ ___\____ ________
2787 |D | |D | |D |
2788 | | | | | |
2789 | tuna | | tuna | | book |
2790 |___\____| |___\____| |_/______|
2791 \ \ /
2792 \ \ /
2793 ___\____ ___\____ /
2794 |F | |F |
2795 | | | |
2796 | | | |
2797 |________| |________|
2798 </pre>
2800 <p>From here, we finish with the bubble-up process. We make new
2801 parent directories up to the top, culminating in a new root directory
2802 with two dir_entries (one points to the old &ldquo;B&rdquo; directory
2803 node we've had all along, the other to the new revision of
2804 &ldquo;A&rdquo;), and finally link the new tree to the history as
2805 revision 3:</p>
2807 <pre>
2808 ______________________________________________________
2809 |___1_______2________3________4________5_________6_____...
2810 | \ \_________________
2811 | \__________ \
2812 ___|_____ __\_____ __\_____
2813 |D | |D | |D |
2814 | | | | | |
2815 | A | | A | | A |
2816 | \ | | \ | | \ |
2817 | B \ | | B \ | | B \ |
2818 |__/___\__| |__/___\_| |__/___\_|
2819 / ___________________/_____\_________/ \
2820 | / ___\_____________/ \ \
2821 | / / \ \ \
2822 ___|/_/ ___\____ ___\____ _____\__
2823 |D | |D | |D | |D |
2824 | | | | | | | |
2825 | | | fish | | fish | | fish |
2826 |_______| |___\____| |___\____| |___\____|
2827 \ \ \
2828 \ \ \
2829 ___\____ ___\____ ___\____
2830 |D | |D | |D |
2831 | | | | | |
2832 | tuna | | tuna | | book |
2833 |___\____| |___\____| |_/______|
2834 \ \ /
2835 \ \ /
2836 ___\____ ___\____ /
2837 |F | |F |
2838 | | | |
2839 | | | |
2840 |________| |________|
2842 </pre>
2844 <p>For our last example, we'll demonstrate the way
2845 &ldquo;tags&rdquo; and &ldquo;branches&rdquo; are implemented in the
2846 repository.</p>
2848 <p>In a nutshell, they're one and the same thing. Because nodes are
2849 so easily shared, we simply create a <em>new</em>
2850 directory entry that points to an existing directory node. It's an
2851 extremely cheap way of copying a tree; we call this new entry a
2852 <strong class="firstterm">clone</strong>, or more colloquially, a &ldquo;cheap
2853 copy&rdquo;.</p>
2855 <p>Let's go back to our original tree, assuming that we're at
2856 revision 6 to begin with:</p>
2858 <pre>
2859 ______________________________________________________
2860 ...___6_______7________8________9________10_________11_____...
2863 ___|_____
2864 |D |
2866 | A |
2867 | \ |
2868 | B \ |
2869 |__/___\__|
2873 ___|___ ___\____
2874 |D | |D |
2875 | | | |
2876 | | | fish |
2877 |_______| |___\____|
2880 ___\____
2881 |D |
2883 | tuna |
2884 |___\____|
2887 ___\____
2888 |F |
2891 |________|
2893 </pre>
2895 <p>Let's &ldquo;tag&rdquo; directory A. To make the clone, we
2896 create a new dir_entry <strong>T</strong> in our
2897 root, pointing to A's node:</p>
2899 <pre>
2900 ______________________________________________________
2901 |___6_______7________8________9________10_________11_____...
2904 ___|_____ __\______
2905 |D | |D |
2906 | | | |
2907 | A | | A |
2908 | \ | | | |
2909 | B \ | | B | T |
2910 |__/___\__| |_/__|__|_|
2911 / \ / | |
2912 | ___\__/ / /
2913 | / \ / /
2914 ___|__/ ___\__/_ /
2915 |D | |D |
2916 | | | |
2917 | | | fish |
2918 |_______| |___\____|
2921 ___\____
2922 |D |
2924 | tuna |
2925 |___\____|
2928 ___\____
2929 |F |
2932 |________|
2934 </pre>
2936 <p>Now we're all set. In the future, the contents of directories A
2937 and B may change quite a lot. However, assuming we never make any
2938 changes to directory T, it will <em>always</em> point to
2939 a particular pristine revision of directory A at some point in time.
2940 Thus, T is a tag.</p>
2942 <p>(In theory, we can use some kind of authorization system to
2943 prevent anyone from writing to directory T. In practice, a well-laid
2944 out repository should encourage &ldquo;tag directories&rdquo; to live
2945 in one place, so that it's clear to all users that they're not meant
2946 to change.)</p>
2948 <p>However, if we <em>do</em> decide to allow commits in
2949 directory T, and now our repository tree increments to revision 8,
2950 then T becomes a branch. Specifically, it's a branch of directory A
2951 which shares history with A up to a certain point, and then
2952 &ldquo;broke off&rdquo; from the main line at revision 8.</p>
2953 </div> <!-- server.fs.struct.bubble-up (h5) -->
2955 <div class="h5" id="server.fs.struct.diffy-storage">
2956 <h5>Diffy Storage</h5>
2959 <p>You may have been thinking, &ldquo;Gee, this bubble up method
2960 seems nice, but it sure wastes a lot of space. Every commit to the
2961 repository creates an entire line of new directory
2962 nodes!&rdquo;</p>
2964 <p>Like many other revision control systems, Subversion stores
2965 changes as differences. It doesn't make complete copies of nodes;
2966 instead, it stores the <em>latest</em> revision as a full
2967 text, and previous revisions as a succession of reverse diffs (the
2968 word "diff" is used loosely here &ndash; for files, it means vdeltas,
2969 for directories, it means a format that expresses changes to
2970 directories).</p>
2971 </div> <!-- server.fs.struct.diffy-storage (h5) -->
2972 </div> <!-- server.fs.struct (h4) -->
2974 <div class="h4" id="server.fs.implementation" title="#server.fs.implementation">
2975 <h4>Implementation</h4>
2978 <p>For the initial release of Subversion,</p>
2980 <ul>
2981 <li><p>The filesystem will be implemented as a library on
2982 Unix.</p></li>
2984 <li><p>The filesystem's data will probably be stored in a
2985 collection of .db files, using the Berkeley Database library.
2987 (In the future, of course, contributors are free
2988 modify the Subversion filesystem to operate with more powerful
2989 SQL database.)
2990 (For more information, see
2991 <a href="http://www.sleepycat.com">http://www.sleepycat.com</a>.)</p></li>
2992 </ul>
2993 </div> <!-- server.fs.implementation (h4) -->
2994 </div> <!-- server.fs (h3) -->
2996 <div class="h3" id="server.libsvn_repos" title="#server.libsvn_repos">
2997 <h3>Repository Library</h3>
3000 <!-- Jimb, Karl: Maybe we should turn this into a discussion about how the
3001 filesystem will use non-historical properties for internal ACLs, and how
3002 people can add "external" ACL systems via historical properties...? -->
3004 <p>A Subversion <strong class="firstterm">repository</strong> is a directory that
3005 contains a number of components:</p>
3007 <ul>
3008 <li><p>a versioned filesystem (typically a collection of .db
3009 files)</p></li>
3010 <li><p>some hook scripts (for executing before or after
3011 commits)</p></li>
3012 <li><p>a locking area (used by Berkeley DB or other
3013 processes)</p></li>
3014 <li><p>a configuration area (for changing global
3015 behaviors)</p></li>
3016 </ul>
3018 <p>The Subversion filesystem is just that: a filesystem. But it's also
3019 useful to provide an API that acts at the level of the repository. The
3020 repository library (<tt class="filename">libsvn_repos</tt>) does this.</p>
3022 <p>In particular, it wraps a few <tt class="filename">libsvn_fs</tt>
3023 routines, such as those for beginning and ending commits, so that
3024 hook-scripts can run. A pre-commit-hook script might check for a valid
3025 log message, and a post-commit-hook script might send an email to a
3026 mailing list.</p>
3028 <p>Additionally, the repository library provides convenience routines
3029 for examining and manipulating the filesystem. For example, a routine to
3030 generate a tree-delta by comparing two revisions, routines for
3031 constructing new transactions, routines for querying log messages, and
3032 routines for exporting and importing filesystem data.</p>
3033 </div> <!-- server.libsvn_repos (h3) -->
3034 </div> <!-- server (h2) -->
3036 <div class="h2" id="license" title="#license">
3037 <h2>License &mdash; Copyright</h2>
3041 <p>Copyright &copy; 2000-2008 Collab.Net. All rights reserved.</p>
3043 <p>This software is licensed as described in the file
3044 <tt class="filename">COPYING</tt>, which you should have received as part of
3045 this distribution. The terms are also available at
3046 <a href="http://subversion.tigris.org/license-1.html">http://subversion.tigris.org/license-1.html</a>. If newer
3047 versions of this license are posted there, you may use a newer version
3048 instead, at your option.</p>
3050 </div> <!-- license (h2) -->
3053 </div>
3054 </body>
3055 </html>