When compiling SQLite, set the SQLITE_DEFAULT_MEMSTATUS=0 compile-time option.
[svn/apache.git] / notes / subversion-design.html
blob8097ce50a278c286962c89fa7bf0653c2c95ea33
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml">
4 <head>
5 <title>Subversion Design</title>
6 </head>
8 <body>
10 <div class="h1">
11 <h1 style="text-align: center">Subversion Design</h1>
12 </div>
14 <p class="warningmark"><em>NOTE: This document is out of date. The last
15 substantial update was in October 2002 (r3377). However, people often come
16 here for the section on the <a href="#server.fs.struct.bubble-up">directory
17 bubble-up method</a>, which is still accurate.</em></p>
19 <div class="h1">
20 <h2>Table of Contents</h2>
21 <ol id="toc">
22 <li><a href="#goals">Goals &mdash; The goals of the Subversion project</a>
23 <ol>
24 <li><a href="#goals.rename-remove-resurrect">Rename/removal/resurrection support</a></li>
25 <li><a href="#goals.textbinary">Text vs binary issues</a></li>
26 <li><a href="#goals.i18n">I18N/Multilingual support</a></li>
27 <li><a href="#goals.branching-and-tagging">Branching and tagging</a></li>
28 <li><a href="#goals.misc">Miscellaneous new behaviors</a>
29 <ol>
30 <li><a href="#goals.misc.logmsgs">Log messages</a></li>
31 <li><a href="#goals.misc.diffplugins">Client side diff plug-ins</a></li>
32 <li><a href="#goals.misc.merging">Better merging</a></li>
33 <li><a href="#goals.misc.conflicts">Conflicts resolution</a></li>
34 </ol>
35 </li> <!-- goals.misc -->
36 </ol>
37 </li> <!-- goals -->
38 <li><a href="#model">Model &mdash; The versioning model used by Subversion</a>
39 <ol>
40 <li><a href="#model.wc-and-repos">Working Directories and Repositories</a></li>
41 <li><a href="#model.txns-and-revnums">Transactions and Revision Numbers</a></li>
42 <li><a href="#model.how-wc">How Working Directories Track the Repository</a></li>
43 <li><a href="#model.lock-merge">Locking vs. Merging - Two Paradigms of Co-operative
44 Developments</a></li>
45 <li><a href="#model.props">Properties</a></li>
46 <li><a href="#model.merging-and-ancestry">Merging and Ancestry</a></li>
47 </ol>
48 </li> <!-- model -->
49 <li><a href="#archi">Architecture &mdash; How Subversion's components work together</a>
50 <ol>
51 <li><a href="#archi.client">Client Layer</a></li>
52 <li><a href="#archi.network">Network Layer</a></li>
53 <li><a href="#archi.fs">Filesystem Layer</a></li>
54 </ol>
55 </li> <!-- archi -->
56 <li><a href="#deltas">Deltas &mdash; How to describe changes</a>
57 <ol>
58 <li><a href="#deltas.text">Text Deltas</a></li>
59 <li><a href="#deltas.prop">Property Deltas</a></li>
60 <li><a href="#deltas.tree">Tree Deltas</a></li>
61 <li><a href="#deltas.postfix-text">Postfix Text Deltas</a></li>
62 <li><a href="#deltas.serializing-via-editor">Serializing Deltas via the "Editor" Interface</a></li>
63 </ol>
64 </li> <!-- deltas -->
65 <li><a href="#client">Client &mdash; How the client works</a>
66 <ol>
67 <li><a href="#client.wc">Working copies and the working copy library</a>
68 <ol>
69 <li><a href="#client.wc.layout">The layout of working copies</a></li>
70 <li><a href="#client.wc.library">The working copy management library</a></li>
71 </ol>
72 </li> <!-- client.wc -->
73 <li><a href="#client.libsvn_ra">The repository access library</a></li>
74 <li><a href="#client.libsvn_client">The client operation library</a></li>
75 </ol>
76 </li> <!-- client -->
77 <li><a href="#protocol">Protocol &mdash; How the client and server communicate</a>
78 <ol>
79 <li><a href="#protocol.webdav">The HTTP/WebDAV/DeltaV based protocol</a></li>
80 <li><a href="#protocol.svn">The custom protocol</a></li>
81 </ol>
82 </li> <!-- protocol -->
83 <li><a href="#server">Server &mdash; How the server works</a>
84 <ol>
85 <li><a href="#server.fs">Filesystem</a>
86 <ol>
87 <li><a href="#server.fs.overview">Filesystem Overview</a></li>
88 <li><a href="#server.fs.api">API</a></li>
89 <li><a href="#server.fs.struct">Repository Structure</a>
90 <ol>
91 <li><a href="#server.fs.struct.schema">Schema</a></li>
92 <li><a href="#server.fs.struct.bubble-up">Bubble-Up Method</a></li>
93 <li><a href="#server.fs.struct.diffy-storage">Diffy Storage</a></li>
94 </ol>
95 </li> <!-- server.fs.struct -->
96 <li><a href="#server.fs.implementation">Implementation</a></li>
97 </ol>
98 </li> <!-- server.fs -->
99 <li><a href="#server.libsvn_repos">Repository Library</a></li>
100 </ol>
101 </li> <!-- server -->
102 <li><a href="#license">License &mdash; Copyright</a></li>
103 </ol>
104 </div>
106 <!--
107 ================================================================
108 Licensed to the Apache Software Foundation (ASF) under one
109 or more contributor license agreements. See the NOTICE file
110 distributed with this work for additional information
111 regarding copyright ownership. The ASF licenses this file
112 to you under the Apache License, Version 2.0 (the
113 "License"); you may not use this file except in compliance
114 with the License. You may obtain a copy of the License at
116 http://www.apache.org/licenses/LICENSE-2.0
118 Unless required by applicable law or agreed to in writing,
119 software distributed under the License is distributed on an
120 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
121 KIND, either express or implied. See the License for the
122 specific language governing permissions and limitations
123 under the License.
124 ====================================================================
126 This software consists of voluntary contributions made by many
127 individuals on behalf of CollabNet.
136 <div class="h2" id="goals" title="#goals">
137 <h2>Goals &mdash; The goals of the Subversion project</h2>
141 <p>The goal of the Subversion project is to write a version control
142 system that takes over CVS's current and future user base
144 (If you're not familiar with CVS or its shortcomings, then
145 skip to <a href="#model">Model &mdash; The versioning model used by Subversion</a>)
146 . The first release
147 has all the major features of CVS, plus certain new features that CVS
148 users often wish they had. In general, Subversion works like CVS, except
149 where there's a compelling reason to be different.</p>
151 <p>So what does Subversion have that CVS doesn't?</p>
153 <ul>
154 <li><p>It versions directories, file-metadata, renames, copies
155 and removals/resurrections. In other words, Subversion records the
156 changes users make to directory trees, not just changes to file
157 contents.</p></li>
159 <li><p>Tagging and branching are constant-time and
160 constant-space.</p></li>
162 <li><p>It is natively client-server, hence much more
163 maintainable than CVS. (In CVS, the client-server protocol was added
164 as an afterthought. This means that most new features have to be
165 implemented twice, or at least more than once: code for the local
166 case, and code for the client-server case.)</p></li>
168 <li><p>The repository is organized efficiently and
169 comprehensibly. (Without going into too much detail, let's just say
170 that CVS's repository structure is showing its
171 age.)</p></li>
173 <li><p>Commits are atomic. Each commit results in a single
174 revision number, which refers to the state of the entire tree. Files
175 no longer have their own revision numbers.</p></li>
177 <li><p>The locking scheme is only as strict as absolutely
178 necessary. Reads are never locked, and writes lock only the files
179 being written, for only as long as needed.</p></li>
181 <li><p>It has internationalization support.</p></li>
183 <li><p>It handles binary files gracefully (experience has shown
184 that CVS's binary file handling is prone to user
185 error).</p></li>
187 <li><p>It takes advantage of the Net's experience with CVS by
188 choosing better default behaviors for certain
189 situations.</p></li>
190 </ul>
192 <p>Some of these advantages are clear and require no further discussion.
193 Others are not so obvious, and are explained in greater detail
194 below.</p>
197 <div class="h3" id="goals.rename-remove-resurrect" title="#goals.rename-remove-resurrect">
198 <h3>Rename/removal/resurrection support</h3>
201 <p>Full rename support means you can trace through ancestry by name
202 <em>or</em> by entity. For example, if you say "Give me
203 revision 12 of foo.c", do you mean revision 12 of the file whose name is
204 <em>now</em> foo.c (but perhaps it was named bar.c back at
205 revision 12), or the file whose name was foo.c in revision 12 (perhaps
206 that file no longer exists, or has a different name now)? In Subversion,
207 both interpretations are available to the user.</p>
209 <p>(Note: we've not yet implemented this, but it wouldn't be too hard.
210 People are advocating switches to 'svn log' that cause history to be
211 traced backwards either by entity or by path.)</p>
212 </div> <!-- goals.rename-remove-resurrect (h3) -->
214 <div class="h3" id="goals.textbinary" title="#goals.textbinary">
215 <h3>Text vs binary issues</h3>
218 <p>Historically, binary files have been problematic in CVS for two
219 unrelated reasons: keyword expansion, and line-end conversion.</p>
221 <ul>
222 <li><p><strong class="firstterm">Keyword expansion</strong> is when CVS
223 expands "$Revision$" into "$Revision: 1.1 $", for example. There
224 are a number of keywords in CVS: "$Author: sussman $", "$Date:
225 2001/06/04 22:00:52 $", and so on.</p></li>
226 <li><p><strong class="firstterm">Line-end conversion</strong> is when CVS
227 gives plaintext files the appropriate line-ending conventions for the
228 working copy's platform. For example, Unix working copies use LF, but
229 Windows working copies use CRLF. (Like CVS, the Subversion
230 repository stores text files in Unix LF format).</p></li>
231 </ul>
233 <p>Both keyword substitution and line-end conversion are sensible only
234 for plain text files. CVS only recognizes two file types anyway:
235 plaintext and binary. And CVS assumes files are plain text unless you
236 tell it otherwise.</p>
238 <p>Subversion recognizes the same two types. The question is, how does
239 it determine a file's type? Experience with CVS suggests that assuming
240 text unless told otherwise is a losing strategy &ndash; people frequently
241 forget to mark images and other opaque formats as binary, then later they
242 wonder why CVS mangled their data. So Subversion will not mangle data:
243 when moving over the network, or when being stored in the repository, it
244 treats all files as binary. In the working copy, a tweakable meta-data
245 property indicates whether to treat the file as text or binary for
246 purposes of whether or not to allow contextual merging during
247 updates.</p>
249 <p>Users can turn line-end conversion on or off per file by tweaking
250 meta-data. Files do <em>not</em> undergo keyword
251 substitution by default, on the theory that if someone wants substitution
252 and isn't getting it, they'll look in the manual; but if they are getting
253 it and didn't want it, they might just be confused and not know what to
254 do. Users can turn substitution on or off per file.</p>
256 <p>Both of these changes are done on the client side; the repository
257 does not even know about them.</p>
258 </div> <!-- goals.textbinary (h3) -->
260 <div class="h3" id="goals.i18n" title="#goals.i18n">
261 <h3>I18N/Multilingual support</h3>
264 <p>Subversion is internationalized &ndash; commands, user messages, and
265 errors can be customized to the appropriate human language at build-time
266 (or run time, if that's not much harder).</p>
268 <p>File names and contents may be multilingual; Subversion does not
269 assume an ASCII-only universe. For purposes of keyword expansion and
270 line-end conversion, Subversion also understands the UTF-* encodings (but
271 not necessarily all of them by the first release).</p>
272 </div> <!-- goals.i18n (h3) -->
274 <div class="h3" id="goals.branching-and-tagging" title="#goals.branching-and-tagging">
275 <h3>Branching and tagging</h3>
278 <p>Subversion supports branching and tagging with one efficient
279 operation: `clone'. To clone a tree is to copy it, to create another
280 tree exactly like it (except that the new tree knows its ancestry
281 relationship to the old one).</p>
283 <p>At the moment of creation, a clone requires only a small, constant
284 amount of space in the repository &ndash; most of its storage is shared
285 with the original tree. If you never commit anything on the clone, then
286 it's just like a CVS tag. If you start committing on it, then it's a
287 branch. Voila! This also implies CVS's "vendor branching" feature,
288 since Subversion has real rename and directory support.</p>
289 </div> <!-- goals.branching-and-tagging (h3) -->
291 <div class="h3" id="goals.misc" title="#goals.misc">
292 <h3>Miscellaneous new behaviors</h3>
295 <div class="h4" id="goals.misc.logmsgs" title="#goals.misc.logmsgs">
296 <h4>Log messages</h4>
299 <p>Subversion has a flexible log message policy (a small matter, but
300 one dear to our hearts).</p>
302 <p>Log messages should be a matter of project policy, not version
303 control software policy. If a user commits with no log message, then
304 Subversion defaults to an empty message. (CVS tries to require log
305 messages, but fails: we've all seen empty log messages in CVS, where
306 the user committed with deliberately empty quotes. Let's stop the
307 madness now.)</p>
308 </div> <!-- goals.misc.logmsgs (h4) -->
310 <div class="h4" id="goals.misc.diffplugins" title="#goals.misc.diffplugins">
311 <h4>Client side diff plug-ins</h4>
314 <p>Subversion supports client-side plug-in diff programs.</p>
316 <p>There is no need for Subversion to have every possible diff
317 mechanism built in. It can invoke a user-specified client-side diff
318 program on the two revisions of the file(s) locally.</p>
320 <p>(Note: This feature does not exist yet, but is planned for
321 post-1.0.)</p>
322 </div> <!-- goals.misc.diffplugins (h4) -->
324 <div class="h4" id="goals.misc.merging" title="#goals.misc.merging">
325 <h4>Better merging</h4>
328 <p>Subversion remembers what has already been merged in and what
329 hasn't, thereby avoiding the problem, familiar to CVS users, of
330 spurious conflicts on repeated merges.</p>
332 <p>(Note: Parts of his feature (<a href="/merge-tracking/">Merge
333 Tracking</a>) are implemented in Subversion&nbsp;1.5; see
334 the <a href="svn_1.5_releasenotes.html#merge-tracking"
335 >release notes</a>.)</p>
337 <p>For details, see <a href="#model.merging-and-ancestry">Merging and Ancestry</a>.</p>
338 </div> <!-- goals.misc.merging (h4) -->
340 <div class="h4" id="goals.misc.conflicts" title="#goals.misc.conflicts">
341 <h4>Conflicts resolution</h4>
344 <p>For text files, Subversion resolves conflicts similarly to CVS, by
345 folding repository changes into the working files with conflict
346 markers. But, for <em>both</em> text and binary files,
347 Subversion also always puts the old and new pristine repository
348 revisions into temporary files, and the pristine working copy revision
349 in another temporary file.</p>
351 <p>Thus, for any conflict, the user has four files readily at
352 hand:</p>
354 <ol>
355 <li><p>the original working copy file with local
356 mods</p></li>
357 <li><p>the older repository file</p></li>
358 <li><p>the newest repository file</p></li>
359 <li><p>the merged file, with conflict
360 markers</p></li>
361 </ol>
363 <p>and in a binary file conflict, the user has all but the
364 last.</p>
366 <p>When the conflict has been resolved and the working copy is
367 committed, Subversion automatically removes the temporary pristine
368 files.</p>
370 <p>A more general solution would allow plug-in merge resolution tools
371 on the client side; but this is not scheduled for the first release).
372 Note that users can use their own merge tools anyway, since all the
373 original files are available.</p>
374 </div> <!-- goals.misc.conflicts (h4) -->
375 </div> <!-- goals.misc (h3) -->
376 </div> <!-- goals (h2) -->
378 <div class="h2" id="model" title="#model">
379 <h2>Model &mdash; The versioning model used by Subversion</h2>
383 <p>This chapter explains the user's view of Subversion &mdash; what
384 &ldquo;objects&rdquo; you interact with, how they behave, and how they
385 relate to each other.</p>
388 <div class="h3" id="model.wc-and-repos" title="#model.wc-and-repos">
389 <h3>Working Directories and Repositories</h3>
392 <p>Suppose you are using Subversion to manage a software project. There
393 are two things you will interact with: your working directory, and the
394 repository.</p>
396 <p>Your <strong class="firstterm">working directory</strong> is an ordinary
397 directory tree, on your local system, containing your project's sources.
398 You can edit these files and compile your program from them in the usual
399 way. Your working directory is your own private work area: Subversion
400 never changes the files in your working directory, or publishes the
401 changes you make there, until you explicitly tell it to do so.</p>
403 <p>After you've made some changes to the files in your working
404 directory, and verified that they work properly, Subversion provides
405 commands to publish your changes to the other people working with you on
406 your project. If they publish their own changes, Subversion provides
407 commands to incorporate those changes into your working directory.</p>
409 <p>A working directory contains some extra files, created and maintained
410 by Subversion, to help it carry out these commands. In particular, these
411 files help Subversion recognize which files contain unpublished changes,
412 and which files are out-of-date with respect to others' work.</p>
414 <p>While your working directory is for your use alone, the
415 <strong class="firstterm">repository</strong> is the common public record you share
416 with everyone else working on the project. To publish your changes, you
417 use Subversion to put them in the repository. (What this means, exactly,
418 we explain below.) Once your changes are in the repository, others can
419 tell Subversion to incorporate your changes into their working
420 directories. In a collaborative environment like this, each user will
421 typically have their own working directory (or perhaps more than one),
422 and all the working directories will be backed by a single repository,
423 shared amongst all the users.</p>
425 <p>A Subversion repository holds a single directory tree, and records
426 the history of changes to that tree. The repository retains enough
427 information to recreate any prior state of the tree, compute the
428 differences between any two prior trees, and report the relations between
429 files in the tree &mdash; which files are derived from which other
430 files.</p>
432 <p>A Subversion repository can hold the source code for several
433 projects; usually, each project is a subdirectory in the tree. In this
434 arrangement, a working directory will usually correspond to a particular
435 subtree of the repository.</p>
437 <p>For example, suppose you have a repository laid out like this:</p>
439 <pre>
440 /trunk/paint/Makefile
441 canvas.c
442 brush.c
443 write/Makefile
444 document.c
445 search.c
446 </pre>
448 <p>In other words, the repository's root directory has a single
449 subdirectory named <tt class="filename">trunk</tt>, which itself contains two
450 subdirectories: <tt class="filename">paint</tt> and
451 <tt class="filename">write</tt>.</p>
453 <p>To get a working directory, you must <strong class="firstterm">check out</strong>
454 some subtree of the repository. If you check out
455 <tt class="filename">/trunk/write</tt>, you will get a working directory like
456 this:</p>
458 <pre>
459 write/Makefile
460 document.c
461 search.c
462 .svn/
463 </pre>
465 <p>This working directory is a copy of the repository's
466 <tt class="filename">/trunk/write</tt> directory, with one additional entry
467 &mdash; <tt class="filename">.svn</tt> &mdash; which holds the extra
468 information needed by Subversion, as mentioned above.</p>
470 <p>Suppose you make changes to <tt class="filename">search.c</tt>. Since the
471 <tt class="filename">.svn</tt> directory remembers the file's modification
472 date and original contents, Subversion can tell that you've changed the
473 file. However, Subversion does not make your changes public until you
474 explicitly tell it to.</p>
476 <p>To publish your changes, you can use Subversion's
477 &lsquo;<tt class="literal">commit</tt>&rsquo; command:</p>
479 <pre>
480 $ pwd
481 /home/jimb/write
482 $ ls -a
483 .svn/ Makefile document.c search.c
484 $ svn commit search.c
486 </pre>
488 <p>Now your changes to <tt class="filename">search.c</tt> have been committed
489 to the repository; if another user checks out a working copy of
490 <tt class="filename">/trunk/write</tt>, they will see your text.</p>
492 <p>Suppose you have a collaborator, Felix, who checked out a working
493 directory of <tt class="filename">/trunk/write</tt> at the same time you did.
494 When you commit your change to <tt class="filename">search.c</tt>, Felix's
495 working copy is left unchanged; Subversion only modifies working
496 directories at the user's request.</p>
498 <p>To bring his working directory up to date, Felix can use the
499 Subversion &lsquo;<tt class="literal">update</tt>&rsquo; command. This will
500 incorporate your changes into his working directory, as well as any
501 others that have been committed since he checked it out.</p>
503 <pre>
504 $ pwd
505 /home/felix/write
506 $ ls -a
507 .svn/ Makefile document.c search.c
508 $ svn update
509 U search.c
511 </pre>
513 <p>The output from the &lsquo;<tt class="literal">svn update</tt>&rsquo;
514 command indicates that Subversion updated the contents of
515 <tt class="filename">search.c</tt>. Note that Felix didn't need to specify
516 which files to update; Subversion uses the information in the
517 <tt class="filename">.svn</tt> directory, and further information in the
518 repository, to decide which files need to be brought up to date.</p>
520 <p>We explain below what happens when both you and Felix make changes to
521 the same file.</p>
522 </div> <!-- model.wc-and-repos (h3) -->
524 <div class="h3" id="model.txns-and-revnums" title="#model.txns-and-revnums">
525 <h3>Transactions and Revision Numbers</h3>
528 <p>A Subversion &lsquo;<tt class="literal">commit</tt>&rsquo; operation can
529 publish changes to any number of files and directories as a single atomic
530 transaction. In your working directory, you can change files' contents,
531 create, delete, rename and copy files and directories, and then commit
532 the completed set of changes as a unit.</p>
534 <p>In the repository, each commit is treated as an atomic transaction:
535 either all the commit's changes take place, or none of them take place.
536 Subversion tries to retain this atomicity in the face of program crashes,
537 system crashes, network problems, and other users' actions. We may call
538 a commit a <strong class="firstterm">transaction</strong> when we want to emphasize
539 its indivisible nature.</p>
541 <p>Each time the repository accepts a transaction, this creates a new
542 state of the tree, called a <strong class="firstterm">revision</strong>. Each
543 revision is assigned a unique natural number, one greater than the number
544 of the previous revision. The initial revision of a freshly created
545 repository is numbered zero, and consists of an empty root
546 directory.</p>
548 <p>Since each transaction creates a new revision, with its own number,
549 we can also use these numbers to refer to transactions; transaction
550 <em class="replaceable">n</em> is the transaction which created revision
551 <em class="replaceable">n</em>. There is no transaction numbered
552 zero.</p>
554 <p>Unlike those of many other systems, Subversion's revision numbers
555 apply to an entire tree, not individual files. Each revision number
556 selects an entire tree.</p>
558 <p>It's important to note that working directories do not always
559 correspond to any single revision in the repository; they may contain
560 files from several different revisions. For example, suppose you check
561 out a working directory from a repository whose most recent revision is
562 4:</p>
564 <pre>
565 write/Makefile:4
566 document.c:4
567 search.c:4
568 </pre>
570 <p>At the moment, this working directory corresponds exactly to revision
571 4 in the repository. However, suppose you make a change to
572 <tt class="filename">search.c</tt>, and commit that change. Assuming no other
573 commits have taken place, your commit will create revision 5 of the
574 repository, and your working directory will look like this:</p>
576 <pre>
577 write/Makefile:4
578 document.c:4
579 search.c:5
580 </pre>
582 <p>Suppose that, at this point, Felix commits a change to
583 <tt class="filename">document.c</tt>, creating revision 6. If you use
584 &lsquo;<tt class="literal">svn update</tt>&rsquo; to bring your working
585 directory up to date, then it will look like this:</p>
587 <pre>
588 write/Makefile:6
589 document.c:6
590 search.c:6
591 </pre>
593 <p>Felix's changes to <tt class="filename">document.c</tt> will appear in
594 your working copy of that file, and your change will still be present in
595 <tt class="filename">search.c</tt>. In this example, the text of
596 <tt class="filename">Makefile</tt> is identical in revisions 4, 5, and 6, but
597 Subversion will mark your working copy with revision 6 to indicate that
598 it is still current. So, after you do a clean update at the root of your
599 working directory, your working directory will generally correspond
600 exactly to some revision in the repository.</p>
601 </div> <!-- model.txns-and-revnums (h3) -->
603 <div class="h3" id="model.how-wc" title="#model.how-wc">
604 <h3>How Working Directories Track the Repository</h3>
607 <p>For each file in a working directory, Subversion records two
608 essential pieces of information:</p>
610 <ul>
611 <li><p>what revision of what repository file your working copy
612 is based on (this is called the file's <strong class="firstterm">base
613 revision</strong>), and</p></li>
614 <li><p>a timestamp recording when the local copy was last
615 updated.</p></li>
616 </ul>
618 <p>Given this information, by talking to the repository, Subversion can
619 tell which of the following four states a file is in:</p>
621 <ul>
622 <li><p><strong>Unchanged, and current.</strong>
623 The file is unchanged in the working directory, and no changes to that
624 file have been committed to the repository since its base
625 revision.</p></li>
626 <li><p><strong>Locally changed, and
627 current</strong>. The file has been changed in the working
628 directory, and no changes to that file have been committed to the
629 repository since its base revision. There are local changes that have
630 not been committed to the repository.</p></li>
631 <li><p><strong>Unchanged, and
632 out-of-date</strong>. The file has not been changed in
633 the working directory, but it has been changed in the repository. The
634 file should eventually be updated, to make it current with the
635 public revision.</p></li>
636 <li><p><strong>Locally changed, and
637 out-of-date</strong>. The file has been changed both in the
638 working directory, and in the repository. The file should be updated;
639 Subversion will attempt to merge the public changes with the local
640 changes. If it can't complete the merge in a plausible
641 way automatically, Subversion leaves it to the user to resolve the
642 conflict.</p></li>
643 </ul>
644 </div> <!-- model.how-wc (h3) -->
646 <div class="h3" id="model.lock-merge" title="#model.lock-merge">
647 <h3>Locking vs. Merging - Two Paradigms of Co-operative
648 Developments</h3>
651 <p>By default, Subversion prefers the &ldquo;merging&rdquo; method of
652 handling simultaneous editing by multiple users. This means that
653 Subversion does not prevent two users from making changes to the same
654 file at the same time. For example, if both you and Felix have checked
655 out working directories of <tt class="filename">/trunk/write</tt>, Subversion
656 will allow both of you to change <tt class="filename">write/search.c</tt> in
657 your working directories. Then, the following sequence of events will
658 occur:</p>
660 <ul>
661 <li><p>Suppose Felix tries to commit his changes to
662 <tt class="filename">search.c</tt> first. His commit will succeed, and
663 his text will appear in the latest revision in the
664 repository.</p></li>
665 <li><p>When you attempt to commit your changes to
666 <tt class="filename">search.c</tt>, Subversion will reject your commit,
667 and tell you that you must update <tt class="filename">search.c</tt> before
668 you can commit it.</p></li>
669 <li><p>When you update <tt class="filename">search.c</tt>, Subversion
670 will try to merge Felix's changes from the repository with your local
671 changes. By default, Subversion merges as if it were applying a
672 patch: if your local changes do not overlap textually with Felix's,
673 then all is well; otherwise, Subversion leaves it to you to resolve
674 the overlapping changes. In either case, Subversion carefully
675 preserves a copy of the original pre-merge text.</p></li>
676 <li><p>Once you have verified that Felix's changes and your
677 changes have been merged correctly, you can commit the new revision
678 of <tt class="filename">search.c</tt>, which now contains everyone's
679 changes.</p></li>
680 </ul>
682 <p>Some version control systems provide &ldquo;locks&rdquo;, which
683 prevent others from changing a file once one person has begun working on
684 it. In our experience, merging is preferable to locks, because:</p>
686 <ul>
687 <li><p>changes usually do not conflict, so Subversion's behavior
688 does the right thing by default, while locking can interfere with
689 legitimate work;</p></li>
690 <li><p>locking can prevent conflicts within a file, but not
691 conflicts between files (say, between a C header file and another
692 file that includes it), so it doesn't really solve the problem; and
693 finally,</p></li>
694 <li><p>people often forget that they are holding locks,
695 resulting in unnecessary delays and friction.</p></li>
696 </ul>
698 <p>Of course, some kinds of files with rigid formats, like images or
699 executables, are simply not mergeable. To support this, Subversion
700 allows users to customize its merging behavior on a per-file basis.
701 Firstly, you can direct Subversion to refuse to merge changes to certain
702 files, and simply present you with the two original texts to choose from.
703 Secondly, in Subversion 1.2 and later, support for the
704 &ldquo;locking&rdquo; method of working is also available, and individual
705 files can be designated as requiring locking.</p>
707 <p>(In the future, you may be able to direct Subversion to merge using a
708 tool which respects the semantics of specific complex file
709 formats.)</p>
710 </div> <!-- model.lock-merge (h3) -->
712 <div class="h3" id="model.props" title="#model.props">
713 <h3>Properties</h3>
716 <p>Files generally have interesting attributes beyond their contents:
717 mime-types, executable permissions, EOL styles, and so on. Subversion
718 attempts to preserve these attributes, or at least record them, when
719 doing so would be meaningful. However, different operating systems
720 support very different sets of file attributes: Windows NT supports
721 access control lists, while Linux provides only the simpler traditional
722 Unix permission bits.</p>
724 <p>In order to interoperate well with clients on many different
725 operating systems, Subversion supports <strong class="firstterm">property
726 lists</strong>, a simple, general-purpose mechanism which clients
727 can use to store arbitrary out-of-band information about files.</p>
729 <p>A property list is a set of name / value pairs. A property name is
730 an arbitrary text string, expressed as a Unicode UTF-8 string,
731 canonically decomposed and ordered. A property value is an arbitrary
732 string of bytes. Property values may be of any size, but Subversion may
733 not handle very large property values efficiently. No two properties in
734 a given a property list may have the same name. Although the word `list'
735 usually denotes an ordered sequence, there is no fixed order to the
736 properties in a property list; the term `property list' is
737 historical.</p>
739 <p>Each revision number, file, directory, and directory entry in the
740 Subversion repository, has its own property list. Subversion puts these
741 property lists to several uses:</p>
743 <ul>
744 <li><p>Clients can use properties to store file attributes, as
745 described above.</p></li>
746 <li><p>The Subversion server uses properties to hold attributes
747 of its own, and allow clients to read and modify them. For example,
748 someday a hypothetical &lsquo;<tt class="literal">svn-acl</tt>&rsquo;
749 property might hold an access control list which the Subversion server
750 uses to regulate access to repository files.</p></li>
751 <li><p>Users can invent properties of their own, to store
752 arbitrary information for use by scripts, build environments, and so
753 on. Names of user properties should be URI's, to avoid conflicts
754 between organizations.</p></li>
755 </ul>
757 <p>Property lists are versioned, just like file contents. You can
758 change properties in your working directory, but those changes are not
759 visible in the repository until you commit your local changes. If you do
760 commit a change to a property value, other users will see your change
761 when they update their working directories.</p>
762 </div> <!-- model.props (h3) -->
764 <div class="h3" id="model.merging-and-ancestry" title="#model.merging-and-ancestry">
765 <h3>Merging and Ancestry</h3>
768 <p>[WARNING: this section was written in May 2000, at the very
769 beginning of the Subversion project. This functionality probably will
770 not exist in Subversion 1.0, but it's planned for post-1.0. The problem
771 should be reasonably solvable by recording merge data in
772 'properties'.]</p>
774 <p>Subversion defines merges the same way CVS does: to merge means to
775 take a set of previously committed changes and apply them, as a patch, to
776 a working copy. This change can then be committed, like any other
777 change. (In Subversion's case, the patch may include changes to
778 directory trees, not just file contents.)</p>
780 <p>As defined thus far, merging is equivalent to hand-editing the
781 working copy into the same state as would result from the patch
782 application. In fact, in CVS there <em>is</em> no difference
783 &ndash; it is equivalent to just editing the files, and there is no
784 record of which ancestors these particular changes came from.
785 Unfortunately, this leads to conflicts when users unintentionally merge
786 the same changes again. (Experienced CVS users avoid this problem by
787 using branch- and merge-point tags, but that involves a lot of unwieldy
788 bookkeeping.)</p>
790 <p>In Subversion, merges are remembered by recording <strong class="firstterm">ancestry
791 sets</strong>. A revision's ancestry set is the set of all changes
792 "accounted for" in that revision. By maintaining ancestry sets, and
793 consulting them when doing merges, Subversion can detect when it would
794 apply the same patch twice, and spare users much bookkeeping. Ancestry
795 sets are stored as properties.</p>
797 <p>In the examples below, bear in mind that revision numbers usually
798 refer to changes, rather than the full contents of that revision. For
799 example, "the change A:4" means "the delta that resulted in A:4", not
800 "the full contents of A:4".</p>
802 <p>The simplest ancestor sets are associated with linear histories. For
803 example, here's the history of a file A:</p>
805 <pre>
807 _____ _____ _____ _____ _____
808 | | | | | | | | | |
809 | A:1 |-----&gt;| A:2 |-----&gt;| A:3 |-----&gt;| A:4 |-----&gt;| A:5 |
810 |_____| |_____| |_____| |_____| |_____|
812 </pre>
814 <p>The ancestor set of A:5 is:</p>
816 <pre>
818 { A:1, A:2, A:3, A:4, A:5 }
820 </pre>
822 <p>That is, it includes the change that brought A from nothing to A:1,
823 the change from A:1 to A:2, and so on to A:5. From now on, ranges like
824 this will be represented with a more compact notation:</p>
826 <pre>
828 { A:1-5 }
830 </pre>
832 <p>Now assume there's a branch B based, or "rooted", at A:2. (This
833 postulates an entirely different revision history, of course, and the
834 global revision numbers in the diagrams will change to reflect it.)
835 Here's what the project looks like with the branch:</p>
837 <pre>
839 _____ _____ _____ _____ _____ _____
840 | | | | | | | | | | | |
841 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |-----&gt;| A:9 |
842 |_____| |_____| |_____| |_____| |_____| |_____|
845 \ _____ _____ _____
846 \| | | | | |
847 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |
848 |_____| |_____| |_____|
850 </pre>
852 <p>If we produce A:9 by merging the B branch back into the
853 trunk</p>
855 <pre>
857 _____ _____ _____ _____ _____ _____
858 | | | | | | | | | | | |
859 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |---.-&gt;| A:9 |
860 |_____| |_____| |_____| |_____| |_____| / |_____|
863 \ _____ _____ _____ /
864 \| | | | | | /
865 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |---&gt;-'
866 |_____| |_____| |_____|
868 </pre>
870 <p>then what will A:9's ancestor set be?</p>
872 <pre>
874 { A:1, A:2, A:4, A:6, A:8, A:9, B:3, B:5, B:7}
876 </pre>
878 <p>or more compactly:</p>
880 <pre>
882 { A:1-9, B:3-7 }
884 </pre>
886 <p>(It's all right that each file's ranges seem to include non-changes;
887 this is just a notational convenience, and you can think of the
888 non-changes as either not being included, or being included but being
889 null deltas as far as that file is concerned).</p>
891 <p>All changes along the B line are accounted for (changes B:3-7), and
892 so are all changes along the A line, including both the merge and any
893 non-merge-related edits made before the commit.</p>
895 <p>Although this merge happened to include all the branch changes, that
896 needn't be the case. For example, the next time we merge the B
897 line</p>
899 <pre>
901 _____ _____ _____ _____ _____ _____ _____
902 | | | | | | | | | | | | | |
903 | A:1 |--&gt;| A:2 |--&gt;| A:4 |--&gt;| A:6 |--&gt;| A:8 |-.-&gt;| A:9 |-.-&gt;|A:11 |
904 |_____| |_____| |_____| |_____| |_____| | |_____| | |_____|
905 \ / |
906 \ / |
907 \ _____ _____ _____ / _____ |
908 \| | | | | | / | | /
909 | B:3 |--&gt;| B:5 |--&gt;| B:7 |--&gt;|B:10 |-&gt;-'
910 |_____| |_____| |_____| |_____|
912 </pre>
914 <p>Subversion will know that A's ancestry set already contains B:3-7, so
915 only the difference between B:7 and B:10 will be applied. A's new
916 ancestry will be</p>
918 <pre>
920 { A:1-11, B:3-10 }
922 </pre>
924 <p>But why limit ourselves to contiguous ranges? An ancestry set is
925 truly a set &ndash; it can be any subset of the changes available:</p>
927 <pre>
929 _____ _____ _____ _____ _____ _____
930 | | | | | | | | | | | |
931 | A:1 |-----&gt;| A:2 |-----&gt;| A:4 |-----&gt;| A:6 |-----&gt;| A:8 |--.--&gt;|A:10 |
932 |_____| |_____| |_____| |_____| |_____| / |_____|
934 | ______________________.__/
935 | / |
936 | / |
937 \ __/_ _|__
938 \ { } { }
939 \ _____ _____ _____ _____
940 \| | | | | | | |
941 | B:3 |-----&gt;| B:5 |-----&gt;| B:7 |-----&gt;| B:9 |-----&gt;
942 |_____| |_____| |_____| |_____|
944 </pre>
946 <p>In this diagram, the change from B:3-5 and the change from B:7-9 are
947 merged into a working copy whose ancestry set (so far) is
948 {&nbsp;A:1-8&nbsp;} plus any local changes. After committing, A:10's
949 ancestry set is</p>
951 <pre>
953 { A:1-10, B:5, B:9 }
955 </pre>
957 <p>Clearly, saying "Let's merge branch B into A" is a little ambiguous.
958 It usually means "Merge all the changes accounted for in B's tip into A",
959 but it <em>might</em> mean "Merge the single change that
960 resulted in B's tip into A".</p>
962 <p>Any merge, when viewed in detail, is an application of a particular
963 set of changes &ndash; not necessarily adjacent ones &ndash; to a working
964 copy. The user-level interface may allow some of these changes to be
965 specified implicitly. For example, many merges involve a single,
966 contiguous range of changes, with one or both ends of the range easily
967 deducible from context (i.e., branch root to branch tip). These
968 inference rules are not specified here, but it should be clear in most
969 contexts how they work.</p>
971 <p>Because each node knows its ancestors, Subversion never merges the
972 same change twice (unless you force it to). For example, if after the
973 above merge, you tell Subversion to merge all B changes into A,
974 Subversion will notice that two of them have already been merged, and so
975 merge only the other two changes, resulting in a final ancestry set
976 of:</p>
978 <pre>
980 { A:1-10, B:3-9 }
982 </pre>
984 <!--
985 Heh, what about this:
987 B:3 adds line 3, with the text "foo".
988 B:5 deletes line 3.
989 B:7 adds line 3, with the text "foo".
990 B:9 deletes line 3.
992 The user first merges B:5 and B:9 into A. If A had that line, it goes away
993 now, nothing more.
995 Next, user merges B:3 and B:7 into A. The second merge must conflict.
997 I'm not sure we need to care about this, I just thought I'd note how even
998 merges that seem like they ought to be easily composable can still suck. :-)
1001 <p>This description of merging and ancestry applies to both intra- and
1002 inter-repository merges. However, inter-repository merging will probably
1003 not be implemented until a future release of Subversion.</p>
1004 </div> <!-- model.merging-and-ancestry (h3) -->
1005 </div> <!-- model (h2) -->
1007 <div class="h2" id="archi" title="#archi">
1008 <h2>Architecture &mdash; How Subversion's components work together</h2>
1012 <p>Subversion is conceptually divided into a number of separable
1013 layers.</p>
1015 <p>Assuming that the programmatic interface of each layer is
1016 well-defined, it is easy to customize the different parts of the system.
1017 Contributors can write new client apps, new network protocols, new server
1018 processes, new server features, and new storage back-ends.</p>
1020 <p>The following diagram illustrates the "layered" architecture, and
1021 where each particular interface lies.</p>
1023 <pre>
1024 +--------------------+
1025 | commandline or GUI |
1026 | client app |
1027 +----------+--------------------+----------+ &lt;=== Client interface
1028 | Client Library |
1030 | +----+ |
1031 | | | |
1032 +-------+--------+ +--------------+--+----------+ &lt;=== Network interface
1033 | Working Copy | | Remote | | Local |
1034 | Management lib | | Repos Access | | Repos |
1035 +----------------+ +--------------+ | Access |
1036 | neon | | |
1037 +--------------+ | |
1038 ^ | |
1039 / | |
1040 DAV / | |
1041 / | |
1042 v | |
1043 +---------+ | |
1044 | | | |
1045 | Apache | | |
1046 | | | |
1047 +---------+ | |
1048 | mod_DAV | | |
1049 +-------------+ | |
1050 | mod_DAV_SVN | | |
1051 +----------+-------------+--------------+----------+ &lt;=== Filesystem interface
1053 | Subversion Filesystem |
1055 +--------------------------------------------------+
1057 </pre>
1060 <div class="h3" id="archi.client" title="#archi.client">
1061 <h3>Client Layer</h3>
1064 <p>The Subversion client, which may be either
1065 command-line or GUI, draws on three libraries.</p>
1067 <p>The working copy library, <tt class="filename">libsvn_wc</tt>, provides
1068 an API for managing the client's working copy of a project. This
1069 includes operations like renaming or removal of files, patching files,
1070 extracting local diffs, and routines for maintaining administrative
1071 files in the <tt class="filename">.svn/</tt> directory.</p>
1073 <p>The repository_access library, <tt class="filename">libsvn_ra</tt>,
1074 provides an API for exchanging information with a Subversion
1075 repository. This includes the ability to read files, write new
1076 revisions of files, and ask the repository to compare a working copy
1077 against its latest revision. Note that there are two implementations
1078 of this interface: one designed to talk to a repository over a network,
1079 and one designed to work with a repository on local disk. Any number
1080 of interface implementations can exist.</p>
1082 <p>The client library, <tt class="filename">libsvn_client</tt> provides
1083 general client functions such as <tt class="literal">update()</tt> and
1084 <tt class="literal">commit()</tt>, which may involve one or both of the other
1085 two client libraries. <tt class="filename">libsvn_client</tt> should, in
1086 theory, provide an API that allows anyone to write a Subversion client
1087 application.</p>
1089 <p>For details, see <a href="#client">Client &mdash; How the client works</a>.</p>
1090 </div> <!-- archi.client (h3) -->
1092 <div class="h3" id="archi.network" title="#archi.network">
1093 <h3>Network Layer</h3>
1096 <p> The network layer's job is to move the repository API requests
1097 over a wire.</p>
1099 <p>On the client side, a network library
1100 (<tt class="filename">libneon</tt>) translates these requests into a set of
1101 HTTP WebDAV/DeltaV requests. The information is sent over TCP/IP to an
1102 Apache server. Apache is used for the following reasons:</p>
1104 <ul>
1105 <li><p>it is time-tested and extremely
1106 stable;</p></li>
1107 <li><p>it has built-in load-balancing;</p></li>
1108 <li><p>it has built-in proxy and firewall
1109 support;</p></li>
1110 <li><p>it has authentication and encryption
1111 features;</p></li>
1112 <li><p>it allows client-side caching;</p></li>
1113 <li><p>it has an extensible module system</p></li>
1114 </ul>
1116 <p>Our rationale is that any attempt to write a dedicated "Subversion
1117 server" (with a "Subversion protocol") would inevitably end up evolving
1118 towards Apache's already-existing feature set. (However, Subversion's
1119 layered architecture certainly doesn't <em>prevent</em>
1120 anyone from writing a totally new network access
1121 implementation.)</p>
1123 <p>An Apache module (<tt class="filename">mod_dav_svn</tt>) translates the
1124 DAV requests into API calls against a particular repository.</p>
1126 <p>For details, see <a href="#protocol">Protocol &mdash; How the client and server communicate</a>.</p>
1127 </div> <!-- archi.network (h3) -->
1129 <div class="h3" id="archi.fs" title="#archi.fs">
1130 <h3>Filesystem Layer</h3>
1133 <p>When the requests reach a particular repository, they are
1134 interpreted by the <strong class="firstterm">Subversion Filesystem
1135 library</strong>, <tt class="filename">libsvn_fs</tt>. The Subversion
1136 Filesystem is a custom Unix-like filesystem, with a twist: writes are
1137 revisioned and atomic, and no data is ever deleted! This filesystem is
1138 currently implemented on top of a normal filesystem, using Berkeley DB
1139 files.</p>
1141 <p>For a more detailed explanation: see <a href="#server">Server &mdash; How the server works</a>.</p>
1142 </div> <!-- archi.fs (h3) -->
1143 </div> <!-- archi (h2) -->
1145 <div class="h2" id="deltas" title="#deltas">
1146 <h2>Deltas &mdash; How to describe changes</h2>
1150 <p>Subversion uses three kinds of deltas:</p>
1152 <ul>
1154 <li><p>A <strong><strong class="firstterm">tree
1155 delta</strong></strong> describes the difference between two
1156 arbitrary directory trees, the way a traditional patch describes the
1157 difference between two files. For example, the delta between
1158 directories A and B could be applied to A, to produce B.</p>
1160 <p>Tree deltas can also carry ancestry information, indicating how
1161 the files in one tree are related to files in the other tree. And
1162 deltas can describe changes to file meta-information, like permission
1163 bits, creation dates, and so on. The repository and working copy use
1164 deltas to communicate changes.</p></li>
1166 <li><p>A <strong><strong class="firstterm">text
1167 delta</strong></strong> describes changes to a string of
1168 bytes, such as the contents of a file. It is analogous to
1169 traditional patch format, except that it works equally well on binary
1170 and text files, and is not invertible (because context and deleted
1171 data are not recorded).</p></li>
1173 <li><p>A <strong><strong class="firstterm">property
1174 delta</strong></strong> describes changes to a list of named
1175 properties (see <a href="#model.props">Properties</a>).</p></li>
1176 </ul>
1178 <p>The term <strong class="firstterm">delta</strong> without qualification generally
1179 means a tree delta, unless some other meaning is clear from
1180 context.</p>
1182 <p>In the examples below, deltas will be described in XML, which happens
1183 to be Subversion's (now mostly defunct) import/export patch format.
1184 However, note that deltas are an abstract data structure, of which the
1185 XML format is merely one representation. Later, we will describe other
1186 representations: for example, there is a serialized representation
1187 (useful for streaming protocols, among other things), and a db-style
1188 representation, used for repository storage. The various representations
1189 of a given delta are (in theory, anyway) perfectly isomorphic to one
1190 another, since they describe the same underlying structure.</p>
1193 <div class="h3" id="deltas.text" title="#deltas.text">
1194 <h3>Text Deltas</h3>
1197 <p>A text delta describes the difference between two strings of bytes,
1198 the <strong class="firstterm">source</strong> string and the
1199 <strong class="firstterm">target</strong> string. Given a source string and a target
1200 string, we can compute a text delta; given a source string and a delta,
1201 we can reconstruct the target string. However, note that deltas are not
1202 invertible: you cannot always reconstruct the source string given the
1203 target string and delta.</p>
1205 <p>The standard Unix &ldquo;diff&rdquo; format is one possible
1206 representation for text deltas; however, diffs are not ideal for internal
1207 use by a revision control system, for several reasons:</p>
1209 <ul>
1210 <li><p>Diffs are line-oriented, which makes them human-readable,
1211 but sometimes makes them perform poorly on binary
1212 files.</p></li>
1213 <li><p>Diffs represent a series of replacements, exchanging
1214 selected ranges ofthe old text with new text; again, this is easy for
1215 humans to read, butit is more expensive to compute and less compact
1216 than some alternatives.</p></li>
1217 </ul>
1219 <p>Instead, Subversion uses the VDelta binary-diffing algorithm, as
1220 described in <em class="citetitle">Hunt, J. J., Vo, K.-P., and Tichy, W. F. An
1221 empirical study of delta algorithms. Lecture Notes in Computer Science
1222 1167 (July 1996), 49-66.</em> Currently, the output of this
1223 algorithm is stored in a custom data format called
1224 <strong class="firstterm">svndiff</strong>, invented by Greg Hudson &lt;&gt;, a
1225 Subversion developer.</p>
1227 <p>The concrete form of a text delta is a well-formed XML element,
1228 having the following form:</p>
1230 <pre>
1231 &lt;text-delta&gt;<em class="replaceable">data</em>&lt;/text-delta&gt;
1232 </pre>
1234 <p>Here, <em class="replaceable">data</em> is the raw svndiff data,
1235 encoded in the MIME Base64 format.</p>
1236 </div> <!-- deltas.text (h3) -->
1238 <div class="h3" id="deltas.prop" title="#deltas.prop">
1239 <h3>Property Deltas</h3>
1242 <p>A property delta describes changes to a property list, of the sort
1243 associated with files, directories, and directory entries, and revision
1244 numbers (see <a href="#model.props">Properties</a>). A property delta can record
1245 creating, deleting, and changing the text of any number of
1246 properties.</p>
1248 <p>A property delta is an unordered set of name/change pairs. No two
1249 pairs within a given property delta have the same name. A pair's name
1250 indicates the property affected, and the change indicates what happens to
1251 its value. There are two kinds of changes:</p>
1253 <dl>
1254 <dt>set <em class="replaceable">value</em></dt>
1255 <dd><p>Change the value of the named property to the byte
1256 string <em class="replaceable">value</em>. If there is no property
1257 with the given name, one is added to the property
1258 list.</p></dd>
1260 <dt>delete</dt>
1261 <dd><p>Remove the named property from the property
1262 list.</p></dd>
1264 </dl>
1266 <p>At the moment, the <tt class="literal">set</tt> command can either create
1267 or change a property value. However, this simplification means that the
1268 server cannot distinguish between a client which believes it is creating
1269 a value afresh, and a client which believes it is changing the value of
1270 an existing property. It may simplify conflict detection to divide
1271 <tt class="literal">set</tt> into two separate <tt class="literal">add</tt> and
1272 <tt class="literal">change</tt> operations.</p>
1274 <p>In the future, we may add a <tt class="literal">text-delta</tt> change,
1275 which specifies a change to an existing property's value as a text delta.
1276 This would give us a compact way to describe small changes to large
1277 property values.</p>
1279 <p>The concrete form of a property delta is a well-formed XML element,
1280 having the following form:</p>
1282 <pre>
1283 &lt;property-delta&gt;<em class="replaceable">change</em>&hellip;&lt;/property-delta&gt;
1284 </pre>
1286 <p>Each <em class="replaceable">change</em> in a property delta has one of
1287 the following forms:</p>
1289 <pre>
1290 &lt;set name='<em class="replaceable">name</em>'&gt;<em class="replaceable">value</em>&lt;/set&gt;
1291 &lt;delete name='<em class="replaceable">name</em>'/&gt;
1292 </pre>
1294 <p>The <em class="replaceable">name</em> attribute of a
1295 <tt class="literal">set</tt> or <tt class="literal">delete</tt> element gives the
1296 name of the property to change. The <em class="replaceable">value</em> of
1297 a <tt class="literal">set</tt> element gives the new value of the
1298 property.</p>
1300 <p>If either the property name or the property value contains the
1301 characters &lsquo;<tt class="literal">&amp;</tt>&rsquo;,
1302 &lsquo;<tt class="literal">&lt;</tt>&rsquo;, or
1303 &lsquo;<tt class="literal">'</tt>&rsquo;, they should be replaced with the
1304 sequences &lsquo;<tt class="literal">&amp;#38</tt>&rsquo;,
1305 &lsquo;<tt class="literal">&amp;#60</tt>&rsquo;, or
1306 &lsquo;<tt class="literal">&amp;#39</tt>&rsquo;, respectively.</p>
1307 </div> <!-- deltas.prop (h3) -->
1309 <div class="h3" id="deltas.tree" title="#deltas.tree">
1310 <h3>Tree Deltas</h3>
1313 <p>A tree delta describes changes between two directory trees, the
1314 <strong class="firstterm">source tree</strong> and the <strong class="firstterm">target
1315 tree</strong>. Tree deltas can describe copies, renames, and
1316 deletions of files and directories, changes to file contents, and changes
1317 to property lists. A tree delta can also carry information about how the
1318 files in the target tree are derived from the files in the source tree,
1319 if this information is available.</p>
1321 <p>The format for tree deltas described here is easy to compute from a
1322 Subversion working directory, and easy to apply to a Subversion
1323 repository. Furthermore, the size of a tree delta in this format is
1324 independent of the commands used to produce the target tree &mdash; it
1325 depends only on the degree of difference between the source and target
1326 trees.</p>
1328 <p>A tree delta is interpreted in the context of three
1329 parameters:</p>
1331 <ul>
1332 <li><p><em class="replaceable">source-root</em>, the name of the
1333 directory to which this complete tree delta applies,</p></li>
1334 <li><p><em class="replaceable">revision</em>, indicating a
1335 particular revision of &hellip;</p></li>
1336 <li><p><em class="replaceable">source-dir</em>, which is a
1337 directory in the source tree that we are currently modifying to yield
1338 &hellip;</p></li>
1339 <li><p>&hellip; <strong class="firstterm">target-dir</strong> &mdash; the
1340 directory we're constructing.</p></li>
1341 </ul>
1343 <p>When we start interpreting a tree delta,
1344 <em class="replaceable">source-root</em>,
1345 <em class="replaceable">source-dir</em>, and
1346 <em class="replaceable">target-dir</em> are all equal. As we walk the tree
1347 delta, <em class="replaceable">target-dir</em> walks the tree we are
1348 constructing, and <em class="replaceable">source-dir</em> walks the
1349 corresponding portion of the source tree, which we use as the original.
1350 <em class="replaceable">Source-root</em> remains constant as we walk the
1351 delta; we may use it to choose new source trees.</p>
1353 <p>A tree delta is a list of changes of the form</p>
1355 <pre>
1356 &lt;tree-delta&gt;<em class="replaceable">change</em>&hellip;&lt;/tree-delta&gt;
1357 </pre>
1359 <p>which describe how to edit the contents of
1360 <em class="replaceable">source-dir</em> to yield
1361 <em class="replaceable">target-dir</em>. There are three kinds of
1362 changes:</p>
1364 <dl>
1366 <dt>&lt;delete
1367 name='<em class="replaceable">name</em>'/&gt;</dt>
1368 <dd><p><em class="replaceable">Source-dir</em> has an entry
1369 named <em class="replaceable">name</em>, which is not present
1370 in <em class="replaceable">target-dir</em>.</p></dd>
1373 <dt>&lt;add
1374 name='<em class="replaceable">name</em>'&gt;<em class="replaceable">content</em>&lt;/add&gt;</dt>
1375 <dd><p><em class="replaceable">target-dir</em> has an entry
1376 named <em class="replaceable">name</em>, which is not present
1377 in <em class="replaceable">source-dir</em>;
1378 <em class="replaceable">content</em> describes the file or directory
1379 to which the new directory entry refers.</p></dd>
1382 <dt>&lt;open
1383 name='<em class="replaceable">name</em>'&gt;<em class="replaceable">content</em>&lt;/open&gt;</dt>
1384 <dd><p>Both <em class="replaceable">source-dir</em> and
1385 <em class="replaceable">target-dir</em> have an entry
1386 named <em class="replaceable">name</em>, which has changed;
1387 <em class="replaceable">content</em> describes the new file
1388 or directory.</p></dd>
1390 </dl>
1392 <p>Any entries in <em class="replaceable">source-dir</em> whose names
1393 aren't mentioned are assumed to appear unchanged in
1394 <em class="replaceable">target-dir</em>. Thus, an empty
1395 <tt class="literal">tree-delta</tt> element indicates that
1396 <em class="replaceable">target-dir</em> is identical to
1397 <em class="replaceable">source-dir</em>.</p>
1399 <p>In the change descriptions above, each
1400 <em class="replaceable">content</em> takes one of the following
1401 forms:</p>
1403 <dl>
1405 <dt>&lt;file
1406 <em class="replaceable">ancestor</em>&gt;<em class="replaceable">prop-delta</em>
1407 <em class="replaceable">text-delta</em>&lt;/file&gt;</dt>
1409 <dd><p>The given <em class="replaceable">target-dir</em> entry
1410 refers to a file, <em class="replaceable">f</em>.
1411 <em class="replaceable">Ancestor</em> indicates which file in the
1412 source tree <em class="replaceable">f</em> is derived from, if any.
1413 </p>
1415 <p><em class="replaceable">Prop-delta</em> is a property delta
1416 describing how <em class="replaceable">f</em>'s properties differ
1417 from that ancestor; it may be omitted, indicating that the
1418 properties are unchanged.</p>
1420 <p><em class="replaceable">Text-delta</em> is a text delta
1421 describing how to construct <em class="replaceable">f</em> from that
1422 ancestor; it may also be omitted, indicating that
1423 <em class="replaceable">f</em>'s text is identical to its
1424 ancestor's.</p></dd>
1428 <dt>&lt;file <em class="replaceable">ancestor</em>/&gt;</dt>
1430 <dd><p>An abbreviation for <tt class="literal">&lt;file
1431 <em class="replaceable">ancestor</em>&gt;&lt;/file&gt;</tt>
1432 &mdash; a fileelement with no property or text delta, thus
1433 describing a file identicalto its ancestor.</p></dd>
1437 <dt>&lt;directory
1438 <em class="replaceable">ancestor</em>&gt;<em class="replaceable">prop-delta</em>
1439 <em class="replaceable">tree-delta</em>&lt;/directory&gt;</dt>
1441 <dd><p>The given <em class="replaceable">target-dir</em> entry
1442 refers to a subdirectory, <em class="replaceable">sub</em>.
1443 <em class="replaceable">Ancestor</em> indicates which directory in
1444 the source tree <em class="replaceable">sub</em> is derived from, if
1445 any.</p>
1447 <p><em class="replaceable">Prop-delta</em> is a property delta
1448 describing how <em class="replaceable">sub</em>'sproperties differ
1449 from that ancestor; it may be omitted, indicating thatthe
1450 properties are unchanged.</p>
1452 <p><em class="replaceable">Tree-delta</em>
1453 describes how to construct <em class="replaceable">sub</em> from
1454 that ancestor; it may be omitted, indicating that the directory is
1455 identical to its ancestor. <em class="replaceable">Tree-delta</em>
1456 should be interpreted with a new
1457 <em class="replaceable">target-dir</em> of
1458 <tt class="filename"><em class="replaceable">target-dir</em>/<em class="replaceable">name</em></tt>.</p>
1460 <p>Since <em class="replaceable">tree-delta</em> is itself a
1461 complete tree delta structure, tree deltas are themselves trees,
1462 whose structure is a subgraph of the target tree.</p></dd>
1466 <dt>&lt;directory
1467 <em class="replaceable">ancestor</em>/&gt;</dt>
1469 <dd><p>An abbreviation for <tt class="literal">&lt;directory
1470 <em class="replaceable">ancestor</em>&gt;&lt;/directory&gt;</tt>
1471 &mdash; a directory element with no property or tree delta, thus
1472 describing a directory identical to its ancestor.</p></dd>
1474 </dl>
1476 <p>The <em class="replaceable">content</em> of a <tt class="literal">add</tt> or
1477 <tt class="literal">open</tt> tag may also contain a property delta, describing
1478 changes to the properties of that <em>directory
1479 entry</em>.</p>
1481 <p>In the <tt class="literal">file</tt> and <tt class="literal">directory</tt>
1482 elements described above, each <em class="replaceable">ancestor</em> has
1483 one of the following forms:</p>
1485 <dl>
1487 <dt>ancestor='<em class="replaceable">path</em>'</dt>
1489 <dd><p>The ancestor of the new or changed file or directory is
1490 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>,
1491 in <em class="replaceable">revision</em>. When this appears as an
1492 attribute of a <tt class="literal">file</tt> element, the element's text
1493 delta should be applied to
1494 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>.
1495 When this appears as an attribute of a <tt class="literal">directory</tt>
1496 element,
1497 <tt class="filename"><em class="replaceable">source-root</em>/<em class="replaceable">path</em></tt>
1498 should be the new <em class="replaceable">source-dir</em> for
1499 interpreting that element's tree delta.</p></dd>
1503 <dt>new='true'</dt>
1505 <dd><p>This indicates that the file or directory has no
1506 ancestor in the source tree. When followed by a
1507 <em class="replaceable">text-delta</em>, that delta should be applied
1508 to the empty file to yield the new text; when followed by a
1509 <em class="replaceable">tree-delta</em>, that delta should be
1510 evaluated as if <em class="replaceable">source-dir</em> were an
1511 imaginary empty directory.</p></dd>
1515 <dt><em class="replaceable">nothing</em></dt>
1517 <dd><p>If neither an <tt class="literal">ancestor</tt> nor a
1518 <tt class="literal">new</tt> attribute is given, this is an abbreviation
1520 <tt class="literal">ancestor='<em class="replaceable">source-dir</em>/<em class="replaceable">name</em>'</tt>,
1521 with the same revision number. This makes the common case &mdash;
1522 files or directories modified in place &mdash; more
1523 compact.</p></dd>
1525 </dl>
1527 <p>If the <em class="replaceable">ancestor</em> spec is not
1528 <tt class="literal">new='true'</tt>, it may also contain the text
1529 <tt class="literal">revision='<em class="replaceable">rev</em>'</tt>, indicating
1530 a new value for <em class="replaceable">revision</em>, in which we should
1531 find the ancestor.</p>
1533 <p>If a filename or path appearing as a <em class="replaceable">name</em>
1534 or <em class="replaceable">path</em> in the description above contains the
1535 characters &lsquo;<tt class="literal">&amp;</tt>&rsquo;,
1536 &lsquo;<tt class="literal">&lt;</tt>&rsquo;, or
1537 &lsquo;<tt class="literal">'</tt>&rsquo;, they should be replaced with the
1538 sequences &lsquo;<tt class="literal">&amp;#38;</tt>&rsquo;,
1539 &lsquo;<tt class="literal">&amp;#60;</tt>&rsquo;, or
1540 &lsquo;<tt class="literal">&amp;#39;</tt>&rsquo;, respectively.</p>
1542 <p>Suppose we have the following source tree:</p>
1544 <pre>
1545 /dir1/file1
1546 file2
1547 dir2/file3
1548 file4
1549 dir3/file5
1550 file6
1551 </pre>
1553 <p>If we edit the contents of <tt class="filename">/dir1/file1</tt>, we can
1554 describe the effect on the tree with the following tree delta, to be
1555 applied to the root:</p>
1557 <pre>
1558 &lt;tree-delta&gt;
1559 &lt;open name='dir1'&gt;
1560 &lt;directory&gt;
1561 &lt;tree-delta&gt;
1562 &lt;open name='file1'&gt;
1563 &lt;file&gt;<em class="replaceable">text-delta</em>&lt;/file&gt;
1564 &lt;/open&gt;
1565 &lt;/tree-delta&gt;
1566 &lt;/directory&gt;
1567 &lt;/open&gt;
1568 &lt;/tree-delta&gt;
1569 </pre>
1571 <p>The outer <tt class="literal">tree-delta</tt> element describes the changes
1572 made to the root directory. Within the root directory, there are changes
1573 in <tt class="filename">dir1</tt>, described by the nested
1574 <tt class="literal">tree-delta</tt>. Within <tt class="filename">/dir1</tt>, there
1575 are changes in <tt class="filename">file1</tt>, described by the
1576 <em class="replaceable">text-delta</em>.</p>
1578 <p>If we had edited both <tt class="filename">/dir1/file1</tt> and
1579 <tt class="filename">/dir1/file2</tt>, then there would simply be two
1580 <tt class="literal">open</tt> elements in the inner
1581 <tt class="literal">tree-delta</tt>.</p>
1583 <p>As another example, starting from the same source tree, suppose we
1584 rename <tt class="filename">/dir1/file1</tt> to
1585 <tt class="filename">/dir1/file8</tt>:</p>
1587 <pre>
1588 &lt;tree-delta&gt;
1589 &lt;open name='dir1'&gt;
1590 &lt;directory&gt;
1591 &lt;tree-delta&gt;
1592 &lt;delete name='file1'/&gt;
1593 &lt;add name='file8'&gt;
1594 &lt;file ancestor='/dir1/file1'/&gt;
1595 &lt;/add&gt;
1596 &lt;/tree-delta&gt;
1597 &lt;/directory&gt;
1598 &lt;/open&gt;
1599 &lt;/tree-delta&gt;
1600 </pre>
1602 <p>As above, the inner <tt class="literal">tdelta</tt> describes how
1603 <tt class="filename">/dir1</tt> has changed: the entry for
1604 <tt class="filename">/dir1/file1</tt> has disappeared, but there is a new
1605 entry, <tt class="filename">/dir1/file8</tt>, which is derived from and
1606 textually identical to <tt class="filename">/dir1/file1</tt> in the source
1607 directory. This is just an indirect way of describing the rename.</p>
1609 <p>Why is it necessary to be so indirect? Consider the delta
1610 representing the result of:</p>
1612 <ol>
1613 <li><p>renaming <tt class="filename">/dir1/file1</tt> to
1614 <tt class="filename">/dir1/tmp</tt>,</p></li>
1615 <li><p>renaming <tt class="filename">/dir1/file2</tt> to
1616 <tt class="filename">/dir1/file1</tt>, and</p></li>
1617 <li><p>renaming <tt class="filename">/dir1/tmp</tt> to
1618 <tt class="filename">/dir1/file2</tt></p></li>
1619 </ol>
1621 <p>(in other words, exchanging <tt class="filename">file1</tt> and
1622 <tt class="filename">file2</tt>):</p>
1624 <pre>
1625 &lt;tree-delta&gt;
1626 &lt;open name='dir1'&gt;
1627 &lt;directory&gt;
1628 &lt;tree-delta&gt;
1629 &lt;open name='file1'&gt;
1630 &lt;file ancestor='/dir1/file2'/&gt;
1631 &lt;/open&gt;
1632 &lt;open name='file2'&gt;
1633 &lt;file ancestor='/dir1/file1'/&gt;
1634 &lt;/open&gt;
1635 &lt;/tree-delta&gt;
1636 &lt;/directory&gt;
1637 &lt;/open&gt;
1638 &lt;/tree-delta&gt;
1639 </pre>
1641 <p>The indirectness allows the tree delta to capture an arbitrary
1642 rearrangement without resorting to temporary filenames.</p>
1644 <p>Another example, starting from the same source tree:</p>
1646 <ol>
1647 <li><p>rename <tt class="filename">/dir1/dir2</tt> to
1648 <tt class="filename">/dir1/dir4</tt>,</p></li>
1649 <li><p>rename <tt class="filename">/dir1/dir3</tt> to
1650 <tt class="filename">/dir1/dir2</tt>, and</p></li>
1651 <li><p>move <tt class="filename">file3</tt> from
1652 <em class="replaceable">/dir1/dir4</em> to
1653 <em class="replaceable">/dir1/dir2</em>.</p></li>
1654 </ol>
1656 <p>Note that <tt class="filename">file3</tt>'s path has remained the same,
1657 even though the directories around it have changed. Here is the tree
1658 delta:</p>
1660 <pre>
1661 &lt;tree-delta&gt;
1662 &lt;open name='dir1'&gt;
1663 &lt;directory&gt;
1664 &lt;tree-delta&gt;
1665 &lt;open name='dir2'&gt;
1666 &lt;directory ancestor='/dir1/dir3'&gt;
1667 &lt;tree-delta&gt;
1668 &lt;add name='file3'&gt;
1669 &lt;file ancestor='/dir1/dir2/file3'/&gt;
1670 &lt;/add&gt;
1671 &lt;/tree-delta&gt;
1672 &lt;/directory&gt;
1673 &lt;/open&gt;
1674 &lt;delete name='dir3'/&gt;
1675 &lt;add name='dir4'&gt;
1676 &lt;directory ancestor='/dir1/dir2'&gt;
1677 &lt;tree-delta&gt;
1678 &lt;delete name='file3'/&gt;
1679 &lt;/tree-delta&gt;
1680 &lt;/directory&gt;
1681 &lt;/add&gt;
1682 &lt;/tree-delta&gt;
1683 &lt;/directory&gt;
1684 &lt;/open&gt;
1685 &lt;/tree-delta&gt;
1686 </pre>
1688 <p>In other words:</p>
1690 <ul>
1691 <li><p><tt class="filename">/dir1</tt> has changed;</p></li>
1692 <li><p>the new directory <tt class="filename">/dir1/dir2</tt> is
1693 derived from the old <tt class="filename">/dir1/dir3</tt>, and contains a
1694 new entry <tt class="filename">file3</tt>, derived from the old
1695 <tt class="filename">/dir1/dir2/file3</tt>;</p></li>
1696 <li><p>there is no longer any <tt class="filename">/dir1/dir3</tt>;
1697 and</p></li>
1698 <li><p>the new directory <tt class="filename">/dir1/dir4</tt> is
1699 derived from the old <tt class="filename">/dir1/dir2</tt>, except that its
1700 entry for <tt class="filename">file3</tt> is now gone.</p></li>
1702 </ul>
1704 <p>Some more possible maneuvers, left as exercises for the
1705 reader:</p>
1707 <ul>
1708 <li><p>Delete <tt class="filename">dir2</tt>, and then create a file
1709 named <tt class="filename">dir2</tt>.</p></li>
1710 <li><p>Rename <tt class="filename">/dir1/dir2</tt> to
1711 <tt class="filename">/dir1/dir4</tt>; move <tt class="filename">file2</tt>
1712 into <tt class="filename">/dir1/dir4</tt>; and move
1713 <tt class="filename">file3</tt> into
1714 <em class="replaceable">/dir1/dir3</em>.</p></li>
1715 <li><p>Move <tt class="filename">dir2</tt> into
1716 <tt class="filename">dir3</tt>, and move <tt class="filename">dir3</tt> into
1717 <tt class="filename">/</tt>.</p></li>
1718 </ul>
1719 </div> <!-- deltas.tree (h3) -->
1721 <div class="h3" id="deltas.postfix-text" title="#deltas.postfix-text">
1722 <h3>Postfix Text Deltas</h3>
1725 <p>It is sometimes useful to represent a set of changes to a tree
1726 without providing text deltas in the middle of the stream. Text deltas
1727 are often large and expensive to compute, and tree deltas can be useful
1728 without them. For example, one can detect whether two changes might
1729 conflict &mdash; whether they change the same file, for example &mdash;
1730 without knowing exactly how the conflicting files changed.</p>
1732 <p>For this reason, our XML representation of a tree delta allows the
1733 text deltas to come <em>after</em> the &lt;/tree-delta&gt;
1734 closure. This allows the client to receive early notice of conflicts:
1735 during a <tt class="literal">svn commit</tt> command, the client sends a
1736 tree-delta to the server, which can check for skeletal conflicts and
1737 reject the commit, before the client takes the time to transmit the
1738 (possibly large) textual changes. This potentially saves quite a bit of
1739 network traffic.</p>
1741 <p>In terms of XML, postfix text deltas are split into two parts. The
1742 first part appears "in-line" and contains a reference ID. The second
1743 part appears after the tree delta is complete. Here's an example:</p>
1745 <pre>
1746 &lt;tree-delta&gt;
1747 &lt;open name="foo.c"&gt;
1748 &lt;file&gt;
1749 &lt;text-delta-ref id="123"&gt;
1750 &lt;/file&gt;
1751 &lt;/open&gt;
1752 &lt;add name="bar.c"&gt;
1753 &lt;file&gt;
1754 &lt;text-delta-ref id="456"&gt;
1755 &lt;/file&gt;
1756 &lt;/add&gt;
1757 &lt;/tree-delta&gt;
1758 &lt;text-delta id="123"&gt;<em>data</em>&lt;/text-delta&gt;
1759 &lt;text-delta id="456"&gt;<em>data</em>&lt;/text-delta&gt;
1760 </pre>
1762 </div> <!-- deltas.postfix-text (h3) -->
1764 <div class="h3" id="deltas.serializing-via-editor" title="#deltas.serializing-via-editor">
1765 <h3>Serializing Deltas via the "Editor" Interface</h3>
1768 <p>The static XML forms above are useful as an import/export format, and
1769 as a visualization aid, but we also need a way to express a delta as a
1770 <em>series of operations</em>, to implement directory tree
1771 diffing and patching. Subversion defines a standard set of such
1772 operations in the vtable <tt class="literal">svn_delta_edit_fns_t</tt>, a set
1773 of function prototypes which anyone may implement (see
1774 <tt class="filename">svn_delta.h</tt>).</p>
1776 <p>Each function in an instance of <tt class="literal">svn_delta_editor_t</tt>
1777 (colloquially known as an <strong class="firstterm">editor</strong>) implements some
1778 distinct subtask of editing a directory tree. In fact, if you compare
1779 the editor function prototypes to the XML elements described previously,
1780 you'll notice a fairly strict correspondence: there's one function for
1781 replacing a directory, another function for replacing a file, one for
1782 adding a directory, another for adding a file, a function for deleting,
1783 and so on.</p>
1785 <p>Although the editor interface was designed around the general idea of
1786 making changes to a directory tree, a specific implementation's behavior
1787 depends on its role. For example, the versioning filesystem library
1788 offers an editor that creates new revisions, while the working copy
1789 library offers an editor that updates working copies. And the network
1790 layer offers an editor that turns editing calls into wire protocol, which
1791 is then converted back into editing calls on the other side! All of
1792 these different tasks can share a single interface, because they are all
1793 fundamentally about the same thing: expressing and applying differences
1794 between directory trees.</p>
1796 <p>Like the XML forms, a series of editor calls must follow certain
1797 nesting conventions; these conventions are implicit in the interface, in
1798 that some of the functions take arguments that can only be obtained from
1799 previous calls to other editor functions.</p>
1801 <p>Editors can best be understood by watching one work on a real
1802 directory tree. For example:</p>
1804 <!-- kff todo: fooo working here. -->
1806 <p>Suppose that the user has made a number of local changes to her
1807 working copy and wants to commit them to the repository. Let's represent
1808 her changes with the same tree-delta from a previous example. Notice
1809 that she has also made textual modifications to
1810 <tt class="filename">file3</tt>; hence the in-line
1811 <tt class="literal">&lt;text-delta&gt;</tt>:</p>
1813 <pre>
1814 &lt;tree-delta&gt;
1815 &lt;open name='dir1'&gt;
1816 &lt;directory&gt;
1817 &lt;tree-delta&gt;
1818 &lt;open name='dir2'&gt;
1819 &lt;directory ancestor='/dir1/dir3'&gt;
1820 &lt;tree-delta&gt;
1821 &lt;add name='file3'&gt;
1822 &lt;file ancestor='/dir1/dir2/file3'&gt;
1823 &lt;text-delta&gt;<em>data</em>&lt;/text-delta&gt;
1824 &lt;/file&gt;
1825 &lt;/add&gt;
1826 &lt;/tree-delta&gt;
1827 &lt;/directory&gt;
1828 &lt;/open&gt;
1829 &lt;delete name='dir3'/&gt;
1830 &lt;add name='dir4'&gt;
1831 &lt;directory ancestor='/dir1/dir2'&gt;
1832 &lt;tree-delta&gt;
1833 &lt;delete name='file3'/&gt;
1834 &lt;/tree-delta&gt;
1835 &lt;/directory&gt;
1836 &lt;/add&gt;
1837 &lt;/tree-delta&gt;
1838 &lt;/directory&gt;
1839 &lt;/open&gt;
1840 &lt;/tree-delta&gt;
1841 </pre>
1843 <p>So how does the client send this information to the server?</p>
1845 <p>In a nutshell: the tree-delta is <em>streamed</em> over
1846 the network, as a series of individual commands given in depth-first
1847 order.</p>
1849 <p>Let's be more specific. The server presents the client with an
1850 object of type <tt class="literal">struct svn_delta_edit_fns_t</tt>,
1851 colloquially known as an <strong class="firstterm">editor</strong>. An editor is
1852 really just table of functions; each function makes a change to a
1853 filesystem. Agent A (who has a private filesystem) presents an editor to
1854 agent B. Agent B then calls the editor's functions to change A's
1855 filesystem. B is said to be <strong class="firstterm">driving</strong> the
1856 editor.</p>
1858 <p>As Karl Fogel likes to describe the process, if one thinks of the
1859 tree-delta as a lion, the editor is a "hoop" that the lion jumps through
1860 &ndash; each portion of the lion being decomposed through time.</p>
1862 <p>B cannot call the functions in any willy-nilly order; there are some
1863 logical restrictions. In particular, as B drives the editor, it receives
1864 opaque data structures which represent directories and files. It must
1865 use and pass these structures, known as <strong class="firstterm">batons</strong>, to
1866 make further function calls.</p>
1868 <p>As an example, let's watch how the client would transmit the above
1869 tree-delta to the repository. (The description below is slightly
1870 simplified. For exact interface details, see
1871 <tt class="filename">subversion/include/svn_delta.h</tt>.)</p>
1873 <p>[Note: in the examples below, and throughout Subversion's code base,
1874 you'll see references to 'baton' objects. This is simply a project
1875 convention, a name given to structures that define contexts for
1876 functions. Many APIs call these structures 'userdata'. In Subversion,
1877 we like the term 'baton', because it reminds us of one function
1878 &ldquo;handing off&rdquo; context to another function.]</p>
1880 <ol>
1881 <li><p>The repository hands an "editor" to the
1882 client.</p></li>
1884 <li><p>The client begins by calling <tt class="literal">root_baton =
1885 editor-&gt;open_root();</tt> The client now has an opaque
1886 object, <strong class="firstterm">root_baton</strong>, which represents the root
1887 of the repository's filesystem.</p></li>
1889 <li><p><tt class="literal">dir1_baton = editor-&gt;open_dir("dir1",
1890 root_baton);</tt> Notice that <em>root_baton</em>
1891 gives the client free license to make any changes it wants in the
1892 repository's root directory &ndash; until, of course, it calls
1893 <tt class="literal">editor-&gt;close_dir(root_baton)</tt>. The first
1894 change made was a replacement of <tt class="filename">dir1</tt>. In
1895 return, the client now has a new opaque data structure that can be
1896 used to change <tt class="filename">dir1</tt>.</p></li>
1898 <li><p><tt class="literal">dir2_baton = editor-&gt;open_dir("dir2",
1899 "/dir1/dir3", dir1_baton);</tt> The
1900 <em>dir1_baton</em> is now used to open
1901 <tt class="filename">dir2</tt> with a directory whose ancestor is
1902 <tt class="filename">/dir1/dir3</tt>.</p></li>
1904 <li><p><tt class="literal">file_baton = editor-&gt;add_file("file3",
1905 "/dir1/dir2/file3", dir2_baton);</tt> Edits are now made to
1906 <tt class="filename">dir2</tt> (using <em>dir2_baton</em>).
1907 In particular, a new file is added to this directory whose ancestor
1908 is <tt class="filename">/dir1/dir2/file3</tt>.</p></li>
1910 <li><p>Now the text-delta associated with
1911 <em>file_baton</em> needs to be transmitted:
1912 <tt class="literal">window_handler =
1913 editor-&gt;apply_textdelta(file_baton);</tt> Text-deltas
1914 themselves, for network efficiency, are streamed in "chunks". So
1915 instead of receiving a baton object, we now have a routine that is
1916 able to receive any number of small "windows" of text-delta data.We
1917 won't go into the details of the <tt class="literal">svn_txdelta_*</tt>
1918 functions right here; but suffice it to say that these routines are
1919 used for sending svndiff data to the
1920 <em>window_handler</em> routine.</p></li>
1922 <li><p><tt class="literal">editor-&gt;close_file(file_baton);</tt> The
1923 client is done sending the file's text-delta, so it releases the file
1924 baton.</p></li>
1926 <li><p><tt class="literal">editor-&gt;close_dir(dir2_baton));</tt> The
1927 client is done making changes to <tt class="filename">dir2</tt>, so it
1928 releases its baton as well.</p></li>
1930 <li><p>The client isn't yet finished with
1931 <tt class="filename">dir1</tt>, however; it makes two more edits:
1932 <tt class="literal">editor-&gt;delete_item("dir3", dir1_baton);</tt>
1933 <tt class="literal">dir4_baton = editor-&gt;add_dir("dir4", "/dir1/dir2",
1934 dir1_baton);</tt> <em>(The function's name is
1935 <tt class="literal">delete_item</tt> rather than
1936 <tt class="literal">delete</tt> to avoid gratuitous incompatibility with
1937 C++, where <tt class="literal">delete</tt> is a reserved
1938 keyword.)</em></p></li>
1940 <li><p>Within the directory <tt class="filename">dir4</tt> (whose
1941 ancestry is <tt class="filename">/dir1/dir2</tt>), the client removes a
1942 file: <tt class="literal">editor-&gt;delete_item("file3",
1943 dir4_baton);</tt></p></li>
1945 <li><p>The client is now finished with both
1946 <tt class="filename">dir4</tt>, as well as its
1947 parent <tt class="filename">dir1</tt>:
1948 <tt class="literal">editor-&gt;close_dir(dir4_baton);</tt>
1949 <tt class="literal">editor-&gt;close_dir(dir1_baton);</tt></p></li>
1951 <li><p>The entire tree-delta is complete. The repository knows
1952 this when the root directory is closed:
1953 <tt class="literal">editor-&gt;close_dir(root_baton);</tt></p></li>
1955 </ol>
1957 <p>Of course, at any point above, the repository may reject an edit. If
1958 this is the case, the client aborts the transmission and the repository
1959 hasn't changed a bit. (Thank goodness for transactions!)</p>
1961 <p>Note, however, that this "editor interface" works in the other
1962 direction as well. When the repository wishes to update a client's
1963 working copy, it is the <em>client's</em> reponsibility to
1964 give a custom editor-object to the server, and the
1965 <em>server</em> is the editor-driver.</p>
1967 <p>Here are the main advantages of this interface:</p>
1969 <ul>
1970 <li><p><em>Consistency</em>. Tree-deltas move
1971 across the network, in both directions, using the same
1972 interface.</p></li>
1973 <li><p><em>Flexibility</em>. Custom
1974 editor-implementations can be written to do anything one might want;
1975 the editor-driver has no idea what is happening on the other side of
1976 the interface. For example, an editor might
1977 </p><ul>
1978 <li><p>Output XML that matches the tree-delta DTD
1979 above;</p></li>
1980 <li><p>Output human-readable descriptions of the edits
1981 taking place;</p></li>
1982 <li><p>Modify a filesystem</p></li>
1983 </ul><p>
1984 </p></li>
1985 </ul>
1987 <p>Whatever the case, it's easy to "swap" editors around, and make
1988 client and server do new and interesting things.</p>
1989 </div> <!-- deltas.serializing-via-editor (h3) -->
1990 </div> <!-- deltas (h2) -->
1992 <div class="h2" id="client" title="#client">
1993 <h2>Client &mdash; How the client works</h2>
1997 <p>The Subversion client is built on three libraries. One operates
1998 strictly on the working copy and does not talk to the repository.
1999 Another talks to the repository but never changes the working copy. The
2000 third library uses the first two to provide operations such as
2001 <tt class="literal">commit</tt> and <tt class="literal">update</tt> &ndash;
2002 operations which need to both talk to the repository and change the
2003 working copy.</p>
2005 <p>The initial client is a Unix-style command-line tool (like standard
2006 CVS), but it should be easy to write a GUI client as well, based on the
2007 same libraries. The libraries capture the core Subversion functionality,
2008 segregating it from user interface concerns.</p>
2010 <p>This chapter describes the libraries, and the physical layout of
2011 working copies.</p>
2014 <div class="h3" id="client.wc" title="#client.wc">
2015 <h3>Working copies and the working copy library</h3>
2018 <p>Working copies are client-side directory trees containing both
2019 versioned data and Subversion administrative files. The functions in the
2020 working copy management library are the only functions in Subversion
2021 which operate on these trees.</p>
2023 <div class="h4" id="client.wc.layout" title="#client.wc.layout">
2024 <h4>The layout of working copies</h4>
2027 <p>This section gives an overview of how
2028 working copies are arranged physically, but is not a full specification
2029 of working copy layout.</p>
2031 <p>As with CVS, Subversion working copies are simply directory trees
2032 with special administrative subdirectories, in this case named ".svn"
2033 instead of "CVS":</p>
2035 <pre>
2036 myproj
2037 / | \
2038 _____________/ | \______________
2039 / | \
2040 .svn src doc
2041 ___/ | \___ /|\ ___/ \___
2042 | | | / | \ | |
2043 base ... ... / | \ myproj.texi .svn
2044 / | \ ___/ | \___
2045 ____/ | \____ | | |
2046 | | | base ... ...
2047 .svn foo.c bar.c |
2048 ___/ | \___ |
2049 | | | |
2050 base ... ... myproj.texi
2051 ___/ \___
2053 foo.c bar.c
2055 </pre>
2057 <p>Each <tt class="filename">dir/.svn/</tt> directory records the files in
2058 <tt class="filename">dir</tt>, their revision numbers and property lists,
2059 pristine revisions of all the files (for client-side delta generation),
2060 the repository from which <tt class="filename">dir</tt> came, and any local
2061 changes (such as uncommitted adds, deletes, and renames) that affect
2062 <tt class="filename">dir</tt>.</p>
2064 <p>Although it would often be possible to deduce certain information
2065 (such as the original repository) by examining parent directories, this
2066 is avoided in favor of making each directory be as much a
2067 self-contained unit as possible.</p>
2069 <p>For example, immediately after a checkout the administrative
2070 information for the entire working tree <em>could</em> be
2071 stored in one top-level file. But subdirectories instead keep track of
2072 their own revision information. This would be necessary anyway once
2073 the user starts committing new revisions for particular files, and it
2074 also makes it easier for the user to prune a big, complete tree into a
2075 small subtree and still have a valid working copy.</p>
2077 <p>The <tt class="filename">.svn</tt> subdir contains:</p>
2079 <ul>
2080 <li><p>A <tt class="filename">format</tt> file, which indicates
2081 which version of the working copy adm format this is (so future
2082 clients can be backwards compatible easily).</p></li>
2084 <li><p>A <tt class="filename">text-base</tt> directory,
2085 containing the pristine repository revisions of the files in the
2086 corresponding working directory</p></li>
2088 <li><p>An <tt class="filename">entries</tt> file, which holds
2089 revision numbers and other information for this directory and its
2090 files, and records the presence of subdirs. It also contains the
2091 repository URLs that each file and directory came from. It may
2092 help to think of this file as the functional equivalent of the
2093 <tt class="filename">CVS/Entries</tt> file.</p></li>
2095 <li><p>A <tt class="filename">props</tt> directory, containing
2096 property names and values for each file in the working
2097 directory.</p></li>
2099 <li><p>A <tt class="filename">prop-base</tt> directory,
2100 containing pristine property names and values for each file in
2101 the working directory.</p></li>
2103 <li><p>A <tt class="filename">dir-props</tt> file, recording
2104 properties for this directory.</p></li>
2106 <li><p>A <tt class="filename">dir-prop-base</tt> file, recording
2107 pristine properties for this directory.</p></li>
2109 <li><p>A <tt class="filename">lock</tt> file, whose presence
2110 implies that some client is currently operating on the
2111 administrative area.</p></li>
2113 <li><p>A <tt class="filename">tmp</tt> directory, for holding
2114 scratch-work and helping make working copy operations more
2115 crash-proof.</p></li>
2117 <li><p>A <tt class="filename">log</tt> file. If present,
2118 indicates a list of actions that need to be taken to complete a
2119 working-copy-operation that is still "in
2120 progress".</p></li>
2121 </ul>
2123 <p>You can read much more about these files in the file
2124 <tt class="filename">subversion/libsvn_wc/README</tt>.</p>
2125 </div> <!-- client.wc.layout (h4) -->
2127 <div class="h4" id="client.wc.library" title="#client.wc.library">
2128 <h4>The working copy management library</h4>
2131 <ul>
2132 <li><p><strong>Requires:</strong>
2133 </p><ul>
2134 <li><p>a working copy</p></li>
2135 </ul><p>
2136 </p></li>
2137 <li><p><strong>Provides:</strong>
2138 </p><ul>
2139 <li><p>ability to manipulate the working copy's versioned
2140 data</p></li>
2141 <li><p>ability to manipulate the working copy's
2142 administrative files</p></li>
2143 </ul><p>
2144 </p></li>
2145 </ul>
2147 <p>This library performs "offline" operations on the working copy, and
2148 lives in <tt class="filename">subversion/libsvn_wc/</tt>.</p>
2150 <p>The API for <em class="replaceable">libsvn_wc</em> is always
2151 evolving; please read the header file for a detailed description:
2152 <tt class="filename">subversion/include/svn_wc.h</tt>.</p>
2153 </div> <!-- client.wc.library (h4) -->
2154 </div> <!-- client.wc (h3) -->
2156 <div class="h3" id="client.libsvn_ra" title="#client.libsvn_ra">
2157 <h3>The repository access library</h3>
2160 <ul>
2161 <li><p><strong>Requires:</strong>
2162 </p><ul>
2163 <li><p>network access to a Subversion
2164 server</p></li>
2165 </ul><p>
2166 </p></li>
2167 <li><p><strong>Provides:</strong>
2168 </p><ul>
2169 <li><p>the ability to interact with a
2170 repository</p></li>
2171 </ul><p>
2172 </p></li>
2173 </ul>
2175 <p>This library performs operations involving communication with the
2176 repository.</p>
2178 <p>The interface defined in
2179 <tt class="filename">subversion/include/svn_ra.h</tt> provides a uniform
2180 interface to both local and remote repository access.</p>
2182 <p>Specifically, <em class="replaceable">libsvn_ra_dav</em> will provide
2183 this interface and speak to repositories using DAV requests. At some
2184 future point, another library <em class="replaceable">libsvn_ra_local</em>
2185 will provide the same interface &ndash; but will link directly to the
2186 filesystem library for accessing local disk repositories.</p>
2187 </div> <!-- client.libsvn_ra (h3) -->
2189 <div class="h3" id="client.libsvn_client" title="#client.libsvn_client">
2190 <h3>The client operation library</h3>
2193 <ul>
2194 <li><p><strong>Requires:</strong>
2195 </p><ul>
2196 <li><p>the working copy management library</p></li>
2197 <li><p>a repository access library</p></li>
2198 </ul><p>
2199 </p></li>
2200 <li><p><strong>Provides:</strong>
2201 </p><ul>
2202 <li><p>all client-side Subversion commands</p></li>
2203 </ul><p>
2204 </p></li>
2205 </ul>
2207 <p>These functions correspond to user-level client commands. In theory,
2208 any client interface (command-line, GUI, emacs, Python, etc.) should be
2209 able to link to <em class="replaceable">libsvn_client</em> and have the
2210 ability to act as a full-featured Subversion client.</p>
2212 <p>Again, the detailed API can be found in
2213 <tt class="filename">subversion/include/svn_client.h</tt>.</p>
2214 </div> <!-- client.libsvn_client (h3) -->
2215 </div> <!-- client (h2) -->
2217 <div class="h2" id="protocol" title="#protocol">
2218 <h2>Protocol &mdash; How the client and server communicate</h2>
2222 <p>The wire protocol is the connection between the servers, and the
2223 client-side <em>Repository Access (RA) API</em>, provided by
2224 <tt class="literal">libsvn_ra</tt>. Note that <tt class="literal">libsvn_ra</tt> is
2225 in fact only a plugin manager, which delegates the actual task of
2226 communicating with a server to one of a selection of back-end modules (the
2227 <tt class="literal">libsvn_ra_*</tt> libraries). Therefore, there is not just
2228 one Subversion protocol - in fact, at present, there are two:</p>
2230 <ul>
2231 <li><p>The HTTP/WebDAV/DeltaV based protocol, implemented by the
2232 <tt class="literal">mod_dav_svn</tt> Apache 2 server module, and by two
2233 independent RA modules, <tt class="literal">libsvn_ra_dav</tt> and
2234 <tt class="literal">libsvn_ra_serf</tt>.</p></li>
2236 <li><p>The custom-designed protocol built directly upon TCP,
2237 implemented by the <tt class="literal">svnserve</tt> server, and the
2238 <tt class="literal">libsvn_ra_svn</tt> RA module.</p></li>
2239 </ul>
2242 <div class="h3" id="protocol.webdav" title="#protocol.webdav">
2243 <h3>The HTTP/WebDAV/DeltaV based protocol</h3>
2246 <p>The Subversion client library <tt class="literal">libsvn_ra_dav</tt> uses
2247 the <em>Neon</em> library to generate WebDAV DeltaV requests
2248 and sends them to a "Subversion-aware" Apache server.</p>
2250 <p>This Apache server is running <tt class="literal">mod_dav</tt> and
2251 <tt class="literal">mod_dav_svn</tt>, which translates the requests into
2252 Subversion filesystem calls.</p>
2254 <p>For more info, see <a href="#archi.network">Network Layer</a>.</p>
2256 <p>For a detailed description of exactly how Greg Stein
2257 <em class="email">gstein@lyra.org</em> is mapping the WebDAV DeltaV spec to
2258 Subversion, see his paper: <a href="http://svn.apache.org/repos/asf/subversion/trunk/notes/http-and-webdav/webdav-usage.html">http://svn.apache.org/repos/asf/subversion/trunk/notes/http-and-webdav/webdav-usage.html</a>
2259 </p>
2261 <p>For more information on WebDAV and the DeltaV extensions, see
2262 <a href="http://www.webdav.org">http://www.webdav.org</a> and
2263 <a href="http://www.webdav.org/deltav">http://www.webdav.org/deltav</a>.
2264 </p>
2266 <p>For more information on <em>Neon</em>, see
2267 <a href="http://www.webdav.org/neon">http://www.webdav.org/neon</a>.</p>
2268 </div> <!-- protocol.webdav (h3) -->
2270 <div class="h3" id="protocol.svn" title="#protocol.svn">
2271 <h3>The custom protocol</h3>
2274 <p>The client library <tt class="literal">libsvn_ra_svn</tt> and standalone
2275 server program <tt class="literal">svnserve</tt> implement a custom protocol
2276 over TCP. This protocol is documented at <a href="http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol">http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol</a>.</p>
2277 </div> <!-- protocol.svn (h3) -->
2278 </div> <!-- protocol (h2) -->
2280 <div class="h2" id="server" title="#server">
2281 <h2>Server &mdash; How the server works</h2>
2285 <p>The term &ldquo;server&rdquo; is ambiguous, because it has at least
2286 two different meanings: it can refer to a powerful computer which offers
2287 services to users on a network, or it can refer to a CPU process designed
2288 to receive network requests.</p>
2290 <p>In Subversion, however, the <strong class="firstterm">server</strong> is just a
2291 set of libraries that implements <strong class="firstterm">repositories</strong> and
2292 makes them available to other programs. No networking is
2293 required.</p>
2295 <p>There are two main libraries: the <strong class="firstterm">Subversion
2296 Filesystem</strong> library, and the <strong class="firstterm">Subversion
2297 Repository</strong> library.</p>
2300 <div class="h3" id="server.fs" title="#server.fs">
2301 <h3>Filesystem</h3>
2304 <div class="h4" id="server.fs.overview" title="#server.fs.overview">
2305 <h4>Filesystem Overview</h4>
2307 <ul>
2308 <li><p><strong>Requires:</strong>
2309 </p><ul>
2310 <li><p>some writable disk space</p></li>
2311 <li><p>(for now) Berkeley DB library</p></li>
2312 </ul><p>
2313 </p></li>
2314 <li><p><strong>Provides:</strong>
2315 </p><ul>
2316 <li><p>a repository for storing files</p></li>
2317 <li><p>concurrent client transactions</p></li>
2318 <li><p>enforcement of user &amp; group permissions
2319 [someday, not yet]</p></li>
2320 </ul><p>
2321 </p></li>
2322 </ul>
2323 <p>This library implements a hierarchical filesystem which supports
2324 atomic changes to directory trees, and records a complete history of
2325 the changes. In addition to recording changes to file and directory
2326 contents, the Subversion Filesystem records changes to file meta-data
2327 (see discussion of <strong class="firstterm">properties</strong> in <a href="#model">Model &mdash; The versioning model used by Subversion</a>).</p>
2328 </div> <!-- server.fs.overview (h4) -->
2330 <div class="h4" id="server.fs.api" title="#server.fs.api">
2331 <h4>API</h4>
2334 <p> There are two main files that describe the Subversion
2335 filesystem.</p>
2337 <p>First, read the section below (<a href="#server.fs.struct">Repository Structure</a>)
2338 for a general overview of how the filesystem works.</p>
2340 <p>Once you've done this, read Jim Blandy's own structural overview,
2341 which explains how nodes and revisions are organized (among other
2342 things) in the filesystem implementation:
2343 <tt class="filename">subversion/libsvn_fs_base/notes/structure</tt>.
2344 (Some details in that document are specific to the BDB-based
2345 filesystem implementation. Details specific to FSFS are recorded in
2346 <tt class="filename">subversion/libsvn_fs_fs/structure</tt>.)</p>
2348 <p>Finally, read the well-documented API in
2349 <tt class="filename">subversion/include/svn_fs.h</tt>.</p>
2350 </div> <!-- server.fs.api (h4) -->
2352 <div class="h4" id="server.fs.struct" title="#server.fs.struct">
2353 <h4>Repository Structure</h4>
2356 <div class="h5" id="server.fs.struct.schema">
2357 <h5>Schema</h5>
2361 To begin, please be sure that you're already casually familiar with
2362 Subversion's ideas of files, directories, and revision histories. If
2363 not, see <a href="#model">Model &mdash; The versioning model used by Subversion</a>. We can now offer precise,
2364 technical descriptions of the terms introduced there.</p>
2366 <!-- This is taken from jimb's very first Subversion spec! -->
2368 <pre>
2369 A <strong class="firstterm">text string</strong> is a string of Unicode characters which is
2370 canonically decomposed and ordered, according to the rules described in the
2371 Unicode standard.
2373 A <strong class="firstterm">string of bytes</strong> is what you'd expect.
2375 A <strong class="firstterm">property list</strong> is an unordered list of properties. A
2376 <strong class="firstterm">property</strong> is a pair
2377 <tt class="literal">(<em class="replaceable">name</em>,
2378 <em class="replaceable">value</em>)</tt>, where
2379 <em class="replaceable">name</em> is a text string, and
2380 <em class="replaceable">value</em> is a string of bytes. No two properties in a
2381 property list have the same name.
2383 A <strong class="firstterm">file</strong> is a property list and a string of bytes.
2385 A <strong class="firstterm">node</strong> is either a file or a directory. (We define a
2386 directory below.) Nodes are distinguished unions &mdash; you can always tell
2387 whether a node is a file or a directory.
2389 A <strong class="firstterm">node table</strong> is an array mapping some set of positive
2390 integers, called <strong class="firstterm">node numbers</strong>, onto
2391 <strong class="firstterm">nodes</strong>. If a node table maps some number
2392 <em class="replaceable">i</em> to some node <em class="replaceable">n</em>, then
2393 <em class="replaceable">i</em> is a <strong class="firstterm">valid node number</strong> in
2394 that table, and <strong class="firstterm">node</strong> <em class="replaceable">i</em>is
2395 <em class="replaceable">n</em>. Otherwise, <em class="replaceable">i</em> is an
2396 <strong class="firstterm">invalid node number</strong> in that table.
2398 A <strong class="firstterm">directory entry</strong> is a triple
2399 <tt class="literal">(<em class="replaceable">name</em>, <em class="replaceable">props</em>,
2400 <em class="replaceable">node</em>)</tt>, where
2401 <em class="replaceable">name</em> is a text string,
2402 <em class="replaceable">props</em> is a property list, and
2403 <em class="replaceable">node</em> is a node number.
2405 A <strong class="firstterm">directory</strong> is an unordered list of directory entries,
2406 and a property list.
2408 A <strong class="firstterm">revision</strong> is a node number and a property list.
2410 A <strong class="firstterm">history</strong> is an array of revisions, indexed by a
2411 contiguous range of non-negative integers containing 0.
2413 A <strong class="firstterm">repository</strong> consists of node table and a history.
2415 </pre>
2417 <!-- Some definitions: we say that a node @var{n} is a @dfn{direct
2418 child} of a directory @var{d} iff @var{d} contains a directory entry
2419 whose node number is @var{n}. A node @var{n} is a @dfn{child} of a
2420 directory @var{d} iff @var{n} is a direct child of @var{d}, or if there
2421 exists some directory @var{e} which is a direct child of @var{d}, and
2422 @var{n} is a child of @var{e}. Given this definition of ``direct
2423 child'' and ``child,'' the obvious definitions of ``direct parent'' and
2424 ``parent'' hold.
2426 In these restrictions, let @var{r} be any repository. When we refer,
2427 implicitly or explicitly, to a node table without further
2428 clarification, we mean @var{r}'s node table. Thus, if we refer to ``a
2429 valid node number'' without specifying the node table in which it is
2430 valid, we mean ``a valid node number in @var{r}'s node table''.
2431 Similarly for @var{r}'s history. -->
2433 <p>Now that we've explained the form of the data, we make some
2434 restrictions on that form.</p>
2436 <p><strong>Every revision has a root
2437 directory.</strong> Every revision's node number is a valid node
2438 number, and the node it refers to is always a directory. We call
2439 this the revision's <strong class="firstterm">root directory</strong>.</p>
2441 <p><strong>Revision 0 always contains an empty root
2442 directory.</strong> This baseline makes it easy to check out
2443 whole projects from the repository.</p>
2445 <p><strong>Directories contain only valid
2446 links.</strong> Every directory entry's
2447 <em class="replaceable">node</em> is a valid node number.</p>
2449 <p><strong>Directory entries can be identified by
2450 name.</strong> For any directory <em class="replaceable">d</em>,
2451 every directory entry in <em class="replaceable">d</em> has a distinct
2452 name.</p>
2454 <p><strong>There are no cycles of
2455 directories.</strong> No node is its own child.</p>
2457 <p><strong>Directories can have more than one
2458 parent.</strong> The Unix file system does not allow more than
2459 one hard link to a directory, but Subversion does allow the analogous
2460 situation. Thus, the directories in a Subversion repository form a
2461 directed acyclic graph (<strong class="firstterm">DAG</strong>), not a tree.
2462 However, it would be distracting and unhelpful to replace the
2463 familiar term &ldquo;directory tree&rdquo; with the unfamiliar term
2464 &ldquo;directory DAG&rdquo;, so we still call it a &ldquo;directory
2465 tree&rdquo; here.</p>
2467 <p><strong>There are no dead nodes.</strong> Every
2468 node is a child of some revision's root directory.</p>
2470 <!-- </jimb> -->
2471 </div> <!-- server.fs.struct.schema (h5) -->
2473 <div class="h5" id="server.fs.struct.bubble-up">
2474 <h5>Bubble-Up Method</h5>
2477 <p>This section provides a conversational explanation of how the
2478 repository actually stores and revisions file trees. It's not
2479 critical knowledge for a programmer using the Subversion Filesystem
2480 API, but most people probably still want to know what's going on
2481 &ldquo;under the hood&rdquo; of the repository.</p>
2483 <p>Suppose we have a new project, at revision 1, looking like this
2484 (using CVS syntax):</p>
2486 <pre>
2487 prompt$ svn checkout myproj
2488 U myproj/
2489 U myproj/B
2490 U myproj/A
2491 U myproj/A/fish
2492 U myproj/A/fish/tuna
2493 prompt$
2494 </pre>
2496 <p>Only the file <tt class="filename">tuna</tt> is a regular file,
2497 everything else in myproj is a directory.</p>
2499 <p>Let's see what this looks like as an abstract data structure in
2500 the repository, and how that structure works in various operations
2501 (such as update, commit, and branch).</p>
2503 <p>In the diagrams that follow, lines represent parent-to-child
2504 connections in a directory hierarchy. Boxes are "nodes". A node is
2505 either a file or a directory &ndash; a letter in the upper left
2506 indicates which kind. A file node has a byte-string for its content,
2507 whereas directory nodes have a list of dir_entries, each pointing to
2508 another node.</p>
2510 <p>Parent-child links go both ways (i.e., a child knows who all its
2511 parents are), but a node's name is stored only in its parent, because
2512 a node with multiple parents may have different names in different
2513 parents.</p>
2515 <p>At the top of the repository is an array of revision numbers,
2516 stretching off to infinity. Since the project is at revision 1, only
2517 index 1 points to anything; it points to the root node of revision 1
2518 of the project:</p>
2520 <pre>
2521 ( myproj's revision array )
2522 ______________________________________________________
2523 |___1_______2________3________4________5_________6_____...
2526 ___|_____
2527 |D |
2529 | A | /* Two dir_entries, `A' and `B'. */
2530 | \ |
2531 | B \ |
2532 |__/___\__|
2536 ___|___ ___\____
2537 |D | |D |
2538 | | | |
2539 | | | fish | /* One dir_entry, `fish'. */
2540 |_______| |___\____|
2543 ___\____
2544 |D |
2546 | tuna | /* One dir_entry, `tuna'. */
2547 |___\____|
2550 ___\____
2551 |F |
2553 | | /* (Contents of tuna not shown.) */
2554 |________|
2556 </pre>
2558 <p>What happens when we modify <tt class="filename">tuna</tt> and commit?
2559 First, we make a new <tt class="filename">tuna</tt> node, containing the
2560 latest text. The new node is not connected to anything yet, it's
2561 just hanging out there in space:</p>
2563 <pre>
2564 ________
2565 |F |
2568 |________|
2569 </pre>
2571 <p>Next, we create a <em>new</em> revision of its parent
2572 directory:</p>
2574 <pre>
2575 ________
2576 |D |
2578 | tuna |
2579 |___\____|
2582 ___\____
2583 |F |
2586 |________|
2587 </pre>
2589 <p>We continue up the line, creating a new revision of the next
2590 parent directory:</p>
2592 <pre>
2593 ________
2594 |D |
2596 | fish |
2597 |___\____|
2600 ___\____
2601 |D |
2603 | tuna |
2604 |___\____|
2607 ___\____
2608 |F |
2611 |________|
2612 </pre>
2614 <p>Now it gets more tricky: we need to create a new revision of the
2615 root directory. This new root directory needs an entry to point to
2616 the &ldquo;new&rdquo; directory A, but directory B hasn't changed at
2617 all. Therefore, our new root directory also has an entry that still
2618 points to the <em>old</em> directory B node!</p>
2620 <pre>
2621 ______________________________________________________
2622 |___1_______2________3________4________5_________6_____...
2625 ___|_____ ________
2626 |D | |D |
2627 | | | |
2628 | A | | A |
2629 | \ | | \ |
2630 | B \ | | B \ |
2631 |__/___\__| |__/___\_|
2632 / \ / \
2633 | ___\_____________/ \
2634 | / \ \
2635 ___|__/ ___\____ ___\____
2636 |D | |D | |D |
2637 | | | | | |
2638 | | | fish | | fish |
2639 |_______| |___\____| |___\____|
2642 ___\____ ___\____
2643 |D | |D |
2644 | | | |
2645 | tuna | | tuna |
2646 |___\____| |___\____|
2649 ___\____ ___\____
2650 |F | |F |
2651 | | | |
2652 | | | |
2653 |________| |________|
2655 </pre>
2657 <p>Finally, after all our new nodes are written, we finish the
2658 &ldquo;bubble up&rdquo; process by linking this new tree to the next
2659 available revision in the history array. In this case, the new tree
2660 becomes revision 2 in the repository.</p>
2662 <pre>
2663 ______________________________________________________
2664 |___1_______2________3________4________5_________6_____...
2666 | \__________
2667 ___|_____ __\_____
2668 |D | |D |
2669 | | | |
2670 | A | | A |
2671 | \ | | \ |
2672 | B \ | | B \ |
2673 |__/___\__| |__/___\_|
2674 / \ / \
2675 | ___\_____________/ \
2676 | / \ \
2677 ___|__/ ___\____ ___\____
2678 |D | |D | |D |
2679 | | | | | |
2680 | | | fish | | fish |
2681 |_______| |___\____| |___\____|
2684 ___\____ ___\____
2685 |D | |D |
2686 | | | |
2687 | tuna | | tuna |
2688 |___\____| |___\____|
2691 ___\____ ___\____
2692 |F | |F |
2693 | | | |
2694 | | | |
2695 |________| |________|
2697 </pre>
2699 <p>Generalizing on this example, you can now see that each
2700 &ldquo;revision&rdquo; in the repository history represents a root
2701 node of a unique tree (and an atomic commit to the whole filesystem.)
2702 There are many trees in the repository, and many of them share
2703 nodes.</p>
2705 <p>Many nice behaviors come from this model:</p>
2707 <ol>
2708 <li><p><strong>Easy reads.</strong> If a
2709 filesystem reader wants to locate revision
2710 <em class="replaceable">X</em> of file <tt class="filename">foo.c</tt>,
2711 it need only traverse the repository's history, locate revision
2712 <em class="replaceable">X</em>'s root node, then walk down the tree
2713 to <tt class="filename">foo.c</tt>.</p></li>
2715 <li><p><strong>Writers don't interfere with
2716 readers.</strong> Writers can continue to create new nodes,
2717 bubbling their way up to the top, and concurrent readers cannot
2718 see the work in progress. The new tree only becomes visible to
2719 readers after the writer makes its final &ldquo;link&rdquo; to
2720 the repository's history.</p></li>
2722 <li><p><strong>File structure is
2723 versioned.</strong> Unlike CVS, the very structure of each
2724 tree is being saved from revision to revision. File and
2725 directory renames, additions, and deletions are part of the
2726 repository's history.</p></li>
2727 </ol>
2729 <p>Let's demonstrate the last point by renaming the
2730 <tt class="filename">tuna</tt> to <tt class="filename">book</tt>.</p>
2732 <p>We start by creating a new parent &ldquo;fish&rdquo; directory,
2733 except that this parent directory has a different dir_entry, one
2734 which points the <em>same</em> old file node, but has a
2735 different name:</p>
2737 <pre>
2738 ______________________________________________________
2739 |___1_______2________3________4________5_________6_____...
2741 | \__________
2742 ___|_____ __\_____
2743 |D | |D |
2744 | | | |
2745 | A | | A |
2746 | \ | | \ |
2747 | B \ | | B \ |
2748 |__/___\__| |__/___\_|
2749 / \ / \
2750 | ___\_____________/ \
2751 | / \ \
2752 ___|__/ ___\____ ___\____
2753 |D | |D | |D |
2754 | | | | | |
2755 | | | fish | | fish |
2756 |_______| |___\____| |___\____|
2759 ___\____ ___\____ ________
2760 |D | |D | |D |
2761 | | | | | |
2762 | tuna | | tuna | | book |
2763 |___\____| |___\____| |_/______|
2764 \ \ /
2765 \ \ /
2766 ___\____ ___\____ /
2767 |F | |F |
2768 | | | |
2769 | | | |
2770 |________| |________|
2771 </pre>
2773 <p>From here, we finish with the bubble-up process. We make new
2774 parent directories up to the top, culminating in a new root directory
2775 with two dir_entries (one points to the old &ldquo;B&rdquo; directory
2776 node we've had all along, the other to the new revision of
2777 &ldquo;A&rdquo;), and finally link the new tree to the history as
2778 revision 3:</p>
2780 <pre>
2781 ______________________________________________________
2782 |___1_______2________3________4________5_________6_____...
2783 | \ \_________________
2784 | \__________ \
2785 ___|_____ __\_____ __\_____
2786 |D | |D | |D |
2787 | | | | | |
2788 | A | | A | | A |
2789 | \ | | \ | | \ |
2790 | B \ | | B \ | | B \ |
2791 |__/___\__| |__/___\_| |__/___\_|
2792 / ___________________/_____\_________/ \
2793 | / ___\_____________/ \ \
2794 | / / \ \ \
2795 ___|/_/ ___\____ ___\____ _____\__
2796 |D | |D | |D | |D |
2797 | | | | | | | |
2798 | | | fish | | fish | | fish |
2799 |_______| |___\____| |___\____| |___\____|
2800 \ \ \
2801 \ \ \
2802 ___\____ ___\____ ___\____
2803 |D | |D | |D |
2804 | | | | | |
2805 | tuna | | tuna | | book |
2806 |___\____| |___\____| |_/______|
2807 \ \ /
2808 \ \ /
2809 ___\____ ___\____ /
2810 |F | |F |
2811 | | | |
2812 | | | |
2813 |________| |________|
2815 </pre>
2817 <p>For our last example, we'll demonstrate the way
2818 &ldquo;tags&rdquo; and &ldquo;branches&rdquo; are implemented in the
2819 repository.</p>
2821 <p>In a nutshell, they're one and the same thing. Because nodes are
2822 so easily shared, we simply create a <em>new</em>
2823 directory entry that points to an existing directory node. It's an
2824 extremely cheap way of copying a tree; we call this new entry a
2825 <strong class="firstterm">clone</strong>, or more colloquially, a &ldquo;cheap
2826 copy&rdquo;.</p>
2828 <p>Let's go back to our original tree, assuming that we're at
2829 revision 6 to begin with:</p>
2831 <pre>
2832 ______________________________________________________
2833 ...___6_______7________8________9________10_________11_____...
2836 ___|_____
2837 |D |
2839 | A |
2840 | \ |
2841 | B \ |
2842 |__/___\__|
2846 ___|___ ___\____
2847 |D | |D |
2848 | | | |
2849 | | | fish |
2850 |_______| |___\____|
2853 ___\____
2854 |D |
2856 | tuna |
2857 |___\____|
2860 ___\____
2861 |F |
2864 |________|
2866 </pre>
2868 <p>Let's &ldquo;tag&rdquo; directory A. To make the clone, we
2869 create a new dir_entry <strong>T</strong> in our
2870 root, pointing to A's node:</p>
2872 <pre>
2873 ______________________________________________________
2874 |___6_______7________8________9________10_________11_____...
2877 ___|_____ __\______
2878 |D | |D |
2879 | | | |
2880 | A | | A |
2881 | \ | | | |
2882 | B \ | | B | T |
2883 |__/___\__| |_/__|__|_|
2884 / \ / | |
2885 | ___\__/ / /
2886 | / \ / /
2887 ___|__/ ___\__/_ /
2888 |D | |D |
2889 | | | |
2890 | | | fish |
2891 |_______| |___\____|
2894 ___\____
2895 |D |
2897 | tuna |
2898 |___\____|
2901 ___\____
2902 |F |
2905 |________|
2907 </pre>
2909 <p>Now we're all set. In the future, the contents of directories A
2910 and B may change quite a lot. However, assuming we never make any
2911 changes to directory T, it will <em>always</em> point to
2912 a particular pristine revision of directory A at some point in time.
2913 Thus, T is a tag.</p>
2915 <p>(In theory, we can use some kind of authorization system to
2916 prevent anyone from writing to directory T. In practice, a well-laid
2917 out repository should encourage &ldquo;tag directories&rdquo; to live
2918 in one place, so that it's clear to all users that they're not meant
2919 to change.)</p>
2921 <p>However, if we <em>do</em> decide to allow commits in
2922 directory T, and now our repository tree increments to revision 8,
2923 then T becomes a branch. Specifically, it's a branch of directory A
2924 which shares history with A up to a certain point, and then
2925 &ldquo;broke off&rdquo; from the main line at revision 8.</p>
2926 </div> <!-- server.fs.struct.bubble-up (h5) -->
2928 <div class="h5" id="server.fs.struct.diffy-storage">
2929 <h5>Diffy Storage</h5>
2932 <p>You may have been thinking, &ldquo;Gee, this bubble up method
2933 seems nice, but it sure wastes a lot of space. Every commit to the
2934 repository creates an entire line of new directory
2935 nodes!&rdquo;</p>
2937 <p>Like many other revision control systems, Subversion stores
2938 changes as differences. It doesn't make complete copies of nodes;
2939 instead, it stores the <em>latest</em> revision as a full
2940 text, and previous revisions as a succession of reverse diffs (the
2941 word "diff" is used loosely here &ndash; for files, it means vdeltas,
2942 for directories, it means a format that expresses changes to
2943 directories).</p>
2944 </div> <!-- server.fs.struct.diffy-storage (h5) -->
2945 </div> <!-- server.fs.struct (h4) -->
2947 <div class="h4" id="server.fs.implementation" title="#server.fs.implementation">
2948 <h4>Implementation</h4>
2951 <p>For the initial release of Subversion,</p>
2953 <ul>
2954 <li><p>The filesystem will be implemented as a library on
2955 Unix.</p></li>
2957 <li><p>The filesystem's data will probably be stored in a
2958 collection of .db files, using the Berkeley Database library.
2960 (In the future, of course, contributors are free
2961 modify the Subversion filesystem to operate with more powerful
2962 SQL database.)
2963 (For more information, see
2964 <a href="http://www.sleepycat.com">http://www.sleepycat.com</a>.)</p></li>
2965 </ul>
2966 </div> <!-- server.fs.implementation (h4) -->
2967 </div> <!-- server.fs (h3) -->
2969 <div class="h3" id="server.libsvn_repos" title="#server.libsvn_repos">
2970 <h3>Repository Library</h3>
2973 <!-- Jimb, Karl: Maybe we should turn this into a discussion about how the
2974 filesystem will use non-historical properties for internal ACLs, and how
2975 people can add "external" ACL systems via historical properties...? -->
2977 <p>A Subversion <strong class="firstterm">repository</strong> is a directory that
2978 contains a number of components:</p>
2980 <ul>
2981 <li><p>a versioned filesystem (typically a collection of .db
2982 files)</p></li>
2983 <li><p>some hook scripts (for executing before or after
2984 commits)</p></li>
2985 <li><p>a locking area (used by Berkeley DB or other
2986 processes)</p></li>
2987 <li><p>a configuration area (for changing global
2988 behaviors)</p></li>
2989 </ul>
2991 <p>The Subversion filesystem is just that: a filesystem. But it's also
2992 useful to provide an API that acts at the level of the repository. The
2993 repository library (<tt class="filename">libsvn_repos</tt>) does this.</p>
2995 <p>In particular, it wraps a few <tt class="filename">libsvn_fs</tt>
2996 routines, such as those for beginning and ending commits, so that
2997 hook-scripts can run. A pre-commit-hook script might check for a valid
2998 log message, and a post-commit-hook script might send an email to a
2999 mailing list.</p>
3001 <p>Additionally, the repository library provides convenience routines
3002 for examining and manipulating the filesystem. For example, a routine to
3003 generate a tree-delta by comparing two revisions, routines for
3004 constructing new transactions, routines for querying log messages, and
3005 routines for exporting and importing filesystem data.</p>
3006 </div> <!-- server.libsvn_repos (h3) -->
3007 </div> <!-- server (h2) -->
3009 <div class="h2" id="license" title="#license">
3010 <h2>License &mdash; Copyright</h2>
3014 <p>Copyright &copy; 2000-2008 Collab.Net. All rights reserved.</p>
3016 <p>This software is licensed as described in the file
3017 <tt class="filename">COPYING</tt>, which you should have received as part of
3018 this distribution. The terms are also available at
3019 <a href="http://subversion.tigris.org/license-1.html">http://subversion.tigris.org/license-1.html</a>. If newer
3020 versions of this license are posted there, you may use a newer version
3021 instead, at your option.</p>
3023 </div> <!-- license (h2) -->
3025 </body>
3026 </html>