2 The Subversion Project: Building a Better CVS
3 ==============================================
5 Ben Collins-Sussman <sussman@collab.net>
8 Published in Linux Journal, January 2002
13 This article discusses the history, goals, features and design of
14 Subversion (http://subversion.tigris.org), an open-source project that
15 aims to produce a compelling replacement for CVS.
21 If you work on any kind of open-source project, you've probably worked
22 with CVS. You probably remember the first time you learned to do an
23 anonymous checkout of a source tree over the net -- or your first
24 commit, or learning how to look at CVS diffs. And then the fateful
25 day came: you asked your friend how to rename a file.
27 "You can't", was the reply.
29 What? What do you mean?
31 "Well, you can delete the file from the repository and then re-add it
34 Yes, but then nobody would know it had been renamed...
36 "Let's call the CVS administrator. She can hand-edit the repository's
37 RCS files for us and possibly make things work."
41 "And by the way, don't try to delete a directory either."
43 You rolled your eyes and groaned. How could such simple tasks be
50 No doubt about it, CVS has evolved into the standard Software
51 Configuration Management (SCM) system of the open source community.
52 And rightly so! CVS itself is Free software, and its wonderful "non
53 locking" development model -- whereby dozens of far-flung programmers
54 collaborate -- fits the open-source world very well. In fact, one
55 might argue that without CVS, it's doubtful whether sites like
56 Freshmeat or Sourceforge would ever have flourished as they do now.
57 CVS and its semi-chaotic development model have become an essential
58 part of open source culture.
60 So what's wrong with CVS?
62 Because it uses the RCS storage-system under the hood, CVS can only
63 track file contents, not tree structures. As a result, the user has
64 no way to copy, move, or rename items without losing history. Tree
65 rearrangements are always ugly server-side tweaks.
67 The RCS back-end cannot store binary files efficiently, and branching
68 and tagging operations can grow to be very slow. CVS also uses the
69 network inefficiently; many users are annoyed by long waits, because
70 file differeces are sent in only one direction (from server to client,
71 but not from client to server), and binary files are always
72 transmitted in their entirety.
74 From a developer's standpoint, the CVS codebase is the result of
75 layers upon layers of historical "hacks". (Remember that CVS began
76 life as a collection of shell-scripts to drive RCS.) This makes the
77 code difficult to understand, maintain, or extend. For example: CVS's
78 networking ability was essentially "stapled on". It was never
79 designed to be a native client-server system.
81 Rectifying CVS's problems is a huge task -- and we've only listed just
82 a few of the many common complaints here.
88 In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a company
89 for commercially supporting and improving CVS. Cyclic made the first
90 public release of a network-enabled CVS (contributed by Cygnus
91 software.) In 1999, Karl Fogel published a book about CVS and the
92 open-source development model it enables (cvsbook.red-bean.com). Karl
93 and Jim had long talked about writing a replacement for CVS; Jim had
94 even drafted a new, theoretical repository design. Finally, in
95 February of 2000, Brian Behlendorf of CollabNet (www.collab.net)
96 offered Karl a full-time job to write a CVS replacement. Karl
97 gathered a team together and work began in May.
99 The team settled on a few simple goals: it was decided that Subversion
100 would be designed as a functional replacement for CVS. It would do
101 everything that CVS does -- preserving the same development model
102 while fixing the flaws in CVS's (lack-of) design. Existing CVS users
103 would be the target audience: any CVS user should be able to start
104 using Subversion with little effort. Any other SCM "bonus features"
105 were decided to be of secondary importance (at least before a 1.0
108 At the time of writing, the original team has been coding for a little
109 over a year, and we have a number of excellent volunteer contributors.
110 (Subversion, like CVS, is a open-source project!)
113 Subversion's Features
114 ----------------------
116 Here's a quick run-down of some of the reasons you should be excited
119 * Real copies and renames. The Subversion repository doesn't use
120 RCS files at all; instead, it implements a 'virtual' versioned
121 filesystem that tracks tree-structures over time (described
122 below). Files *and* directories are versioned. At last, there
123 are real client-side `mv' and `cp' commands that behave just as
126 * Atomic commits. A commit either goes into the repository
127 completely, or not all.
129 * Advanced network layer. The Subversion network server is Apache,
130 and client and server speak WebDAV(2) to one another. (See the
131 'design' section below.)
133 * Faster network access. A binary diffing algorithm is used to
134 store and transmit deltas in *both* directions, regardless of
135 whether a file is of text or binary type.
137 * Filesystem "properties". Each file or directory has an invisible
138 hashtable attached. You can invent and store any arbitrary
139 key/value pairs you wish: owner, perms, icons, app-creator,
140 mime-type, personal notes, etc. This is a general-purpose feature
141 for users. Properties are versioned, just like file contents.
142 And some properties are auto-detected, like the mime-type of a
143 file (no more remembering to use the '-kb' switch!)
145 * Extensible and hackable. Subversion has no historical baggage; it
146 was designed and then implemented as a collection of shared C
147 libraries with well-defined APIs. This makes Subversion extremely
148 maintainable and usable by other applications and languages.
150 * Easy migration. The Subversion command-line client is very
151 similar to CVS; the development model is the same, so CVS users
152 should have little trouble making the switch. Development of a
153 'cvs2svn' repository converter is in progress.
155 * It's Free. Subversion is released under a Apache/BSD-style
162 Subversion has a modular design; it's implemented as a collection of C
163 libraries. Each layer has a well-defined purpose and interface. In
164 general, code flow begins at the top of the diagram and flows
165 "downward" -- each layer provides an interface to the layer above it.
167 <<insert diagram here: svn.tiff>>
170 Let's take a short tour of these layers, starting at the bottom.
173 --> The Subversion filesystem.
175 The Subversion Filesystem is *not* a kernel-level filesystem that one
176 would install in an operating system (like the Linux ext2 fs.)
177 Instead, it refers to the design of Subversion's repository. The
178 repository is built on top of a database -- currently Berkeley DB --
179 and thus is a collection of .db files. However, a library accesses
180 these files and exports a C API that simulates a filesystem --
181 specifically, a "versioned" filesystem.
183 This means that writing a program to access the repository is like
184 writing against other filesystem APIs: you can open files and
185 directories for reading and writing as usual. The main difference is
186 that this particular filesystem never loses data when written to; old
187 versions of files and directories are always saved as historical
190 Whereas CVS's backend (RCS) stores revision numbers on a per-file
191 basis, Subversion numbers entire trees. Each atomic 'commit' to the
192 repository creates a completely new filesystem tree, and is
193 individually labeled with a single, global revision number. Files and
194 directories which have changed are rewritten (and older versions are
195 backed up and stored as differences against the latest version), while
196 unchanged entries are pointed to via a shared-storage mechanism. This
197 is how the repository is able to version tree structures, not just
200 Finally, it should be mentioned that using a database like Berkeley DB
201 immediately provides other nice features that Subversion needs: data
202 integrity, atomic writes, recoverability, and hot backups. (See
203 www.sleepycat.com for more information.)
206 --> The network layer.
208 Subversion has the mark of Apache all over it. At its very core, the
209 client uses the Apache Portable Runtime (APR) library. (In fact, this
210 means that Subversion client should compile and run anywhere Apache
211 httpd does -- right now, this list includes all flavors of Unix,
212 Win32, BeOS, OS/2, Mac OS X, and possibly Netware.)
214 However, Subversion depends on more than just APR -- the Subversion
215 "server" is Apache httpd itself.
217 Why was Apache chosen? Ultimately, the decision was about not
218 reinventing the wheel. Apache is a time-tested, open-source server
219 process that ready for serious use, yet is still extensible. It can
220 sustain a high network load. It runs on many platforms and can
221 operate through firewalls. It's able to use a number of different
222 authentication protocols. It can do network pipelining and caching.
223 By using Apache as a server, Subversion gets all these features for
224 free. Why start from scratch?
226 Subversion uses WebDAV as its network protocol. DAV (Distributed
227 Authoring and Versioning) is a whole discussion in itself (see
228 www.webdav.org) -- but in short, it's an extension to HTTP that allows
229 reads/writes and "versioning" of files over the web. The Subversion
230 project is hoping to ride a slowly rising tide of support for this
231 protocol: all of the latest file-browsers for Win32, MacOS, and GNOME
232 speak this protocol already. Interoperability will (hopefully) become
233 more and more of a bonus over time.
235 For users who simply wish to access Subversion repositories on local
236 disk, the client can do this too; no network is required. The
237 "Repository Access" layer (RA) is an abstract API implemented by both
238 the DAV and local-access RA libraries. This is a specific benefit of
239 writing a "librarized" version control system; it's a big win over
240 CVS, which has two very different, difficult-to-maintain codepaths for
241 local vs. network repository-access. Feel like writing a new network
242 protocol for Subversion? Just write a new library that implements the
246 --> The client libraries.
248 On the client side, the Subversion "working copy" library maintains
249 administrative information within special SVN/ subdirectories, similar
250 in purpose to the CVS/ administrative directories found in CVS working
253 A glance inside the typical SVN/ directory turns up a bit more than
254 usual, however. The `entries' file contains XML which describes the
255 current state of the working copy directory (and which basically
256 serves the purposes of CVS's Entries, Root, and Repository files
257 combined). But other items present (and not found in CVS/) include
258 storage locations for the versioned "properties" (the metadata
259 mentioned in 'Subversion Features' above) and private caches of
260 pristine versions of each file. This latter feature provides the
261 ability to report local modifications -- and do reversions --
262 *without* network access. Authentication data is also stored within
263 SVN/, rather than in a single .cvspass-like file.
265 The Subversion "client" library has the broadest responsibility; its
266 job is to mingle the functionality of the working-copy library with
267 that of the repository-access library, and then to provide a
268 highest-level API to any application that wishes to perform general
269 version control actions.
271 For example: the C routine `svn_client_checkout()' takes a URL as an
272 argument. It passes this URL to the repository-access library and
273 opens an authenticated session with a particular repository. It then
274 asks the repository for a certain tree, and sends this tree into the
275 working-copy library, which then writes a full working copy to disk
276 (SVN/ directories and all.)
278 The client library is designed to be used by any application. While
279 the Subversion source code includes a standard command-line client, it
280 should be very easy to write any number of GUI clients on top of the
281 client library. Hopefully, these GUIs should someday prove to be much
282 better than the current crop of CVS GUI applications (the majority of
283 which are no more than fragile "wrappers" around the CVS command-line
286 In addition, proper SWIG bindings (www.swig.org) should make
287 the Subversion API available to any number of languages: java, perl,
288 python, guile, and so on. In order to Subvert CVS, it helps to be
295 The release of Subversion 1.0 is currently planned for early 2002.
296 After the release of 1.0, Subversion is slated for additions such as
297 i18n support, "intelligent" merging, better "changeset" manipulation,
298 client-side plugins, and improved features for server administration.
299 (Also on the wishlist is an eclectic collection of ideas, such as
300 distributed, replicating repositories.)
302 A final thought from Subversion's FAQ:
304 "We aren't (yet) attempting to break new ground in SCM systems, nor
305 are we attempting to imitate all the best features of every SCM
306 system out there. We're trying to replace CVS."
308 If, in three years, Subversion is widely presumed to be the "standard"
309 SCM system in the open-source community, then the project will have
310 succeeded. But the future is still hazy: ultimately, Subversion
311 will have to win this position on its own technical merits.
319 Please visit the Subversion project website at
320 http://subversion.tigris.org. There are discussion lists to join, and
321 the source code is available via anonymous CVS -- and soon through