Follow-up to r29036: Now that the "mergeinfo" transaction file is no
[svn.git] / www / testing-goals.html
blobdca644bee308187c0c602f5165db61a443643fb9
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml">
4 <head>
5 <style type="text/css"> /* <![CDATA[ */
6 @import "branding/css/tigris.css";
7 @import "branding/css/inst.css";
8 /* ]]> */</style>
9 <link rel="stylesheet" type="text/css" media="print"
10 href="branding/css/print.css"/>
11 <script type="text/javascript" src="branding/scripts/tigris.js"></script>
12 <title>Subversion Testing Goals</title>
13 </head>
15 <body>
16 <div class="app">
18 <h2>Design goals for the SVN test suite</h2>
20 <ul>
21 <li>
22 <a href="#WHY">Why Test?</a>
23 </li>
24 <li>
25 <a href="#AUDIENCE">Audience</a>
26 </li>
27 <li>
28 <a href="#REQUIREMENTS">Requirements</a>
29 </li>
30 <li>
31 <a href="#EASEOFUSE">Ease of Use</a>
32 </li>
33 <li>
34 <a href="#LOCATION">Location</a>
35 </li>
36 <li>
37 <a href="#EXTERNAL">External dependencies</a>
38 </li>
39 </ul>
43 <h3><a name="WHY">Why Test?</a></h3>
45 <p>
46 Regression testing is an essential element of high quality software.
47 Unfortunately, some developers have not had first hand exposure to a
48 high quality testing framework. Lack of familiarity with the positive
49 effects of testing can be blamed for statements like:
50 <br/>
51 </p>
52 <blockquote>
53 <p>"I don't need to test my code, I know it works."</p>
54 </blockquote>
55 <p>
56 It is safe to say that the idea that developers do not introduce
57 bugs has been disproved.
58 </p>
61 <h3><a name="AUDIENCE">Audience</a></h3>
63 <p>
64 The test suite will be used by both developers and end users.
65 </p>
67 <p>
68 <b>Developers</b> need a test suite to help with:
69 </p>
71 <p>
72 <b><i>Fixing Bugs:</i></b>
73 <br/>
74 Each time a bug is fixed, a test case should be added to the test
75 suite. Creating a test case that reproduces a bug is a seemingly
76 obvious requirement. If a bug cannot be reproduced, there is no way to
77 be sure a given change will actually fix the problem. Once a test case
78 has been created, it can be used to validate the correctness of a
79 given patch. Adding a new test case for each bug also ensures that
80 the same bug will not be introduced again in the future.
81 </p>
83 <p>
84 <b><i>Impact Analysis:</i></b>
85 <br/>
86 A developer fixing a bug or adding a new feature needs to know if a
87 given change breaks other parts of the code. It may seem obvious, but
88 keeping a developer from introducing new bugs is one of the primary
89 benefits of a using a regression test system.
90 </p>
92 <p>
93 <b><i>Regression Analysis:</i></b>
94 <br/>
95 When a test regression occurs, a developer will need to manually
96 determine what has caused the failure. The test system is not able to
97 determine why a test case failed. The test system should simply report
98 exactly which test results changed and when the last results were
99 generated.
100 </p>
103 <b>Users</b> need a test suite to help with:
104 </p>
107 <b><i>Building:</i></b>
108 <br/>
109 Building software can be a scary process. Users that have never built
110 software may be unwilling to try. Others may have tried to build a
111 piece of software in the past, only to be thwarted by a difficult
112 build process. Even if the build completed without an error, how can a
113 user be confident that the generated executable actually works? The
114 only workable solution to this problem is to provide an easily
115 accessible set of tests that the user can run after building.
116 </p>
119 <b><i>Porting:</i></b>
120 <br/>
121 Often, users become porters when the need to run on a previously
122 unsupported system arises. This porting process typically require some
123 minor tweaking of include files. It is absolutely critical that
124 testing be available when porting since the primary developers may not
125 have any way to test changes submitted by someone doing a port.
126 </p>
130 <b><i>Testing:</i></b>
131 <br/>
132 Different installations of the exact same OS can contain subtle
133 differences that cause software to operate incorrectly. Only testing
134 on different systems will expose problems of this nature. A test suite
135 can help identify these sorts of problems before a program is actually
136 put to use.
137 </p>
142 <h3><a name="REQUIREMENTS">Requirements</a></h3>
145 Functional requirements of an acceptable test suite include:
146 </p>
149 <b><i>Unique Test Identifiers:</i></b>
150 <br/>
151 Each test case must have a globally unique test identifier, this
152 identifier is just a string. A globally unique string is
153 required so that test cases can be individually identified by
154 name, sorted, and even looked up on the web. It seems simple,
155 perhaps even blatantly obvious, but some other test packages
156 have failed to maintain uniqueness in test identifiers and
157 developers have suffered because of it. It is even desirable for
158 the system actively enforces this uniqueness requirement.
159 </p>
162 <b><i>Exact Results:</i></b>
163 <br/>
164 A test case must have one expected result. If the result of
165 running the tests does not exactly match the expected result,
166 the test must fail.
167 </p>
170 <b><i>Reproducible Results:</i></b>
171 <br/>
172 Test results should be reproducible. If a test result matches
173 the expected result, it should do so every time the test is
174 run. External factors like time stamps must not effect the
175 results of a test.
176 </p>
179 <b><i>Self-Contained Tests:</i></b>
180 <br/>
181 Each test should be self-contained. Results for one test should
182 not depend on side effects of previous tests. This is obviously
183 a good practice, since one is able to understand everything a
184 test is doing without having to look at other tests. The test
185 system should also support random access so that a single test
186 or set of tests can be run. If a test is not self-contained, it
187 cannot be run in isolation.
188 </p>
191 <b><i>Selective Execution:</i></b>
192 <br/>
193 It may not be possible to run a given set of tests on certain
194 systems. The suite must provide a means of selectively running
195 tests cases based on the environment. The test system must also
196 provide a way to selectively run a given test case or set of
197 test cases on a per invocation basis. It would be incredibly
198 tedious to run the entire suite to see the results for a single
199 test.
200 </p>
203 <b><i>No Monitoring:</i></b>
204 <br/>
205 The tests must run from start to end without operator
206 intervention. Test results must be generated automatically. It
207 is critical that an operator not need to manually compare test
208 results to figure out which tests failed and which ones passed.
209 </p>
213 <b><i>Automatic Logging of Results:</i></b>
214 <br/>
215 The system must store test results so that they can be compared
216 later. This applies to machine readable results as well as human
217 readable results. For example, assume we have a test named
218 <code>client-1</code>, it expects a result of 1 but instead 0 is
219 returned by the test case. We should expect the system to store
220 two distinct pieces of information. First, that the test
221 failed. Second, how the test failed, meaning how the expected
222 result differed from the actual result.
223 </p>
226 This following example shows the kind of results we might record
227 in a results log file.
228 </p>
230 <pre><code>
231 client-1 FAILED
232 client-2 PASSED
233 client-3 PASSED
234 </code></pre>
237 <b><i>Automatic Recovery:</i></b>
238 <br/>
239 The test system must be able to recover from crashes and
240 unexpected delays. For example, a child process might go into a
241 infinite loop and would need to be killed. The test shell itself
242 might also crash or go into an infinite loop. In these cases,
243 the test run must automatically recover and continue with the
244 tests directly after the one that crashed.
245 </p>
248 This is critical for a couple of reasons. Nasty crashes and
249 infinite loops most often appear on users (not developers)
250 systems. Users are not well equipped to deal with these sorts of
251 exceptional situations. It is unrealistic to expect that users
252 will be able to manually recover from disaster and restart
253 crashed test cases. It is an accomplishment just to get them to
254 run the tests in the first place!
255 </p>
258 Ensuring that the test system actually runs each and every test
259 is critical, since a failing test near the end of the suite
260 might never be noticed if a crash halfway through kept all the
261 tests from being run. This process must be completely
262 automated, no operator intervention should be required.
263 </p>
267 <b><i>Report Results Only:</i></b>
268 <br/>
269 When a regression is found, a developer will need to manually
270 determine the reason for the regression. The system should tell
271 the developer exactly what tests have failed, when the last set
272 of results were generated, and what the previous results
273 actually were. Any additional functionality is outside the
274 scope of the test system.
275 </p>
278 <b><i>Platform Specific Results:</i></b>
279 <br/>
280 Each supported platform should have an associated set of test
281 results. The naive approach would be to maintain a single set of
282 results and compare the output for any platform to the known
283 results. The problem with this approach is that is does not
284 provide a way to keep track of when changes differ from one
285 platform to another. The following example attempts to clarify
286 with an example.
287 </p>
290 Assume you have the following tests results generated on a
291 reference platform before and after a set of changes were
292 committed.
293 </p>
295 <table border="1" cellspacing="2" cellpadding="2">
297 <tr>
298 <td><b>Before</b> (Reference Platform)</td>
300 <td><b>After</b> (Reference Platform)</td>
301 </tr>
303 <tr>
304 <td><code>client-1 PASSED</code></td>
305 <td><code>client-1 PASSED</code></td>
306 </tr>
308 <tr>
309 <td><code>client-2 PASSED</code></td>
310 <td><code>client-2 FAILED</code></td>
311 </tr>
313 </table>
316 It is clear that the change you made introduced a regression in
317 the <code>client-2</code> test. The problem shows up when you
318 try to compare results generated from this modified code on some
319 other platform. For example, assume you got the following
320 results:
321 </p>
323 <table border="1" cellspacing="2" cellpadding="2">
325 <tr>
326 <td><b>Before</b> (Reference Platform)</td>
328 <td><b>After</b> (Other Platform)</td>
329 </tr>
331 <tr>
332 <td><code>client-1 PASSED</code></td>
333 <td><code>client-1 FAILED</code></td>
334 </tr>
336 <tr>
337 <td><code>client-2 PASSED</code></td>
338 <td><code>client-2 PASSED</code></td>
339 </tr>
341 </table>
344 Now things are not at all clear. We know that
345 <code>client-1</code> is failing but we don't know if it is
346 related to the change we just made. We don't know if this test
347 failed the last time we ran the tests on this platform since we
348 only have results for the reference platform to compare to. We
349 might have fixed a bug in <code>client-2</code>, or we might
350 have done nothing to effect it.
351 </p>
354 If we instead keep track of test results on a platform by
355 platform basis, we can avoid much of this pain. It is easy to
356 imagine how this problem could get considerably worse if there
357 were 50 or 100 tests that behaved differently from one platform
358 to the next.
359 </p>
362 <b><i>Test Types:</i></b>
363 <br/>
364 The test suite should support two types of tests. The first
365 makes use of an external program like the svn client. These
366 kinds of tests will need to exec an external program and check
367 the output and exit status of the child process. Note that it
368 will not be possible to run this sort of test on Mac OS. The
369 second type of test will load Subversion shared libraries and
370 invoke methods in-process.
371 </p>
374 This provides the ability to do extensive testing of the various
375 Subversion APIs without using the svn client. This also has the
376 nice benefit that it will work on Mac OS, as well as Windows and
377 Unix.
378 </p>
380 <h3><a name="EASEOFUSE">Ease of Use</a></h3>
383 Developers will tend to avoid using a test suite if it is not
384 easy to add new tests and maintain old ones. If developers are
385 uninterested in using the test suite, it will quickly fall into
386 disrepair and become a burden instead of an aide.
387 </p>
390 Users will simply avoid running the test suite if it is not
391 extremely simple to use. A user should be able to build the
392 software and then run:
393 </p>
395 <blockquote>
396 <p><code>
397 % make check
398 </code></p>
399 </blockquote>
402 This should run the test suite and provide a very high level set
403 of results that include how many tests results have changed
404 since the last run.
405 </p>
408 While this high level report is useful to developers, they will
409 often need to examine results in more detail. The system should
410 provide a means to manually examine results, compare output,
411 invoke a debugger, and other sorts of low level operations.
412 </p>
415 The next example shows how a developer might run a specific
416 subset of tests from the command line. The pattern given would
417 be used to do a glob style match on the test case identifiers,
418 and run any that matched.
419 </p>
421 <blockquote>
422 <p><code>
423 % svntest "client-*"
424 </code></p>
425 </blockquote>
427 <h3><a name="LOCATION">Location</a></h3>
430 The test suite should be packaged along with the source code
431 instead of being made available as a separate download. This
432 significantly simplifies the process of running tests since they
433 are already incorporated into the build tree.
434 </p>
437 The test suite must support building and running inside and
438 outside of the source directory. For example, a developer might
439 want to run tests on both Solaris and Linux. The developer
440 should be able to run the tests concurrently in two different
441 build directories without having the tests interfere with each
442 other.
443 </p>
446 <h3><a name="EXTERNAL">External program dependencies</a></h3>
449 As much as possible, the test suite should avoid depending on
450 external programs or libraries.
452 Of course, there is a nasty bootstrap problem with a test suite
453 implemented in a scripting language. A wide variety of systems
454 provide no support for modern scripting languages. We will avoid
455 this issue for now and assume that the scripting language of
456 choice is supported by the system.
457 </p>
460 For example, the test suite should not depend on CVS to generate
461 test results. Many users will not have access to CVS on the
462 system they want to test Subversion on.
463 </p>
465 </div>
466 </body>
467 </html>