www/testing-goals.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3 <html xmlns="http://www.w3.org/1999/xhtml">
   4 <head>
   5 <style type="text/css"> /* <![CDATA[ */
   6   @import "branding/css/tigris.css";
   7   @import "branding/css/inst.css";
   8   /* ]]> */</style>
   9 <link rel="stylesheet" type="text/css" media="print"
  10   href="branding/css/print.css"/>
  11 <script type="text/javascript" src="branding/scripts/tigris.js"></script>
  12 <title>Subversion Testing Goals</title>
  13 </head>
  14
  15 <body>
  16 <div class="app">
  17
  18     <h2>Design goals for the SVN test suite</h2>
  19
  20     <ul>
  21       <li>
  22         <a href="#WHY">Why Test?</a>
  23       </li>
  24       <li>
  25         <a href="#AUDIENCE">Audience</a>
  26       </li>
  27       <li>
  28         <a href="#REQUIREMENTS">Requirements</a>
  29       </li>
  30       <li>
  31         <a href="#EASEOFUSE">Ease of Use</a>
  32       </li>
  33       <li>
  34         <a href="#LOCATION">Location</a>
  35       </li>
  36       <li>
  37         <a href="#EXTERNAL">External dependencies</a>
  38       </li>
  39     </ul>
  40
  41
  42
  43     <h3><a name="WHY">Why Test?</a></h3>
  44
  45     <p>
  46       Regression testing is an essential element of high quality software.
  47       Unfortunately, some developers have not had first hand exposure to a
  48       high quality testing framework.  Lack of familiarity with the positive
  49       effects of testing can be blamed for statements like:
  50       <br/>
  51     </p>
  52     <blockquote>
  53       <p>"I don't need to test my code, I know it works."</p>
  54     </blockquote>
  55     <p>
  56       It is safe to say that the idea that developers do not introduce
  57       bugs has been disproved.
  58     </p>
  59
  60
  61     <h3><a name="AUDIENCE">Audience</a></h3>
  62
  63     <p>
  64       The test suite will be used by both developers and end users.
  65     </p>
  66
  67     <p>
  68       <b>Developers</b> need a test suite to help with:
  69     </p>
  70
  71     <p>
  72       <b><i>Fixing Bugs:</i></b>
  73       <br/>
  74       Each time a bug is fixed, a test case should be added to the test
  75       suite. Creating a test case that reproduces a bug is a seemingly
  76       obvious requirement. If a bug cannot be reproduced, there is no way to
  77       be sure a given change will actually fix the problem. Once a test case
  78       has been created, it can be used to validate the correctness of a
  79       given patch.  Adding a new test case for each bug also ensures that
  80       the same bug will not be introduced again in the future.
  81     </p>
  82
  83     <p>
  84       <b><i>Impact Analysis:</i></b>
  85       <br/>
  86       A developer fixing a bug or adding a new feature needs to know if a
  87       given change breaks other parts of the code. It may seem obvious, but
  88       keeping a developer from introducing new bugs is one of the primary
  89       benefits of a using a regression test system.
  90     </p>
  91
  92     <p>
  93       <b><i>Regression Analysis:</i></b>
  94       <br/>
  95       When a test regression occurs, a developer will need to manually
  96       determine what has caused the failure.  The test system is not able to
  97       determine why a test case failed. The test system should simply report
  98       exactly which test results changed and when the last results were
  99       generated.
 100     </p>
 101
 102     <p>
 103       <b>Users</b> need a test suite to help with:
 104     </p>
 105
 106     <p>
 107       <b><i>Building:</i></b>
 108       <br/>
 109       Building software can be a scary process.  Users that have never built
 110       software may be unwilling to try. Others may have tried to build a
 111       piece of software in the past, only to be thwarted by a difficult
 112       build process. Even if the build completed without an error, how can a
 113       user be confident that the generated executable actually works?  The
 114       only workable solution to this problem is to provide an easily
 115       accessible set of tests that the user can run after building.
 116     </p>
 117
 118     <p>
 119       <b><i>Porting:</i></b>
 120       <br/>
 121       Often, users become porters when the need to run on a previously
 122       unsupported system arises. This porting process typically require some
 123       minor tweaking of include files.  It is absolutely critical that
 124       testing be available when porting since the primary developers may not
 125       have any way to test changes submitted by someone doing a port.
 126     </p>
 127
 128
 129     <p>
 130       <b><i>Testing:</i></b>
 131       <br/>
 132       Different installations of the exact same OS can contain subtle
 133       differences that cause software to operate incorrectly.  Only testing
 134       on different systems will expose problems of this nature. A test suite
 135       can help identify these sorts of problems before a program is actually
 136       put to use.
 137     </p>
 138
 139
 140
 141
 142     <h3><a name="REQUIREMENTS">Requirements</a></h3>
 143
 144     <p>
 145       Functional requirements of an acceptable test suite include:
 146     </p>
 147
 148     <p>
 149       <b><i>Unique Test Identifiers:</i></b>
 150       <br/>
 151       Each test case must have a globally unique test identifier, this
 152       identifier is just a string. A globally unique string is
 153       required so that test cases can be individually identified by
 154       name, sorted, and even looked up on the web.  It seems simple,
 155       perhaps even blatantly obvious, but some other test packages
 156       have failed to maintain uniqueness in test identifiers and
 157       developers have suffered because of it. It is even desirable for
 158       the system actively enforces this uniqueness requirement.
 159     </p>
 160
 161     <p>
 162       <b><i>Exact Results:</i></b>
 163       <br/>
 164       A test case must have one expected result. If the result of
 165       running the tests does not exactly match the expected result,
 166       the test must fail.
 167     </p>
 168
 169     <p>
 170       <b><i>Reproducible Results:</i></b>
 171       <br/>
 172       Test results should be reproducible.  If a test result matches
 173       the expected result, it should do so every time the test is
 174       run. External factors like time stamps must not effect the
 175       results of a test.
 176     </p>
 177
 178     <p>
 179       <b><i>Self-Contained Tests:</i></b>
 180       <br/>
 181       Each test should be self-contained.  Results for one test should
 182       not depend on side effects of previous tests. This is obviously
 183       a good practice, since one is able to understand everything a
 184       test is doing without having to look at other tests. The test
 185       system should also support random access so that a single test
 186       or set of tests can be run. If a test is not self-contained, it
 187       cannot be run in isolation.
 188     </p>
 189
 190     <p>
 191       <b><i>Selective Execution:</i></b>
 192       <br/>
 193       It may not be possible to run a given set of tests on certain
 194       systems. The suite must provide a means of selectively running
 195       tests cases based on the environment. The test system must also
 196       provide a way to selectively run a given test case or set of
 197       test cases on a per invocation basis. It would be incredibly
 198       tedious to run the entire suite to see the results for a single
 199       test.
 200     </p>
 201
 202     <p>
 203       <b><i>No Monitoring:</i></b>
 204       <br/>
 205       The tests must run from start to end without operator
 206       intervention.  Test results must be generated automatically. It
 207       is critical that an operator not need to manually compare test
 208       results to figure out which tests failed and which ones passed.
 209     </p>
 210
 211
 212     <p>
 213       <b><i>Automatic Logging of Results:</i></b>
 214       <br/>
 215       The system must store test results so that they can be compared
 216       later. This applies to machine readable results as well as human
 217       readable results. For example, assume we have a test named
 218       <code>client-1</code>, it expects a result of 1 but instead 0 is
 219       returned by the test case.  We should expect the system to store
 220       two distinct pieces of information. First, that the test
 221       failed. Second, how the test failed, meaning how the expected
 222       result differed from the actual result.
 223     </p>
 224
 225     <p>
 226       This following example shows the kind of results we might record
 227       in a results log file.
 228     </p>
 229
 230       <pre><code>
 231    client-1 FAILED
 232    client-2 PASSED
 233    client-3 PASSED
 234     </code></pre>
 235
 236     <p>
 237       <b><i>Automatic Recovery:</i></b>
 238       <br/>
 239       The test system must be able to recover from crashes and
 240       unexpected delays.  For example, a child process might go into a
 241       infinite loop and would need to be killed. The test shell itself
 242       might also crash or go into an infinite loop. In these cases,
 243       the test run must automatically recover and continue with the
 244       tests directly after the one that crashed.
 245     </p>
 246
 247     <p>
 248       This is critical for a couple of reasons. Nasty crashes and
 249       infinite loops most often appear on users (not developers)
 250       systems. Users are not well equipped to deal with these sorts of
 251       exceptional situations.  It is unrealistic to expect that users
 252       will be able to manually recover from disaster and restart
 253       crashed test cases. It is an accomplishment just to get them to
 254       run the tests in the first place!
 255     </p>
 256
 257     <p>
 258       Ensuring that the test system actually runs each and every test
 259       is critical, since a failing test near the end of the suite
 260       might never be noticed if a crash halfway through kept all the
 261       tests from being run.  This process must be completely
 262       automated, no operator intervention should be required.
 263     </p>
 264
 265
 266     <p>
 267       <b><i>Report Results Only:</i></b>
 268       <br/>
 269       When a regression is found, a developer will need to manually
 270       determine the reason for the regression.  The system should tell
 271       the developer exactly what tests have failed, when the last set
 272       of results were generated, and what the previous results
 273       actually were.  Any additional functionality is outside the
 274       scope of the test system.
 275     </p>
 276
 277     <p>
 278       <b><i>Platform Specific Results:</i></b>
 279       <br/>
 280       Each supported platform should have an associated set of test
 281       results. The naive approach would be to maintain a single set of
 282       results and compare the output for any platform to the known
 283       results. The problem with this approach is that is does not
 284       provide a way to keep track of when changes differ from one
 285       platform to another. The following example attempts to clarify
 286       with an example.
 287     </p>
 288
 289     <p>
 290       Assume you have the following tests results generated on a
 291       reference platform before and after a set of changes were
 292       committed.
 293     </p>
 294
 295     <table border="1" cellspacing="2" cellpadding="2">
 296
 297       <tr>
 298         <td><b>Before</b> (Reference Platform)</td>
 299
 300         <td><b>After</b> (Reference Platform)</td>
 301       </tr>
 302
 303       <tr>
 304         <td><code>client-1 PASSED</code></td>
 305         <td><code>client-1 PASSED</code></td>
 306       </tr>
 307
 308       <tr>
 309         <td><code>client-2 PASSED</code></td>
 310         <td><code>client-2 FAILED</code></td>
 311       </tr>
 312
 313     </table>
 314
 315     <p>
 316       It is clear that the change you made introduced a regression in
 317       the <code>client-2</code> test.  The problem shows up when you
 318       try to compare results generated from this modified code on some
 319       other platform. For example, assume you got the following
 320       results:
 321     </p>
 322
 323     <table border="1" cellspacing="2" cellpadding="2">
 324
 325       <tr>
 326         <td><b>Before</b> (Reference Platform)</td>
 327
 328         <td><b>After</b> (Other Platform)</td>
 329       </tr>
 330
 331       <tr>
 332         <td><code>client-1 PASSED</code></td>
 333         <td><code>client-1 FAILED</code></td>
 334       </tr>
 335
 336       <tr>
 337         <td><code>client-2 PASSED</code></td>
 338         <td><code>client-2 PASSED</code></td>
 339       </tr>
 340
 341     </table>
 342
 343     <p>
 344       Now things are not at all clear. We know that
 345       <code>client-1</code> is failing but we don't know if it is
 346       related to the change we just made. We don't know if this test
 347       failed the last time we ran the tests on this platform since we
 348       only have results for the reference platform to compare to. We
 349       might have fixed a bug in <code>client-2</code>, or we might
 350       have done nothing to effect it.
 351     </p>
 352
 353     <p>
 354       If we instead keep track of test results on a platform by
 355       platform basis, we can avoid much of this pain. It is easy to
 356       imagine how this problem could get considerably worse if there
 357       were 50 or 100 tests that behaved differently from one platform
 358       to the next.
 359     </p>
 360
 361     <p>
 362       <b><i>Test Types:</i></b>
 363       <br/>
 364       The test suite should support two types of tests. The first
 365       makes use of an external program like the svn client.  These
 366       kinds of tests will need to exec an external program and check
 367       the output and exit status of the child process. Note that it
 368       will not be possible to run this sort of test on Mac OS.  The
 369       second type of test will load Subversion shared libraries and
 370       invoke methods in-process.
 371     </p>
 372
 373     <p>
 374       This provides the ability to do extensive testing of the various
 375       Subversion APIs without using the svn client. This also has the
 376       nice benefit that it will work on Mac OS, as well as Windows and
 377       Unix.
 378     </p>
 379
 380     <h3><a name="EASEOFUSE">Ease of Use</a></h3>
 381
 382     <p>
 383       Developers will tend to avoid using a test suite if it is not
 384       easy to add new tests and maintain old ones.  If developers are
 385       uninterested in using the test suite, it will quickly fall into
 386       disrepair and become a burden instead of an aide.
 387     </p>
 388
 389     <p>
 390       Users will simply avoid running the test suite if it is not
 391       extremely simple to use. A user should be able to build the
 392       software and then run:
 393     </p>
 394
 395     <blockquote>
 396       <p><code>
 397         % make check
 398       </code></p>
 399     </blockquote>
 400
 401     <p>
 402       This should run the test suite and provide a very high level set
 403       of results that include how many tests results have changed
 404       since the last run.
 405     </p>
 406
 407     <p>
 408       While this high level report is useful to developers, they will
 409       often need to examine results in more detail.  The system should
 410       provide a means to manually examine results, compare output,
 411       invoke a debugger, and other sorts of low level operations.
 412     </p>
 413
 414     <p>
 415       The next example shows how a developer might run a specific
 416       subset of tests from the command line. The pattern given would
 417       be used to do a glob style match on the test case identifiers,
 418       and run any that matched.
 419     </p>
 420
 421     <blockquote>
 422       <p><code>
 423         % svntest "client-*"
 424       </code></p>
 425     </blockquote>
 426
 427     <h3><a name="LOCATION">Location</a></h3>
 428
 429     <p>
 430       The test suite should be packaged along with the source code
 431       instead of being made available as a separate download. This
 432       significantly simplifies the process of running tests since they
 433       are already incorporated into the build tree.
 434     </p>
 435
 436     <p>
 437       The test suite must support building and running inside and
 438       outside of the source directory. For example, a developer might
 439       want to run tests on both Solaris and Linux. The developer
 440       should be able to run the tests concurrently in two different
 441       build directories without having the tests interfere with each
 442       other.
 443     </p>
 444
 445
 446     <h3><a name="EXTERNAL">External program dependencies</a></h3>
 447
 448     <p>
 449       As much as possible, the test suite should avoid depending on
 450       external programs or libraries.
 451
 452       Of course, there is a nasty bootstrap problem with a test suite
 453       implemented in a scripting language. A wide variety of systems
 454       provide no support for modern scripting languages. We will avoid
 455       this issue for now and assume that the scripting language of
 456       choice is supported by the system.
 457     </p>
 458
 459     <p>
 460       For example, the test suite should not depend on CVS to generate
 461       test results. Many users will not have access to CVS on the
 462       system they want to test Subversion on.
 463     </p>
 464
 465 </div>
 466 </body>
 467 </html>