external/bsd/atf/dist/doc/text/specification.txt

   1    Rearchitecting ATF: The missing specification
   2
   3    By Julio Merino, The NetBSD Foundation
   4
   5                                     Contents
   6
   7     1. Overview
   8
   9     2. Key features and differences
  10
  11     3. Scenarios
  12
  13          1. The developer
  14
  15          2. The end user
  16
  17          3. The administrator
  18
  19          4. Build farms
  20
  21     4. Users
  22
  23     5. Test case organization and identifiers
  24
  25          1. File system layout
  26
  27          2. Identifiers
  28
  29     6. Test cases
  30
  31          1. Identifiers
  32
  33          2. Types and sizes
  34
  35          3. Results
  36
  37          4. Results reporting
  38
  39     7. Test programs
  40
  41          1. Identifiers
  42
  43          2. On-disk representation
  44
  45          3. Test case isolation
  46
  47          4. The command-line interface
  48
  49     8. Execution automation
  50
  51     9. The results store
  52
  53    10. Build farms
  54
  55    This document is very much WORK IN PROGRESS. Anything can change at any
  56    time without prior notice. Feel free to (and please do) raise comments
  57    about the major ideas herein described but DO NOT NITPICK. Be aware that
  58    even the ideas and design decisions in this document are not settled in
  59    stone; they may completely change too.
  60
  61                                     Overview
  62
  63    The Automated Testing Framework, or ATF for short, aims to provide a
  64    software testing platform for both developers and end users:
  65
  66      * Developers want a set of libraries that make the implementation of
  67        test cases painless.
  68
  69      * Users want a set of tools that allow them to run the tests over and
  70        over and over and over again and generate beautiful reports with the
  71        results.
  72
  73    The development of ATF started as a Google Summer of Code 2007 project for
  74    the NetBSD operating system. Unfortunately, the code basically grew out of
  75    a prototype and a very loose specification. The result is, to put it
  76    mildly, a real mess and a pain in the ass to maintain. Don't get me wrong:
  77    the code has grown pretty well based on the original design ideas, but the
  78    overall result has some problems that are really hard to fix without a
  79    major redesign. Moreover, some of these problems have only materialized as
  80    a result of the reasonable maturity of ATF; they were really hard to
  81    predict in the first place.
  82
  83    This specification aims to provide an ideal design for ATF (err, yes, a
  84    design for how it should have been architected in the first place). It
  85    will be obvious that we will have to rewrite major portions of code, but I
  86    would expect to be able to reuse many parts of it. Starting from scratch
  87    is not an option; incremental improvement will deliver results much
  88    earlier and allow for user assurance before we do new mistakes.
  89
  90                           Key features and differences
  91
  92    The major features of ATF will be:
  93
  94      * Lightweight libraries for C, C++ and POSIX shell scripting to
  95        implement test cases.
  96
  97      * Test cases designed to be installed on the target system so that they
  98        can be run much later after building the software.
  99
 100    The major differences between future versions of ATF and previous ones
 101    will be:
 102
 103      * Test programs don't perform isolation. Before, test programs were
 104        overly-complicated by trying to isolate the subprocesses of their test
 105        cases from the rest of the test cases and the system. This is very
 106        fragile, specially when implemented in POSIX shell. Therefore,
 107        isolation will now be performed from a single point, atf-run, just
 108        before forking the test case.
 109
 110      * Test programs can only run one test case at a time. Related to the
 111        previous points, test programs will not run multiple test cases in a
 112        row any more, because they can't provide isolation. Sequencing will be
 113        provided by atf-run.
 114
 115      * Simple debugging. As test programs do not fork any more, debugging of
 116        failing test cases is easier, as gdb will Just Work (TM).
 117
 118      * Test case metadata is stored out of the test program, in a special
 119        file. This is to allow efficient querying from external applications.
 120        If you have attempted to run an old POSIX shell test program with the
 121        -l option to list the available test cases, you know what I mean; such
 122        approach does not scale at all.
 123
 124      * Support for other test sources. We want to support adding results
 125        coming from "special" test programs to the report, such as build
 126        slaves or source code linters.
 127
 128      * Remote reporting of test results. Previously, atf-run and atf-report
 129        are able to generate test reports for a single run, but it's just not
 130        possible to merge these results with other executions or with results
 131        from other machines. We will have a database, accessible remotely,
 132        containing results from multiple sources (different machines,
 133        different test cases, etc.) and providing historical information about
 134        these results.
 135
 136                                    Scenarios
 137
 138 The developer
 139
 140    The developer wants a set of libraries to be able to write test cases for
 141    his own software painlessly and as quickly as possible. These libraries
 142    should have a clean interface and not expose internal details of the
 143    implementation (as the old libraries do). Furthermore, another key point
 144    that the developer values is the ease of debugging of test cases: when a
 145    test case fails, running it in gdb or similar tools is crucial, and the
 146    framework should not get in the way to do that. Unfortunately, previous
 147    versions of ATF make debugging really hard, so this is something to
 148    address in the future.
 149
 150 The end user
 151
 152    It may be argued that the end user should never see the tests because,
 153    when he gets the application, he has to be able to assume that it is
 154    defect free. Unfortunately, that is not the case. Many developers do not
 155    have the resources to have build farms with all possible hardware/software
 156    configurations that their users may have, so testing is never complete.
 157
 158    Even more, there is a very clear case in which the end user needs tests
 159    and for which there is no easy replacement. Let's assume the user gets a
 160    shiny new version of the FlashyView image viewer. FlashyView has a
 161    dependency on the third-party libjpeg library to load and decode the image
 162    files. At the moment of FlashyView's 1.0 release, its developers test the
 163    code against libjpeg 89.3.4 and all is right. The user installs both
 164    FlashyView 1.0 and libjpeg 89.3.4 on his computer and all is good.
 165    However, one day his CleverOS operating system decides to upgrade libjpeg
 166    to 89.122.36 because, you know, both are compatible. But the developers
 167    have only recently tested it with 89.122.35 and they don't know FlashyView
 168    1.0 doesn't work with 89.122.36. If the user has the tests available, he
 169    will be able to run them after an upgrade and check that, effectively,
 170    some obscure features of FlashyView 1.0 have stopped working with
 171    89.122.36. This can be an invaluable help for critical applications or as
 172    part of the bug reporting procedure.
 173
 174 The administrator
 175
 176    System administrators need to set up beautiful new boxes pretty
 177    frequently. But hardware is different on each of them, and the software
 178    developers do not have the luxury to have those uber-expensive machines to
 179    make sure that their software works fine in reversed-endian architectures.
 180    If the administrator has the tests readily available for all software
 181    components, he will be able to quickly assess whether the software
 182    installation will be stable or not in the new system. He will similarly be
 183    able to assess the overall quality of the system after major and minor
 184    upgrades.
 185
 186 Build farms
 187
 188    I am adding build farms as a scenario because this is something that we
 189    really need to have but which was not addressed at all in older versions
 190    of ATF. Virtually all software projects that want to address portability
 191    to different systems and/or architectures will need some kind of build
 192    automation in a set of machines (aka build slaves). ATF has to provide
 193    ways to either allow the integration of these test results into the
 194    overall reports or to implement itself the necessary logic to provide a
 195    build farm.
 196
 197                                      Users
 198
 199    The first and main consumer of ATF (during the very first releases, at
 200    least) will be The NetBSD Project. As such, we need to make design
 201    decisions that benefit ATF in this context. Some of these include:
 202
 203      * No dependencies on third-party software. The use of Boost or SQLite
 204        sounds tempting, as we shall see later on, but might result in a ban
 205        of ATF into the NetBSD source tree. If a third-party component may
 206        result in high benefits in the code, it will be considered, but care
 207        has to be taken.
 208
 209      * Don't force C++. Test case developers don't want to see C++ at all. So
 210        the C library must be as clean as possible from C++-like artifacts.
 211
 212      * Speed matters. Previous version of ATF run "reasonably fast" on modern
 213        computers, but are unbearably slow on not-so-old machines. This is not
 214        tolerable, given that NetBSD runs on many underpowered platforms and
 215        those are the ones that will most benefit from automated testing.
 216
 217    Of course I hope we'll have more consumers other than NetBSD, but for that
 218    to happen we must design a good product and then gain consumers at a slow
 219    pace.
 220
 221                      Test case organization and identifiers
 222
 223    The smallest testing unit is a test case. A test case has a specific
 224    purpose, like ensuring that a single method works fine (unit test) or
 225    ensuring that a specific command-line flag works as expected (system
 226    test).
 227
 228    Test cases are grouped into test programs. These test programs act as mere
 229    frontends for the execution of the test cases they contain: there is
 230    absolutely no state sharing between different test cases at run time, even
 231    if they belong to the same test program.
 232
 233    Test programs are stored in a subtree of the file system. This subtree
 234    defines a test suite.
 235
 236 File system layout
 237
 238    In order to identify the root of a test suite, we will place a special
 239    control directory, named _ATF, as a child of the root's directory. This
 240    directory will include a file, named test-suite, that contains the name of
 241    the test suite.
 242
 243    Descending from the test suite root directory, we can find either
 244    subdirectories or test programs. The former are used to organize test
 245    programs logically, while the later can be placed anywhere in the subtree.
 246
 247 Identifiers
 248
 249    Based on the tree layout that defines a test suite, each test program and
 250    test case can be identified by an absolute path from the root of the tree
 251    to the test program or test case, respectively. Given that we impose a
 252    difference between test programs and test cases, we will reflect such
 253    differences in the paths.
 254
 255    A test program is identified merely by the path from the test suite's root
 256    directory to it, and the components of this path are separated by forward
 257    slashes (just like in any Unix path).
 258
 259    A test program is identified by a name that is unique within the test
 260    program. To uniquely identify the test case within the tree, we take the
 261    path of the test program and append the test case name to it as a new
 262    component, but this time using a colon as the delimiter.
 263
 264                                    Test cases
 265
 266 Identifiers
 267
 268    Test case identifier vs. execution identifier.
 269
 270 Types and sizes
 271
 272    Test cases have a specific purpose and, as such, they will be tagged by
 273    the developers. These types can be:
 274
 275     1. Unit test: ...
 276
 277     2. Integration test: ...
 278
 279     3. System test: ...
 280
 281    Orthogonally to test case types, tests also have a size defining them:
 282
 283     1. Small: A test case that runs in miliseconds.
 284
 285     2. Medium: A test case that runs in the order of few seconds (less than
 286        10).
 287
 288     3. Large: Any other test case.
 289
 290    Obviously, classifying the test cases by size is a very subjective thing,
 291    because faster machines will make some medium test cases feel small at
 292    some point. To-do: consider if we really want to do this...
 293
 294 Results
 295
 296    A test case results may terminate with any of the following results:
 297
 298      * Pass: All the checks in the test case were successful. No additional
 299        information provided.
 300
 301      * Fail: The test case explicitly failed; a textual reason must be
 302        provided for this failure.
 303
 304      * Skipped: The test case was not executed because some conditions were
 305        not met; a textual reason must be provided to aid the user in
 306        correcting the problems that prevented the test case from running.
 307
 308      * Expected failure: An error was detected in the test case but it was
 309        expected. Useful to capture known bugs in test cases, but which will
 310        not be fixed anytime soon.
 311
 312      * Bogus: This is not a result raised by the test case, but is a
 313        condition detected by the caller. A test case is deemed bogus when it
 314        exits abruptly: i.e. it crashes at any point or it doesn't create the
 315        results file.
 316
 317 Results reporting
 318
 319    A test case will create a file upon completion, which will contain the
 320    results of the execution of that specific test case. If the test case
 321    fails half-way through due to some unexpected error, the file will not be
 322    created. Callers of the test case will then know that something went
 323    horribly wrong and mark the test case as bogus.
 324
 325    Previous versions of ATF used a special file descriptor to report their
 326    results to the caller. This seemed a good idea at the beginning because I
 327    expected to have test cases not to create temporary directories, but
 328    causes several problems: the test case can close the results file
 329    descriptor and it is, I think, impossible to eventually implement this
 330    approach in Win32 systems. As regards the former problem, though, the old
 331    code uses a temporary file internally to store the results and lets the
 332    test program monitor read that and redirect those results through the
 333    desired file descriptor. That is redundant and uselessly complex: why not
 334    use files all the way through in the first place? That's what we are going
 335    to do.
 336
 337                                  Test programs
 338
 339    A test program is a collection of related test cases with a common
 340    run-time interface. Test cases need not be of the same type; i.e. a test
 341    program could contain both unit and system tests.
 342
 343 Identifiers
 344
 345    A test program has a name that must be unique in the directory it is
 346    stored (obviously; file systems do not support multiple files with the
 347    same name living in the same directory).
 348
 349    The test program is uniquely identified by the full path from the test
 350    suite's root directory to the test program, including the test program
 351    name itself.
 352
 353 On-disk representation
 354
 355    Test programs are, by definition, binaries or scripts stored on disk.
 356    However, we need to attach some meta-data to these programs, which makes
 357    ATF test programs be stored as bundles on disk.
 358
 359    Lets consider a test program called wheel-test for the super-interesting
 360    wheel class. The wheel-test contains the can-spin and is-round test cases
 361    that check if, well, the wheel can spin and if the wheel is round. This
 362    test program is stored in a wheel-test.atf-tp directory whose contents
 363    are:
 364
 365      * wheel-test.atf-tp/metadata: Contains the list of available test cases,
 366        their description and their properties (if any).
 367
 368      * wheel-test.atf-tp/executable: A binary or shell script that implements
 369        the test cases described in the metadata.
 370
 371    Why do we store the metadata separately from the binary? We want to be
 372    able to inspect a whole tree of test programs as fast as possible and
 373    collect information about all the available test cases and their
 374    properties. This information can later be used to query which test cases
 375    to run on each run -- just imagine a GUI providing the user the whole
 376    (huge) list of test cases available in their systems (for all the
 377    applications he has installed) and let him inspect this tree at will.
 378
 379    Previous versions of ATF kept the metadata inside the binary and provided
 380    a very rudimentary command-line interface in each binary to export this
 381    data. The problem is that executing the binaries just to get this
 382    information is a costly operation -- specially for shell-based tests --,
 383    so this approach does not scale.
 384
 385    Of course, keeping the metadata separate from the executable can lead to
 386    inconsistencies between the two, which will be dealt by checksumming the
 387    binary and storing the criptographic checksum in the metadata. To-do:
 388    decide which checksumming algorithm to use.
 389
 390    Open problem: how do we make it easy to generate this layout from the
 391    build tools? Specially, how to painlessly tie this to Automake?
 392
 393 Test case isolation
 394
 395    Test programs contain a set of test cases, but we want to run each test
 396    case as isolatedly as possible from each other. If we run the test cases
 397    in the same process, they share the same memory, so they can mess with
 398    global state that will affect the execution order.
 399
 400    Additionally, we want each test case to run in its own temporary
 401    subdirectory so that it can create, as will, files and directories. The
 402    run-time system must take care of cleaning everything up after execution.
 403
 404    Previous versions of ATF implemented this separation by making the test
 405    program spawn a subprocess for each test case, and by making this same
 406    test program deal with all other the nitty-gritty details of directory
 407    isolation and cleanup. This turns out in tons of code duplication among
 408    each language binding, and is quite hard to keep all implementations
 409    consistent with each other. Furthermore, implementing this isolation in
 410    shell scripts is painfully complex and obfuscated, which makes shell
 411    scripts incredibly slow. At last, there is one more drawback: debugging of
 412    failing test cases is hard because the forking of subprocesses collides
 413    with debuggers; yes, gdb supports subprocess boundary crossing, but not in
 414    all platforms.
 415
 416    An alternative approach is to make test programs not do the isolation by
 417    themselves. Instead, we will have atf-run to spawn a new, clean, isolated
 418    subprocess for each test case and then just execute that test case. This
 419    will, most likely, be faster than the current approach (because it will be
 420    implemented in C++) and will be much easier to maintain.
 421
 422    There are two major drawbacks, though:
 423
 424      * Running the test program by hand will leave tons of garbage uncleaned;
 425        that is fine as long as we warn the tech-savvy user to not do that.
 426
 427      * The current libraries allow the programmer to define random test cases
 428        anywhere in their program (not necessarily in a test program) and run
 429        them in a isolated way by just running their run method. If we remove
 430        the isolation from the test cases themselves, this API should
 431        disappear, as it will not be safe any more to run a test case by hand
 432        from within a program. Maybe not a big deal, though, because... who
 433        wants to mix test cases with a regular application code?
 434
 435 The command-line interface
 436
 437    All test programs must provide the same command-line interface so that end
 438    users are not surprised by unknown and inconsistent flags and arguments.
 439    We did a good job in previous versions of ATF in this regard, but we are
 440    going to simplify the interface even further.
 441
 442    Given that test programs will not provide isolation for the test cases
 443    they contain, we will not allow a single run of the test program to
 444    execute more than one test case. If automation is needed to run several
 445    tests in a sequence, the user will have to use atf-run.
 446
 447    With all that said, a test program will provide the following interface:
 448
 449    test-program [options] [test-case-name]
 450
 451    Note that we can only specify a single test case. For simplicity, we are
 452    going to make it optional, in which case the test program will only work
 453    if it defines a single test case. I do not really like the idea, because
 454    adding another test case to the program will break existing callers, but
 455    these are internal binaries that must not be called directly, so there is
 456    no real harm done if that happens. The simplicity is here provided only to
 457    make debugging easier.
 458
 459    The available options are as follows:
 460
 461      * -h: Explicitly request help. The program must never print the whole
 462        usage message unless asked to do so.
 463
 464      * -r results-file: Path to the file where the execution results will be
 465        stored.
 466
 467      * -s srcdir: Path to the source directory where the test program
 468        resides. We will not try to guess it at this point (atf-run will,
 469        though) unless the source directory is the current directory, because
 470        there is the potential of guessing incorrectly and confusing our
 471        users. We need to know what the source directory is to be able to find
 472        the metadata file and any auxiliary data files required by the test
 473        program.
 474
 475      * -v var=value: Sets the configuration variable var to value, which test
 476        cases can later query.
 477
 478    Note that several flags provided by old ATF versions are gone. Namely: -l
 479    is removed because the metadata is stored separately and -w is removed
 480    because the test program will not create temporary directories any more by
 481    itself.
 482
 483                               Execution automation
 484
 485    The atf-run tool provides automation to run multiple test cases (coming
 486    from different test programs) sequentially. Parallel execution may be
 487    implemented in the future, but test cases must be desinged in a way that
 488    allows them to be executed along other test cases without conflicts.
 489
 490    atf-run also provides isolation for test cases. This tool spawns a
 491    subprocess for each of the tests that have to run, and in doing so it
 492    prepares the subprocess to have a reasonable environmet and isolates it
 493    from the rest of the test cases as much as possible. Once all this has
 494    happened, the test program containing the test case is executed in the
 495    subprocess and the results are collected from the results file generated
 496    by the test case.
 497
 498    To-do: Do we need Atffiles? Probably not, so remove them and mention why
 499    we are doing so.
 500
 501                                The results store
 502
 503    The atf-store implements a database that contains information about the
 504    execution of test cases. The database captures the results of each test
 505    case as well as any potential information that is helpful for debugging:
 506    i.e. the stdout and stderr outputs.
 507
 508    The store is historic: we want to keep the history of a given test case.
 509    Why? Some of these test cases come from build slaves and contain the whole
 510    results of a fetch/compile/test run, so we want to see how things progress
 511    in history. Disk space is cheap, but if we want to cleanup, we can cull
 512    old executions.
 513
 514    We will have different frontends for the store: I'm thinking that
 515    atf-report could just read off the store and print the results on screen,
 516    but we could also have a plugin for name-your-favourite-http-server to
 517    generate a dynamic view of the test case results -- very useful for build
 518    farms.
 519
 520    Given the nature of the store, I think it'd be wise to use SQLite to back
 521    it up, specially if it ever is to serve dynamic web content. If we go this
 522    route, we should provide a not-really-optimized file-based backend for
 523    those users that do not want to have an additional dependency (NetBSD
 524    anyone?).
 525
 526    The store will only be accessed by atf-store. I do not want atf-run or the
 527    test programs to access it directly to store their results. They must
 528    contact the atf-store binary to do so. Having a single entry point to the
 529    store will prevent consistency issues. Now, this brings up two big
 530    questions: where is the store located and how is it accessed?
 531
 532    If we are running ATF interactively, we probably do not want to use the
 533    store at all. However, for simplicity of implementation of tools such as
 534    atf-run, they should always contact the store and let the store decide
 535    what to do. For interactive runs, we can omit storing results and so
 536    sending results to the store should result in a no-op. How does atf-report
 537    work then?
 538
 539    The store has to be accessible locally (through a pipe, named pipe or
 540    whatever) but also remotely. We want build slaves to be able to send
 541    results to the store on a push basis. Open issue: how do we deal with
 542    security?
 543
 544                                   Build farms
 545
 546    Build farms, or continuous builds, are required for any software project
 547    that wants to achieve a minimum amount of quality in one or more
 548    platforms. ATF cannot disregard this use case.
 549
 550    The work of each build slave can be treated as a single test case, and
 551    thus all of its work (source code fetching, building and testing) can be
 552    collapsed into a single program that works as a test case. These results
 553    can later be incorporated into test result reports effortlessly. A more
 554    advanced approach involves splitting each stage (fetch, build, test) as a
 555    separate test case, and then making these independent test cases depend on
 556    each other. The writer of the build slave script has to be able to decide
 557    the approach he prefers.
 558
 559    In order to support build farms, we just need to provide an easy way of
 560    creating a test program (in POSIX shell) to act as a build slave. We then
 561    stick a call to atf-run in cron calling this single test program and make
 562    it deliver the results to a remote atf-store.