1 <?xml version="1.0" encoding="utf-8"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.1//EN"
3 "http://docbook.org/xml/simple/1.1/sdocbook.dtd">
8 <title>Rearchitecting ATF: The missing specification</title>
11 <firstname>Julio</firstname>
12 <surname>Merino</surname>
14 <orgname>The NetBSD Foundation</orgname>
19 <note role="warning"><para>This document is very much WORK IN PROGRESS.
20 Anything can change at any time without prior notice. Feel free to (and
21 please do) raise comments about the major ideas herein described but DO NOT
22 NITPICK. Be aware that even the ideas and design decisions in this
23 document are not settled in stone; they may completely change
26 <section id="overview">
28 <title>Overview</title>
30 <para>The Automated Testing Framework, or ATF for short, aims to provide a
31 software testing platform for both developers and end users:</para>
35 <listitem><para>Developers want a set of libraries that make the
36 implementation of test cases painless.</para></listitem>
38 <listitem><para>Users want a set of tools that allow them to run the tests
39 over and over and over and over again and generate beautiful reports with
40 the results.</para></listitem>
44 <para>The development of ATF started as a Google Summer of Code 2007
45 project for the NetBSD operating system. Unfortunately, the code basically
46 grew out of a prototype and a very loose specification. The result is, to
47 put it mildly, a real mess and a pain in the ass to maintain. Don't get me
48 wrong: the code has grown pretty well based on the original design ideas,
49 but the overall result has some problems that are really hard to fix
50 without a major redesign. Moreover, some of these problems have only
51 materialized as a result of the reasonable maturity of ATF; they were
52 really hard to predict in the first place.</para>
54 <para>This specification aims to provide an ideal design for ATF (err, yes,
55 a design for how it should have been architected in the first place). It
56 will be obvious that we will have to rewrite major portions of code, but I
57 would expect to be able to reuse many parts of it. Starting from scratch
58 is not an option; incremental improvement will deliver results much earlier
59 and allow for user assurance before we do new mistakes.</para>
65 <title>Key features and differences</title>
67 <para>The major features of ATF will be:</para>
71 <listitem><para>Lightweight libraries for C, C++ and POSIX shell scripting
72 to implement test cases.</para></listitem>
74 <listitem><para>Test cases designed to be installed on the target system so
75 that they can be run much later after building the
76 software.</para></listitem>
80 <para>The major differences between future versions of ATF and previous
85 <listitem><para><emphasis>Test programs don't perform isolation.</emphasis>
86 Before, test programs were overly-complicated by trying to isolate the
87 subprocesses of their test cases from the rest of the test cases and the
88 system. This is very fragile, specially when implemented in POSIX shell.
89 Therefore, isolation will now be performed from a single point, atf-run,
90 just before forking the test case.</para></listitem>
92 <listitem><para><emphasis>Test programs can only run one test case at a
93 time.</emphasis> Related to the previous points, test programs will not
94 run multiple test cases in a row any more, because they can't provide
95 isolation. Sequencing will be provided by atf-run.</para></listitem>
97 <listitem><para><emphasis>Simple debugging.</emphasis> As test programs do
98 not fork any more, debugging of failing test cases is easier, as gdb will
99 Just Work (TM).</para></listitem>
101 <listitem><para><emphasis>Test case metadata is stored out of the test
102 program, in a special file.</emphasis> This is to allow efficient querying
103 from external applications. If you have attempted to run an old POSIX
104 shell test program with the <literal>-l</literal> option to list the
105 available test cases, you know what I mean; such approach does not scale at
106 all.</para></listitem>
108 <listitem><para><emphasis>Support for other test sources.</emphasis> We
109 want to support adding results coming from "special" test programs to the
110 report, such as build slaves or source code linters.</para></listitem>
112 <listitem><para><emphasis>Remote reporting of test results.</emphasis>
113 Previously, atf-run and atf-report are able to generate test reports for a
114 single run, but it's just not possible to merge these results with other
115 executions or with results from other machines. We will have a database,
116 accessible remotely, containing results from multiple sources (different
117 machines, different test cases, etc.) and providing historical information
118 about these results.</para></listitem>
124 <section id="scenarios">
126 <title>Scenarios</title>
128 <section id="developer-scenario">
130 <title>The developer</title>
132 <para>The developer wants a set of libraries to be able to write test cases
133 for his own software painlessly and as quickly as possible. These
134 libraries should have a clean interface and not expose internal details of
135 the implementation (as the old libraries do). Furthermore, another key
136 point that the developer values is the ease of debugging of test cases:
137 when a test case fails, running it in gdb or similar tools is crucial, and
138 the framework should not get in the way to do that. Unfortunately,
139 previous versions of ATF make debugging really hard, so this is something
140 to address in the future.</para>
144 <section id="user-scenario">
146 <title>The end user</title>
148 <para>It may be argued that the end user should never see the tests
149 because, when he gets the application, he has to be able to assume that it
150 is defect free. Unfortunately, that is not the case. Many developers do
151 not have the resources to have build farms with all possible
152 hardware/software configurations that their users may have, so testing is
153 never complete.</para>
155 <para>Even more, there is a very clear case in which the end user needs
156 tests and for which there is no easy replacement. Let's assume the user
157 gets a shiny new version of the FlashyView image viewer. FlashyView has a
158 dependency on the third-party libjpeg library to load and decode the image
159 files. At the moment of FlashyView's 1.0 release, its developers test the
160 code against libjpeg 89.3.4 and all is right. The user installs both
161 FlashyView 1.0 and libjpeg 89.3.4 on his computer and all is good.
162 However, one day his CleverOS operating system decides to upgrade libjpeg
163 to 89.122.36 because, you know, both are compatible. But the developers
164 have only recently tested it with 89.122.35 and they don't know FlashyView
165 1.0 doesn't work with 89.122.36. If the user has the tests available, he
166 will be able to run them after an upgrade and check that, effectively, some
167 obscure features of FlashyView 1.0 have stopped working with 89.122.36.
168 This can be an invaluable help for critical applications or as part of the
169 bug reporting procedure.</para>
173 <section id="admin-scenario">
175 <title>The administrator</title>
177 <para>System administrators need to set up beautiful new boxes pretty
178 frequently. But hardware is different on each of them, and the software
179 developers do not have the luxury to have those uber-expensive machines to
180 make sure that their software works fine in reversed-endian architectures.
181 If the administrator has the tests readily available for all software
182 components, he will be able to quickly assess whether the software
183 installation will be stable or not in the new system. He will similarly be
184 able to assess the overall quality of the system after major and minor
189 <section id="farm-scenario">
191 <title>Build farms</title>
193 <para>I am adding build farms as a scenario because this is something that
194 we really need to have but which was not addressed at all in older versions
195 of ATF. Virtually all software projects that want to address portability
196 to different systems and/or architectures will need some kind of build
197 automation in a set of machines (aka build slaves). ATF has to provide
198 ways to either allow the integration of these test results into the overall
199 reports or to implement itself the necessary logic to provide a build
210 <para>The first and main consumer of ATF (during the very first releases,
211 at least) will be The NetBSD Project. As such, we need to make design
212 decisions that benefit ATF in this context. Some of these include:</para>
216 <listitem><para>No dependencies on third-party software. The use of Boost
217 or SQLite sounds tempting, as we shall see later on, but might result in a
218 ban of ATF into the NetBSD source tree. If a third-party component may
219 result in high benefits in the code, it will be considered, but care has to
220 be taken.</para></listitem>
222 <listitem><para>Don't force C++. Test case developers don't want to see
223 C++ at all. So the C library must be as clean as possible from C++-like
224 artifacts.</para></listitem>
226 <listitem><para>Speed matters. Previous version of ATF run "reasonably
227 fast" on modern computers, but are unbearably slow on not-so-old machines.
228 This is not tolerable, given that NetBSD runs on many underpowered
229 platforms and those are the ones that will most benefit from automated
230 testing.</para></listitem>
234 <para>Of course I hope we'll have more consumers other than NetBSD, but for
235 that to happen we must design a good product and then gain consumers at a
240 <section id="organization">
242 <title>Test case organization and identifiers</title>
244 <para>The smallest testing unit is a <emphasis>test case</emphasis>. A
245 test case has a specific purpose, like ensuring that a single method works
246 fine (unit test) or ensuring that a specific command-line flag works as
247 expected (system test).</para>
249 <para>Test cases are grouped into <emphasis>test programs</emphasis>.
250 These test programs act as mere frontends for the execution of the test
251 cases they contain: there is absolutely no state sharing between different
252 test cases at run time, even if they belong to the same test
255 <para>Test programs are stored in a subtree of the file system. This
256 subtree defines a <emphasis>test suite</emphasis>.</para>
258 <section id="fslayout">
260 <title>File system layout</title>
262 <para>In order to identify the root of a test suite, we will place a
263 special control directory, named <literal>_ATF</literal>, as a child of the
264 root's directory. This directory will include a file, named
265 <literal>test-suite</literal>, that contains the name of the test
268 <para>Descending from the test suite root directory, we can find either
269 subdirectories or test programs. The former are used to organize test
270 programs logically, while the later can be placed anywhere in the
275 <section id="identifiers">
277 <title>Identifiers</title>
279 <para>Based on the tree layout that defines a test suite, each test program
280 and test case can be identified by an absolute path from the root of the
281 tree to the test program or test case, respectively. Given that we impose
282 a difference between test programs and test cases, we will reflect such
283 differences in the paths.</para>
285 <para>A test program is identified merely by the path from the test suite's
286 root directory to it, and the components of this path are separated by
287 forward slashes (just like in any Unix path).</para>
289 <para>A test program is identified by a name that is unique within the test
290 program. To uniquely identify the test case within the tree, we take the
291 path of the test program and append the test case name to it as a new
292 component, but this time using a colon as the delimiter.</para>
300 <title>Test cases</title>
302 <section id="tc-ids">
304 <title>Identifiers</title>
306 <para>Test case identifier vs. execution identifier.</para>
310 <section id="tc-types">
312 <title>Types and sizes</title>
314 <para>Test cases have a specific purpose and, as such, they will be tagged
315 by the developers. These types can be:</para>
319 <listitem><para>Unit test: ...</para></listitem>
321 <listitem><para>Integration test: ...</para></listitem>
323 <listitem><para>System test: ...</para></listitem>
327 <para>Orthogonally to test case types, tests also have a size defining
332 <listitem><para>Small: A test case that runs in
333 miliseconds.</para></listitem>
335 <listitem><para>Medium: A test case that runs in the order of few seconds
336 (less than 10).</para></listitem>
338 <listitem><para>Large: Any other test case.</para></listitem>
342 <para>Obviously, classifying the test cases by size is a very subjective
343 thing, because faster machines will make some medium test cases feel small
344 at some point. To-do: consider if we really want to do this...</para>
348 <section id="tc-results">
350 <title>Results</title>
352 <para>A test case results may terminate with any of the following
357 <listitem><para>Pass: All the checks in the test case were successful. No
358 additional information provided.</para></listitem>
360 <listitem><para>Fail: The test case explicitly failed; a textual reason
361 must be provided for this failure.</para></listitem>
363 <listitem><para>Skipped: The test case was not executed because some
364 conditions were not met; a textual reason must be provided to aid the user
365 in correcting the problems that prevented the test case from
366 running.</para></listitem>
368 <listitem><para>Expected failure: An error was detected in the test case
369 but it was expected. Useful to capture known bugs in test cases, but which
370 will not be fixed anytime soon.</para></listitem>
372 <listitem><para>Bogus: This is not a result raised by the test case, but is
373 a condition detected by the caller. A test case is deemed bogus when it
374 exits abruptly: i.e. it crashes at any point or it doesn't create the
375 results file.</para></listitem>
381 <section id="tc-reporting">
383 <title>Results reporting</title>
385 <para>A test case will create a file upon completion, which will contain
386 the results of the execution of that specific test case. If the test case
387 fails half-way through due to some unexpected error, the file will not be
388 created. Callers of the test case will then know that something went
389 horribly wrong and mark the test case as bogus.</para>
391 <para>Previous versions of ATF used a special file descriptor to report
392 their results to the caller. This seemed a good idea at the beginning
393 because I expected to have test cases not to create temporary directories,
394 but causes several problems: the test case can close the results file
395 descriptor and it is, I think, impossible to eventually implement this
396 approach in Win32 systems. As regards the former problem, though, the old
397 code uses a temporary file internally to store the results and lets the
398 test program monitor read that and redirect those results through the
399 desired file descriptor. That is redundant and uselessly complex: why not
400 use files all the way through in the first place? That's what we are going
409 <title>Test programs</title>
411 <para>A test program is a collection of related test cases with a common
412 run-time interface. Test cases need not be of the same type; i.e. a test
413 program could contain both unit and system tests.</para>
415 <section id="tp-ids">
417 <title>Identifiers</title>
419 <para>A test program has a name that must be unique in the directory it is
420 stored (obviously; file systems do not support multiple files with the same
421 name living in the same directory).</para>
423 <para>The test program is uniquely identified by the full path from the
424 test suite's root directory to the test program, including the test program
429 <section id="tp-disk">
431 <title>On-disk representation</title>
433 <para>Test programs are, by definition, binaries or scripts stored on disk.
434 However, we need to attach some meta-data to these programs, which makes
435 ATF test programs be stored as bundles on disk.</para>
437 <para>Lets consider a test program called <literal>wheel-test</literal> for
438 the super-interesting wheel class. The wheel-test contains the
439 <literal>can-spin</literal> and <literal>is-round</literal> test cases that
440 check if, well, the wheel can spin and if the wheel is round. This test
441 program is stored in a <literal>wheel-test.atf-tp</literal> directory whose
446 <listitem><para><filename>wheel-test.atf-tp/metadata</filename>: Contains
447 the list of available test cases, their description and their properties
448 (if any).</para></listitem>
450 <listitem><para><filename>wheel-test.atf-tp/executable</filename>: A binary
451 or shell script that implements the test cases described in the
452 metadata.</para></listitem>
456 <para>Why do we store the metadata separately from the binary? We want to
457 be able to inspect a whole tree of test programs as fast as possible and
458 collect information about all the available test cases and their
459 properties. This information can later be used to query which test cases
460 to run on each run -- just imagine a GUI providing the user the whole
461 (huge) list of test cases available in their systems (for all the
462 applications he has installed) and let him inspect this tree at
465 <para>Previous versions of ATF kept the metadata inside the binary and
466 provided a very rudimentary command-line interface in each binary to export
467 this data. The problem is that executing the binaries just to get this
468 information is a costly operation -- specially for shell-based tests --, so
469 this approach does not scale.</para>
471 <para>Of course, keeping the metadata separate from the executable can lead
472 to inconsistencies between the two, which will be dealt by checksumming the
473 binary and storing the criptographic checksum in the metadata. To-do:
474 decide which checksumming algorithm to use.</para>
476 <para>Open problem: how do we make it easy to generate this layout from the
477 build tools? Specially, how to painlessly tie this to Automake?</para>
481 <section id="tc-isolation">
483 <title>Test case isolation</title>
485 <para>Test programs contain a set of test cases, but we want to run each
486 test case as isolatedly as possible from each other. If we run the test
487 cases in the same process, they share the same memory, so they can mess
488 with global state that will affect the execution order.</para>
490 <para>Additionally, we want each test case to run in its own temporary
491 subdirectory so that it can create, as will, files and directories. The
492 run-time system must take care of cleaning everything up after
495 <para>Previous versions of ATF implemented this separation by making the
496 test program spawn a subprocess for each test case, and by making this same
497 test program deal with all other the nitty-gritty details of directory
498 isolation and cleanup. This turns out in tons of code duplication among
499 each language binding, and is quite hard to keep all implementations
500 consistent with each other. Furthermore, implementing this isolation in
501 shell scripts is painfully complex and obfuscated, which makes shell
502 scripts incredibly slow. At last, there is one more drawback: debugging of
503 failing test cases is hard because the forking of subprocesses collides
504 with debuggers; yes, gdb supports subprocess boundary crossing, but not in
505 all platforms.</para>
507 <para>An alternative approach is to make test programs
508 <emphasis>not</emphasis> do the isolation by themselves. Instead, we will
509 have atf-run to spawn a new, clean, isolated subprocess for each test case
510 and then just execute that test case. This will, most likely, be faster
511 than the current approach (because it will be implemented in C++) and will
512 be much easier to maintain.</para>
514 <para>There are two major drawbacks, though:</para>
518 <listitem><para>Running the test program by hand will leave tons of garbage
519 uncleaned; that is fine as long as we warn the tech-savvy user to
520 <emphasis>not do that</emphasis>.</para></listitem>
522 <listitem><para>The current libraries allow the programmer to define random
523 test cases anywhere in their program (not necessarily in a test program)
524 and run them in a isolated way by just running their <literal>run</literal>
525 method. If we remove the isolation from the test cases themselves, this
526 API should disappear, as it will not be safe any more to run a test case by
527 hand from within a program. Maybe not a big deal, though, because... who
528 wants to mix test cases with a regular application code?</para></listitem>
534 <section id="tp-cli">
536 <title>The command-line interface</title>
538 <para>All test programs must provide the same command-line interface so
539 that end users are not surprised by unknown and inconsistent flags and
540 arguments. We did a good job in previous versions of ATF in this regard,
541 but we are going to simplify the interface even further.</para>
543 <para>Given that test programs will not provide isolation for the test
544 cases they contain, we will not allow a single run of the test program to
545 execute more than one test case. If automation is needed to run several
546 tests in a sequence, the user will have to use atf-run.</para>
548 <para>With all that said, a test program will provide the following
551 <para>test-program [options] [test-case-name]</para>
553 <para>Note that we can only specify a single test case. For simplicity, we
554 are going to make it optional, in which case the test program will
555 <emphasis>only work</emphasis> if it defines a single test case. I do not
556 really like the idea, because adding another test case to the program will
557 break existing callers, <emphasis>but</emphasis> these are internal
558 binaries that must not be called directly, so there is no real harm done if
559 that happens. The simplicity is here provided only to make debugging
562 <para>The available options are as follows:</para>
566 <listitem><para>-h: Explicitly request help. The program must never print
567 the whole usage message unless asked to do so.</para></listitem>
569 <listitem><para>-r results-file: Path to the file where the execution
570 results will be stored.</para></listitem>
572 <listitem><para>-s srcdir: Path to the source directory where the test
573 program resides. We will not try to guess it at this point (atf-run will,
574 though) unless the source directory is the current directory, because there
575 is the potential of guessing incorrectly and confusing our users. We need
576 to know what the source directory is to be able to find the metadata file
577 and any auxiliary data files required by the test
578 program.</para></listitem>
580 <listitem><para>-v var=value: Sets the configuration variable var to value,
581 which test cases can later query.</para></listitem>
585 <para>Note that several flags provided by old ATF versions are gone.
586 Namely: -l is removed because the metadata is stored separately and -w is
587 removed because the test program will not create temporary directories any
588 more by itself.</para>
596 <title>Execution automation</title>
598 <para>The atf-run tool provides automation to run multiple test cases
599 (coming from different test programs) sequentially. Parallel execution may
600 be implemented in the future, but test cases must be desinged in a way that
601 allows them to be executed along other test cases without conflicts.</para>
603 <para>atf-run also provides isolation for test cases. This tool spawns a
604 subprocess for each of the tests that have to run, and in doing so it
605 prepares the subprocess to have a reasonable environmet and isolates it
606 from the rest of the test cases as much as possible. Once all this has
607 happened, the test program containing the test case is executed in the
608 subprocess and the results are collected from the results file generated by
609 the test case.</para>
611 <para>To-do: Do we need Atffiles? Probably not, so remove them and mention
612 why we are doing so.</para>
618 <title>The results store</title>
620 <para>The atf-store implements a database that contains information about
621 the execution of test cases. The database captures the results of each
622 test case as well as any potential information that is helpful for
623 debugging: i.e. the stdout and stderr outputs.</para>
625 <para>The store is <emphasis>historic</emphasis>: we want to keep the
626 history of a given test case. Why? Some of these test cases come from
627 build slaves and contain the whole results of a fetch/compile/test run, so
628 we want to see how things progress in history. Disk space is cheap, but if
629 we want to cleanup, we can cull old executions.</para>
631 <para>We will have different frontends for the store: I'm thinking that
632 atf-report could just read off the store and print the results on screen,
633 but we could also have a plugin for name-your-favourite-http-server to
634 generate a dynamic view of the test case results -- very useful for build
637 <para>Given the nature of the store, I think it'd be wise to use SQLite to
638 back it up, specially if it ever is to serve dynamic web content. If we go
639 this route, we should provide a not-really-optimized file-based backend for
640 those users that do not want to have an additional dependency (NetBSD
643 <para>The store will <emphasis>only</emphasis> be accessed by atf-store. I
644 do not want atf-run or the test programs to access it directly to store
645 their results. They must contact the atf-store binary to do so. Having a
646 single entry point to the store will prevent consistency issues. Now, this
647 brings up two big questions: where is the store located and how is it
650 <para>If we are running ATF interactively, we probably do not want to use
651 the store at all. However, for simplicity of implementation of tools such
652 as atf-run, they should always contact the store and let the store decide
653 what to do. For interactive runs, we can omit storing results and so
654 sending results to the store should result in a no-op. How does atf-report
657 <para>The store has to be accessible locally (through a pipe, named pipe or
658 whatever) but also remotely. We want build slaves to be able to send
659 results to the store on a push basis. Open issue: how do we deal with
666 <title>Build farms</title>
668 <para>Build farms, or continuous builds, are required for any software
669 project that wants to achieve a minimum amount of quality in one or more
670 platforms. ATF cannot disregard this use case.</para>
672 <para>The work of each build slave can be treated as a single test case,
673 and thus all of its work (source code fetching, building and testing) can
674 be collapsed into a single program that works as a test case. These
675 results can later be incorporated into test result reports effortlessly. A
676 more advanced approach involves splitting each stage (fetch, build, test)
677 as a separate test case, and then making these independent test cases
678 depend on each other. The writer of the build slave script has to be able
679 to decide the approach he prefers.</para>
681 <para>In order to support build farms, we just need to provide an easy way
682 of creating a test program (in POSIX shell) to act as a build slave. We
683 then stick a call to atf-run in cron calling this single test program and
684 make it deliver the results to a remote atf-store.</para>
691 vim: syntax=docbk:expandtab:shiftwidth=2:softtabstop=2:tw=75