external/bsd/atf/dist/doc/standalone/specification.html

   1 <?xml version="1.0"?>
   2 <html xmlns="http://www.w3.org/1999/xhtml">
   3 <head>
   4 <meta xmlns="" http-equiv="Content-Type"
   5 content="text/html; charset=iso-8859-1" />
   6 <link xmlns="" rel="made"
   7 href="mailto:atf-devel AT NetBSD DOT org" />
   8 <link xmlns="" rel="stylesheet" type="text/css"
   9 href="standalone.css" />
  10 <title xmlns="">Rearchitecting ATF: The missing
  11 specification</title>
  12 </head>
  13 <body>
  14 <div xmlns="" class="header">
  15 <p class="title">Rearchitecting ATF: The missing specification</p>
  16 <p class="author">By Julio Merino, The NetBSD Foundation</p>
  17 </div>
  18 <div xmlns="" class="toc">
  19 <h1>Contents</h1>
  20 <ol>
  21 <li>
  22 <p>
  23 <a href="#overview">Overview</a>
  24 </p>
  25 </li>
  26 <li>
  27 <p>
  28 <a href="#keys">Key features and differences</a>
  29 </p>
  30 </li>
  31 <li>
  32 <p>
  33 <a href="#scenarios">Scenarios</a>
  34 </p>
  35 <ol>
  36 <li>
  37 <p>
  38 <a href="#developer-scenario">The developer</a>
  39 </p>
  40 </li>
  41 <li>
  42 <p>
  43 <a href="#user-scenario">The end user</a>
  44 </p>
  45 </li>
  46 <li>
  47 <p>
  48 <a href="#admin-scenario">The administrator</a>
  49 </p>
  50 </li>
  51 <li>
  52 <p>
  53 <a href="#farm-scenario">Build farms</a>
  54 </p>
  55 </li>
  56 </ol>
  57 </li>
  58 <li>
  59 <p>
  60 <a href="#users">Users</a>
  61 </p>
  62 </li>
  63 <li>
  64 <p>
  65 <a href="#organization">Test case organization and identifiers</a>
  66 </p>
  67 <ol>
  68 <li>
  69 <p>
  70 <a href="#fslayout">File system layout</a>
  71 </p>
  72 </li>
  73 <li>
  74 <p>
  75 <a href="#identifiers">Identifiers</a>
  76 </p>
  77 </li>
  78 </ol>
  79 </li>
  80 <li>
  81 <p>
  82 <a href="#tcs">Test cases</a>
  83 </p>
  84 <ol>
  85 <li>
  86 <p>
  87 <a href="#tc-ids">Identifiers</a>
  88 </p>
  89 </li>
  90 <li>
  91 <p>
  92 <a href="#tc-types">Types and sizes</a>
  93 </p>
  94 </li>
  95 <li>
  96 <p>
  97 <a href="#tc-results">Results</a>
  98 </p>
  99 </li>
 100 <li>
 101 <p>
 102 <a href="#tc-reporting">Results reporting</a>
 103 </p>
 104 </li>
 105 </ol>
 106 </li>
 107 <li>
 108 <p>
 109 <a href="#tps">Test programs</a>
 110 </p>
 111 <ol>
 112 <li>
 113 <p>
 114 <a href="#tp-ids">Identifiers</a>
 115 </p>
 116 </li>
 117 <li>
 118 <p>
 119 <a href="#tp-disk">On-disk representation</a>
 120 </p>
 121 </li>
 122 <li>
 123 <p>
 124 <a href="#tc-isolation">Test case isolation</a>
 125 </p>
 126 </li>
 127 <li>
 128 <p>
 129 <a href="#tp-cli">The command-line interface</a>
 130 </p>
 131 </li>
 132 </ol>
 133 </li>
 134 <li>
 135 <p>
 136 <a href="#run">Execution automation</a>
 137 </p>
 138 </li>
 139 <li>
 140 <p>
 141 <a href="#store">The results store</a>
 142 </p>
 143 </li>
 144 <li>
 145 <p>
 146 <a href="#farms">Build farms</a>
 147 </p>
 148 </li>
 149 </ol>
 150 </div>
 151 <div xmlns="" class="contents">
 152 <div class="note">This document is very much WORK IN PROGRESS.
 153 Anything can change at any time without prior notice. Feel free to
 154 (and please do) raise comments about the major ideas herein
 155 described but DO NOT NITPICK. Be aware that even the ideas and
 156 design decisions in this document are not settled in stone; they
 157 may completely change too.</div>
 158 <h1>
 159 <a name="overview">Overview</a>
 160 </h1>
 161 <p xml:space="preserve">
 162 The Automated Testing Framework, or ATF for short, aims to provide
 163 a
 164 software testing platform for both developers and end users:
 165 </p>
 166 <ul>
 167 <li>
 168 <p xml:space="preserve">
 169 Developers want a set of libraries that make the
 170 implementation of test cases painless.
 171 </p>
 172 </li>
 173 <li>
 174 <p xml:space="preserve">
 175 Users want a set of tools that allow them to run the tests
 176 over and over and over and over again and generate beautiful
 177 reports with
 178 the results.
 179 </p>
 180 </li>
 181 </ul>
 182 <p xml:space="preserve">
 183 The development of ATF started as a Google Summer of Code 2007
 184 project for the NetBSD operating system.  Unfortunately, the code
 185 basically
 186 grew out of a prototype and a very loose specification.  The result
 187 is, to
 188 put it mildly, a real mess and a pain in the ass to maintain.
 189 Don't get me
 190 wrong: the code has grown pretty well based on the original design
 191 ideas,
 192 but the overall result has some problems that are really hard to
 193 fix
 194 without a major redesign.  Moreover, some of these problems have
 195 only
 196 materialized as a result of the reasonable maturity of ATF; they
 197 were
 198 really hard to predict in the first place.
 199 </p>
 200 <p xml:space="preserve">
 201 This specification aims to provide an ideal design for ATF (err,
 202 yes,
 203 a design for how it should have been architected in the first
 204 place).  It
 205 will be obvious that we will have to rewrite major portions of
 206 code, but I
 207 would expect to be able to reuse many parts of it.  Starting from
 208 scratch
 209 is not an option; incremental improvement will deliver results much
 210 earlier
 211 and allow for user assurance before we do new mistakes.
 212 </p>
 213 <h1>
 214 <a name="keys">Key features and differences</a>
 215 </h1>
 216 <p xml:space="preserve">
 217 The major features of ATF will be:
 218 </p>
 219 <ul>
 220 <li>
 221 <p xml:space="preserve">
 222 Lightweight libraries for C, C++ and POSIX shell scripting
 223 to implement test cases.
 224 </p>
 225 </li>
 226 <li>
 227 <p xml:space="preserve">
 228 Test cases designed to be installed on the target system so
 229 that they can be run much later after building the
 230 software.
 231 </p>
 232 </li>
 233 </ul>
 234 <p xml:space="preserve">
 235 The major differences between future versions of ATF and previous
 236 ones will be:
 237 </p>
 238 <ul>
 239 <li>
 240 <p xml:space="preserve">
 241 <i>Test programs don't perform isolation.</i>
 242 Before, test programs were overly-complicated by trying to isolate
 243 the
 244 subprocesses of their test cases from the rest of the test cases
 245 and the
 246 system.  This is very fragile, specially when implemented in POSIX
 247 shell.
 248 Therefore, isolation will now be performed from a single point,
 249 atf-run,
 250 just before forking the test case.
 251 </p>
 252 </li>
 253 <li>
 254 <p xml:space="preserve">
 255 <i>Test programs can only run one test case at a
 256 time.</i>  Related to the previous points, test programs will not
 257 run multiple test cases in a row any more, because they can't
 258 provide
 259 isolation.  Sequencing will be provided by atf-run.
 260 </p>
 261 </li>
 262 <li>
 263 <p xml:space="preserve">
 264 <i>Simple debugging.</i>  As test programs do
 265 not fork any more, debugging of failing test cases is easier, as
 266 gdb will
 267 Just Work (TM).
 268 </p>
 269 </li>
 270 <li>
 271 <p xml:space="preserve">
 272 <i>Test case metadata is stored out of the test
 273 program, in a special file.</i>  This is to allow efficient
 274 querying
 275 from external applications.  If you have attempted to run an old
 276 POSIX
 277 shell test program with the
 278 <tt>-l</tt> option to list the
 279 available test cases, you know what I mean; such approach does not
 280 scale at
 281 all.
 282 </p>
 283 </li>
 284 <li>
 285 <p xml:space="preserve">
 286 <i>Support for other test sources.</i>  We
 287 want to support adding results coming from "special" test programs
 288 to the
 289 report, such as build slaves or source code linters.
 290 </p>
 291 </li>
 292 <li>
 293 <p xml:space="preserve">
 294 <i>Remote reporting of test results.</i>
 295 Previously, atf-run and atf-report are able to generate test
 296 reports for a
 297 single run, but it's just not possible to merge these results with
 298 other
 299 executions or with results from other machines.  We will have a
 300 database,
 301 accessible remotely, containing results from multiple sources
 302 (different
 303 machines, different test cases, etc.) and providing historical
 304 information
 305 about these results.
 306 </p>
 307 </li>
 308 </ul>
 309 <h1>
 310 <a name="scenarios">Scenarios</a>
 311 </h1>
 312 <h2>
 313 <a name="developer-scenario">The developer</a>
 314 </h2>
 315 <p xml:space="preserve">
 316 The developer wants a set of libraries to be able to write test
 317 cases
 318 for his own software painlessly and as quickly as possible.  These
 319 libraries should have a clean interface and not expose internal
 320 details of
 321 the implementation (as the old libraries do).  Furthermore, another
 322 key
 323 point that the developer values is the ease of debugging of test
 324 cases:
 325 when a test case fails, running it in gdb or similar tools is
 326 crucial, and
 327 the framework should not get in the way to do that.  Unfortunately,
 328 previous versions of ATF make debugging really hard, so this is
 329 something
 330 to address in the future.
 331 </p>
 332 <h2>
 333 <a name="user-scenario">The end user</a>
 334 </h2>
 335 <p xml:space="preserve">
 336 It may be argued that the end user should never see the tests
 337 because, when he gets the application, he has to be able to assume
 338 that it
 339 is defect free.  Unfortunately, that is not the case.  Many
 340 developers do
 341 not have the resources to have build farms with all possible
 342 hardware/software configurations that their users may have, so
 343 testing is
 344 never complete.
 345 </p>
 346 <p xml:space="preserve">
 347 Even more, there is a very clear case in which the end user needs
 348 tests and for which there is no easy replacement.  Let's assume the
 349 user
 350 gets a shiny new version of the FlashyView image viewer.
 351 FlashyView has a
 352 dependency on the third-party libjpeg library to load and decode
 353 the image
 354 files.  At the moment of FlashyView's 1.0 release, its developers
 355 test the
 356 code against libjpeg 89.3.4 and all is right.  The user installs
 357 both
 358 FlashyView 1.0 and libjpeg 89.3.4 on his computer and all is good.
 359 However, one day his CleverOS operating system decides to upgrade
 360 libjpeg
 361 to 89.122.36 because, you know, both are compatible.  But the
 362 developers
 363 have only recently tested it with 89.122.35 and they don't know
 364 FlashyView
 365 1.0 doesn't work with 89.122.36.  If the user has the tests
 366 available, he
 367 will be able to run them after an upgrade and check that,
 368 effectively, some
 369 obscure features of FlashyView 1.0 have stopped working with
 370 89.122.36.
 371 This can be an invaluable help for critical applications or as part
 372 of the
 373 bug reporting procedure.
 374 </p>
 375 <h2>
 376 <a name="admin-scenario">The administrator</a>
 377 </h2>
 378 <p xml:space="preserve">
 379 System administrators need to set up beautiful new boxes pretty
 380 frequently.  But hardware is different on each of them, and the
 381 software
 382 developers do not have the luxury to have those uber-expensive
 383 machines to
 384 make sure that their software works fine in reversed-endian
 385 architectures.
 386 If the administrator has the tests readily available for all
 387 software
 388 components, he will be able to quickly assess whether the software
 389 installation will be stable or not in the new system.  He will
 390 similarly be
 391 able to assess the overall quality of the system after major and
 392 minor
 393 upgrades.
 394 </p>
 395 <h2>
 396 <a name="farm-scenario">Build farms</a>
 397 </h2>
 398 <p xml:space="preserve">
 399 I am adding build farms as a scenario because this is something
 400 that
 401 we really need to have but which was not addressed at all in older
 402 versions
 403 of ATF.  Virtually all software projects that want to address
 404 portability
 405 to different systems and/or architectures will need some kind of
 406 build
 407 automation in a set of machines (aka build slaves).  ATF has to
 408 provide
 409 ways to either allow the integration of these test results into the
 410 overall
 411 reports or to implement itself the necessary logic to provide a
 412 build
 413 farm.
 414 </p>
 415 <h1>
 416 <a name="users">Users</a>
 417 </h1>
 418 <p xml:space="preserve">
 419 The first and main consumer of ATF (during the very first releases,
 420 at least) will be The NetBSD Project.  As such, we need to make
 421 design
 422 decisions that benefit ATF in this context.  Some of these include:
 423 </p>
 424 <ul>
 425 <li>
 426 <p xml:space="preserve">
 427 No dependencies on third-party software.  The use of Boost
 428 or SQLite sounds tempting, as we shall see later on, but might
 429 result in a
 430 ban of ATF into the NetBSD source tree.  If a third-party component
 431 may
 432 result in high benefits in the code, it will be considered, but
 433 care has to
 434 be taken.
 435 </p>
 436 </li>
 437 <li>
 438 <p xml:space="preserve">
 439 Don't force C++.  Test case developers don't want to see
 440 C++ at all.  So the C library must be as clean as possible from
 441 C++-like
 442 artifacts.
 443 </p>
 444 </li>
 445 <li>
 446 <p xml:space="preserve">
 447 Speed matters.  Previous version of ATF run "reasonably
 448 fast" on modern computers, but are unbearably slow on not-so-old
 449 machines.
 450 This is not tolerable, given that NetBSD runs on many underpowered
 451 platforms and those are the ones that will most benefit from
 452 automated
 453 testing.
 454 </p>
 455 </li>
 456 </ul>
 457 <p xml:space="preserve">
 458 Of course I hope we'll have more consumers other than NetBSD, but
 459 for
 460 that to happen we must design a good product and then gain
 461 consumers at a
 462 slow pace.
 463 </p>
 464 <h1>
 465 <a name="organization">Test case organization and identifiers</a>
 466 </h1>
 467 <p xml:space="preserve">
 468 The smallest testing unit is a
 469 <i>test case</i>.  A
 470 test case has a specific purpose, like ensuring that a single
 471 method works
 472 fine (unit test) or ensuring that a specific command-line flag
 473 works as
 474 expected (system test).
 475 </p>
 476 <p xml:space="preserve">
 477 Test cases are grouped into
 478 <i>test programs</i>.
 479 These test programs act as mere frontends for the execution of the
 480 test
 481 cases they contain: there is absolutely no state sharing between
 482 different
 483 test cases at run time, even if they belong to the same test
 484 program.
 485 </p>
 486 <p xml:space="preserve">
 487 Test programs are stored in a subtree of the file system.  This
 488 subtree defines a
 489 <i>test suite</i>.
 490 </p>
 491 <h2>
 492 <a name="fslayout">File system layout</a>
 493 </h2>
 494 <p xml:space="preserve">
 495 In order to identify the root of a test suite, we will place a
 496 special control directory, named
 497 <tt>_ATF</tt>, as a child of the
 498 root's directory.  This directory will include a file, named
 499 <tt>test-suite</tt>, that contains the name of the test
 500 suite.
 501 </p>
 502 <p xml:space="preserve">
 503 Descending from the test suite root directory, we can find either
 504 subdirectories or test programs.  The former are used to organize
 505 test
 506 programs logically, while the later can be placed anywhere in the
 507 subtree.
 508 </p>
 509 <h2>
 510 <a name="identifiers">Identifiers</a>
 511 </h2>
 512 <p xml:space="preserve">
 513 Based on the tree layout that defines a test suite, each test
 514 program
 515 and test case can be identified by an absolute path from the root
 516 of the
 517 tree to the test program or test case, respectively.  Given that we
 518 impose
 519 a difference between test programs and test cases, we will reflect
 520 such
 521 differences in the paths.
 522 </p>
 523 <p xml:space="preserve">
 524 A test program is identified merely by the path from the test
 525 suite's
 526 root directory to it, and the components of this path are separated
 527 by
 528 forward slashes (just like in any Unix path).
 529 </p>
 530 <p xml:space="preserve">
 531 A test program is identified by a name that is unique within the
 532 test
 533 program.  To uniquely identify the test case within the tree, we
 534 take the
 535 path of the test program and append the test case name to it as a
 536 new
 537 component, but this time using a colon as the delimiter.
 538 </p>
 539 <h1>
 540 <a name="tcs">Test cases</a>
 541 </h1>
 542 <h2>
 543 <a name="tc-ids">Identifiers</a>
 544 </h2>
 545 <p xml:space="preserve">
 546 Test case identifier vs. execution identifier.
 547 </p>
 548 <h2>
 549 <a name="tc-types">Types and sizes</a>
 550 </h2>
 551 <p xml:space="preserve">
 552 Test cases have a specific purpose and, as such, they will be
 553 tagged
 554 by the developers.  These types can be:
 555 </p>
 556 <ol>
 557 <li>
 558 <p xml:space="preserve">
 559 Unit test: ...
 560 </p>
 561 </li>
 562 <li>
 563 <p xml:space="preserve">
 564 Integration test: ...
 565 </p>
 566 </li>
 567 <li>
 568 <p xml:space="preserve">
 569 System test: ...
 570 </p>
 571 </li>
 572 </ol>
 573 <p xml:space="preserve">
 574 Orthogonally to test case types, tests also have a size defining
 575 them:
 576 </p>
 577 <ol>
 578 <li>
 579 <p xml:space="preserve">
 580 Small: A test case that runs in
 581 miliseconds.
 582 </p>
 583 </li>
 584 <li>
 585 <p xml:space="preserve">
 586 Medium: A test case that runs in the order of few seconds
 587 (less than 10).
 588 </p>
 589 </li>
 590 <li>
 591 <p xml:space="preserve">
 592 Large: Any other test case.
 593 </p>
 594 </li>
 595 </ol>
 596 <p xml:space="preserve">
 597 Obviously, classifying the test cases by size is a very subjective
 598 thing, because faster machines will make some medium test cases
 599 feel small
 600 at some point.  To-do: consider if we really want to do this...
 601 </p>
 602 <h2>
 603 <a name="tc-results">Results</a>
 604 </h2>
 605 <p xml:space="preserve">
 606 A test case results may terminate with any of the following
 607 results:
 608 </p>
 609 <ul>
 610 <li>
 611 <p xml:space="preserve">
 612 Pass: All the checks in the test case were successful.  No
 613 additional information provided.
 614 </p>
 615 </li>
 616 <li>
 617 <p xml:space="preserve">
 618 Fail: The test case explicitly failed; a textual reason
 619 must be provided for this failure.
 620 </p>
 621 </li>
 622 <li>
 623 <p xml:space="preserve">
 624 Skipped: The test case was not executed because some
 625 conditions were not met; a textual reason must be provided to aid
 626 the user
 627 in correcting the problems that prevented the test case from
 628 running.
 629 </p>
 630 </li>
 631 <li>
 632 <p xml:space="preserve">
 633 Expected failure: An error was detected in the test case
 634 but it was expected.  Useful to capture known bugs in test cases,
 635 but which
 636 will not be fixed anytime soon.
 637 </p>
 638 </li>
 639 <li>
 640 <p xml:space="preserve">
 641 Bogus: This is not a result raised by the test case, but is
 642 a condition detected by the caller.  A test case is deemed bogus
 643 when it
 644 exits abruptly: i.e. it crashes at any point or it doesn't create
 645 the
 646 results file.
 647 </p>
 648 </li>
 649 </ul>
 650 <h2>
 651 <a name="tc-reporting">Results reporting</a>
 652 </h2>
 653 <p xml:space="preserve">
 654 A test case will create a file upon completion, which will contain
 655 the results of the execution of that specific test case.  If the
 656 test case
 657 fails half-way through due to some unexpected error, the file will
 658 not be
 659 created.  Callers of the test case will then know that something
 660 went
 661 horribly wrong and mark the test case as bogus.
 662 </p>
 663 <p xml:space="preserve">
 664 Previous versions of ATF used a special file descriptor to report
 665 their results to the caller.  This seemed a good idea at the
 666 beginning
 667 because I expected to have test cases not to create temporary
 668 directories,
 669 but causes several problems: the test case can close the results
 670 file
 671 descriptor and it is, I think, impossible to eventually implement
 672 this
 673 approach in Win32 systems.  As regards the former problem, though,
 674 the old
 675 code uses a temporary file internally to store the results and lets
 676 the
 677 test program monitor read that and redirect those results through
 678 the
 679 desired file descriptor.  That is redundant and uselessly complex:
 680 why not
 681 use files all the way through in the first place?  That's what we
 682 are going
 683 to do.
 684 </p>
 685 <h1>
 686 <a name="tps">Test programs</a>
 687 </h1>
 688 <p xml:space="preserve">
 689 A test program is a collection of related test cases with a common
 690 run-time interface.  Test cases need not be of the same type; i.e.
 691 a test
 692 program could contain both unit and system tests.
 693 </p>
 694 <h2>
 695 <a name="tp-ids">Identifiers</a>
 696 </h2>
 697 <p xml:space="preserve">
 698 A test program has a name that must be unique in the directory it
 699 is
 700 stored (obviously; file systems do not support multiple files with
 701 the same
 702 name living in the same directory).
 703 </p>
 704 <p xml:space="preserve">
 705 The test program is uniquely identified by the full path from the
 706 test suite's root directory to the test program, including the test
 707 program
 708 name itself.
 709 </p>
 710 <h2>
 711 <a name="tp-disk">On-disk representation</a>
 712 </h2>
 713 <p xml:space="preserve">
 714 Test programs are, by definition, binaries or scripts stored on
 715 disk.
 716 However, we need to attach some meta-data to these programs, which
 717 makes
 718 ATF test programs be stored as bundles on disk.
 719 </p>
 720 <p xml:space="preserve">
 721 Lets consider a test program called
 722 <tt>wheel-test</tt> for
 723 the super-interesting wheel class.  The wheel-test contains the
 724 <tt>can-spin</tt> and
 725 <tt>is-round</tt> test cases that
 726 check if, well, the wheel can spin and if the wheel is round.  This
 727 test
 728 program is stored in a
 729 <tt>wheel-test.atf-tp</tt> directory whose
 730 contents are:
 731 </p>
 732 <ul>
 733 <li>
 734 <p xml:space="preserve">
 735 <tt class="filename">wheel-test.atf-tp/metadata</tt>: Contains
 736 the list of available test cases, their description and their
 737 properties
 738 (if any).
 739 </p>
 740 </li>
 741 <li>
 742 <p xml:space="preserve">
 743 <tt class="filename">wheel-test.atf-tp/executable</tt>: A binary
 744 or shell script that implements the test cases described in the
 745 metadata.
 746 </p>
 747 </li>
 748 </ul>
 749 <p xml:space="preserve">
 750 Why do we store the metadata separately from the binary?  We want
 751 to
 752 be able to inspect a whole tree of test programs as fast as
 753 possible and
 754 collect information about all the available test cases and their
 755 properties.  This information can later be used to query which test
 756 cases
 757 to run on each run -- just imagine a GUI providing the user the
 758 whole
 759 (huge) list of test cases available in their systems (for all the
 760 applications he has installed) and let him inspect this tree at
 761 will.
 762 </p>
 763 <p xml:space="preserve">
 764 Previous versions of ATF kept the metadata inside the binary and
 765 provided a very rudimentary command-line interface in each binary
 766 to export
 767 this data.  The problem is that executing the binaries just to get
 768 this
 769 information is a costly operation -- specially for shell-based
 770 tests --, so
 771 this approach does not scale.
 772 </p>
 773 <p xml:space="preserve">
 774 Of course, keeping the metadata separate from the executable can
 775 lead
 776 to inconsistencies between the two, which will be dealt by
 777 checksumming the
 778 binary and storing the criptographic checksum in the metadata.
 779 To-do:
 780 decide which checksumming algorithm to use.
 781 </p>
 782 <p xml:space="preserve">
 783 Open problem: how do we make it easy to generate this layout from
 784 the
 785 build tools?  Specially, how to painlessly tie this to Automake?
 786 </p>
 787 <h2>
 788 <a name="tc-isolation">Test case isolation</a>
 789 </h2>
 790 <p xml:space="preserve">
 791 Test programs contain a set of test cases, but we want to run each
 792 test case as isolatedly as possible from each other.  If we run the
 793 test
 794 cases in the same process, they share the same memory, so they can
 795 mess
 796 with global state that will affect the execution order.
 797 </p>
 798 <p xml:space="preserve">
 799 Additionally, we want each test case to run in its own temporary
 800 subdirectory so that it can create, as will, files and directories.
 801  The
 802 run-time system must take care of cleaning everything up after
 803 execution.
 804 </p>
 805 <p xml:space="preserve">
 806 Previous versions of ATF implemented this separation by making the
 807 test program spawn a subprocess for each test case, and by making
 808 this same
 809 test program deal with all other the nitty-gritty details of
 810 directory
 811 isolation and cleanup.  This turns out in tons of code duplication
 812 among
 813 each language binding, and is quite hard to keep all
 814 implementations
 815 consistent with each other.  Furthermore, implementing this
 816 isolation in
 817 shell scripts is painfully complex and obfuscated, which makes
 818 shell
 819 scripts incredibly slow.  At last, there is one more drawback:
 820 debugging of
 821 failing test cases is hard because the forking of subprocesses
 822 collides
 823 with debuggers; yes, gdb supports subprocess boundary crossing, but
 824 not in
 825 all platforms.
 826 </p>
 827 <p xml:space="preserve">
 828 An alternative approach is to make test programs
 829 <i>not</i> do the isolation by themselves.  Instead, we will
 830 have atf-run to spawn a new, clean, isolated subprocess for each
 831 test case
 832 and then just execute that test case.  This will, most likely, be
 833 faster
 834 than the current approach (because it will be implemented in C++)
 835 and will
 836 be much easier to maintain.
 837 </p>
 838 <p xml:space="preserve">
 839 There are two major drawbacks, though:
 840 </p>
 841 <ul>
 842 <li>
 843 <p xml:space="preserve">
 844 Running the test program by hand will leave tons of garbage
 845 uncleaned; that is fine as long as we warn the tech-savvy user to
 846 <i>not do that</i>.
 847 </p>
 848 </li>
 849 <li>
 850 <p xml:space="preserve">
 851 The current libraries allow the programmer to define random
 852 test cases anywhere in their program (not necessarily in a test
 853 program)
 854 and run them in a isolated way by just running their
 855 <tt>run</tt>
 856 method.  If we remove the isolation from the test cases themselves,
 857 this
 858 API should disappear, as it will not be safe any more to run a test
 859 case by
 860 hand from within a program.  Maybe not a big deal, though,
 861 because... who
 862 wants to mix test cases with a regular application code?
 863 </p>
 864 </li>
 865 </ul>
 866 <h2>
 867 <a name="tp-cli">The command-line interface</a>
 868 </h2>
 869 <p xml:space="preserve">
 870 All test programs must provide the same command-line interface so
 871 that end users are not surprised by unknown and inconsistent flags
 872 and
 873 arguments.  We did a good job in previous versions of ATF in this
 874 regard,
 875 but we are going to simplify the interface even further.
 876 </p>
 877 <p xml:space="preserve">
 878 Given that test programs will not provide isolation for the test
 879 cases they contain, we will not allow a single run of the test
 880 program to
 881 execute more than one test case.  If automation is needed to run
 882 several
 883 tests in a sequence, the user will have to use atf-run.
 884 </p>
 885 <p xml:space="preserve">
 886 With all that said, a test program will provide the following
 887 interface:
 888 </p>
 889 <p xml:space="preserve">
 890 test-program [options] [test-case-name]
 891 </p>
 892 <p xml:space="preserve">
 893 Note that we can only specify a single test case.  For simplicity,
 894 we
 895 are going to make it optional, in which case the test program will
 896 <i>only work</i> if it defines a single test case.  I do not
 897 really like the idea, because adding another test case to the
 898 program will
 899 break existing callers,
 900 <i>but</i> these are internal
 901 binaries that must not be called directly, so there is no real harm
 902 done if
 903 that happens.  The simplicity is here provided only to make
 904 debugging
 905 easier.
 906 </p>
 907 <p xml:space="preserve">
 908 The available options are as follows:
 909 </p>
 910 <ul>
 911 <li>
 912 <p xml:space="preserve">
 913 -h: Explicitly request help.  The program must never print
 914 the whole usage message unless asked to do so.
 915 </p>
 916 </li>
 917 <li>
 918 <p xml:space="preserve">
 919 -r results-file: Path to the file where the execution
 920 results will be stored.
 921 </p>
 922 </li>
 923 <li>
 924 <p xml:space="preserve">
 925 -s srcdir: Path to the source directory where the test
 926 program resides.  We will not try to guess it at this point
 927 (atf-run will,
 928 though) unless the source directory is the current directory,
 929 because there
 930 is the potential of guessing incorrectly and confusing our users.
 931 We need
 932 to know what the source directory is to be able to find the
 933 metadata file
 934 and any auxiliary data files required by the test
 935 program.
 936 </p>
 937 </li>
 938 <li>
 939 <p xml:space="preserve">
 940 -v var=value: Sets the configuration variable var to value,
 941 which test cases can later query.
 942 </p>
 943 </li>
 944 </ul>
 945 <p xml:space="preserve">
 946 Note that several flags provided by old ATF versions are gone.
 947 Namely: -l is removed because the metadata is stored separately and
 948 -w is
 949 removed because the test program will not create temporary
 950 directories any
 951 more by itself.
 952 </p>
 953 <h1>
 954 <a name="run">Execution automation</a>
 955 </h1>
 956 <p xml:space="preserve">
 957 The atf-run tool provides automation to run multiple test cases
 958 (coming from different test programs) sequentially.  Parallel
 959 execution may
 960 be implemented in the future, but test cases must be desinged in a
 961 way that
 962 allows them to be executed along other test cases without
 963 conflicts.
 964 </p>
 965 <p xml:space="preserve">
 966 atf-run also provides isolation for test cases.  This tool spawns a
 967 subprocess for each of the tests that have to run, and in doing so
 968 it
 969 prepares the subprocess to have a reasonable environmet and
 970 isolates it
 971 from the rest of the test cases as much as possible.  Once all this
 972 has
 973 happened, the test program containing the test case is executed in
 974 the
 975 subprocess and the results are collected from the results file
 976 generated by
 977 the test case.
 978 </p>
 979 <p xml:space="preserve">
 980 To-do: Do we need Atffiles?  Probably not, so remove them and
 981 mention
 982 why we are doing so.
 983 </p>
 984 <h1>
 985 <a name="store">The results store</a>
 986 </h1>
 987 <p xml:space="preserve">
 988 The atf-store implements a database that contains information about
 989 the execution of test cases.  The database captures the results of
 990 each
 991 test case as well as any potential information that is helpful for
 992 debugging: i.e. the stdout and stderr outputs.
 993 </p>
 994 <p xml:space="preserve">
 995 The store is
 996 <i>historic</i>: we want to keep the
 997 history of a given test case.  Why?  Some of these test cases come
 998 from
 999 build slaves and contain the whole results of a fetch/compile/test
1000 run, so
1001 we want to see how things progress in history.  Disk space is
1002 cheap, but if
1003 we want to cleanup, we can cull old executions.
1004 </p>
1005 <p xml:space="preserve">
1006 We will have different frontends for the store: I'm thinking that
1007 atf-report could just read off the store and print the results on
1008 screen,
1009 but we could also have a plugin for name-your-favourite-http-server
1010 to
1011 generate a dynamic view of the test case results -- very useful for
1012 build
1013 farms.
1014 </p>
1015 <p xml:space="preserve">
1016 Given the nature of the store, I think it'd be wise to use SQLite
1017 to
1018 back it up, specially if it ever is to serve dynamic web content.
1019 If we go
1020 this route, we should provide a not-really-optimized file-based
1021 backend for
1022 those users that do not want to have an additional dependency
1023 (NetBSD
1024 anyone?).
1025 </p>
1026 <p xml:space="preserve">
1027 The store will
1028 <i>only</i> be accessed by atf-store.  I
1029 do not want atf-run or the test programs to access it directly to
1030 store
1031 their results.  They must contact the atf-store binary to do so.
1032 Having a
1033 single entry point to the store will prevent consistency issues.
1034 Now, this
1035 brings up two big questions: where is the store located and how is
1036 it
1037 accessed?
1038 </p>
1039 <p xml:space="preserve">
1040 If we are running ATF interactively, we probably do not want to use
1041 the store at all.  However, for simplicity of implementation of
1042 tools such
1043 as atf-run, they should always contact the store and let the store
1044 decide
1045 what to do.  For interactive runs, we can omit storing results and
1046 so
1047 sending results to the store should result in a no-op.  How does
1048 atf-report
1049 work then?
1050 </p>
1051 <p xml:space="preserve">
1052 The store has to be accessible locally (through a pipe, named pipe
1053 or
1054 whatever) but also remotely.  We want build slaves to be able to
1055 send
1056 results to the store on a push basis.  Open issue: how do we deal
1057 with
1058 security?
1059 </p>
1060 <h1>
1061 <a name="farms">Build farms</a>
1062 </h1>
1063 <p xml:space="preserve">
1064 Build farms, or continuous builds, are required for any software
1065 project that wants to achieve a minimum amount of quality in one or
1066 more
1067 platforms.  ATF cannot disregard this use case.
1068 </p>
1069 <p xml:space="preserve">
1070 The work of each build slave can be treated as a single test case,
1071 and thus all of its work (source code fetching, building and
1072 testing) can
1073 be collapsed into a single program that works as a test case.
1074 These
1075 results can later be incorporated into test result reports
1076 effortlessly.  A
1077 more advanced approach involves splitting each stage (fetch, build,
1078 test)
1079 as a separate test case, and then making these independent test
1080 cases
1081 depend on each other.  The writer of the build slave script has to
1082 be able
1083 to decide the approach he prefers.
1084 </p>
1085 <p xml:space="preserve">
1086 In order to support build farms, we just need to provide an easy
1087 way
1088 of creating a test program (in POSIX shell) to act as a build
1089 slave.  We
1090 then stick a call to atf-run in cron calling this single test
1091 program and
1092 make it deliver the results to a remote atf-store.
1093 </p>
1094 </div>
1095 </body>
1096 </html>