benchmarks/unixbench-5.1.2/USAGE

   1 Running the Tests
   2 =================
   3
   4 All the tests are executed using the "Run" script in the top-level directory.
   5
   6 The simplest way to generate results is with the commmand:
   7     ./Run
   8
   9 This will run a standard "index" test (see "The BYTE Index" below), and
  10 save the report in the "results" directory, with a filename like
  11     hostname-2007-09-23-01
  12 An HTML version is also saved.
  13
  14 If you want to generate both the basic system index and the graphics index,
  15 then do:
  16     ./Run gindex
  17
  18 If your system has more than one CPU, the tests will be run twice -- once
  19 with a single copy of each test running at once, and once with N copies,
  20 where N is the number of CPUs.  Some categories of tests, however (currently
  21 the graphics tests) will only run with a single copy.
  22
  23 Since the tests are based on constant time (variable work), a "system"
  24 run usually takes about 29 minutes; the "graphics" part about 18 minutes.
  25 A "gindex" run on a dual-core machine will do 2 "system" passes (single-
  26 and dual-processing) and one "graphics" run, for a total around one and
  27 a quarter hours.
  28
  29 ============================================================================
  30
  31 Detailed Usage
  32 ==============
  33
  34 The Run script takes a number of options which you can use to customise a
  35 test, and you can specify the names of the tests to run.  The full usage
  36 is:
  37
  38     Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...]
  39
  40 The option flags are:
  41
  42   -q            Run in quiet mode.
  43   -v            Run in verbose mode.
  44   -i <count>    Run <count> iterations for each test -- slower tests
  45                 use <count> / 3, but at least 1.  Defaults to 10 (3 for
  46                 slow tests).
  47   -c <n>        Run <n> copies of each test in parallel.
  48
  49 The -c option can be given multiple times; for example:
  50
  51     ./Run -c 1 -c 4
  52
  53 will run a single-streamed pass, then a 4-streamed pass.  Note that some
  54 tests (currently the graphics tests) will only run in a single-streamed pass.
  55
  56 The remaining non-flag arguments are taken to be the names of tests to run.
  57 The default is to run "index".  See "Tests" below.
  58
  59 When running the tests, I do *not* recommend switching to single-user mode
  60 ("init 1").  This seems to change the results in ways I don't understand,
  61 and it's not realistic (unless your system will actually be running in this
  62 mode, of course).  However, if using a windowing system, you may want to
  63 switch to a minimal window setup (for example, log in to a "twm" session),
  64 so that randomly-churning background processes don't randomise the results
  65 too much.  This is particularly true for the graphics tests.
  66
  67
  68 ============================================================================
  69
  70 Tests
  71 =====
  72
  73 The available tests are organised into categories; when generating index
  74 scores (see "The BYTE Index" below) the results for each category are
  75 produced separately.  The categories are:
  76
  77    system          The original Unix system tests (not all are actually
  78                    in the index)
  79    2d              2D graphics tests (not all are actually in the index)
  80    3d              3D graphics tests
  81    misc            Various non-indexed tests
  82
  83 The following individual tests are available:
  84
  85   system:
  86     dhry2reg         Dhrystone 2 using register variables
  87     whetstone-double Double-Precision Whetstone
  88     syscall          System Call Overhead
  89     pipe             Pipe Throughput
  90     context1         Pipe-based Context Switching
  91     spawn            Process Creation
  92     execl            Execl Throughput
  93     fstime-w         File Write 1024 bufsize 2000 maxblocks
  94     fstime-r         File Read 1024 bufsize 2000 maxblocks
  95     fstime           File Copy 1024 bufsize 2000 maxblocks
  96     fsbuffer-w       File Write 256 bufsize 500 maxblocks
  97     fsbuffer-r       File Read 256 bufsize 500 maxblocks
  98     fsbuffer         File Copy 256 bufsize 500 maxblocks
  99     fsdisk-w         File Write 4096 bufsize 8000 maxblocks
 100     fsdisk-r         File Read 4096 bufsize 8000 maxblocks
 101     fsdisk           File Copy 4096 bufsize 8000 maxblocks
 102     shell1           Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1")
 103     shell8           Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8")
 104     shell16          Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16")
 105
 106   2d:
 107     2d-rects         2D graphics: rectangles
 108     2d-lines         2D graphics: lines
 109     2d-circle        2D graphics: circles
 110     2d-ellipse       2D graphics: ellipses
 111     2d-shapes        2D graphics: polygons
 112     2d-aashapes      2D graphics: aa polygons
 113     2d-polys         2D graphics: complex polygons
 114     2d-text          2D graphics: text
 115     2d-blit          2D graphics: images and blits
 116     2d-window        2D graphics: windows
 117
 118   3d:
 119     ubgears          3D graphics: gears
 120
 121   misc:
 122     C                C Compiler Throughput ("looper 60 $cCompiler cctest.c")
 123     arithoh          Arithoh (huh?)
 124     short            Arithmetic Test (short) (this is arith.c configured for
 125                      "short" variables; ditto for the ones below)
 126     int              Arithmetic Test (int)
 127     long             Arithmetic Test (long)
 128     float            Arithmetic Test (float)
 129     double           Arithmetic Test (double)
 130     dc               Dc: sqrt(2) to 99 decimal places (runs
 131                      "looper 30 dc < dc.dat", using your system's copy of "dc")
 132     hanoi            Recursion Test -- Tower of Hanoi
 133     grep             Grep for a string in a large file, using your system's
 134                      copy of "grep"
 135     sysexec          Exercise fork() and exec().
 136
 137 The following pseudo-test names are aliases for combinations of other
 138 tests:
 139
 140     arithmetic       Runs arithoh, short, int, long, float, double,
 141                      and whetstone-double
 142     dhry             Alias for dhry2reg
 143     dhrystone        Alias for dhry2reg
 144     whets            Alias for whetstone-double
 145     whetstone        Alias for whetstone-double
 146     load             Runs shell1, shell8, and shell16
 147     misc             Runs C, dc, and hanoi
 148     speed            Runs the arithmetic and system groups
 149     oldsystem        Runs execl, fstime, fsbuffer, fsdisk, pipe, context1,
 150                      spawn, and syscall
 151     system           Runs oldsystem plus shell1, shell8, and shell16
 152     fs               Runs fstime-w, fstime-r, fstime, fsbuffer-w,
 153                      fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk
 154     shell            Runs shell1, shell8, and shell16
 155
 156     index            Runs the tests which constitute the official index:
 157                      the oldsystem group, plus dhry2reg, whetstone-double,
 158                      shell1, and shell8
 159                      See "The BYTE Index" below for more information.
 160     graphics         Runs the tests which constitute the graphics index:
 161                      2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit,
 162                      2d-window, and ubgears
 163     gindex           Runs the index and graphics groups, to generate both
 164                      sets of index results
 165
 166     all              Runs all tests
 167
 168
 169 ============================================================================
 170
 171 The BYTE Index
 172 ==============
 173
 174 The purpose of this test is to provide a basic indicator of the performance
 175 of a Unix-like system; hence, multiple tests are used to test various
 176 aspects of the system's performance.  These test results are then compared
 177 to the scores from a baseline system to produce an index value, which is
 178 generally easier to handle than the raw sores.  The entire set of index
 179 values is then combined to make an overall index for the system.
 180
 181 Since 1995, the baseline system has been "George", a SPARCstation 20-61
 182 with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings
 183 were set at 10.0.  (So a system which scores 520 is 52 times faster than
 184 this machine.)  Since the numbers are really only useful in a relative
 185 sense, there's no particular reason to update the base system, so for the
 186 sake of consistency it's probably best to leave it alone.  George's scores
 187 are in the file "pgms/index.base"; this file is used to calculate the
 188 index scores for any particular run.
 189
 190 Over the years, various changes have been made to the set of tests in the
 191 index.  Although there is a desire for a consistent baseline, various tests
 192 have been determined to be misleading, and have been removed; and a few
 193 alternatives have been added.  These changes are detailed in the README,
 194 and should be born in mind when looking at old scores.
 195
 196 A number of tests are included in the benchmark suite which are not part of
 197 the index, for various reasons; these tests can of course be run manually.
 198 See "Tests" above.
 199
 200
 201 ============================================================================
 202
 203 Graphics Tests
 204 ==============
 205
 206 As of version 5.1, UnixBench now contains some graphics benchmarks.  These
 207 are intended to give a rough idea of the general graphics performance of
 208 a system.
 209
 210 The graphics tests are in categories "2d" and "3d", so the index scores
 211 for these tests are separate from the basic system index.  This seems
 212 like a sensible division, since the graphics performance of a system
 213 depends largely on the graphics adaptor.
 214
 215 The tests currently consist of some 2D "x11perf" tests and "ubgears".
 216
 217 * The 2D tests are a selection of the x11perf tests, using the host
 218   system's x11perf command (which must be installed and in the search
 219   path).  Only a few of the x11perf tests are used, in the interests
 220   of completing a test run in a reasonable time; if you want to do
 221   detailed diagnosis of an X server or graphics chip, then use x11perf
 222   directly.
 223
 224 * The 3D test is "ubgears", a modified version of the familiar "glxgears".
 225   This version runs for 5 seconds to "warm up", then performs a timed
 226   run and displays the average frames-per-second.
 227
 228 On multi-CPU systems, the graphics tests will only run in single-processing
 229 mode.  This is because the meaning of running two copies of a test at once
 230 is dubious; and the test windows tend to overlay each other, meaning that
 231 the window behind isn't actually doing any work.
 232
 233
 234 ============================================================================
 235
 236 Multiple CPUs
 237 =============
 238
 239 If your system has multiple CPUs, the default behaviour is to run the selected
 240 tests twice -- once with one copy of each test program running at a time,
 241 and once with N copies, where N is the number of CPUs.  (You can override
 242 this with the "-c" option; see "Detailed Usage" above.)  This is designed to
 243 allow you to assess:
 244
 245  - the performance of your system when running a single task
 246  - the performance of your system when running multiple tasks
 247  - the gain from your system's implementation of parallel processing
 248
 249 The results, however, need to be handled with care.  Here are the results
 250 of two runs on a dual-processor system, one in single-processing mode, one
 251 dual-processing:
 252
 253   Test                    Single     Dual   Gain
 254   --------------------    ------   ------   ----
 255   Dhrystone 2              562.5   1110.3    97%
 256   Double Whetstone         320.0    640.4   100%
 257   Execl Throughput         450.4    880.3    95%
 258   File Copy 1024           759.4    595.9   -22%
 259   File Copy 256            535.8    438.8   -18%
 260   File Copy 4096          1261.8   1043.4   -17%
 261   Pipe Throughput          481.0    979.3   104%
 262   Pipe-based Switching     326.8   1229.0   276%
 263   Process Creation         917.2   1714.1    87%
 264   Shell Scripts (1)       1064.9   1566.3    47%
 265   Shell Scripts (8)       1567.7   1709.9     9%
 266   System Call Overhead     944.2   1445.5    53%
 267   --------------------    ------   ------   ----
 268   Index Score:             678.2   1026.2    51%
 269
 270 As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone,
 271 execl, pipe throughput, process creation -- show close to 100% gain when
 272 running 2 copies in parallel.
 273
 274 The Pipe-based Context Switching test measures context switching overhead
 275 by sending messages back and forth between 2 processes.  I don't know why
 276 it shows such a huge gain with 2 copies (ie. 4 processes total) running,
 277 but it seems to be consistent on my system.  I think this may be an issue
 278 with the SMP implementation.
 279
 280 The System Call Overhead shows a lesser gain, presumably because it uses a
 281 lot of CPU time in single-threaded kernel code.  The shell scripts test with
 282 8 concurrent processes shows no gain -- because the test itself runs 8
 283 scripts in parallel, it's already using both CPUs, even when the benchmark
 284 is run in single-stream mode.  The same test with one process per copy
 285 shows a real gain.
 286
 287 The filesystem throughput tests show a loss, instead of a gain, when
 288 multi-processing.  That there's no gain is to be expected, since the tests
 289 are presumably constrained by the throughput of the I/O subsystem and the
 290 disk drive itself; the drop in performance is presumably down to the
 291 increased contention for resources, and perhaps greater disk head movement.
 292
 293 So what tests should you use, how many copies should you run, and how should
 294 you interpret the results?  Well, that's up to you, since it depends on
 295 what it is you're trying to measure.
 296
 297 Implementation
 298 --------------
 299
 300 The multi-processing mode is implemented at the level of test iterations.
 301 During each iteration of a test, N slave processes are started using fork().
 302 Each of these slaves executes the test program using fork() and exec(),
 303 reads and stores the entire output, times the run, and prints all the
 304 results to a pipe.  The Run script reads the pipes for each of the slaves
 305 in turn to get the results and times.  The scores are added, and the times
 306 averaged.
 307
 308 The result is that each test program has N copies running at once.  They
 309 should all finish at around the same time, since they run for constant time.
 310
 311 If a test program itself starts off K multiple processes (as with the shell8
 312 test), then the effect will be that there are N * K processes running at
 313 once.  This is probably not very useful for testing multi-CPU performance.
 314
 315
 316 ============================================================================
 317
 318 The Language Setting
 319 ====================
 320
 321 The $LANG environment variable determines how programs abnd library
 322 routines interpret text.  This can have a big impact on the test results.
 323
 324 If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if
 325 it is set to en_US.UTF-8, foir example, then text is treated as being
 326 encoded in UTF-8, which is more complex and therefore slower.  Setting
 327 it to other languages can have varying results.
 328
 329 To ensure consistency between test runs, the Run script now (as of version
 330 5.1.1) sets $LANG to "en_US.utf8".
 331
 332 This setting which is configured with the variable "$language".  You
 333 should not change this if you want to share your results to allow
 334 comparisons between systems; however, you may want to change it to see
 335 how different language settings affect performance.
 336
 337 Each test report now includes the language settings in use.  The reported
 338 language is what is set in $LANG, and is not necessarily supported by the
 339 system; but we also report the character mapping and collation order which
 340 are actually in use (as reported by "locale").
 341
 342
 343 ============================================================================
 344
 345 Interpreting the Results
 346 ========================
 347
 348 Interpreting the results of these tests is tricky, and totally depends on
 349 what you're trying to measure.
 350
 351 For example, are you trying to measure how fast your CPU is?  Or how good
 352 your compiler is?  Because these tests are all recompiled using your host
 353 system's compiler, the performance of the compiler will inevitably impact
 354 the performance of the tests.  Is this a problem?  If you're choosing a
 355 system, you probably care about its overall speed, which may well depend
 356 on how good its compiler is; so including that in the test results may be
 357 the right answer.  But you may want to ensure that the right compiler is
 358 used to build the tests.
 359
 360 On the other hand, with the vast majority of Unix systems being x86 / PC
 361 compatibles, running Linux and the GNU C compiler, the results will tend
 362 to be more dependent on the hardware; but the versions of the compiler and
 363 OS can make a big difference.  (I measured a 50% gain between SUSE 10.1
 364 and OpenSUSE 10.2 on the same machine.)  So you may want to make sure that
 365 all your test systems are running the same version of the OS; or at least
 366 publish the OS and compuiler versions with your results.  Then again, it may
 367 be compiler performance that you're interested in.
 368
 369 The C test is very dubious -- it tests the speed of compilation.  If you're
 370 running the exact same compiler on each system, OK; but otherwise, the
 371 results should probably be discarded.  A slower compilation doesn't say
 372 anything about the speed of your system, since the compiler may simply be
 373 spending more time to super-optimise the code, which would actually make it
 374 faster.
 375
 376 This will be particularly true on architectures like IA-64 (Itanium etc.)
 377 where the compiler spends huge amounts of effort scheduling instructions
 378 to run in parallel, with a resultant significant gain in execution speed.
 379
 380 Some tests are even more dubious in terms of host-dependency -- for example,
 381 the "dc" test uses the host's version of dc (a calculator program).  The
 382 version of this which is available can make a huge difference to the score,
 383 which is why it's not in the index group.  Read through the release notes
 384 for more on these kinds of issues.
 385
 386 Another age-old issue is that of the benchmarks being too trivial to be
 387 meaningful.  With compilers getting ever smarter, and performing more
 388 wide-ranging flow path analyses, the danger of parts of the benchmarks
 389 simply being optimised out of existance is always present.
 390
 391 All in all, the "index" and "gindex" tests (see above) are designed to
 392 give a reasonable measure of overall system performance; but the results
 393 of any test run should always be used with care.
 394