documentation/benchmarking.html

   1 <html>
   2
   3 <head>
   4 <meta http-equiv="Content-Type"
   5 content="text/html; charset=iso-8859-1">
   6 <meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
   7 <title>Benchmarking Guide</title>
   8 </head>
   9
  10 <body bgcolor="#FFFFFF">
  11
  12 <h2 align="left">Benchmarking Guide</h2>
  13
  14 <p>Micro-benchmarks are notoriously inaccurate, in any system.
  15 Here are some guidelines you should read carefully before trying
  16 to construct an accurate benchmark in the Strongtalk system. This
  17 is very important because there is one big 'gotcha' associated
  18 with running benchmarks from a &quot;do it&quot; in Strongtalk:</p>
  19
  20 <ul>
  21     <li><strong>Put your benchmark in a real method</strong>. As
  22         mentioned in the tour, to get compiled performance
  23         results in Strongtalk, the primary computation (the code
  24         where your benchmark is spending most of its time) needs
  25         to be in an actual method, not in a &quot;do it&quot;
  26         from a workspace. This is because the current version of
  27         the VM doesn't use the optimized method until the <em>next</em>
  28         time that it is called after compilation, and a &quot;do
  29         it&quot; method by definition is never called more than
  30         once. (In a real program or normal &quot;do it&quot;,
  31         this effect is never an issue- only micro-benchmarks have
  32         loops that iterate zillions of times with the loop itself
  33         in the &quot;do it&quot;). This is not a fundamental
  34         limitation in the technology, but we hadn't implemented
  35         &quot;on-stack-replacement&quot; in the Smalltalk system
  36         at the time of release (we did implement it for Java). <p>Note
  37         that this does <em>not</em> mean that the code that your
  38         &quot;do it&quot; invokes won't be optimized and used the
  39         first time around- it will. But the big performance gains
  40         for micro-benchmarks come from inlining <em>all </em>the
  41         called methods directly into the performance critical
  42         benchmark loop, and if that loop is literally in the
  43         &quot;do it&quot;, that isn't possible. </p>
  44         <p>A good way to run your benchmark is to create a method
  45         in the Test class (which is there for this kind of thing)
  46         that runs for at least 100 milliseconds, and then call
  47         that method a number of times until it becomes optimized.
  48         The Test&gt;benchmark: method will do this for you, and
  49         report the fastest time. To tell if your code is running
  50         enough, a good rule of thumb is that if your method
  51         doesn't get faster and then stabilize at some speed, then
  52         it's not being run </p>
  53     </li>
  54     <li><strong>Know how to choose a benchmark. </strong>Micro-benchmarks
  55         are notorious for producing misleading results in all
  56         systems, which is why all real benchmarks are bigger
  57         programs that as much as possible use the same code on
  58         both systems. If you insist on writing a micro-benchmark,
  59         keep these issues in mind:<ul>
  60             <li><strong>Your code should spend its time in
  61                 Smalltalk</strong>, not down in rarely-used
  62                 system primitives or C-callouts. For example,
  63                 'factorial' spends almost all of its time in the
  64                 LargeInteger multiplication primitive, not
  65                 Smalltalk code.</li>
  66             <li><strong>Use library methods that are commonly
  67                 used in real performance-critical code. </strong>Take
  68                 factorial as an example: when is the last time
  69                 your program was performance bound on
  70                 LargeInteger multiplication?</li>
  71             <li><strong>Use code that is like normal Smalltalk
  72                 code (use of core data structures, allocation,
  73                 message sending in a normal pattern, instance
  74                 variable access, blocks).</strong> This is the
  75                 biggest reason most micro-benchmarks aren't
  76                 accurate. Real code is broken up into many
  77                 methods, with lots of message sends, instance
  78                 variable reads, boolean operations, SmallInteger
  79                 operations, temporary allocations, and Array
  80                 accesses, all mixed together. These are the
  81                 things that Strongtalk is designed to optimize.</li>
  82             <li><strong>Use the same code and input data on both
  83                 systems.</strong> Running a highly
  84                 implementation- dependent operation like
  85                 &quot;compile all methods&quot; is not a good
  86                 benchmark because the set of methods is totally
  87                 different, and the bytecode compilers are
  88                 implemented completely differently. (Also, the
  89                 byte-code compiler is not a performance critical
  90                 routine in applications, so it has not been tuned
  91                 at all in Strongtalk. When was the last time your
  92                 users were twiddling their thumbs waiting for the
  93                 bytecode compiler?)</li>
  94         </ul>
  95     </li>
  96 </ul>
  97
  98 <h3>How we did Benchmarking</h3>
  99
 100 <p>When we benchmarked the system ourselves, we assembled a large
 101 suite of accepted OO benchmarks, such as Richards, DeltaBlue (a
 102 constraint solver), the Stanford benchmarks, Slopstones and
 103 Smopstones. These benchmarks are already in the image, if you
 104 want to run them. Try evaluating &quot;VMSuite
 105 runBenchmarks&quot; and look at the code it runs. If you want a
 106 real performance comparison, run these on other VMs.</p>
 107
 108 <p>As an example, I put a couple of very small microbenchmarks
 109 that are run the right way in the system tour (the code is in the
 110 Test class). You can try running them on other Smalltalks as a
 111 start.</p>
 112
 113 <h3>Other benchmarkling problems people have been having</h3>
 114
 115 <ul>
 116     <li>Several have people complained about their benchmark that
 117         runs &quot;5000 factorial&quot; in a loop crashes. If you
 118         read the troubleshooting section, you will see that the
 119         error message you are getting indicates that you are
 120         running out of virtual memory, which explains the crash.
 121         This is happening because the full garbage collector does
 122         not run automatically in Strongtalk right now (the
 123         generation scavenger of course runs fine). Obviously it
 124         would be nice if it ran automatically, but if you are
 125         allocating vast amounts of memory (which 5000 factorial
 126         does), plese run &quot;VM collectGarbage&quot;
 127         occasionally. And as we have already pointed out,
 128         factorial is a very bad (unrepresentative) benchmark on
 129         any system. <p>The moral of the story: if you have a
 130         crash, read the troubleshooting section.</p>
 131     </li>
 132     <li>&quot;Compile all methods&quot; crashes. Yes, it is a
 133         known problem that is one method in the image that
 134         crashes the bytecode compiler when it is run this way,
 135         even in interpreted mode. Use some other benchmark (this
 136         isn't a good benchmark anyway, as pointed out above).</li>
 137 </ul>
 138 </body>
 139 </html>