2 Understanding Variations in Dhrystone Performance
6 By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen
13 This article has appeared in:
16 Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17
21 Microprocessor manufacturers tend to credit all the performance measured by
22 benchmarks to the speed of their processors, they often don't even mention the
23 programming language and compiler used. In their detailed documents, usually
24 called "performance brief" or "performance report," they usually do give more
25 details. However, these details are often lost in the press releases and other
26 marketing statements. For serious performance evaluation, it is necessary to
27 study the code generated by the various compilers.
29 Dhrystone was originally published in Ada (Communications of the ACM, Oct.
30 1984). However, since good Ada compilers were rare at this time and, together
31 with UNIX, C became more and more popular, the C version of Dhrystone is the
32 one now mainly used in industry. There are "official" versions 2.1 for Ada,
33 Pascal, and C, which are as close together as the languages' semantic
36 Dhrystone contains two statements where the programming language and its
37 translation play a major part in the execution time measured by the benchmark:
39 o String assignment (in procedure Proc_0 / main)
40 o String comparison (in function Func_2)
42 In Ada and Pascal, strings are arrays of characters where the length of the
43 string is part of the type information known at compile time. In C, strings
44 are also arrays of characters, but there are no operators defined in the
45 language for assignment and comparison of strings. Instead, functions
46 "strcpy" and "strcmp" are used. These functions are defined for strings of
47 arbitrary length, and make use of the fact that strings in C have to end with
48 a terminating null byte. For general-purpose calls to these functions, the
49 implementor can assume nothing about the length and the alignment of the
52 The C version of Dhrystone spends a relatively large amount of time in these
53 two functions. Some time ago, I made measurements on a VAX 11/785 with the
54 Berkeley UNIX (4.2) compilers (often-used compilers, but certainly not the
55 most advanced). In the C version, 23% of the time was spent in the string
56 functions; in the Pascal version, only 10%. On good RISC machines (where less
57 time is spent in the procedure calling sequence than on a VAX) and with better
58 optimizing compilers, the percentage is higher; MIPS has reported 34% for an
59 R3000. Because of this effect, Pascal and Ada Dhrystone results are usually
60 better than C results (except when the optimization quality of the C compiler
61 is considerably better than that of the other compilers).
63 Several people have noted that the string operations are over-represented in
64 Dhrystone, mainly because the strings occurring in Dhrystone are longer than
65 average strings. I admit that this is true, and have said so in my SIGPLAN
66 Notices paper (Aug. 1988); however, I didn't want to generate confusion by
67 changing the string lengths from version 1 to version 2.
69 Even if they are somewhat over-represented in Dhrystone, string operations are
70 frequent enough that it makes sense to implement them in the most efficient
71 way possible, not only for benchmarking purposes. This means that they can
72 and should be written in assembly language code. ANSI C also explicitly allows
73 the strings functions to be implemented as macros, i.e. by inline code.
75 There is also a third way to speed up the "strcpy" statement in Dhrystone: For
76 this particular "strcpy" statement, the source of the assignment is a string
77 constant. Therefore, in contrast to calls to "strcpy" in the general case, the
78 compiler knows the length and alignment of the strings involved at compile
79 time and can generate code in the same efficient way as a Pascal compiler
80 (word instructions instead of byte instructions).
82 This is not allowed in the case of the "strcmp" call: Here, the addresses are
83 formal procedure parameters, and no assumptions can be made about the length
84 or alignment of the strings. Any such assumptions would indicate an incorrect
85 implementation. They might work for Dhrystone, where the strings are in fact
86 word-aligned with typical compilers, but other programs would deliver
89 So, for an apple-to-apple comparison between processors, and not between
90 several possible (legal or illegal) degrees of compiler optimization, one
91 should check that the systems are comparable with respect to the following
94 (1) String functions in assembly language vs. in C
96 Frequently used functions such as the string functions can and should be
97 written in assembly language, and all serious C language systems known
98 to me do this. (I list this point for completeness only.) Note that
99 processors with an instruction that checks a word for a null byte (such
100 as AMD's 29000 and Intel's 80960) have an advantage here. (This
101 advantage decreases relatively if optimization (3) is applied.) Due to
102 the length of the strings involved in Dhrystone, this advantage may be
103 considered too high in perspective, but it is certainly legal to use
104 such instructions - after all, these situations are what they were
107 (2) String function code inline vs. as library functions.
109 ANSI C has created a new situation, compared with the older
110 Kernighan/Ritchie C. In the original C, the definition of the string
111 function was not part of the language. Now it is, and inlining is
112 explicitly allowed. I probably should have stated more clearly in my
113 SIGPLAN Notices paper that the rule "No procedure inlining for
114 Dhrystone" referred to the user level procedures only and not to the
117 (3) Fixed-length and alignment assumptions for the strings
119 Compilers should be allowed to optimize in these cases if (and only if)
120 it is safe to do so. For Dhrystone, this is the "strcpy" statement, but
121 not the "strcmp" statement (unless, of course, the "strcmp" code
122 explicitly checks the alignment at execution time and branches
123 accordingly). A "Dhrystone switch" for the compiler that causes the
124 generation of code that may not work under certain circumstances is
125 certainly inappropriate for comparisons. It has been reported in Usenet
126 that some C compilers provide such a compiler option; since I don't have
127 access to all C compilers involved, I cannot verify this.
129 If the fixed-length and word-alignment assumption can be used, a wide
130 bus that permits fast multi-word load instructions certainly does help;
131 however, this fact by itself should not make a really big difference.
133 A check of these points - something that is necessary for a thorough
134 evaluation and comparison of the Dhrystone performance claims - requires
135 object code listings as well as listings for the string functions (strcpy,
136 strcmp) that are possibly called by the program.
138 I don't pretend that Dhrystone is a perfect tool to measure the integer
139 performance of microprocessors. The more it is used and discussed, the more I
140 myself learn about aspects that I hadn't noticed yet when I wrote the program.
141 And of course, the very success of a benchmark program is a danger in that
142 people may tune their compilers and/or hardware to it, and with this action
145 Whetstone and Linpack have their critical points also: The Whetstone rating
146 depends heavily on the speed of the mathematical functions (sine, sqrt, ...),
147 and Linpack is sensitive to data alignment for some cache configurations.
149 Introduction of a standard set of public domain benchmark software (something
150 the SPEC effort attempts) is certainly a worthwhile thing. In the meantime,
151 people will continue to use whatever is available and widely distributed, and
152 Dhrystone ratings are probably still better than MIPS ratings if these are -
153 as often in industry - based on no reproducible derivation. However, any
154 serious performance evaluation requires more than just a comparison of raw
155 numbers; one has to make sure that the numbers have been obtained in a