i386/linux/linux-0.99/drivers/FPU-emu/README

   1  +---------------------------------------------------------------------------+
   2  |  wm-FPU-emu   an FPU emulator for 80386 and 80486SX microprocessors.      |
   3  |                                                                           |
   4  | Copyright (C) 1992,1993,1994                                              |
   5  |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
   6  |                       Australia.  E-mail   billm@vaxc.cc.monash.edu.au    |
   7  |                                                                           |
   8  |    This program is free software; you can redistribute it and/or modify   |
   9  |    it under the terms of the GNU General Public License version 2 as      |
  10  |    published by the Free Software Foundation.                             |
  11  |                                                                           |
  12  |    This program is distributed in the hope that it will be useful,        |
  13  |    but WITHOUT ANY WARRANTY; without even the implied warranty of         |
  14  |    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the          |
  15  |    GNU General Public License for more details.                           |
  16  |                                                                           |
  17  |    You should have received a copy of the GNU General Public License      |
  18  |    along with this program; if not, write to the Free Software            |
  19  |    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.              |
  20  |                                                                           |
  21  +---------------------------------------------------------------------------+
  22
  23
  24
  25 wm-FPU-emu is an FPU emulator for Linux. It is derived from wm-emu387
  26 which is my 80387 emulator for djgpp (gcc under msdos); wm-emu387 was
  27 in turn based upon emu387 which was written by DJ Delorie for djgpp.
  28 The interface to the Linux kernel is based upon the original Linux
  29 math emulator by Linus Torvalds.
  30
  31 My target FPU for wm-FPU-emu is that described in the Intel486
  32 Programmer's Reference Manual (1992 edition). Unfortunately, numerous
  33 facets of the functioning of the FPU are not well covered in the
  34 Reference Manual. The information in the manual has been supplemented
  35 with measurements on real 80486's. Unfortunately, it is simply not
  36 possible to be sure that all of the peculiarities of the 80486 have
  37 been discovered, so there is always likely to be obscure differences
  38 in the detailed behaviour of the emulator and a real 80486.
  39
  40 wm-FPU-emu does not implement all of the behaviour of the 80486 FPU.
  41 See "Limitations" later in this file for a list of some differences.
  42
  43 Please report bugs, etc to me at:
  44        billm@vaxc.cc.monash.edu.au
  45   or at:
  46        billm@jacobi.maths.monash.edu.au
  47
  48
  49 --Bill Metzenthen
  50   Jan 1994
  51
  52
  53 ----------------------- Internals of wm-FPU-emu -----------------------
  54
  55 Numeric algorithms:
  56 (1) Add, subtract, and multiply. Nothing remarkable in these.
  57 (2) Divide has been tuned to get reasonable performance. The algorithm
  58     is not the obvious one which most people seem to use, but is designed
  59     to take advantage of the characteristics of the 80386. I expect that
  60     it has been invented many times before I discovered it, but I have not
  61     seen it. It is based upon one of those ideas which one carries around
  62     for years without ever bothering to check it out.
  63 (3) The sqrt function has been tuned to get good performance. It is based
  64     upon Newton's classic method. Performance was improved by capitalizing
  65     upon the properties of Newton's method, and the code is once again
  66     structured taking account of the 80386 characteristics.
  67 (4) The trig, log, and exp functions are based in each case upon quasi-
  68     "optimal" polynomial approximations. My definition of "optimal" was
  69     based upon getting good accuracy with reasonable speed.
  70 (5) The argument reducing code for the trig function effectively uses
  71     a value of pi which is accurate to more than 128 bits. As a consequence,
  72     the reduced argument is accurate to more than 64 bits for arguments up
  73     to a few pi, and accurate to more than 64 bits for most arguments,
  74     even for arguments approaching 2^63. This is far superior to an
  75     80486, which uses a value of pi which is accurate to 66 bits.
  76
  77 The code of the emulator is complicated slightly by the need to
  78 account for a limited form of re-entrancy. Normally, the emulator will
  79 emulate each FPU instruction to completion without interruption.
  80 However, it may happen that when the emulator is accessing the user
  81 memory space, swapping may be needed. In this case the emulator may be
  82 temporarily suspended while disk i/o takes place. During this time
  83 another process may use the emulator, thereby changing some static
  84 variables (eg FPU_st0_ptr, etc). The code which accesses user memory
  85 is confined to five files:
  86     fpu_entry.c
  87     reg_ld_str.c
  88     load_store.c
  89     get_address.c
  90     errors.c
  91
  92 ----------------------- Limitations of wm-FPU-emu -----------------------
  93
  94 There are a number of differences between the current wm-FPU-emu
  95 (version beta 1.5) and the 80486 FPU (apart from bugs). Some of the
  96 more important differences are listed below:
  97
  98 Segment overrides don't do anything yet.
  99
 100 All internal computations are performed at 64 bit or higher precision
 101 and the results rounded etc as required by the PC bits of the FPU
 102 control word.  Under the crt0 version for Linux current at June 1993,
 103 the FPU PC bits specify 64 bits precision.
 104
 105 The precision flag (PE of the FPU status word) and the Roundup flag
 106 (C1 of the status word) are now implemented. Does anyone write code
 107 which uses these features? The Roundup flag does not have much meaning
 108 for the transcendental functions and its 80486 value with these
 109 functions is likely to differ from its emulator value.
 110
 111 In a few rare cases the Underflow flag obtained with the emulator will
 112 be different from that obtained with an 80486. This occurs when the
 113 following conditions apply simultaneously:
 114 (a) the operands have a higher precision than the current setting of the
 115     precision control (PC) flags.
 116 (b) the underflow exception is masked.
 117 (c) the magnitude of the exact result (before rounding) is less than 2^-16382.
 118 (d) the magnitude of the final result (after rounding) is exactly 2^-16382.
 119 (e) the magnitude of the exact result would be exactly 2^-16382 if the
 120     operands were rounded to the current precision before the arithmetic
 121     operation was performed.
 122 If all of these apply, the emulator will set the Underflow flag but a real
 123 80486 will not.
 124
 125 NOTE: Certain formats of Extended Real are UNSUPPORTED. They are
 126 unsupported by the 80486. They are the Pseudo-NaNs, Pseudoinfinities,
 127 and Unnormals. None of these will be generated by an 80486 or by the
 128 emulator. Do not use them. The emulator treats them differently in
 129 detail from the way an 80486 does.
 130
 131 The emulator treats PseudoDenormals differently from an 80486. These
 132 numbers are in fact properly normalised numbers with the exponent
 133 offset by 1, and the emulator treats them as such. Unlike the 80486,
 134 the emulator does not generate a Denormal Operand exception for these
 135 numbers. The arithmetical results produced when using such a number as
 136 an operand are the same for the emulator and a real 80486 (apart from
 137 any slight precision difference for the transcendental functions).
 138 Neither the emulator nor an 80486 produces one of these numbers as the
 139 result of any arithmetic operation. An 80486 can keep one of these
 140 numbers in an FPU register with its identity as a PseudoDenormal, but
 141 the emulator will not; they are always converted to a valid number.
 142
 143 ----------------------- Performance of wm-FPU-emu -----------------------
 144
 145 Speed.
 146 -----
 147
 148 The speed of floating point computation with the emulator will depend
 149 upon instruction mix. Relative performance is best for the instructions
 150 which require most computation. The simple instructions are adversely
 151 affected by the fpu instruction trap overhead.
 152
 153
 154 Timing: Some simple timing tests have been made on the emulator functions.
 155 The times include load/store instructions. All times are in microseconds
 156 measured on a 33MHz 386 with 64k cache. The Turbo C tests were under
 157 ms-dos, the next two columns are for emulators running with the djgpp
 158 ms-dos extender. The final column is for wm-FPU-emu in Linux 0.97,
 159 using libm4.0 (hard).
 160
 161 function      Turbo C        djgpp 1.06        WM-emu387     wm-FPU-emu
 162
 163    +          60.5           154.8              76.5          139.4
 164    -          61.1-65.5      157.3-160.8        76.2-79.5     142.9-144.7
 165    *          71.0           190.8              79.6          146.6
 166    /          61.2-75.0      261.4-266.9        75.3-91.6     142.2-158.1
 167
 168  sin()        310.8          4692.0            319.0          398.5
 169  cos()        284.4          4855.2            308.0          388.7
 170  tan()        495.0          8807.1            394.9          504.7
 171  atan()       328.9          4866.4            601.1          419.5-491.9
 172
 173  sqrt()       128.7          crashed           145.2          227.0
 174  log()        413.1-419.1    5103.4-5354.21    254.7-282.2    409.4-437.1
 175  exp()        479.1          6619.2            469.1          850.8
 176
 177
 178 The performance under Linux is improved by the use of look-ahead code.
 179 The following results show the improvement which is obtained under
 180 Linux due to the look-ahead code. Also given are the times for the
 181 original Linux emulator with the 4.1 'soft' lib.
 182
 183  [ Linus' note: I changed look-ahead to be the default under linux, as
 184    there was no reason not to use it after I had edited it to be
 185    disabled during tracing ]
 186
 187             wm-FPU-emu w     original w
 188             look-ahead       'soft' lib
 189    +         106.4             190.2
 190    -         108.6-111.6      192.4-216.2
 191    *         113.4             193.1
 192    /         108.8-124.4      700.1-706.2
 193
 194  sin()       390.5            2642.0
 195  cos()       381.5            2767.4
 196  tan()       496.5            3153.3
 197  atan()      367.2-435.5     2439.4-3396.8
 198
 199  sqrt()      195.1            4732.5
 200  log()       358.0-387.5     3359.2-3390.3
 201  exp()       619.3            4046.4
 202
 203
 204 These figures are now somewhat out-of-date. The emulator has become
 205 progressively slower for most functions as more of the 80486 features
 206 have been implemented.
 207
 208
 209 ----------------------- Accuracy of wm-FPU-emu -----------------------
 210
 211
 212 Accuracy: The following table gives the accuracy of the sqrt(), trig
 213 and log functions. Each function was tested at about 400 points. Ideal
 214 results would be 64 bits. The reduced accuracy of cos() and tan() for
 215 arguments greater than pi/4 can be thought of as being due to the
 216 precision of the argument x; e.g. an argument of pi/2-(1e-10) which is
 217 accurate to 64 bits can result in a relative accuracy in cos() of about
 218 64 + log2(cos(x)) = 31 bits. Results for the Turbo C emulator are given
 219 in the last column.
 220
 221
 222 Function      Tested x range            Worst result                Turbo C
 223                                         (relative bits)
 224
 225 sqrt(x)       1 .. 2                    64.1                         63.2
 226 atan(x)       1e-10 .. 200              62.6                         62.8
 227 cos(x)        0 .. pi/2-(1e-10)         63.2 (x <= pi/4)             62.4
 228                                         35.2 (x = pi/2-(1e-10))      31.9
 229 sin(x)        1e-10 .. pi/2             63.0                         62.8
 230 tan(x)        1e-10 .. pi/2-(1e-10)     62.4 (x <= pi/4)             62.1
 231                                         35.2 (x = pi/2-(1e-10))      31.9
 232 exp(x)        0 .. 1                    63.1                         62.9
 233 log(x)        1+1e-6 .. 2               62.4                         62.1
 234
 235
 236 As of version 1.3 of the emulator, the accuracy of the basic
 237 arithmetic has been improved (by a small fraction of a bit). Care has
 238 been taken to ensure full accuracy of the rounding of the basic
 239 arithmetic functions (+,-,*,/,and fsqrt), and they all now produce
 240 results which are exact to the 64th bit (unless there are any bugs
 241 left). To ensure this, it was necessary to effectively get information
 242 of up to about 128 bits precision. The emulator now passes the
 243 "paranoia" tests (compiled with gcc 2.3.3) for 'float' variables (24
 244 bit precision numbers) when precision control is set to 24, 53 or 64
 245 bits, and for 'double' variables (53 bit precision numbers) when
 246 precision control is set to 53 bits (a properly performing FPU cannot
 247 pass the 'paranoia' tests for 'double' variables when precision
 248 control is set to 64 bits).
 249
 250 For version 1.5, the accuracy of fprem and fprem1 has been improved.
 251 These functions now produce exact results. The code for reducing the
 252 argument for the trig functions (fsin, fcos, fptan and fsincos) has
 253 been improved and now effectively uses a value for pi which is
 254 accurate to more than 128 bits precision. As a consquence, the
 255 accuracy of these functions for large arguments has been dramatically
 256 improved (and is now very much better than an 80486 FPU). There is
 257 also now no degradation of accuracy for fcos and ftan for operands
 258 close to pi/2. Measured results are (note that the definition of
 259 accuracy has changed slightly from that used for the above table):
 260
 261 Function      Tested x range          Worst result
 262                                      (absolute bits)
 263
 264 cos(x)        0 .. 9.22e+18              62.0
 265 sin(x)        1e-16 .. 9.22e+18          62.1
 266 tan(x)        1e-16 .. 9.22e+18          61.8
 267
 268 It is possible with some effort to find very large arguments which
 269 give much degraded precision. For example, the integer number
 270            8227740058411162616.0
 271 is within about 10e-7 of a multiple of pi. To find the tan (for
 272 example) of this number to 64 bits precision it would be necessary to
 273 have a value of pi which had about 150 bits precision. The FPU
 274 emulator computes the result to about 42.6 bits precision (the correct
 275 result is about -9.739715e-8). On the other hand, an 80486 FPU returns
 276 0.01059, which in relative terms is hopelessly inaccurate.
 277
 278 For arguments close to critical angles (which occur at multiples of
 279 pi/2) the emulator is more accurate than an 80486 FPU. For very large
 280 arguments, the emulator is far more accurate.
 281
 282 ------------------------- Contributors -------------------------------
 283
 284 A number of people have contributed to the development of the
 285 emulator, often by just reporting bugs, sometimes with suggested
 286 fixes, and a few kind people have provided me with access in one way
 287 or another to an 80486 machine. Contributors include (to those people
 288 who I may have forgotten, please forgive me):
 289
 290 Linus Torvalds
 291 Tommy.Thorn@daimi.aau.dk
 292 Andrew.Tridgell@anu.edu.au
 293 Nick Holloway, alfie@dcs.warwick.ac.uk
 294 Hermano Moura, moura@dcs.gla.ac.uk
 295 Jon Jagger, J.Jagger@scp.ac.uk
 296 Lennart Benschop
 297 Brian Gallew, geek+@CMU.EDU
 298 Thomas Staniszewski, ts3v+@andrew.cmu.edu
 299 Martin Howell, mph@plasma.apana.org.au
 300 M Saggaf, alsaggaf@athena.mit.edu
 301 Peter Barker, PETER@socpsy.sci.fau.edu
 302 tom@vlsivie.tuwien.ac.at
 303 Dan Russel, russed@rpi.edu
 304 Daniel Carosone, danielce@ee.mu.oz.au
 305 cae@jpmorgan.com
 306 Hamish Coleman, t933093@minyos.xx.rmit.oz.au
 307 Bruce Evans, bde@kralizec.zeta.org.au
 308 Timo Korvola, Timo.Korvola@hut.fi
 309
 310 ...and numerous others who responded to my request for help with
 311 a real 80486.
 312