security/nss/lib/freebl/ecl/README.FP

   1 ***** BEGIN LICENSE BLOCK *****
   2 Version: MPL 1.1/GPL 2.0/LGPL 2.1
   3
   4 The contents of this file are subject to the Mozilla Public License Version
   5 1.1 (the "License"); you may not use this file except in compliance with
   6 the License. You may obtain a copy of the License at
   7 http://www.mozilla.org/MPL/
   8
   9 Software distributed under the License is distributed on an "AS IS" basis,
  10 WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
  11 for the specific language governing rights and limitations under the
  12 License.
  13
  14 The Original Code is the elliptic curve math library.
  15
  16 The Initial Developer of the Original Code is Sun Microsystems, Inc.
  17 Portions created by Sun Microsystems, Inc. are Copyright (C) 2003
  18 Sun Microsystems, Inc. All Rights Reserved.
  19
  20 Contributor(s):
  21      Stephen Fung <fungstep@hotmail.com> and
  22      Nils Gura <nils.gura@sun.com>, Sun Microsystems Laboratories
  23
  24 Alternatively, the contents of this file may be used under the terms of
  25 either the GNU General Public License Version 2 or later (the "GPL"), or
  26 the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
  27 in which case the provisions of the GPL or the LGPL are applicable instead
  28 of those above. If you wish to allow use of your version of this file only
  29 under the terms of either the GPL or the LGPL, and not to allow others to
  30 use your version of this file under the terms of the MPL, indicate your
  31 decision by deleting the provisions above and replace them with the notice
  32 and other provisions required by the GPL or the LGPL. If you do not delete
  33 the provisions above, a recipient may use your version of this file under
  34 the terms of any one of the MPL, the GPL or the LGPL.
  35
  36 ***** END LICENSE BLOCK *****
  37
  38 The ECL exposes routines for constructing and converting curve
  39 parameters for internal use.
  40
  41 The floating point code of the ECL provides algorithms for performing
  42 elliptic-curve point multiplications in floating point.
  43
  44 The point multiplication algorithms perform calculations almost
  45 exclusively in floating point for efficiency, but have the same
  46 (integer) interface as the ECL for compatibility and to be easily
  47 wired-in to the ECL.  Please see README file (not this README.FP file)
  48 for information on wiring-in.
  49
  50 This has been implemented for 3 curves as specified in [1]:
  51         secp160r1
  52         secp192r1
  53         secp224r1
  54
  55 RATIONALE
  56 =========
  57 Calculations are done in the  floating-point unit (FPU) since it
  58 gives better performance on the UltraSPARC III chips.  This is
  59 because the FPU allows for faster multiplication than the integer unit.
  60 The integer unit has a longer multiplication instruction latency, and
  61 does not allow full pipelining, as described in [2].
  62 Since performance is an important selling feature of Elliptic Curve
  63 Cryptography (ECC), this implementation was created.
  64
  65 DATA REPRESENTATION
  66 ===================
  67 Data is primarily represented in an array of double-precision floating
  68 point numbers.  Generally, each array element has 24 bits of precision
  69 (i.e. be x * 2^y, where x is an integer of at most 24 bits, y some positive
  70 integer), although the actual implementation details are more complicated.
  71
  72 e.g. a way to store an 80 bit number might be:
  73 double p[4] = { 632613 * 2^0, 329841 * 2^24, 9961 * 2^48, 51 * 2^64 };
  74 See section ARITHMETIC OPERATIONS for more details.
  75
  76 This implementation assumes that the floating-point unit rounding mode
  77 is round-to-even as specified in IEEE 754
  78 (as opposed to chopping, rounding up, or rounding down).
  79 When subtracting integers represented as arrays of floating point
  80 numbers, some coefficients (array elements) may become negative.
  81 This effectively gives an extra bit of precision that is important
  82 for correctness in some cases.
  83
  84 The described number presentation limits the size of integers to 1023 bits.
  85 This is due to an upper bound of 1024 for the exponent of a double precision
  86 floating point number as specified in IEEE-754.
  87 However, this is acceptable for ECC key sizes of the foreseeable future.
  88
  89 DATA STRUCTURES
  90 ===============
  91 For more information on coordinate representations, see [3].
  92
  93 ecfp_aff_pt
  94 -----------
  95 Affine EC Point Representation.  This is the basic
  96 representation (x, y) of an elliptic curve point.
  97
  98 ecfp_jac_pt
  99 -----------
 100 Jacobian EC Point.  This stores a point as (X, Y, Z), where
 101 the affine point corresponds to (X/Z^2, Y/Z^3).  This allows
 102 for fewer inversions in calculations.
 103
 104 ecfp_chud_pt
 105 ------------
 106 Chudnovsky Jacobian Point.  This representation stores a point
 107 as (X, Y, Z, Z^2, Z^3), the same as a Jacobian representation
 108 but also storing Z^2 and Z^3 for faster point additions.
 109
 110 ecfp_jm_pt
 111 ----------
 112 Modified Jacobian Point.  This representation stores a point
 113 as (X, Y, Z, a*Z^4), the same as Jacobian representation but
 114 also storing a*Z^4 for faster point doublings.  Here "a" represents
 115 the linear coefficient of x defining the curve.
 116
 117 EC_group_fp
 118 -----------
 119 Stores information on the elliptic curve group for floating
 120 point calculations.  Contains curve specific information, as
 121 well as function pointers to routines, allowing different
 122 optimizations to be easily wired in.
 123 This should be made accessible from an ECGroup for the floating
 124 point implementations of point multiplication.
 125
 126 POINT MULTIPLICATION ALGORITHMS
 127 ===============================
 128 Elliptic Curve Point multiplication can be done at a higher level orthogonal
 129 to the implementation of point additions and point doublings.  There
 130 are a variety of algorithms that can be used.
 131
 132 The following algorithms have been implemented:
 133
 134 4-bit Window (Jacobian Coordinates)
 135 Double & Add (Jacobian & Affine Coordinates)
 136 5-bit Non-Adjacent Form (Modified Jacobian & Chudnovsky Jacobian)
 137
 138 Currently, the fastest algorithm for multiplying a generic point
 139 is the 5-bit Non-Adjacent Form.
 140
 141 See comments in ecp_fp.c for more details and references.
 142
 143 SOURCE / HEADER FILES
 144 =====================
 145
 146 ecp_fp.c
 147 --------
 148 Main source file for floating point calculations.  Contains routines
 149 to convert from floating-point to integer (mp_int format), point
 150 multiplication algorithms, and several other routines.
 151
 152 ecp_fp.h
 153 --------
 154 Main header file.  Contains most constants used and function prototypes.
 155
 156 ecp_fp[160, 192, 224].c
 157 -----------------------
 158 Source files for specific curves.  Contains curve specific code such
 159 as specialized reduction based on the field defining prime.  Contains
 160 code wiring-in different algorithms and optimizations.
 161
 162 ecp_fpinc.c
 163 -----------
 164 Source file that is included by ecp_fp[160, 192, 224].c.  This generates
 165 functions with different preprocessor-defined names and loop iterations,
 166 allowing for static linking and strong compiler optimizations without
 167 code duplication.
 168
 169 TESTING
 170 =======
 171 The test suite can be found in ecl/tests/ecp_fpt.  This tests and gets
 172 timings of the different algorithms for the curves implemented.
 173
 174 ARITHMETIC OPERATIONS
 175 ---------------------
 176 The primary operations in ECC over the prime fields are modular arithmetic:
 177 i.e. n * m (mod p) and n + m (mod p).  In this implementation, multiplication,
 178 addition, and reduction are implemented as separate functions.  This
 179 enables computation of formulae with fewer reductions, e.g.
 180 (a * b) + (c * d) (mod p) rather than:
 181 ((a * b) (mod p)) + ((c * d) (mod p)) (mod p)
 182 This takes advantage of the fact that the double precision mantissa in
 183 floating point can hold numbers up to 2^53, i.e. it has some leeway to
 184 store larger intermediate numbers.  See further detail in the section on
 185 FLOATING POINT PRECISION.
 186
 187 Multiplication
 188 --------------
 189 Multiplication is implemented in a standard polynomial multiplication
 190 fashion.  The terms in opposite factors are pairwise multiplied and
 191 added together appropriately.  Note that the result requires twice
 192 as many doubles for storage, as the bit size of the product is twice
 193 that of the multiplicands.
 194 e.g. suppose we have double n[3], m[3], r[6], and want to calculate r = n * m
 195 r[0] = n[0] * m[0]
 196 r[1] = n[0] * m[1] + n[1] * m[0]
 197 r[2] = n[0] * m[2] + n[1] * m[1] + n[2] * m[0]
 198 r[3] = n[1] * m[2] + n[2] * m[1]
 199 r[4] = n[2] * m[2]
 200 r[5] = 0 (This is used later to hold spillover from r[4], see tidying in
 201 the reduction section.)
 202
 203 Addition
 204 --------
 205 Addition is done term by term.  The only caveat is to be careful with
 206 the number of terms that need to be added.  When adding results of
 207 multiplication (before reduction), twice as many terms need to be added
 208 together.  This is done in the addLong function.
 209 e.g. for double n[4], m[4], r[4]:  r = n + m
 210 r[0] = n[0] + m[0]
 211 r[1] = n[1] + m[1]
 212 r[2] = n[2] + m[2]
 213 r[3] = n[3] + m[3]
 214
 215 Modular Reduction
 216 -----------------
 217 For the curves implemented, reduction is possible by fast reduction
 218 for Generalized Mersenne Primes, as described in [4].  For the
 219 floating point implementation, a significant step of the reduction
 220 process is tidying:  that is, the propagation of carry bits from
 221 low-order to high-order coefficients to reduce the precision of each
 222 coefficient to 24 bits.
 223 This is done by adding and then subtracting
 224 ecfp_alpha, a large floating point number that induces precision roundoff.
 225 See [5] for more details on tidying using floating point arithmetic.
 226 e.g. suppose we have r = 961838 * 2^24 + 519308
 227 then if we set alpha = 3 * 2^51 * 2^24,
 228 FP(FP(r + alpha) - alpha) = 961838 * 2^24, because the precision for
 229 the intermediate results is limited.  Our values of alpha are chosen
 230 to truncate to a desired number of bits.
 231
 232 The reduction is then performed as in [4], adding multiples of prime p.
 233 e.g. suppose we are working over a polynomial of 10^2.  Take the number
 234 2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95, stored in 5 elements
 235 for coefficients of 10^0, 10^2, ..., 10^8.
 236 We wish to reduce modulo p = 10^6 - 2 * 10^4 + 1
 237 We can subtract off from the higher terms
 238 (2 * 10^8 + 11 * 10^6 + 53 * 10^4 + 23 * 10^2 + 95) - (2 * 10^2) * (10^6 - 2 * 10^4 + 1)
 239 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95
 240 = 15 * 10^6 + 53 * 10^4 + 21 * 10^2 + 95 - (15) * (10^6 - 2 * 10^4 + 1)
 241 = 83 * 10^4 + 21 * 10^2 + 80
 242
 243 Integrated Example
 244 ------------------
 245 This example shows how multiplication, addition, tidying, and reduction
 246 work together in our modular arithmetic.  This is simplified from the
 247 actual implementation, but should convey the main concepts.
 248 Working over polynomials of 10^2 and with p as in the prior example,
 249 Let a = 16 * 10^4 + 53 * 10^2 + 33
 250 let b = 81 * 10^4 + 31 * 10^2 + 49
 251 let c = 22 * 10^4 +  0 * 10^2 + 95
 252 And suppose we want to compute a * b + c mod p.
 253 We first do a multiplication:  then a * b =
 254 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5100 * 10^4 + 3620 * 10^2 + 1617
 255 Then we add in c before doing reduction, allowing us to get a * b + c =
 256 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
 257 We then perform a tidying on the upper half of the terms:
 258 0 * 10^10 + 1296 * 10^8 + 4789 * 10^6
 259 0 * 10^10 + (1296 + 47) * 10^8 + 89 * 10^6
 260 0 * 10^10 + 1343 * 10^8 + 89 * 10^6
 261 13 * 10^10 + 43 * 10^8 + 89 * 10^6
 262 which then gives us
 263 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
 264 we then reduce modulo p similar to the reduction example above:
 265 13 * 10^10 + 43 * 10^8 + 89 * 10^6 + 5122 * 10^4 + 3620 * 10^2 + 1712
 266         - (13 * 10^4 * p)
 267 69 * 10^8 + 89 * 10^6 + 5109 * 10^4 + 3620 * 10^2 + 1712
 268         - (69 * 10^2 * p)
 269 227 * 10^6 + 5109 * 10^4 + 3551 * 10^2 + 1712
 270         - (227 * p)
 271 5563 * 10^4 + 3551 * 10^2 + 1485
 272 finally, we do tidying to get the precision of each term down to 2 digits
 273 5563 * 10^4 + 3565 * 10^2 + 85
 274 5598 * 10^4 + 65 * 10^2 + 85
 275 55 * 10^6 + 98 * 10^4 + 65 * 10^2 + 85
 276 and perform another reduction step
 277         - (55 * p)
 278 208 * 10^4 + 65 * 10^2 + 30
 279 There may be a small number of further reductions that could be done at
 280 this point, but this is typically done only at the end when converting
 281 from floating point  to an integer unit representation.
 282
 283 FLOATING POINT PRECISION
 284 ========================
 285 This section discusses the precision of floating point numbers, which
 286 one writing new formulae or a larger bit size should be aware of.  The
 287 danger is that an intermediate result may be required to store a
 288 mantissa larger than 53 bits, which would cause error by rounding off.
 289
 290 Note that the tidying with IEEE rounding mode set to round-to-even
 291 allows negative numbers, which actually reduces the size of the double
 292 mantissa to 23 bits - since it rounds the mantissa to the nearest number
 293 modulo 2^24, i.e. roughly between -2^23 and 2^23.
 294 A multiplication increases the bit size to 2^46 * n, where n is the number
 295 of doubles to store a number.  For the 224 bit curve, n = 10.  This gives
 296 doubles of size 5 * 2^47.  Adding two of these doubles gives a result
 297 of size 5 * 2^48, which is less than 2^53, so this is safe.
 298 Similar analysis can be done for other formulae to ensure numbers remain
 299 below 2^53.
 300
 301 Extended-Precision Floating Point
 302 ---------------------------------
 303 Some platforms, notably x86 Linux, may use an extended-precision floating
 304 point representation that has a 64-bit mantissa.  [6]  Although this
 305 implementation is optimized for the IEEE standard 53-bit mantissa,
 306 it should work with the 64-bit mantissa.  A check is done at run-time
 307 in the function ec_set_fp_precision that detects if the precision is
 308 greater than 53 bits, and runs code for the 64-bit mantissa accordingly.
 309
 310 REFERENCES
 311 ==========
 312 [1] Certicom Corp., "SEC 2: Recommended Elliptic Curve Domain Parameters", Sept. 20, 2000.  www.secg.org
 313 [2] Sun Microsystems Inc.  UltraSPARC III Cu User's Manual, Version 1.0, May 2002, Table 4.4
 314 [3] H. Cohen, A. Miyaji, and T. Ono, "Efficient Elliptic Curve Exponentiation Using Mixed Coordinates".
 315 [4] Henk C.A. van Tilborg, Generalized Mersenne Prime.  http://www.win.tue.nl/~henkvt/GenMersenne.pdf
 316 [5] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2.
 317 [6] Daniel J. Bernstein, Floating-Point Arithmetic and Message Authentication, Journal of Cryptology, March 2000, Section 2 Notes.