1 $Id: README.md5,v 13.1 2006/08/14 03:16:33 chongo Exp $
3 This is an implementation of the RSA Data Security, Inc.
4 MD5 Message-Digest Algorithm
6 The digests produced from strings (-s string), files or stdin are
7 identical to the RSA's original md5 program. The command line and
8 output interface are upward compatible as well. Users of the
9 original shs program may replace it with this version has their
10 existing use and digests will be preserved.
12 See md5drvr.c for version information. See the man page md5.1
15 Landon Curt Noll (chongo was here) /\../\
16 http://www.isthe.com/chongo/index.html
23 Network Working Group R. Rivest
24 INTERNET-DRAFT MIT Laboratory for Computer Science
26 RSA Data Security, Inc.
29 The MD5 Message-Digest Algorithm
33 This draft document will be submitted to the RFC editor as a protocol
34 specification. Comments should be sent to <pem-dev@tis.com> or to the
35 authors. Distribution of this memo is unlimited.
39 We would like to thank Don Coppersmith, Burt Kaliski, Ralph Merkle,
40 David Chaum, and Noam Nisan for numerous helpful comments and
45 1. Executive Summary 1
46 2. Terminology and Notation 2
47 3. MD5 Algorithm Description 3
49 5. Summary of Differences Between MD4 and MD5 7
50 6. Security Considerations 7
53 APPENDIX - Reference Implementation 9
57 This document describes the MD5 message-digest algorithm. The
58 algorithm takes as input an input message of arbitrary length and
59 produces as output a 128-bit "fingerprint" or "message digest" of the
60 input. It is conjectured that it is computationally infeasible to
61 produce two messages having the same message digest, or to produce
62 any message having a given prespecified target message digest. The
63 MD5 algorithm is intended for digital signature applications, where a
64 large file must be "compressed" in a secure manner before being
65 encrypted with a private (secret) key under a public-key cryptosystem
68 The MD5 algorithm is designed to be quite fast on 32-bit machines. In
69 addition, the MD5 algorithm does not require any large substitution
70 tables; the algorithm can be coded quite compactly.
72 The MD5 algorithm is an extension of the MD4 message digest algorithm
73 [1,2]. MD5 is slightly slower than MD4, but is more "conservative" in
74 design. MD5 was designed because it was felt that MD4 was perhaps
75 being adopted for use more quickly than justified by the existing
76 critical review; because MD4 was designed to be exceptionally fast,
77 it is "at the edge" in terms of risking successful cryptanalytic
78 attack. MD5 backs off a bit, giving up a little in speed for a much
79 greater likelihood of ultimate security. It incorporates some
80 suggestions made by various reviewers, and contains additional
83 The MD5 algorithm is being placed in the public domain for review and
84 possible adoption as a standard.
86 A version of this document including the C source code in the
87 appendix is available by FTP from RSA.COM in the file "pub/md5.doc".
89 This document may be referred to, unofficially, as Internet draft
92 For OSI-based applications, MD5's object identifier is
94 md5 OBJECT IDENTIFIER ::=
95 {iso(1) member-body(2) US(840) rsadsi(113549) digestAlgorithm(2) 5}
97 In the X.509 type AlgorithmIdentifier [3], the parameters for MD5
98 should have type NULL.
101 2. Terminology and Notation
103 In this document a "word" is a 32-bit quantity and a "byte" is an
104 eight-bit quantity. A sequence of bits can be interpreted in a
105 natural manner as a sequence of bytes, where each consecutive group
106 of eight bits is interpreted as a byte with the high-order (most
107 significant) bit of each byte listed first. Similarly, a sequence of
108 bytes can be interpreted as a sequence of 32-bit words, where each
109 consecutive group of four bytes is interpreted as a word with the
110 low-order (least significant) byte given first.
112 Let x_i denote "x sub i". If the subscript is an expression, we
113 surround it in braces, as in x_{i+1}. Similarly, we use ^ for
114 superscripts (exponentiation), so that x^i denotes x to the i-th
117 Let the symbol "+" denote addition of words (i.e., modulo-2^32
118 addition). Let X <<< s denote the 32-bit value obtained by circularly
119 shifting (rotating) X left by s bit positions. Let not(X) denote the
120 bit-wise complement of X, and let X v Y denote the bit-wise OR of X
121 and Y. Let X xor Y denote the bit-wise XOR of X and Y, and let XY
122 denote the bit-wise AND of X and Y.
125 3. MD5 Algorithm Description
127 We begin by supposing that we have a b-bit message as input, and that
128 we wish to find its message digest. Here b is an arbitrary
129 nonnegative integer; b may be zero, it need not be a multiple of
130 eight, and it may be arbitrarily large. We imagine the bits of the
131 message written down as follows:
135 The following five steps are performed to compute the message digest
138 3.1 Step 1. Append Padding Bits
140 The message is "padded" (extended) so that its length (in bits) is
141 congruent to 448, modulo 512. That is, the message is extended so
142 that it is just 64 bits shy of being a multiple of 512 bits long.
143 Padding is always performed, even if the length of the message is
144 already congruent to 448, modulo 512 (in which case 512 bits of
147 Padding is performed as follows: a single "1" bit is appended to the
148 message, and then enough zero bits are appended so that the length in
149 bits of the padded message becomes congruent to 448, modulo 512.
151 3.2 Step 2. Append Length
153 A 64-bit representation of b (the length of the message before the
154 padding bits were added) is appended to the result of the previous
155 step. In the unlikely event that b is greater than 2^64, then only
156 the low-order 64 bits of b are used. (These bits are appended as two
157 32-bit words and appended low-order word first in accordance with the
158 previous conventions.)
160 At this point the resulting message (after padding with bits and with
161 b) has a length that is an exact multiple of 512 bits. Equivalently,
162 this message has a length that is an exact multiple of 16 (32-bit)
163 words. Let M[0 ... N-1] denote the words of the resulting message,
164 where N is a multiple of 16.
166 3.3 Step 3. Initialize MD Buffer
168 A four-word buffer (A,B,C,D) is used to compute the message digest.
169 Here each of A, B, C, D is a 32-bit register. These registers are
170 initialized to the following values in hexadecimal, low-order bytes
178 3.4 Step 4. Process Message in 16-Word Blocks
180 We first define four auxiliary functions that each take as input
181 three 32-bit words and produce as output one 32-bit word.
183 F(X,Y,Z) = XY v not(X) Z
184 G(X,Y,Z) = XZ v Y not(Z)
185 H(X,Y,Z) = X xor Y xor Z
186 I(X,Y,Z) = Y xor (X v not(Z))
188 In each bit position F acts as a conditional: if X then Y else Z.
189 (The function F could have been defined using + instead of v since XY
190 and not(X)Z will never have 1's in the same bit position.) It is
191 interesting to note that if the bits of X, Y, and Z are independent
192 and unbiased, the each bit of F(X,Y,Z) will be independent and
195 The functions G, H, and I are similar to the function F, in that they
196 act in "bitwise parallel" to produce their output from the bits of X,
197 Y, and Z, in such a manner that if the corresponding bits of X, Y,
198 and Z are independent and unbiased, then each bit of G(X,Y,Z),
199 H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note that
200 the function H is the bit-wise "xor" or "parity" function of its
205 /* Process each 16-word block. */
206 For i = 0 to N/16-1 do
208 /* Copy block i into X. */
210 Set X[j] to M[i*16+j].
211 end /* of loop on j */
213 /* Save A as AA, B as BB, C as CC, and D as DD. */
220 /* Let FF(a,b,c,d,X[k],s,t) denote the operation
221 a = b + ((a + F(b,c,d) + X[k] + t) <<< s). */
222 /* Here the additive constants t are chosen as follows:
223 In step i, the additive constant is the integer part of
224 4294967296 times abs(sin(i)), where i is in radians. */
225 /* Let S11 = 7, S12 = 12, S13 = 17, and S14 = 22. */
226 /* Do the following 16 operations. */
227 FF (a, b, c, d, X[ 0], S11, 3614090360); /* Step 1 */
228 FF (d, a, b, c, X[ 1], S12, 3905402710); /* 2 */
229 FF (c, d, a, b, X[ 2], S13, 606105819); /* 3 */
230 FF (b, c, d, a, X[ 3], S14, 3250441966); /* 4 */
231 FF (a, b, c, d, X[ 4], S11, 4118548399); /* 5 */
232 FF (d, a, b, c, X[ 5], S12, 1200080426); /* 6 */
233 FF (c, d, a, b, X[ 6], S13, 2821735955); /* 7 */
234 FF (b, c, d, a, X[ 7], S14, 4249261313); /* 8 */
235 FF (a, b, c, d, X[ 8], S11, 1770035416); /* 9 */
236 FF (d, a, b, c, X[ 9], S12, 2336552879); /* 10 */
237 FF (c, d, a, b, X[10], S13, 4294925233); /* 11 */
238 FF (b, c, d, a, X[11], S14, 2304563134); /* 12 */
239 FF (a, b, c, d, X[12], S11, 1804603682); /* 13 */
240 FF (d, a, b, c, X[13], S12, 4254626195); /* 14 */
241 FF (c, d, a, b, X[14], S13, 2792965006); /* 15 */
242 FF (b, c, d, a, X[15], S14, 1236535329); /* 16 */
245 /* Let GG(a,b,c,d,X[k],s,t) denote the operation
246 a = b + ((a + G(b,c,d) + X[k] + t) <<< s). */
247 /* Let S21 = 5, S22 = 9, S23 = 14, and S24 = 20. */
249 /* Do the following 16 operations. */
250 GG (a, b, c, d, X[ 1], S21, 4129170786); /* 17 */
251 GG (d, a, b, c, X[ 6], S22, 3225465664); /* 18 */
252 GG (c, d, a, b, X[11], S23, 643717713); /* 19 */
253 GG (b, c, d, a, X[ 0], S24, 3921069994); /* 20 */
254 GG (a, b, c, d, X[ 5], S21, 3593408605); /* 21 */
255 GG (d, a, b, c, X[10], S22, 38016083); /* 22 */
256 GG (c, d, a, b, X[15], S23, 3634488961); /* 23 */
257 GG (b, c, d, a, X[ 4], S24, 3889429448); /* 24 */
258 GG (a, b, c, d, X[ 9], S21, 568446438); /* 25 */
259 GG (d, a, b, c, X[14], S22, 3275163606); /* 26 */
260 GG (c, d, a, b, X[ 3], S23, 4107603335); /* 27 */
261 GG (b, c, d, a, X[ 8], S24, 1163531501); /* 28 */
262 GG (a, b, c, d, X[13], S21, 2850285829); /* 29 */
263 GG (d, a, b, c, X[ 2], S22, 4243563512); /* 30 */
264 GG (c, d, a, b, X[ 7], S23, 1735328473); /* 31 */
265 GG (b, c, d, a, X[12], S24, 2368359562); /* 32 */
268 /* Let HH(a,b,c,d,X[k],s,t) denote the operation
269 a = b + ((a + H(b,c,d) + X[k] + t) <<< s). */
270 /* Let S31 = 4, S32 = 11, S33 = 16, and S34 = 23. */
272 /* Do the following 16 operations. */
273 HH (a, b, c, d, X[ 5], S31, 4294588738); /* 33 */
274 HH (d, a, b, c, X[ 8], S32, 2272392833); /* 34 */
275 HH (c, d, a, b, X[11], S33, 1839030562); /* 35 */
276 HH (b, c, d, a, X[14], S34, 4259657740); /* 36 */
277 HH (a, b, c, d, X[ 1], S31, 2763975236); /* 37 */
278 HH (d, a, b, c, X[ 4], S32, 1272893353); /* 38 */
279 HH (c, d, a, b, X[ 7], S33, 4139469664); /* 39 */
280 HH (b, c, d, a, X[10], S34, 3200236656); /* 40 */
281 HH (a, b, c, d, X[13], S31, 681279174); /* 41 */
282 HH (d, a, b, c, X[ 0], S32, 3936430074); /* 42 */
283 HH (c, d, a, b, X[ 3], S33, 3572445317); /* 43 */
284 HH (b, c, d, a, X[ 6], S34, 76029189); /* 44 */
285 HH (a, b, c, d, X[ 9], S31, 3654602809); /* 45 */
286 HH (d, a, b, c, X[12], S32, 3873151461); /* 46 */
287 HH (c, d, a, b, X[15], S33, 530742520); /* 47 */
288 HH (b, c, d, a, X[ 2], S34, 3299628645); /* 48 */
291 /* Let II(a,b,c,d,X[k],s,t) denote the operation
292 a = b + ((a + I(b,c,d) + X[k] + t) <<< s). */
293 /* Let S41 = 6, S42 = 10, S43 = 15, and S44 = 21. */
295 /* Do the following 16 operations. */
296 II (a, b, c, d, X[ 0], S41, 4096336452); /* 49 */
297 II (d, a, b, c, X[ 7], S42, 1126891415); /* 50 */
298 II (c, d, a, b, X[14], S43, 2878612391); /* 51 */
299 II (b, c, d, a, X[ 5], S44, 4237533241); /* 52 */
300 II (a, b, c, d, X[12], S41, 1700485571); /* 53 */
301 II (d, a, b, c, X[ 3], S42, 2399980690); /* 54 */
302 II (c, d, a, b, X[10], S43, 4293915773); /* 55 */
303 II (b, c, d, a, X[ 1], S44, 2240044497); /* 56 */
304 II (a, b, c, d, X[ 8], S41, 1873313359); /* 57 */
305 II (d, a, b, c, X[15], S42, 4264355552); /* 58 */
306 II (c, d, a, b, X[ 6], S43, 2734768916); /* 59 */
307 II (b, c, d, a, X[13], S44, 1309151649); /* 60 */
308 II (a, b, c, d, X[ 4], S41, 4149444226); /* 61 */
309 II (d, a, b, c, X[11], S42, 3174756917); /* 62 */
310 II (c, d, a, b, X[ 2], S43, 718787259); /* 63 */
311 II (b, c, d, a, X[ 9], S44, 3951481745); /* 64 */
313 /* Then perform the following additions. (That is, increment each
314 of the four registers by the value it had before this block
321 end /* of loop on i */
325 The message digest produced as output is A, B, C, D. That is, we
326 begin with the low-order byte of A, and end with the high-order byte
329 This completes the description of MD5. A reference implementation in
330 C is given in the Appendix.
335 The MD5 message-digest algorithm is simple to implement, and provides
336 a "fingerprint" or message digest of a message of arbitrary length.
337 It is conjectured that the difficulty of coming up with two messages
338 having the same message digest is on the order of 2^64 operations,
339 and that the difficulty of coming up with any message having a given
340 message digest is on the order of 2^128 operations. The MD5 algorithm
341 has been carefully scrutinized for weaknesses. It is, however, a
342 relatively new algorithm and further security analysis is of course
343 justified, as is the case with any new proposal of this sort.
346 5. Summary of Differences Between MD4 and MD5
348 The following are the differences between MD4 and MD5:
350 -- A fourth round has been added.
352 -- Each step now has a unique additive constant.
354 -- The function g in round 2 was changed from (XY v XZ v YZ)
355 to (XZ v Y not(Z)) to make g less symmetric.
357 -- Each step now adds in the result of the previous step.
358 This promotes a faster "avalanche effect".
360 -- The order in which input words are accessed in rounds 2
361 and 3 is changed, to make these patterns less like each
364 -- The shift amounts in each round have been approximately
365 optimized, to yield a faster "avalanche effect". The
366 shifts in different rounds are distinct.
369 6. Security Considerations
371 The level of security discussed in this memo is considered to be
372 sufficient for implementing very high security hybrid digital-
373 signature schemes based on MD5 and a public-key cryptosystem.
378 [1] Rivest, R.L., The MD4 Message Digest Algorithm (RFC 1186),
381 [2] Rivest, R.L., The MD4 message digest algorithm, presented at
382 CRYPTO '90 (Santa Barbara, CA, August 11-15, 1990).
384 [3] CCITT, The Directory---Authentication Framework
385 (Recommendation X.509), 1988.
391 Massachusetts Institute of Technology
392 Laboratory for Computer Science
394 545 Technology Square
395 Cambridge, MA 02139-1986
396 Phone: (617) 253-5880
397 EMail: rivest@theory.lcs.mit.edu
400 RSA Data Security, Inc.
401 10 Twin Dolphin Drive
402 Redwood City, CA 94065
403 Phone: (415) 595-8782
407 APPENDIX - Reference Implementation
409 This appendix contains the following files:
411 md5.h -- header file for implementation of MD5
413 md5.c -- the source code for MD5 routines
415 md5driver.c -- sample test routines
417 session -- sample results of running md5driver
419 It is not difficult to improve this implementation on particular
420 platforms, an exercise left to the reader. Following are some
423 1. Change MD5Update so that the context is not used at all
424 if it is empty (mdi == 0) and 64 or more bytes remain
425 (inLen >= 64). In other words, call Transform with inBuf
426 in this case. (This requires that byte ordering is
429 2. Implement a procedure MD5UpdateLong modeled after
430 MD5Update where inBuf is UINT4 * instead of unsigned char
431 *. MD5UpdateLong would call Transform directly with 16-
432 word blocks from inBuf. Call this instead of MD5Update in
433 general. This works well if you have an I/O procedure
434 that can read long words from a file.
436 3. On "little-endian" platforms where the lowest-address
437 byte in a long word is the least significant (and there
438 are no alignment restrictions), change MD5Update to call
439 Transform directly with 64-byte blocks from inBuf
440 (typecast to a UINT4 *).