external/bsd/bind/dist/doc/rfc/rfc4648.txt

   1
   2
   3
   4
   5
   6
   7 Network Working Group                                       S. Josefsson
   8 Request for Comments: 4648                                           SJD
   9 Obsoletes: 3548                                             October 2006
  10 Category: Standards Track
  11
  12
  13              The Base16, Base32, and Base64 Data Encodings
  14
  15 Status of This Memo
  16
  17    This document specifies an Internet standards track protocol for the
  18    Internet community, and requests discussion and suggestions for
  19    improvements.  Please refer to the current edition of the "Internet
  20    Official Protocol Standards" (STD 1) for the standardization state
  21    and status of this protocol.  Distribution of this memo is unlimited.
  22
  23 Copyright Notice
  24
  25    Copyright (C) The Internet Society (2006).
  26
  27 Abstract
  28
  29    This document describes the commonly used base 64, base 32, and base
  30    16 encoding schemes.  It also discusses the use of line-feeds in
  31    encoded data, use of padding in encoded data, use of non-alphabet
  32    characters in encoded data, use of different encoding alphabets, and
  33    canonical encodings.
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58 Josefsson                   Standards Track                     [Page 1]
  59 \f
  60 RFC 4648                    Base-N Encodings                October 2006
  61
  62
  63 Table of Contents
  64
  65    1. Introduction ....................................................3
  66    2. Conventions Used in This Document ...............................3
  67    3. Implementation Discrepancies ....................................3
  68       3.1. Line Feeds in Encoded Data .................................3
  69       3.2. Padding of Encoded Data ....................................4
  70       3.3. Interpretation of Non-Alphabet Characters in Encoded Data ..4
  71       3.4. Choosing the Alphabet ......................................4
  72       3.5. Canonical Encoding .........................................5
  73    4. Base 64 Encoding ................................................5
  74    5. Base 64 Encoding with URL and Filename Safe Alphabet ............7
  75    6. Base 32 Encoding ................................................8
  76    7. Base 32 Encoding with Extended Hex Alphabet ....................10
  77    8. Base 16 Encoding ...............................................10
  78    9. Illustrations and Examples .....................................11
  79    10. Test Vectors ..................................................12
  80    11. ISO C99 Implementation of Base64 ..............................14
  81    12. Security Considerations .......................................14
  82    13. Changes Since RFC 3548 ........................................15
  83    14. Acknowledgements ..............................................15
  84    15. Copying Conditions ............................................15
  85    16. References ....................................................16
  86       16.1. Normative References .....................................16
  87       16.2. Informative References ...................................16
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114 Josefsson                   Standards Track                     [Page 2]
 115 \f
 116 RFC 4648                    Base-N Encodings                October 2006
 117
 118
 119 1.  Introduction
 120
 121    Base encoding of data is used in many situations to store or transfer
 122    data in environments that, perhaps for legacy reasons, are restricted
 123    to US-ASCII [1] data.  Base encoding can also be used in new
 124    applications that do not have legacy restrictions, simply because it
 125    makes it possible to manipulate objects with text editors.
 126
 127    In the past, different applications have had different requirements
 128    and thus sometimes implemented base encodings in slightly different
 129    ways.  Today, protocol specifications sometimes use base encodings in
 130    general, and "base64" in particular, without a precise description or
 131    reference.  Multipurpose Internet Mail Extensions (MIME) [4] is often
 132    used as a reference for base64 without considering the consequences
 133    for line-wrapping or non-alphabet characters.  The purpose of this
 134    specification is to establish common alphabet and encoding
 135    considerations.  This will hopefully reduce ambiguity in other
 136    documents, leading to better interoperability.
 137
 138 2.  Conventions Used in This Document
 139
 140    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
 141    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
 142    document are to be interpreted as described in [2].
 143
 144 3.  Implementation Discrepancies
 145
 146    Here we discuss the discrepancies between base encoding
 147    implementations in the past and, where appropriate, mandate a
 148    specific recommended behavior for the future.
 149
 150 3.1.  Line Feeds in Encoded Data
 151
 152    MIME [4] is often used as a reference for base 64 encoding.  However,
 153    MIME does not define "base 64" per se, but rather a "base 64 Content-
 154    Transfer-Encoding" for use within MIME.  As such, MIME enforces a
 155    limit on line length of base 64-encoded data to 76 characters.  MIME
 156    inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
 157    that it is "virtually identical"; however, PEM uses a line length of
 158    64 characters.  The MIME and PEM limits are both due to limits within
 159    SMTP.
 160
 161    Implementations MUST NOT add line feeds to base-encoded data unless
 162    the specification referring to this document explicitly directs base
 163    encoders to add line feeds after a specific number of characters.
 164
 165
 166
 167
 168
 169
 170 Josefsson                   Standards Track                     [Page 3]
 171 \f
 172 RFC 4648                    Base-N Encodings                October 2006
 173
 174
 175 3.2.  Padding of Encoded Data
 176
 177    In some circumstances, the use of padding ("=") in base-encoded data
 178    is not required or used.  In the general case, when assumptions about
 179    the size of transported data cannot be made, padding is required to
 180    yield correct decoded data.
 181
 182    Implementations MUST include appropriate pad characters at the end of
 183    encoded data unless the specification referring to this document
 184    explicitly states otherwise.
 185
 186    The base64 and base32 alphabets use padding, as described below in
 187    sections 4 and 6, but the base16 alphabet does not need it; see
 188    section 8.
 189
 190 3.3.  Interpretation of Non-Alphabet Characters in Encoded Data
 191
 192    Base encodings use a specific, reduced alphabet to encode binary
 193    data.  Non-alphabet characters could exist within base-encoded data,
 194    caused by data corruption or by design.  Non-alphabet characters may
 195    be exploited as a "covert channel", where non-protocol data can be
 196    sent for nefarious purposes.  Non-alphabet characters might also be
 197    sent in order to exploit implementation errors leading to, e.g.,
 198    buffer overflow attacks.
 199
 200    Implementations MUST reject the encoded data if it contains
 201    characters outside the base alphabet when interpreting base-encoded
 202    data, unless the specification referring to this document explicitly
 203    states otherwise.  Such specifications may instead state, as MIME
 204    does, that characters outside the base encoding alphabet should
 205    simply be ignored when interpreting data ("be liberal in what you
 206    accept").  Note that this means that any adjacent carriage return/
 207    line feed (CRLF) characters constitute "non-alphabet characters" and
 208    are ignored.  Furthermore, such specifications MAY ignore the pad
 209    character, "=", treating it as non-alphabet data, if it is present
 210    before the end of the encoded data.  If more than the allowed number
 211    of pad characters is found at the end of the string (e.g., a base 64
 212    string terminated with "==="), the excess pad characters MAY also be
 213    ignored.
 214
 215 3.4.  Choosing the Alphabet
 216
 217    Different applications have different requirements on the characters
 218    in the alphabet.  Here are a few requirements that determine which
 219    alphabet should be used:
 220
 221
 222
 223
 224
 225
 226 Josefsson                   Standards Track                     [Page 4]
 227 \f
 228 RFC 4648                    Base-N Encodings                October 2006
 229
 230
 231    o  Handled by humans.  The characters "0" and "O" are easily
 232       confused, as are "1", "l", and "I".  In the base32 alphabet below,
 233       where 0 (zero) and 1 (one) are not present, a decoder may
 234       interpret 0 as O, and 1 as I or L depending on case.  (However, by
 235       default it should not; see previous section.)
 236
 237    o  Encoded into structures that mandate other requirements.  For base
 238       16 and base 32, this determines the use of upper- or lowercase
 239       alphabets.  For base 64, the non-alphanumeric characters (in
 240       particular, "/") may be problematic in file names and URLs.
 241
 242    o  Used as identifiers.  Certain characters, notably "+" and "/" in
 243       the base 64 alphabet, are treated as word-breaks by legacy text
 244       search/index tools.
 245
 246    There is no universally accepted alphabet that fulfills all the
 247    requirements.  For an example of a highly specialized variant, see
 248    IMAP [8].  In this document, we document and name some currently used
 249    alphabets.
 250
 251 3.5.  Canonical Encoding
 252
 253    The padding step in base 64 and base 32 encoding can, if improperly
 254    implemented, lead to non-significant alterations of the encoded data.
 255    For example, if the input is only one octet for a base 64 encoding,
 256    then all six bits of the first symbol are used, but only the first
 257    two bits of the next symbol are used.  These pad bits MUST be set to
 258    zero by conforming encoders, which is described in the descriptions
 259    on padding below.  If this property do not hold, there is no
 260    canonical representation of base-encoded data, and multiple base-
 261    encoded strings can be decoded to the same binary data.  If this
 262    property (and others discussed in this document) holds, a canonical
 263    encoding is guaranteed.
 264
 265    In some environments, the alteration is critical and therefore
 266    decoders MAY chose to reject an encoding if the pad bits have not
 267    been set to zero.  The specification referring to this may mandate a
 268    specific behaviour.
 269
 270 4.  Base 64 Encoding
 271
 272    The following description of base 64 is derived from [3], [4], [5],
 273    and [6].  This encoding may be referred to as "base64".
 274
 275    The Base 64 encoding is designed to represent arbitrary sequences of
 276    octets in a form that allows the use of both upper- and lowercase
 277    letters but that need not be human readable.
 278
 279
 280
 281
 282 Josefsson                   Standards Track                     [Page 5]
 283 \f
 284 RFC 4648                    Base-N Encodings                October 2006
 285
 286
 287    A 65-character subset of US-ASCII is used, enabling 6 bits to be
 288    represented per printable character.  (The extra 65th character, "=",
 289    is used to signify a special processing function.)
 290
 291    The encoding process represents 24-bit groups of input bits as output
 292    strings of 4 encoded characters.  Proceeding from left to right, a
 293    24-bit input group is formed by concatenating 3 8-bit input groups.
 294    These 24 bits are then treated as 4 concatenated 6-bit groups, each
 295    of which is translated into a single character in the base 64
 296    alphabet.
 297
 298    Each 6-bit group is used as an index into an array of 64 printable
 299    characters.  The character referenced by the index is placed in the
 300    output string.
 301
 302                       Table 1: The Base 64 Alphabet
 303
 304      Value Encoding  Value Encoding  Value Encoding  Value Encoding
 305          0 A            17 R            34 i            51 z
 306          1 B            18 S            35 j            52 0
 307          2 C            19 T            36 k            53 1
 308          3 D            20 U            37 l            54 2
 309          4 E            21 V            38 m            55 3
 310          5 F            22 W            39 n            56 4
 311          6 G            23 X            40 o            57 5
 312          7 H            24 Y            41 p            58 6
 313          8 I            25 Z            42 q            59 7
 314          9 J            26 a            43 r            60 8
 315         10 K            27 b            44 s            61 9
 316         11 L            28 c            45 t            62 +
 317         12 M            29 d            46 u            63 /
 318         13 N            30 e            47 v
 319         14 O            31 f            48 w         (pad) =
 320         15 P            32 g            49 x
 321         16 Q            33 h            50 y
 322
 323    Special processing is performed if fewer than 24 bits are available
 324    at the end of the data being encoded.  A full encoding quantum is
 325    always completed at the end of a quantity.  When fewer than 24 input
 326    bits are available in an input group, bits with value zero are added
 327    (on the right) to form an integral number of 6-bit groups.  Padding
 328    at the end of the data is performed using the '=' character.  Since
 329    all base 64 input is an integral number of octets, only the following
 330    cases can arise:
 331
 332    (1) The final quantum of encoding input is an integral multiple of 24
 333        bits; here, the final unit of encoded output will be an integral
 334        multiple of 4 characters with no "=" padding.
 335
 336
 337
 338 Josefsson                   Standards Track                     [Page 6]
 339 \f
 340 RFC 4648                    Base-N Encodings                October 2006
 341
 342
 343    (2) The final quantum of encoding input is exactly 8 bits; here, the
 344        final unit of encoded output will be two characters followed by
 345        two "=" padding characters.
 346
 347    (3) The final quantum of encoding input is exactly 16 bits; here, the
 348        final unit of encoded output will be three characters followed by
 349        one "=" padding character.
 350
 351 5.  Base 64 Encoding with URL and Filename Safe Alphabet
 352
 353    The Base 64 encoding with an URL and filename safe alphabet has been
 354    used in [12].
 355
 356    An alternative alphabet has been suggested that would use "~" as the
 357    63rd character.  Since the "~" character has special meaning in some
 358    file system environments, the encoding described in this section is
 359    recommended instead.  The remaining unreserved URI character is ".",
 360    but some file system environments do not permit multiple "." in a
 361    filename, thus making the "." character unattractive as well.
 362
 363    The pad character "=" is typically percent-encoded when used in an
 364    URI [9], but if the data length is known implicitly, this can be
 365    avoided by skipping the padding; see section 3.2.
 366
 367    This encoding may be referred to as "base64url".  This encoding
 368    should not be regarded as the same as the "base64" encoding and
 369    should not be referred to as only "base64".  Unless clarified
 370    otherwise, "base64" refers to the base 64 in the previous section.
 371
 372    This encoding is technically identical to the previous one, except
 373    for the 62:nd and 63:rd alphabet character, as indicated in Table 2.
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394 Josefsson                   Standards Track                     [Page 7]
 395 \f
 396 RFC 4648                    Base-N Encodings                October 2006
 397
 398
 399          Table 2: The "URL and Filename safe" Base 64 Alphabet
 400
 401      Value Encoding  Value Encoding  Value Encoding  Value Encoding
 402          0 A            17 R            34 i            51 z
 403          1 B            18 S            35 j            52 0
 404          2 C            19 T            36 k            53 1
 405          3 D            20 U            37 l            54 2
 406          4 E            21 V            38 m            55 3
 407          5 F            22 W            39 n            56 4
 408          6 G            23 X            40 o            57 5
 409          7 H            24 Y            41 p            58 6
 410          8 I            25 Z            42 q            59 7
 411          9 J            26 a            43 r            60 8
 412         10 K            27 b            44 s            61 9
 413         11 L            28 c            45 t            62 - (minus)
 414         12 M            29 d            46 u            63 _
 415         13 N            30 e            47 v           (underline)
 416         14 O            31 f            48 w
 417         15 P            32 g            49 x
 418         16 Q            33 h            50 y         (pad) =
 419
 420 6.  Base 32 Encoding
 421
 422    The following description of base 32 is derived from [11] (with
 423    corrections).  This encoding may be referred to as "base32".
 424
 425    The Base 32 encoding is designed to represent arbitrary sequences of
 426    octets in a form that needs to be case insensitive but that need not
 427    be human readable.
 428
 429    A 33-character subset of US-ASCII is used, enabling 5 bits to be
 430    represented per printable character.  (The extra 33rd character, "=",
 431    is used to signify a special processing function.)
 432
 433    The encoding process represents 40-bit groups of input bits as output
 434    strings of 8 encoded characters.  Proceeding from left to right, a
 435    40-bit input group is formed by concatenating 5 8bit input groups.
 436    These 40 bits are then treated as 8 concatenated 5-bit groups, each
 437    of which is translated into a single character in the base 32
 438    alphabet.  When a bit stream is encoded via the base 32 encoding, the
 439    bit stream must be presumed to be ordered with the most-significant-
 440    bit first.  That is, the first bit in the stream will be the high-
 441    order bit in the first 8bit byte, the eighth bit will be the low-
 442    order bit in the first 8bit byte, and so on.
 443
 444
 445
 446
 447
 448
 449
 450 Josefsson                   Standards Track                     [Page 8]
 451 \f
 452 RFC 4648                    Base-N Encodings                October 2006
 453
 454
 455    Each 5-bit group is used as an index into an array of 32 printable
 456    characters.  The character referenced by the index is placed in the
 457    output string.  These characters, identified in Table 3, below, are
 458    selected from US-ASCII digits and uppercase letters.
 459
 460                      Table 3: The Base 32 Alphabet
 461
 462      Value Encoding  Value Encoding  Value Encoding  Value Encoding
 463          0 A             9 J            18 S            27 3
 464          1 B            10 K            19 T            28 4
 465          2 C            11 L            20 U            29 5
 466          3 D            12 M            21 V            30 6
 467          4 E            13 N            22 W            31 7
 468          5 F            14 O            23 X
 469          6 G            15 P            24 Y         (pad) =
 470          7 H            16 Q            25 Z
 471          8 I            17 R            26 2
 472
 473    Special processing is performed if fewer than 40 bits are available
 474    at the end of the data being encoded.  A full encoding quantum is
 475    always completed at the end of a body.  When fewer than 40 input bits
 476    are available in an input group, bits with value zero are added (on
 477    the right) to form an integral number of 5-bit groups.  Padding at
 478    the end of the data is performed using the "=" character.  Since all
 479    base 32 input is an integral number of octets, only the following
 480    cases can arise:
 481
 482    (1) The final quantum of encoding input is an integral multiple of 40
 483        bits; here, the final unit of encoded output will be an integral
 484        multiple of 8 characters with no "=" padding.
 485
 486    (2) The final quantum of encoding input is exactly 8 bits; here, the
 487        final unit of encoded output will be two characters followed by
 488        six "=" padding characters.
 489
 490    (3) The final quantum of encoding input is exactly 16 bits; here, the
 491        final unit of encoded output will be four characters followed by
 492        four "=" padding characters.
 493
 494    (4) The final quantum of encoding input is exactly 24 bits; here, the
 495        final unit of encoded output will be five characters followed by
 496        three "=" padding characters.
 497
 498    (5) The final quantum of encoding input is exactly 32 bits; here, the
 499        final unit of encoded output will be seven characters followed by
 500        one "=" padding character.
 501
 502
 503
 504
 505
 506 Josefsson                   Standards Track                     [Page 9]
 507 \f
 508 RFC 4648                    Base-N Encodings                October 2006
 509
 510
 511 7.  Base 32 Encoding with Extended Hex Alphabet
 512
 513    The following description of base 32 is derived from [7].  This
 514    encoding may be referred to as "base32hex".  This encoding should not
 515    be regarded as the same as the "base32" encoding and should not be
 516    referred to as only "base32".  This encoding is used by, e.g.,
 517    NextSECure3 (NSEC3) [10].
 518
 519    One property with this alphabet, which the base64 and base32
 520    alphabets lack, is that encoded data maintains its sort order when
 521    the encoded data is compared bit-wise.
 522
 523    This encoding is identical to the previous one, except for the
 524    alphabet.  The new alphabet is found in Table 4.
 525
 526                  Table 4: The "Extended Hex" Base 32 Alphabet
 527
 528          Value Encoding  Value Encoding  Value Encoding  Value Encoding
 529              0 0             9 9            18 I            27 R
 530              1 1            10 A            19 J            28 S
 531              2 2            11 B            20 K            29 T
 532              3 3            12 C            21 L            30 U
 533              4 4            13 D            22 M            31 V
 534              5 5            14 E            23 N
 535              6 6            15 F            24 O         (pad) =
 536              7 7            16 G            25 P
 537              8 8            17 H            26 Q
 538
 539 8.  Base 16 Encoding
 540
 541    The following description is original but analogous to previous
 542    descriptions.  Essentially, Base 16 encoding is the standard case-
 543    insensitive hex encoding and may be referred to as "base16" or "hex".
 544
 545    A 16-character subset of US-ASCII is used, enabling 4 bits to be
 546    represented per printable character.
 547
 548    The encoding process represents 8-bit groups (octets) of input bits
 549    as output strings of 2 encoded characters.  Proceeding from left to
 550    right, an 8-bit input is taken from the input data.  These 8 bits are
 551    then treated as 2 concatenated 4-bit groups, each of which is
 552    translated into a single character in the base 16 alphabet.
 553
 554    Each 4-bit group is used as an index into an array of 16 printable
 555    characters.  The character referenced by the index is placed in the
 556    output string.
 557
 558
 559
 560
 561
 562 Josefsson                   Standards Track                    [Page 10]
 563 \f
 564 RFC 4648                    Base-N Encodings                October 2006
 565
 566
 567                          Table 5: The Base 16 Alphabet
 568
 569          Value Encoding  Value Encoding  Value Encoding  Value Encoding
 570              0 0             4 4             8 8            12 C
 571              1 1             5 5             9 9            13 D
 572              2 2             6 6            10 A            14 E
 573              3 3             7 7            11 B            15 F
 574
 575    Unlike base 32 and base 64, no special padding is necessary since a
 576    full code word is always available.
 577
 578 9.  Illustrations and Examples
 579
 580    To translate between binary and a base encoding, the input is stored
 581    in a structure, and the output is extracted.  The case for base 64 is
 582    displayed in the following figure, borrowed from [5].
 583
 584             +--first octet--+-second octet--+--third octet--+
 585             |7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|7 6 5 4 3 2 1 0|
 586             +-----------+---+-------+-------+---+-----------+
 587             |5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|5 4 3 2 1 0|
 588             +--1.index--+--2.index--+--3.index--+--4.index--+
 589
 590    The case for base 32 is shown in the following figure, borrowed from
 591    [7].  Each successive character in a base-32 value represents 5
 592    successive bits of the underlying octet sequence.  Thus, each group
 593    of 8 characters represents a sequence of 5 octets (40 bits).
 594
 595                         1          2          3
 596              01234567 89012345 67890123 45678901 23456789
 597             +--------+--------+--------+--------+--------+
 598             |< 1 >< 2| >< 3 ><|.4 >< 5.|>< 6 ><.|7 >< 8 >|
 599             +--------+--------+--------+--------+--------+
 600                                                     <===> 8th character
 601                                               <====> 7th character
 602                                          <===> 6th character
 603                                    <====> 5th character
 604                              <====> 4th character
 605                         <===> 3rd character
 606                   <====> 2nd character
 607              <===> 1st character
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618 Josefsson                   Standards Track                    [Page 11]
 619 \f
 620 RFC 4648                    Base-N Encodings                October 2006
 621
 622
 623    The following example of Base64 data is from [5], with corrections.
 624
 625       Input data:  0x14fb9c03d97e
 626       Hex:     1   4    f   b    9   c     | 0   3    d   9    7   e
 627       8-bit:   00010100 11111011 10011100  | 00000011 11011001 01111110
 628       6-bit:   000101 001111 101110 011100 | 000000 111101 100101 111110
 629       Decimal: 5      15     46     28       0      61     37     62
 630       Output:  F      P      u      c        A      9      l      +
 631
 632       Input data:  0x14fb9c03d9
 633       Hex:     1   4    f   b    9   c     | 0   3    d   9
 634       8-bit:   00010100 11111011 10011100  | 00000011 11011001
 635                                                       pad with 00
 636       6-bit:   000101 001111 101110 011100 | 000000 111101 100100
 637       Decimal: 5      15     46     28       0      61     36
 638                                                          pad with =
 639       Output:  F      P      u      c        A      9      k      =
 640
 641       Input data:  0x14fb9c03
 642       Hex:     1   4    f   b    9   c     | 0   3
 643       8-bit:   00010100 11111011 10011100  | 00000011
 644                                              pad with 0000
 645       6-bit:   000101 001111 101110 011100 | 000000 110000
 646       Decimal: 5      15     46     28       0      48
 647                                                   pad with =      =
 648       Output:  F      P      u      c        A      w      =      =
 649
 650 10.  Test Vectors
 651
 652    BASE64("") = ""
 653
 654    BASE64("f") = "Zg=="
 655
 656    BASE64("fo") = "Zm8="
 657
 658    BASE64("foo") = "Zm9v"
 659
 660    BASE64("foob") = "Zm9vYg=="
 661
 662    BASE64("fooba") = "Zm9vYmE="
 663
 664    BASE64("foobar") = "Zm9vYmFy"
 665
 666    BASE32("") = ""
 667
 668    BASE32("f") = "MY======"
 669
 670    BASE32("fo") = "MZXQ===="
 671
 672
 673
 674 Josefsson                   Standards Track                    [Page 12]
 675 \f
 676 RFC 4648                    Base-N Encodings                October 2006
 677
 678
 679    BASE32("foo") = "MZXW6==="
 680
 681    BASE32("foob") = "MZXW6YQ="
 682
 683    BASE32("fooba") = "MZXW6YTB"
 684
 685    BASE32("foobar") = "MZXW6YTBOI======"
 686
 687    BASE32-HEX("") = ""
 688
 689    BASE32-HEX("f") = "CO======"
 690
 691    BASE32-HEX("fo") = "CPNG===="
 692
 693    BASE32-HEX("foo") = "CPNMU==="
 694
 695    BASE32-HEX("foob") = "CPNMUOG="
 696
 697    BASE32-HEX("fooba") = "CPNMUOJ1"
 698
 699    BASE32-HEX("foobar") = "CPNMUOJ1E8======"
 700
 701    BASE16("") = ""
 702
 703    BASE16("f") = "66"
 704
 705    BASE16("fo") = "666F"
 706
 707    BASE16("foo") = "666F6F"
 708
 709    BASE16("foob") = "666F6F62"
 710
 711    BASE16("fooba") = "666F6F6261"
 712
 713    BASE16("foobar") = "666F6F626172"
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730 Josefsson                   Standards Track                    [Page 13]
 731 \f
 732 RFC 4648                    Base-N Encodings                October 2006
 733
 734
 735 11.  ISO C99 Implementation of Base64
 736
 737    An ISO C99 implementation of Base64 encoding and decoding that is
 738    believed to follow all recommendations in this RFC is available from:
 739
 740       http://josefsson.org/base-encoding/
 741
 742    This code is not normative.
 743
 744    The code could not be included in this RFC for procedural reasons
 745    (RFC 3978 section 5.4).
 746
 747 12.  Security Considerations
 748
 749    When base encoding and decoding is implemented, care should be taken
 750    not to introduce vulnerabilities to buffer overflow attacks, or other
 751    attacks on the implementation.  A decoder should not break on invalid
 752    input including, e.g., embedded NUL characters (ASCII 0).
 753
 754    If non-alphabet characters are ignored, instead of causing rejection
 755    of the entire encoding (as recommended), a covert channel that can be
 756    used to "leak" information is made possible.  The ignored characters
 757    could also be used for other nefarious purposes, such as to avoid a
 758    string equality comparison or to trigger implementation bugs.  The
 759    implications of ignoring non-alphabet characters should be understood
 760    in applications that do not follow the recommended practice.
 761    Similarly, when the base 16 and base 32 alphabets are handled case
 762    insensitively, alteration of case can be used to leak information or
 763    make string equality comparisons fail.
 764
 765    When padding is used, there are some non-significant bits that
 766    warrant security concerns, as they may be abused to leak information
 767    or used to bypass string equality comparisons or to trigger
 768    implementation problems.
 769
 770    Base encoding visually hides otherwise easily recognized information,
 771    such as passwords, but does not provide any computational
 772    confidentiality.  This has been known to cause security incidents
 773    when, e.g., a user reports details of a network protocol exchange
 774    (perhaps to illustrate some other problem) and accidentally reveals
 775    the password because she is unaware that the base encoding does not
 776    protect the password.
 777
 778    Base encoding adds no entropy to the plaintext, but it does increase
 779    the amount of plaintext available and provide a signature for
 780    cryptanalysis in the form of a characteristic probability
 781    distribution.
 782
 783
 784
 785
 786 Josefsson                   Standards Track                    [Page 14]
 787 \f
 788 RFC 4648                    Base-N Encodings                October 2006
 789
 790
 791 13.  Changes Since RFC 3548
 792
 793    Added the "base32 extended hex alphabet", needed to preserve sort
 794    order of encoded data.
 795
 796    Referenced IMAP for the special Base64 encoding used there.
 797
 798    Fixed the example copied from RFC 2440.
 799
 800    Added security consideration about providing a signature for
 801    cryptoanalysis.
 802
 803    Added test vectors.
 804
 805    Fixed typos.
 806
 807 14.  Acknowledgements
 808
 809    Several people offered comments and/or suggestions, including John E.
 810    Hadstate, Tony Hansen, Gordon Mohr, John Myers, Chris Newman, and
 811    Andrew Sieber.  Text used in this document are based on earlier RFCs
 812    describing specific uses of various base encodings.  The author
 813    acknowledges the RSA Laboratories for supporting the work that led to
 814    this document.
 815
 816    This revised version is based in parts on comments and/or suggestions
 817    made by Roy Arends, Eric Blake, Brian E Carpenter, Elwyn Davies, Bill
 818    Fenner, Sam Hartman, Ted Hardie, Per Hygum, Jelte Jansen, Clement
 819    Kent, Tero Kivinen, Paul Kwiatkowski, and Ben Laurie.
 820
 821 15.  Copying Conditions
 822
 823    Copyright (c) 2000-2006 Simon Josefsson
 824
 825    Regarding the abstract and sections 1, 3, 8, 10, 12, 13, and 14 of
 826    this document, that were written by Simon Josefsson ("the author",
 827    for the remainder of this section), the author makes no guarantees
 828    and is not responsible for any damage resulting from its use.  The
 829    author grants irrevocable permission to anyone to use, modify, and
 830    distribute it in any way that does not diminish the rights of anyone
 831    else to use, modify, and distribute it, provided that redistributed
 832    derivative works do not contain misleading author or version
 833    information and do not falsely purport to be IETF RFC documents.
 834    Derivative works need not be licensed under similar terms.
 835
 836
 837
 838
 839
 840
 841
 842 Josefsson                   Standards Track                    [Page 15]
 843 \f
 844 RFC 4648                    Base-N Encodings                October 2006
 845
 846
 847 16.  References
 848
 849 16.1.  Normative References
 850
 851    [1]   Cerf, V., "ASCII format for network interchange", RFC 20,
 852          October 1969.
 853
 854    [2]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
 855          Levels", BCP 14, RFC 2119, March 1997.
 856
 857 16.2.  Informative References
 858
 859    [3]   Linn, J., "Privacy Enhancement for Internet Electronic Mail:
 860          Part I: Message Encryption and Authentication Procedures", RFC
 861          1421, February 1993.
 862
 863    [4]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
 864          Extensions (MIME) Part One: Format of Internet Message Bodies",
 865          RFC 2045, November 1996.
 866
 867    [5]   Callas, J., Donnerhacke, L., Finney, H., and R. Thayer,
 868          "OpenPGP Message Format", RFC 2440, November 1998.
 869
 870    [6]   Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose,
 871          "DNS Security Introduction and Requirements", RFC 4033, March
 872          2005.
 873
 874    [7]   Klyne, G. and L. Masinter, "Identifying Composite Media
 875          Features", RFC 2938, September 2000.
 876
 877    [8]   Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION
 878          4rev1", RFC 3501, March 2003.
 879
 880    [9]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
 881          Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
 882          January 2005.
 883
 884    [10]  Laurie, B., Sisson, G., Arends, R., and D. Blacka, "DNSSEC Hash
 885          Authenticated Denial of Existence", Work in Progress, June
 886          2006.
 887
 888    [11]  Myers, J., "SASL GSSAPI mechanisms", Work in Progress, May
 889          2000.
 890
 891    [12]  Wilcox-O'Hearn, B., "Post to P2P-hackers mailing list",
 892          http://zgp.org/pipermail/p2p-hackers/2001-September/
 893          000315.html, September 2001.
 894
 895
 896
 897
 898 Josefsson                   Standards Track                    [Page 16]
 899 \f
 900 RFC 4648                    Base-N Encodings                October 2006
 901
 902
 903 Author's Address
 904
 905    Simon Josefsson
 906    SJD
 907    EMail: simon@josefsson.org
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954 Josefsson                   Standards Track                    [Page 17]
 955 \f
 956 RFC 4648                    Base-N Encodings                October 2006
 957
 958
 959 Full Copyright Statement
 960
 961    Copyright (C) The Internet Society (2006).
 962
 963    This document is subject to the rights, licenses and restrictions
 964    contained in BCP 78, and except as set forth therein, the authors
 965    retain all their rights.
 966
 967    This document and the information contained herein are provided on an
 968    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
 969    OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
 970    ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
 971    INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
 972    INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
 973    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 974
 975 Intellectual Property
 976
 977    The IETF takes no position regarding the validity or scope of any
 978    Intellectual Property Rights or other rights that might be claimed to
 979    pertain to the implementation or use of the technology described in
 980    this document or the extent to which any license under such rights
 981    might or might not be available; nor does it represent that it has
 982    made any independent effort to identify any such rights.  Information
 983    on the procedures with respect to rights in RFC documents can be
 984    found in BCP 78 and BCP 79.
 985
 986    Copies of IPR disclosures made to the IETF Secretariat and any
 987    assurances of licenses to be made available, or the result of an
 988    attempt made to obtain a general license or permission for the use of
 989    such proprietary rights by implementers or users of this
 990    specification can be obtained from the IETF on-line IPR repository at
 991    http://www.ietf.org/ipr.
 992
 993    The IETF invites any interested party to bring to its attention any
 994    copyrights, patents or patent applications, or other proprietary
 995    rights that may cover technology that may be required to implement
 996    this standard.  Please address the information to the IETF at
 997    ietf-ipr@ietf.org.
 998
 999 Acknowledgement
1000
1001    Funding for the RFC Editor function is provided by the IETF
1002    Administrative Support Activity (IASA).
1003
1004
1005
1006
1007
1008
1009
1010 Josefsson                   Standards Track                    [Page 18]
1011 \f