7 Network Working Group N. Freed
8 Request for Comments: 2231 Innosoft
9 Updates: 2045, 2047, 2183 K. Moore
10 Obsoletes: 2184 University of Tennessee
11 Category: Standards Track November 1997
14 MIME Parameter Value and Encoded Word Extensions:
15 Character Sets, Languages, and Continuations
20 This document specifies an Internet standards track protocol for the
21 Internet community, and requests discussion and suggestions for
22 improvements. Please refer to the current edition of the "Internet
23 Official Protocol Standards" (STD 1) for the standardization state
24 and status of this protocol. Distribution of this memo is unlimited.
28 Copyright (C) The Internet Society (1997). All Rights Reserved.
32 This memo defines extensions to the RFC 2045 media type and RFC 2183
33 disposition parameter value mechanisms to provide
35 (1) a means to specify parameter values in character sets
38 (2) to specify the language to be used should the value be
41 (3) a continuation mechanism for long parameter values to
42 avoid problems with header line wrapping.
44 This memo also defines an extension to the encoded words defined in
45 RFC 2047 to allow the specification of the language to be used for
46 display as well as the character set.
50 The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
51 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
58 Freed & Moore Standards Track [Page 1]
60 RFC 2231 MIME Value and Encoded Word Extensions November 1997
63 (1) textual message bodies in character sets other than
66 (2) non-textual message bodies,
68 (3) multi-part message bodies, and
70 (4) textual header information in character sets other than
73 MIME is now widely deployed and is used by a variety of Internet
74 protocols, including, of course, Internet email. However, MIME's
75 success has resulted in the need for additional mechanisms that were
76 not provided in the original protocol specification.
78 In particular, existing MIME mechanisms provide for named media type
79 (content-type field) parameters as well as named disposition
80 (content-disposition field). A MIME media type may specify any
81 number of parameters associated with all of its subtypes, and any
82 specific subtype may specify additional parameters for its own use. A
83 MIME disposition value may specify any number of associated
84 parameters, the most important of which is probably the attachment
85 disposition's filename parameter.
87 These parameter names and values end up appearing in the content-type
88 and content-disposition header fields in Internet email. This
89 inherently imposes three crucial limitations:
91 (1) Lines in Internet email header fields are folded
92 according to RFC 822 folding rules. This makes long
93 parameter values problematic.
95 (2) MIME headers, like the RFC 822 headers they often
96 appear in, are limited to 7bit US-ASCII, and the
97 encoded-word mechanisms of RFC 2047 are not available
98 to parameter values. This makes it impossible to have
99 parameter values in character sets other than US-ASCII
100 without specifying some sort of private per-parameter
103 (3) It has recently become clear that character set
104 information is not sufficient to properly display some
105 sorts of information -- language information is also
106 needed [RFC-2130]. For example, support for
107 handicapped users may require reading text string
114 Freed & Moore Standards Track [Page 2]
116 RFC 2231 MIME Value and Encoded Word Extensions November 1997
119 aloud. The language the text is written in is needed
120 for this to be done correctly. Some parameter values
121 may need to be displayed, hence there is a need to
122 allow for the inclusion of language information.
124 The last problem on this list is also an issue for the encoded words
125 defined by RFC 2047, as encoded words are intended primarily for
128 This document defines extensions that address all of these
129 limitations. All of these extensions are implemented in a fashion
130 that is completely compatible at a syntactic level with existing MIME
131 implementations. In addition, the extensions are designed to have as
132 little impact as possible on existing uses of MIME.
134 IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when
135 they actually are used. As such, these mechanisms should not be used
136 lightly; they should be reserved for situations where a real need for
139 2.1. Requirements notation
141 This document occasionally uses terms that appear in capital letters.
142 When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
143 appear capitalized, they are being used to indicate particular
144 requirements of this specification. A discussion of the meanings of
145 these terms appears in [RFC- 2119].
147 3. Parameter Value Continuations
149 Long MIME media type or disposition parameter values do not interact
150 well with header line wrapping conventions. In particular, proper
151 header line wrapping depends on there being places where linear
152 whitespace (LWSP) is allowed, which may or may not be present in a
153 parameter value, and even if present may not be recognizable as such
154 since specific knowledge of parameter value syntax may not be
155 available to the agent doing the line wrapping. The result is that
156 long parameter values may end up getting truncated or otherwise
157 damaged by incorrect line wrapping implementations.
159 A mechanism is therefore needed to break up parameter values into
160 smaller units that are amenable to line wrapping. Any such mechanism
161 MUST be compatible with existing MIME processors. This means that
163 (1) the mechanism MUST NOT change the syntax of MIME media
164 type and disposition lines, and
170 Freed & Moore Standards Track [Page 3]
172 RFC 2231 MIME Value and Encoded Word Extensions November 1997
175 (2) the mechanism MUST NOT depend on parameter ordering
176 since MIME states that parameters are not order
177 sensitive. Note that while MIME does prohibit
178 modification of MIME headers during transport, it is
179 still possible that parameters will be reordered when
180 user agent level processing is done.
182 The obvious solution, then, is to use multiple parameters to contain
183 a single parameter value and to use some kind of distinguished name
184 to indicate when this is being done. And this obvious solution is
185 exactly what is specified here: The asterisk character ("*") followed
186 by a decimal count is employed to indicate that multiple parameters
187 are being used to encapsulate a single parameter value. The count
188 starts at 0 and increments by 1 for each subsequent section of the
189 parameter value. Decimal values are used and neither leading zeroes
190 nor gaps in the sequence are allowed.
192 The original parameter value is recovered by concatenating the
193 various sections of the parameter, in order. For example, the
196 Content-Type: message/external-body; access-type=URL;
198 URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
200 is semantically identical to
202 Content-Type: message/external-body; access-type=URL;
203 URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
205 Note that quotes around parameter values are part of the value
206 syntax; they are NOT part of the value itself. Furthermore, it is
207 explicitly permitted to have a mixture of quoted and unquoted
210 4. Parameter Value Character Set and Language Information
212 Some parameter values may need to be qualified with character set or
213 language information. It is clear that a distinguished parameter
214 name is needed to identify when this information is present along
215 with a specific syntax for the information in the value itself. In
216 addition, a lightweight encoding mechanism is needed to accommodate 8
217 bit information in parameter values.
226 Freed & Moore Standards Track [Page 4]
228 RFC 2231 MIME Value and Encoded Word Extensions November 1997
231 Asterisks ("*") are reused to provide the indicator that language and
232 character set information is present and encoding is being used. A
233 single quote ("'") is used to delimit the character set and language
234 information at the beginning of the parameter value. Percent signs
235 ("%") are used as the encoding flag, which agrees with RFC 2047.
237 Specifically, an asterisk at the end of a parameter name acts as an
238 indicator that character set and language information may appear at
239 the beginning of the parameter value. A single quote is used to
240 separate the character set, language, and actual value information in
241 the parameter value string, and an percent sign is used to flag
242 octets encoded in hexadecimal. For example:
244 Content-Type: application/x-stuff;
245 title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
247 Note that it is perfectly permissible to leave either the character
248 set or language field blank. Note also that the single quote
249 delimiters MUST be present even when one of the field values is
250 omitted. This is done when either character set, language, or both
251 are not relevant to the parameter value at hand. This MUST NOT be
252 done in order to indicate a default character set or language --
253 parameter field definitions MUST NOT assign a default character set
256 4.1. Combining Character Set, Language, and Parameter Continuations
258 Character set and language information may be combined with the
259 parameter continuation mechanism. For example:
261 Content-Type: application/x-stuff
262 title*0*=us-ascii'en'This%20is%20even%20more%20
263 title*1*=%2A%2A%2Afun%2A%2A%2A%20
268 (1) Language and character set information only appear at
269 the beginning of a given parameter value.
271 (2) Continuations do not provide a facility for using more
272 than one character set or language in the same
275 (3) A value presented using multiple continuations may
276 contain a mixture of encoded and unencoded segments.
282 Freed & Moore Standards Track [Page 5]
284 RFC 2231 MIME Value and Encoded Word Extensions November 1997
287 (4) The first segment of a continuation MUST be encoded if
288 language and character set information are given.
290 (5) If the first segment of a continued parameter value is
291 encoded the language and character set field delimiters
292 MUST be present even when the fields are left blank.
294 5. Language specification in Encoded Words
296 RFC 2047 provides support for non-US-ASCII character sets in RFC 822
297 message header comments, phrases, and any unstructured text field.
298 This is done by defining an encoded word construct which can appear
299 in any of these places. Given that these are fields intended for
300 display, it is sometimes necessary to associate language information
301 with encoded words as well as just the character set. This
302 specification extends the definition of an encoded word to allow the
303 inclusion of such information. This is simply done by suffixing the
304 character set specification with an asterisk followed by the language
307 From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
309 6. IMAP4 Handling of Parameter Values
311 IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
312 when generating the BODY and BODYSTRUCTURE fetch attributes.
314 7. Modifications to MIME ABNF
316 The ABNF for MIME parameter values given in RFC 2045 is:
318 parameter := attribute "=" value
321 ; Matching of attributes
322 ; is ALWAYS case-insensitive.
324 This specification changes this ABNF to:
326 parameter := regular-parameter / extended-parameter
328 regular-parameter := regular-parameter-name "=" value
330 regular-parameter-name := attribute [section]
332 attribute := 1*attribute-char
338 Freed & Moore Standards Track [Page 6]
340 RFC 2231 MIME Value and Encoded Word Extensions November 1997
343 attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
344 "*", "'", "%", or tspecials>
346 section := initial-section / other-sections
348 initial-section := "*0"
350 other-sections := "*" ("1" / "2" / "3" / "4" / "5" /
351 "6" / "7" / "8" / "9") *DIGIT)
353 extended-parameter := (extended-initial-name "="
355 (extended-other-names "="
356 extended-other-values)
358 extended-initial-name := attribute [initial-section] "*"
360 extended-other-names := attribute other-sections "*"
362 extended-initial-value := [charset] "'" [language] "'"
363 extended-other-values
365 extended-other-values := *(ext-octet / attribute-char)
367 ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
369 charset := <registered character set name>
371 language := <registered language tag [RFC-1766]>
373 The ABNF given in RFC 2047 for encoded-words is:
375 encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
377 This specification changes this ABNF to:
379 encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
381 8. Character sets which allow specification of language
383 In the future it is likely that some character sets will provide
384 facilities for inline language labeling. Such facilities are
385 inherently more flexible than those defined here as they allow for
386 language switching in the middle of a string.
394 Freed & Moore Standards Track [Page 7]
396 RFC 2231 MIME Value and Encoded Word Extensions November 1997
399 If and when such facilities are developed they SHOULD be used in
400 preference to the language labeling facilities specified here. Note
401 that all the mechanisms defined here allow for the omission of
402 language labels so as to be able to accommodate this possible future
405 9. Security Considerations
407 This RFC does not discuss security issues and is not believed to
408 raise any security issues not already endemic in electronic mail and
409 present in fully conforming implementations of MIME.
414 Crocker, D., "Standard for the Format of ARPA Internet
415 Text Messages", STD 11, RFC 822 August 1982.
418 Alvestrand, H., "Tags for the Identification of
419 Languages", RFC 1766, March 1995.
422 Freed, N., and N. Borenstein, "Multipurpose Internet Mail
423 Extensions (MIME) Part One: Format of Internet Message
424 Bodies", RFC 2045, December 1996.
427 Freed, N. and N. Borenstein, "Multipurpose Internet Mail
428 Extensions (MIME) Part Two: Media Types", RFC 2046,
432 Moore, K., "Multipurpose Internet Mail Extensions (MIME)
433 Part Three: Representation of Non-ASCII Text in Internet
434 Message Headers", RFC 2047, December 1996.
437 Freed, N., Klensin, J. and J. Postel, "Multipurpose
438 Internet Mail Extensions (MIME) Part Four: MIME
439 Registration Procedures", RFC 2048, December 1996.
442 Freed, N. and N. Borenstein, "Multipurpose Internet Mail
443 Extensions (MIME) Part Five: Conformance Criteria and
444 Examples", RFC 2049, December 1996.
450 Freed & Moore Standards Track [Page 8]
452 RFC 2231 MIME Value and Encoded Word Extensions November 1997
456 Crispin, M., "Internet Message Access Protocol - Version
457 4rev1", RFC 2060, December 1996.
460 Bradner, S., "Key words for use in RFCs to Indicate
461 Requirement Levels", RFC 2119, March 1997.
464 Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
465 Atkinson, R., Crispin, M., and P. Svanberg, "Report from the
466 IAB Character Set Workshop", RFC 2130, April 1997.
469 Troost, R., Dorner, S. and K. Moore, "Communicating
470 Presentation Information in Internet Messages: The
471 Content-Disposition Header", RFC 2183, August 1997.
473 11. Authors' Addresses
476 Innosoft International, Inc.
478 West Covina, CA 91790
481 Phone: +1 626 919 3600
483 EMail: ned.freed@innosoft.com
487 Computer Science Dept.
488 University of Tennessee
490 Knoxville, TN 37996-1301
493 EMail: moore@cs.utk.edu
506 Freed & Moore Standards Track [Page 9]
508 RFC 2231 MIME Value and Encoded Word Extensions November 1997
511 12. Full Copyright Statement
513 Copyright (C) The Internet Society (1997). All Rights Reserved.
515 This document and translations of it may be copied and furnished to
516 others, and derivative works that comment on or otherwise explain it
517 or assist in its implementation may be prepared, copied, published
518 and distributed, in whole or in part, without restriction of any
519 kind, provided that the above copyright notice and this paragraph are
520 included on all such copies and derivative works. However, this
521 document itself may not be modified in any way, such as by removing
522 the copyright notice or references to the Internet Society or other
523 Internet organizations, except as needed for the purpose of
524 developing Internet standards in which case the procedures for
525 copyrights defined in the Internet Standards process must be
526 followed, or as required to translate it into languages other than
529 The limited permissions granted above are perpetual and will not be
530 revoked by the Internet Society or its successors or assigns.
532 This document and the information contained herein is provided on an
533 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
534 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
535 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
536 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
537 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
562 Freed & Moore Standards Track [Page 10]