7 Network Working Group Y. Abel, Ed.
8 Request for Comments: 5335 TWNIC
9 Updates: 2045, 2822 September 2008
10 Category: Experimental
13 Internationalized Email Headers
17 This memo defines an Experimental Protocol for the Internet
18 community. It does not specify an Internet standard of any kind.
19 Discussion and suggestions for improvement are requested.
20 Distribution of this memo is unlimited.
24 Full internationalization of electronic mail requires not only the
25 capabilities to transmit non-ASCII content, to encode selected
26 information in specific header fields, and to use non-ASCII
27 characters in envelope addresses. It also requires being able to
28 express those addresses and the information based on them in mail
29 header fields. This document specifies an experimental variant of
30 Internet mail that permits the use of Unicode encoded in UTF-8,
31 rather than ASCII, as the base form for Internet email header field.
32 This form is permitted in transmission only if authorized by an SMTP
33 extension, as specified in an associated specification. This
34 specification Updates section 6.4 of RFC 2045 to conform with the
58 Abel Experimental [Page 1]
60 RFC 5335 I18N Email Headers September 2008
65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
66 1.1. Role of This Specification . . . . . . . . . . . . . . . . 3
67 1.2. Relation to Other Standards . . . . . . . . . . . . . . . 3
68 2. Background and History . . . . . . . . . . . . . . . . . . . . 3
69 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4
70 4. Changes on Message Header Fields . . . . . . . . . . . . . . . 5
71 4.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 5
72 4.2. Changes on MIME Headers . . . . . . . . . . . . . . . . . 6
73 4.3. Syntax Extensions to RFC 2822 . . . . . . . . . . . . . . 6
74 4.4. Change on addr-spec Syntax . . . . . . . . . . . . . . . . 8
75 4.5. Trace Field Syntax . . . . . . . . . . . . . . . . . . . . 9
76 4.6. message/global . . . . . . . . . . . . . . . . . . . . . . 9
77 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11
78 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
79 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12
80 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
81 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12
82 8.2. Informative References . . . . . . . . . . . . . . . . . . 13
114 Abel Experimental [Page 2]
116 RFC 5335 I18N Email Headers September 2008
121 1.1. Role of This Specification
123 Full internationalization of electronic mail requires several
126 o The capability to transmit non-ASCII content, provided for as part
127 of the basic MIME specification [RFC2045], [RFC2046].
129 o The capability to use international characters in envelope
130 addresses, discussed in [RFC4952] and specified in [RFC5336].
132 o The capability to express those addresses, and information related
133 to them and based on them, in mail header fields, defined in this
136 This document specifies an experimental variant of Internet mail that
137 permits the use of Unicode encoded in UTF-8 [RFC3629], rather than
138 ASCII, as the base form for Internet email header fields. This form
139 is permitted in transmission, if authorized by the SMTP extension
140 specified in [RFC5336] or by other transport mechanisms capable of
143 1.2. Relation to Other Standards
145 This document updates Section 6.4 of RFC 2045. It removes the
146 blanket ban on applying a content-transfer-encoding to all subtypes
147 of message/, and instead specifies that a composite subtype MAY
148 specify whether or not a content-transfer-encoding can be used for
149 that subtype, with "cannot be used" as the default.
151 This document also updates [RFC2822] and MIME ([RFC2045]), and the
152 fact that an Experimental specification updates a Standards-Track
153 specification means that people who participate in the experiment
154 have to consider those standards updated.
156 Allowing use of a content-transfer-encoding on subtypes of messages
157 is not limited to transmissions that are authorized by the SMTP
158 extension specified in [RFC5336]. Message/global permits use of a
159 content-transfer-encoding.
161 2. Background and History
163 Mailbox names often represent the names of human users. Many of
164 these users throughout the world have names that are not normally
165 expressed with just the ASCII repertoire of characters, and would
166 like to use more or less their real names in their mailbox names.
170 Abel Experimental [Page 3]
172 RFC 5335 I18N Email Headers September 2008
175 These users are also likely to use non-ASCII text in their common
176 names and subjects of email messages, both received and sent. This
177 protocol specifies UTF-8 as the encoding to represent email header
180 The traditional format of email messages [RFC2822] allows only ASCII
181 characters in the header fields of messages. This prevents users
182 from having email addresses that contain non-ASCII characters. It
183 further forces non-ASCII text in common names, comments, and in free
184 text (such as in the Subject: field) to be encoded (as required by
185 MIME format [RFC2047]). This specification describes a change to the
186 email message format that is related to the SMTP message transport
187 change described in the associated document [RFC4952] and [RFC5336],
188 and that allows non-ASCII characters in most email header fields.
189 These changes affect SMTP clients, SMTP servers, mail user agents
190 (MUAs), list expanders, gateways to other media, and all other
191 processes that parse or handle email messages.
193 As specified in [RFC5336], an SMTP protocol extension "UTF8SMTP" is
194 used to prevent the transmission of messages with UTF-8 header fields
195 to systems that cannot handle such messages.
197 Use of this SMTP extension helps prevent the introduction of such
198 messages into message stores that might misinterpret, improperly
199 display, or mangle such messages. It should be noted that using an
200 ESMTP extension does not prevent transferring email messages with
201 UTF-8 header fields to other systems that use the email format for
202 messages and that may not be upgraded, such as unextended POP and
203 IMAP servers. Changes to these protocols to handle UTF-8 header
204 fields are addressed in [EAI-POP] and [IMAP-UTF8] .
206 The objective for this protocol is to allow UTF-8 in email header
207 fields. Issues such as how to handle messages containing UTF-8
208 header fields that have to be delivered to systems that have not been
209 upgraded to support this capability are discussed in [DOWNGRADE].
213 A plain ASCII string is also a valid UTF-8 string; see [RFC3629]. In
214 this document, ordinary ASCII characters are UTF-8 characters if they
215 are in headers which contain <utf8-xtra-char>s.
217 Unless otherwise noted, all terms used here are defined in [RFC2821],
218 [RFC2822], [RFC4952], or [RFC5336].
220 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
221 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
222 document are to be interpreted as described in [RFC2119].
226 Abel Experimental [Page 4]
228 RFC 5335 I18N Email Headers September 2008
231 4. Changes on Message Header Fields
233 SMTP clients can send header fields in UTF-8 format, if the UTF8SMTP
234 extension is advertised by the SMTP server or is permitted by other
235 transport mechanisms.
237 This protocol does NOT change the [RFC2822] rules for defining header
238 field names. The bodies of header fields are allowed to contain
239 UTF-8 characters, but the header field names themselves must contain
240 only ASCII characters.
242 To permit UTF-8 characters in field values, the header definition in
243 [RFC2822] must be extended to support the new format. The following
244 ABNF is defined to substitute those definitions in [RFC2822].
246 The syntax rules not covered in this section remain as defined in
249 4.1. UTF-8 Syntax and Normalization
251 UTF-8 characters can be defined in terms of octets using the
252 following ABNF [RFC5234], taken from [RFC3629]:
254 UTF8-xtra-char = UTF8-2 / UTF8-3 / UTF8-4
256 UTF8-2 = %xC2-DF UTF8-tail
258 UTF8-3 = %xE0 %xA0-BF UTF8-tail /
259 %xE1-EC 2(UTF8-tail) /
260 %xED %x80-9F UTF8-tail /
263 UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) /
264 %xF1-F3 3( UTF8-tail ) /
265 %xF4 %x80-8F 2( UTF8-tail )
269 These are normatively defined in [RFC3629], but kept in this document
270 for reasons of convenience.
272 See [RFC5198] for a discussion of normalization; the use of
273 normalization form NFC is RECOMMENDED.
282 Abel Experimental [Page 5]
284 RFC 5335 I18N Email Headers September 2008
287 4.2. Changes on MIME Headers
289 This specification updates Section 6.4 of [RFC2045]. [RFC2045]
290 prohibits applying a content-transfer-encoding to all subtypes of
291 message/. This specification relaxes the rule -- it allows newly
292 defined MIME types to permit content-transfer-encoding, and it allows
293 content-transfer-encoding for message/global (see Section 4.6).
295 Background: Normally, transfer of message/global will be done in
296 8-bit-clean channels, and body parts will have "identity" encodings,
297 that is, no decoding is necessary. In the case where a message
298 containing a message/global is downgraded from 8-bit to 7-bit as
299 described in [RFC1652], an encoding may be applied to the message; if
300 the message travels multiple times between a 7-bit environment and an
301 environment implementing UTF8SMTP, multiple levels of encoding may
302 occur. This is expected to be rarely seen in practice, and the
303 potential complexity of other ways of dealing with the issue are
304 thought to be larger than the complexity of allowing nested encodings
307 4.3. Syntax Extensions to RFC 2822
309 The following rules are intended to extend the corresponding rules in
310 [RFC2822] in order to allow UTF-8 characters.
312 FWS = <see [RFC2822], folding white space>
314 CFWS = <see [RFC2822], folding white space>
316 ctext =/ UTF8-xtra-char
318 utext =/ UTF8-xtra-char
320 comment = "(" *([FWS] utf8-ccontent) [FWS] ")"
322 word = utf8-atom / utf8-quoted-string
324 This means that all the [RFC2822] constructs that build upon these
325 will permit UTF-8 characters, including comments and quoted strings.
326 We do not change the syntax of <atext> in order to allow UTF8
327 characters in <addr-spec>. This would also allow UTF-8 characters in
328 <message-id>, which is not allowed due to the limitation described in
329 Section 4.5. Instead, <utf8-atext> is added to meet this
338 Abel Experimental [Page 6]
340 RFC 5335 I18N Email Headers September 2008
343 utf8-text = %d1-9 / ; all UTF-8 characters except
344 %d11-12 / ; US-ASCII NUL, CR, and LF
348 utf8-quoted-pair = ("\" utf8-text) / obs-qp
350 utf8-qcontent = utf8-qtext / utf8-quoted-pair
352 utf8-quoted-string = [CFWS]
353 DQUOTE *([FWS] utf8-qcontent) [FWS] DQUOTE
356 utf8-ccontent = ctext / utf8-quoted-pair / comment
358 utf8-qtext = qtext / UTF8-xtra-char
360 utf8-atext = ALPHA / DIGIT /
361 "!" / "#" / ; Any character except
362 "$" / "%" / ; controls, SP, and specials.
363 "&" / "'" / ; Used for atoms.
373 utf8-atom = [CFWS] 1*utf8-atext [CFWS]
375 utf8-dot-atom = [CFWS] utf8-dot-atom-text [CFWS]
377 utf8-dot-atom-text = 1*utf8-atext *("." 1*utf8-atext)
379 qcontent = utf8-qcontent
381 To allow the use of UTF-8 in a Content-Description header field
382 [RFC2045], the following syntax is used:
384 description = "Content-Description:" unstructured CRLF
386 The <utext> syntax is extended above to allow UTF-8 in all
387 <unstructured> header fields.
394 Abel Experimental [Page 7]
396 RFC 5335 I18N Email Headers September 2008
399 Note, however, this does not remove any constraint on the character
400 set of protocol elements; for instance, all the allowed values for
401 timezone in the Date: headers are still expressed in ASCII. And
402 also, none of this revised syntax changes what is allowed in a
403 <msg-id>, which will still remain in pure ASCII.
405 4.4. Change on addr-spec Syntax
407 Internationalized email addresses are represented in UTF-8. Thus,
408 all header fields containing <mailbox>es are updated to permit UTF-8
409 as well as an additional, optional all-ASCII alternate address. Note
410 that Message Submission Servers ("MSAs") and Message Transfer Agents
411 (MTAs) may downgrade internationalized messages as needed. The
412 procedure for doing so is described in [DOWNGRADE].
414 mailbox = name-addr / addr-spec / utf8-addr-spec
416 angle-addr =/ [CFWS] "<" utf8-addr-spec [ alt-address ] ">"
417 [CFWS] / obs-angle-addr
419 utf8-addr-spec = utf8-local-part "@" utf8-domain
421 utf8-local-part= utf8-dot-atom / utf8-quoted-string / obs-local-part
423 utf8-domain = utf8-dot-atom / domain-literal / obs-domain
425 alt-address = FWS "<" addr-spec ">"
427 Below are a few examples of possible <mailbox> representations.
429 "DISPLAY_NAME" <ASCII@ASCII>
430 ; traditional mailbox format
432 "DISPLAY_NAME" <non-ASCII@non-ASCII>
433 ; UTF8SMTP but no ALT-ADDRESS parameter provided,
434 ; message will bounce if UTF8SMTP extension is not supported
436 <non-ASCII@non-ASCII>
437 ; without DISPLAY_NAME and quoted string
438 ; UTF8SMTP but no ALT-ADDRESS parameter provided,
439 ; message will bounce if UTF8SMTP extension is not supported
441 "DISPLAY_NAME" <non-ASCII@non-ASCII <ASCII@ASCII>>
442 ; UTF8SMTP with ALT-ADDRESS parameter provided,
443 ; ALT-ADDRESS can be used if downgrade is necessary
450 Abel Experimental [Page 8]
452 RFC 5335 I18N Email Headers September 2008
455 4.5. Trace Field Syntax
457 "For" fields containing internationalized addresses are allowed, by
458 use of the new uFor syntax. UTF-8 information may be needed in
459 Received fields. Such information is therefore allowed to preserve
460 the integrity of those fields. The uFor syntax retains the original
461 UTF-8 email address between email address internationalization (EAI)-
462 aware MTAs. Note that, should downgrading be required, the uFor
463 parameter is dropped per the procedure specified in [DOWNGRADE].
465 The "Return-Path" header provides the email return address in the
466 mail delivery. Thus, the header is augmented to carry UTF-8
467 addresses (see the revised syntax of <angle-addr> in Section 4.4 of
468 this document). This will not break the rule of trace field
469 integrity, because the header is added at the last MTA and described
472 The <item-value> on "Received:" syntax is augmented to allow UTF-8
473 email address in the "For" field. <angle-addr> is augmented to
474 include UTF-8 email address. In order to allow UTF-8 email addresses
475 in an <addr-spec>, <utf8-addr-spec> is added to <item-value>.
477 item-value =/ utf8-addr-spec
481 Internationalized messages must only be transmitted as authorized by
482 [RFC5336] or within a non-SMTP environment which supports these
483 messages. A message is a "message/global message", if
485 o it contains UTF-8 header values as specified in this document, or
487 o it contains UTF-8 values in the headers fields of body parts.
489 The type message/global is similar to message/rfc822, except that it
490 contains a message that can contain UTF-8 characters in the headers
491 of the message or body parts. If this type is sent to a 7-bit-only
492 system, it has to be encoded in MIME [RFC2045]. (Note that a system
493 compliant with MIME that doesn't recognize message/global would treat
494 it as "application/octet-stream" as described in Section 5.2.4 of
497 Alternatively, SMTP servers and other systems which transfer a
498 message/global body part MAY choose to down-convert it to a message/
499 rfc822 body part using the rules described in [DOWNGRADE].
506 Abel Experimental [Page 9]
508 RFC 5335 I18N Email Headers September 2008
515 Required parameters: none
517 Optional parameters: none
519 Encoding considerations: Any content-transfer-encoding is permitted.
520 The 8-bit or binary content-transfer-encodings are recommended
523 Security considerations: See Section 5.
525 Interoperability considerations: The media type provides
526 functionality similar to the message/rfc822 content type for email
527 messages with international email headers. When there is a need
528 to embed or return such content in another message, there is
529 generally an option to use this media type and leave the content
530 unchanged or down-convert the content to message/rfc822. Both of
531 these choices will interoperate with the installed base, but with
532 different properties. Systems unaware of international headers
533 will typically treat a message/global body part as an unknown
534 attachment, while they will understand the structure of a message/
535 rfc822. However, systems that understand message/global will
536 provide functionality superior to the result of a down-conversion
537 to message/rfc822. The most interoperable choice depends on the
540 Published specification: RFC 5335
542 Applications that use this media type: SMTP servers and email
543 clients that support multipart/report generation or parsing.
544 Email clients which forward messages with international headers as
547 Additional information:
549 Magic number(s): none
551 File extension(s): The extension ".u8msg" is suggested.
553 Macintosh file type code(s): A uniform type identifier (UTI) of
554 "public.utf8-email-message" is suggested. This conforms to
555 "public.message" and "public.composite-content", but does not
556 necessarily conform to "public.utf8-plain-text".
562 Abel Experimental [Page 10]
564 RFC 5335 I18N Email Headers September 2008
567 Person & email address to contact for further information: See the
568 Author's Address section of this document.
570 Intended usage: COMMON
572 Restrictions on usage: This is a structured media type which embeds
573 other MIME media types. The 8-bit or binary content-transfer-
574 encoding MUST be used unless this media type is sent over a 7-bit-
577 Author: See the Author's Address section of this document.
579 Change controller: IETF Standards Process
581 5. Security Considerations
583 If a user has a non-ASCII mailbox address and an ASCII mailbox
584 address, a digital certificate that identifies that user may have
585 both addresses in the identity. Having multiple email addresses as
586 identities in a single certificate is already supported in PKIX
587 (Public Key Infrastructure for X.509 Certificates) and OpenPGP.
589 Because UTF-8 often requires several octets to encode a single
590 character, internationalized local parts may cause mail addresses to
591 become longer. As specified in [RFC2822], each line of characters
592 MUST be no more 998 octets, excluding the CRLF.
594 Because internationalized local parts may cause email addresses to be
595 longer, processes that parse, store, or handle email addresses or
596 local parts must take extra care not to overflow buffers, truncate
597 addresses, or exceed storage allotments. Also, they must take care,
598 when comparing, to use the entire lengths of the addresses.
600 In this specification, a user could provide an ASCII alternative
601 address for a non-ASCII address. However, it is possible these two
602 addresses go to different mailboxes, or even different people. This
603 configuration may be based on a user's personal choice or on
604 administration policy. We recognize that if ASCII and non-ASCII
605 email is delivered to two different destinations, based on MTA
606 capability, this may violate the principle of least astonishment, but
607 this is not a "protocol problem".
609 The security impact of UTF-8 headers on email signature systems such
610 as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
611 discussed in RFC 4952, Section 9. A subsequent document [DOWNGRADE]
612 will cover the impact of downgrading on these systems.
618 Abel Experimental [Page 11]
620 RFC 5335 I18N Email Headers September 2008
623 6. IANA Considerations
625 IANA has registered the message/global MIME type using the
626 registration form contained in Section 4.4.
630 This document incorporates many ideas first described in Internet-
631 Draft form by Paul Hoffman, although many details have changed from
634 The author especially thanks Jeff Yeh for his efforts and
635 contributions on editing previous versions.
637 Most of the content of this document is provided by John C Klensin.
638 Also, some significant comments and suggestions were received from
639 Charles H. Lindsey, Kari Hurtta, Pete Resnick, Alexey Melnikov, Chris
640 Newman, Yangwoo Ko, Yoshiro Yoneya, and other members of the JET team
641 (Joint Engineering Team) and were incorporated into the document.
642 The editor sincerely thanks them for their contributions.
646 8.1. Normative References
648 [RFC1652] Klensin, J., Freed, N., Rose, M., Stefferud, E., and D.
649 Crocker, "SMTP Service Extension for 8bit-
650 MIMEtransport", RFC 1652, July 1994.
652 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
653 Requirement Levels", BCP 14, RFC 2119, March 1997.
655 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
658 [RFC2822] Resnick, P., "Internet Message Format", RFC 2822,
661 [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
662 10646", STD 63, RFC 3629, November 2003.
664 [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
665 Internationalized Email", RFC 4952, July 2007.
667 [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for
668 Network Interchange", RFC 5198, March 2008.
674 Abel Experimental [Page 12]
676 RFC 5335 I18N Email Headers September 2008
679 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
680 Specifications: ABNF", STD 68, RFC 5234, January 2008.
682 [RFC5336] Yao, J., Ed. and W. Mao, Ed., "SMTP Extension for
683 Internationalized Email Addresses", RFC 5336,
686 8.2. Informative References
688 [DOWNGRADE] Fujiwara, K. and Y. Yoneya, "Downgrading mechanism for
689 Email Address Internationalization", Work in Progress,
692 [EAI-POP] Newman, C. and R. Gellens, "POP3 Support for UTF-8",
693 Work in Progress, July 2008.
695 [IMAP-UTF8] Resnick, P. and C. Newman, "IMAP Support for UTF-8",
696 Work in Progress, April 2008.
698 [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
699 Extensions (MIME) Part One: Format of Internet Message
700 Bodies", RFC 2045, November 1996.
702 [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
703 Extensions (MIME) Part Two: Media Types", RFC 2046,
706 [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
707 Part Three: Message Header Extensions for Non-ASCII
708 Text", RFC 2047, November 1996.
714 4F-2, No. 9, Sec 2, Roosvelt Rd.
718 Phone: +886 2 23411313 ext 505
719 EMail: abelyang@twnic.net.tw
730 Abel Experimental [Page 13]
732 RFC 5335 I18N Email Headers September 2008
735 Full Copyright Statement
737 Copyright (C) The IETF Trust (2008).
739 This document is subject to the rights, licenses and restrictions
740 contained in BCP 78, and except as set forth therein, the authors
741 retain all their rights.
743 This document and the information contained herein are provided on an
744 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
745 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
746 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
747 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
748 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
749 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
751 Intellectual Property
753 The IETF takes no position regarding the validity or scope of any
754 Intellectual Property Rights or other rights that might be claimed to
755 pertain to the implementation or use of the technology described in
756 this document or the extent to which any license under such rights
757 might or might not be available; nor does it represent that it has
758 made any independent effort to identify any such rights. Information
759 on the procedures with respect to rights in RFC documents can be
760 found in BCP 78 and BCP 79.
762 Copies of IPR disclosures made to the IETF Secretariat and any
763 assurances of licenses to be made available, or the result of an
764 attempt made to obtain a general license or permission for the use of
765 such proprietary rights by implementers or users of this
766 specification can be obtained from the IETF on-line IPR repository at
767 http://www.ietf.org/ipr.
769 The IETF invites any interested party to bring to its attention any
770 copyrights, patents or patent applications, or other proprietary
771 rights that may cover technology that may be required to implement
772 this standard. Please address the information to the IETF at
786 Abel Experimental [Page 14]