2 .he 'Mail Systems and Addressing in 4.2bsd''%'
3 .fo 'Version 4.1'USENIX \- Jan 83'Last Mod 7/25/83'
8 Mail Systems and Addressing
16 1919 Addison Street, Suite 105.
17 Berkeley, California 94704.
28 Routing mail through a heterogeneous internet presents many new
30 Among the worst of these is that of address mapping.
31 Historically, this has been handled on an ad hoc basis.
33 this approach has become unmanageable as internets grow.
35 Sendmail acts a unified
37 to which all mail can be
39 Address interpretation is controlled by a production
41 which can parse both old and new format addresses.
45 a flexible technique that can
46 handle many common situations.
47 Sendmail is not intended to perform
48 user interface functions.
50 Sendmail will replace delivermail in the Berkeley 4.2 distribution.
51 Several major hosts are now or will soon be running sendmail.
52 This change will affect any users that route mail through a sendmail
54 The changes that will be user visible are emphasized.
58 \(dgA considerable part of this work
59 was done while under the employ
61 at the University of California at Berkeley.
64 The mail system to appear in 4.2bsd
65 will contain a number of changes.
66 Most of these changes are based on the replacement of
68 with a new module called
71 implements a general internetwork mail routing facility,
72 featuring aliasing and forwarding,
73 automatic routing to network gateways,
74 and flexible configuration.
75 Of key interest to the mail system user
76 will be the changes in the network addressing structure.
79 each node has an address,
80 and resources can be identified
81 with a host-resource pair;
83 the mail system can refer to users
84 using a host-username pair.
85 Host names and numbers have to be administered by a central authority,
86 but usernames can be assigned locally to each host.
89 multiple networks with different characteristics
93 the syntax and semantics of resource identification change.
94 Certain special cases can be handled trivially
99 providing network names that appear local to hosts
101 as with the Ethernet at Xerox PARC.
102 However, the general case is extremely complex.
104 some networks require that the route the message takes
105 be explicitly specified by the sender,
106 simplifying the database update problem
107 since only adjacent hosts must be entered
108 into the system tables,
109 while others use logical addressing,
110 where the sender specifies the location of the recipient
111 but not how to get there.
112 Some networks use a left-associative syntax
113 and others use a right-associative syntax,
114 causing ambiguity in mixed addresses.
116 Internet standards seek to eliminate these problems.
117 Initially, these proposed expanding the address pairs
120 {network, host, username}
122 Network numbers must be universally agreed upon,
123 and hosts can be assigned locally
125 The user-level presentation was changed
127 comprised of a local resource identification
128 and a hierarchical domain specification
129 with a common static root.
131 separates the issue of physical versus logical addressing.
133 an address of the form
134 .q "eric@a.cc.berkeley.arpa"
135 describes the logical
136 organization of the address space
141 in the Computer Center
143 but not the physical networks used
144 (for example, this could go over different networks
148 or a store-and-forward network).
151 is intended to help bridge the gap
155 of networks that know nothing of each other
156 and the clean, tightly-coupled world
157 of unique network numbers.
158 It can accept old arbitrary address syntaxes,
159 resolving ambiguities using heuristics
160 specified by the system administrator,
161 as well as domain-based addressing.
162 It helps guide the conversion of message formats
163 between disparate networks.
166 is designed to assist a graceful transition
167 to consistent internetwork addressing schemes.
170 Section 1 defines some of the terms
171 frequently left fuzzy
172 when working in mail systems.
173 Section 2 discusses the design goals for
176 the new address formats
177 and basic features of
180 Section 4 discusses some of the special problems
182 The differences between
186 are presented in section 5.
192 use names of actual people
195 to imply a commitment
196 or even an intellectual agreement
197 on the part of these people or organizations.
199 Bell Telephone Laboratories (BTL),
200 Digital Equipment Corporation (DEC),
201 Lawrence Berkeley Laboratories (LBL),
202 Britton-Lee Incorporated (BLI),
203 and the University of California at Berkeley
204 are not committed to any of these proposals at this time.
206 represents no more than
207 the personal opinions of the author.
211 There are four basic concepts
212 that must be clearly distinguished
213 when dealing with mail systems:
214 the user (or the user's agent),
215 the user's identification,
218 These are distinguished primarily by their position independence.
219 .sh 2 "User and Identification"
221 The user is the being
222 (a person or program)
223 that is creating or receiving a message.
226 is an entity operating on behalf of the user \*-
227 such as a secretary who handles my mail.
228 or a program that automatically returns a
230 .q "I am at the UNICOM conference."
232 The identification is the tag
233 that goes along with the particular user.
234 This tag is completely independent of location.
236 my identification is the string
238 and this identification does not change
239 whether I am located at U.C. Berkeley,
241 or at a scientific institute in Austria.
243 Since the identification is frequently ambiguous
247 it is common to add other disambiguating information
248 that is not strictly part of the identification
255 .q "System Administrator"
259 The address specifies a location.
263 my address might change from
264 .q eric@Berkeley.ARPA
268 .q allman@IIASA.Austria
269 depending on my current affiliation.
272 an address is independent of the location of anyone else.
274 my address remains the same to everyone who might be sending me mail.
276 a person at MIT and a person at USC
278 .q eric@Berkeley.ARPA
279 and have it arrive to the same mailbox.
283 service would be provided to map user identifications
287 Currently this is handled by passing around
289 or by calling people on the telephone
290 to find out their address.
293 While an address specifies
301 from sender to receiver.
302 As such, the route is potentially different
303 for every pair of people in the electronic universe.
305 Normally the route is hidden from the user
308 some networks put the burden of determining the route
310 Although this simplifies the software,
311 it also greatly impairs the usability
313 The UUCP network is an example of such a network.
319 \**This section makes no distinction between
326 Compatibility with the existing mail programs,
327 including Bell version 6 mail,
334 and hopefully UUCP mail
340 Reliability, in the sense of guaranteeing
341 that every message is correctly delivered
342 or at least brought to the attention of a human
343 for correct disposal;
344 no message should ever be completely lost.
345 This goal was considered essential
346 because of the emphasis on mail in our environment.
347 It has turned out to be one of the hardest goals to satisfy,
348 especially in the face of the many anomalous message formats
349 produced by various ARPANET sites.
351 certain sites generate improperly formated addresses,
353 causing error-message loops.
354 Some hosts use blanks in names,
355 causing problems with
356 mail programs that assume that an address
358 The semantics of some fields
359 are interpreted slightly differently
362 the obscure features of the ARPANET mail protocol
366 are difficult to support,
367 but must be supported.
369 Existing software to do actual delivery
370 should be used whenever possible.
371 This goal derives as much from political and practical considerations
375 fairly complex environments,
377 connections to a single network type
378 (such as with multiple UUCP or Ethernets).
379 This goal requires consideration of the contents of an address
380 as well as its syntax
381 in order to determine which gateway to use.
383 Configuration information should not be compiled into the code.
384 A single compiled program should be able to run as is at any site
385 (barring such basic changes as the CPU type or the operating system).
386 We have found this seemingly unimportant goal
387 to be critical in real life.
388 Besides the simple problems that occur when any program gets recompiled
389 in a different environment,
392 with anything that they will be recompiling anyway.
395 must be able to let various groups maintain their own mailing lists,
396 and let individuals specify their own forwarding,
397 without modifying the system alias file.
399 Each user should be able to specify which mailer to execute
400 to process mail being delivered for him.
401 This feature allows users who are using specialized mailers
402 that use a different format to build their environment
403 without changing the system,
404 and facilitates specialized functions
405 (such as returning an
406 .q "I am on vacation"
409 Network traffic should be minimized
410 by batching addresses to a single host where possible,
411 without assistance from the user.
413 These goals motivated the architecture illustrated in figure 1.
420 +---------+ +---------+ +---------+
421 | sender1 | | sender2 | | sender3 |
422 +---------+ +---------+ +---------+
424 +----------+ + +----------+
431 +----------+ + +----------+
434 +---------+ +---------+ +---------+
435 | mailer1 | | mailer2 | | mailer3 |
436 +---------+ +---------+ +---------+
441 Figure 1 \*- Sendmail System Structure.
444 The user interacts with a mail generating and sending program.
445 When the mail is created,
448 which routes the message to the correct mailer(s).
449 Since some of the senders may be network servers
450 and some of the mailers may be network clients,
452 may be used as an internet mail gateway.
454 .sh 2 "Address Formats"
456 Arguments may be flags or addresses.
457 Flags set various processing options.
458 Following flag arguments,
459 address arguments may be given.
460 Addresses follow the syntax in RFC822
464 In brief, the format is:
466 Anything in parentheses is thrown away
469 Anything in angle brackets (\c
473 This rule implements the ARPANET standard that addresses of the form
475 user name <machine-address>
477 will send to the electronic
479 rather than the human
485 backslashes quote characters.
486 Backslashes are more powerful
487 in that they will cause otherwise equivalent phrases
488 to compare differently \*- for example,
497 is different from either of them.
499 to avoid normal aliasing
500 or duplicate suppression algorithms.
502 Parentheses, angle brackets, and double quotes
503 must be properly balanced and nested.
504 The rewriting rules control remaining parsing\**.
506 \**Disclaimer: Some special processing is done
507 after rewriting local names; see below.
510 Although old style addresses are still accepted
512 the preferred address format
513 is based on ARPANET-style domain-based addresses
515 These addresses are based on a hierarchical, logical decomposition
516 of the address space.
517 The addresses are hierarchical in a sense
518 similar to the U.S. postal addresses:
519 the messages may first be routed to the correct state,
520 with no initial consideration of the city
521 or other addressing details.
522 The addresses are logical
523 in that each step in the hierarchy
524 corresponds to a set of
525 .q "naming authorities"
526 rather than a physical network.
531 eric@HostA.BigSite.ARPA
533 would first look up the domain
535 in the namespace administrated by
537 A query could then be sent to
539 for interpretation of
541 Eventually the mail would arrive at
543 which would then do final delivery
546 .sh 2 "Mail to Files and Programs"
548 Files and programs are legitimate message recipients.
549 Files provide archival storage of messages,
550 useful for project administration and history.
551 Programs are useful as recipients in a variety of situations,
553 to maintain a public repository of systems messages
554 (such as the Berkeley
558 Any address passing through the initial parsing algorithm
560 (i.e, not appearing to be a valid address for another mailer)
561 is scanned for two special cases.
562 If prefixed by a vertical bar (\c
564 the rest of the address is processed as a shell command.
565 If the user name begins with a slash mark (\c
567 the name is used as a file name,
568 instead of a login name.
569 .sh 2 "Aliasing, Forwarding, Inclusion"
572 reroutes mail three ways.
573 Aliasing applies system wide.
574 Forwarding allows each user to reroute incoming mail
575 destined for that account.
578 to read a file for a list of addresses,
580 in conjunction with aliasing.
583 Aliasing maps local addresses to address lists using a system-wide file.
584 This file is hashed to speed access.
585 Only addresses that parse as local
586 are allowed as aliases;
587 this guarantees a unique key
588 (since there are no nicknames for the local host).
592 if an recipient address specifies a local user
596 file in the recipient's home directory.
601 but rather to the list of addresses in that file.
603 this list will contain only one address,
604 and the feature will be used for network mail forwarding.
606 Forwarding also permits a user to specify a private incoming mailer.
610 "\^|\|/usr/local/newmail myname"
612 will use a different incoming mailer.
615 Inclusion is specified in RFC 733 [Crocker77] syntax:
619 An address of this form reads the file specified by
621 and sends to all users listed in that file.
625 to support direct use of this feature,
626 but rather to use this as a subset of aliasing.
628 an alias of the form:
630 project: :include:/usr/project/userlist
632 is a method of letting a project maintain a mailing list
633 without interaction with the system administration,
634 even if the alias file is protected.
636 It is not necessary to rebuild the index on the alias database
637 when a :include: list is changed.
638 .sh 2 "Message Collection"
640 Once all recipient addresses are parsed and verified,
641 the message is collected.
642 The message comes in two parts:
643 a message header and a message body,
644 separated by a blank line.
645 The body is an uninterpreted
646 sequence of text lines.
648 The header is formated as a series of lines
651 field-name: field-value
653 Field-value can be split across lines by starting the following
654 lines with a space or a tab.
655 Some header fields have special internal meaning,
656 and have appropriate special processing.
657 Other headers are simply passed through.
658 Some header fields may be added automatically,
660 .sh 1 "THE UUCP PROBLEM"
662 Of particular interest
665 used in the UUCP environment
666 causes a number of serious problems.
668 giving out an address
670 without knowing the address of your potential correspondent.
671 This is typically handled
672 by specifying the address
679 it is often difficult to compute
682 without some knowledge
683 of the topology of the network.
684 Although it may be easy for a human being
686 under many circumstances,
687 a program does not have equally sophisticated heuristics
690 certain addresses will become painfully and unnecessarily long,
691 as when a message is routed through many hosts in the USENET.
696 are impossible to parse unambiguously \*-
699 decvax!ucbvax!lbl-h!user@LBL-CSAM
701 might have many possible resolutions,
702 depending on whether the message was first routed
706 To solve this problem,
708 would have to be changed to use addresses
712 .q decvax!ucbvax!eric
713 might be expressed as
715 (with the hop through decvax implied).
716 This address would itself be a domain-based address;
718 an address might be of the form:
720 mark@d.cbosg.btl.UUCP
722 Hosts outside of Bell Telephone Laboratories
723 would then only need to know
724 how to get to a designated BTL relay,
726 would only be maintained inside Bell.
728 There are three major problems
729 associated with turning UUCP addresses
730 into something reasonable:
731 defining the namespace,
732 creating and propagating the necessary software,
733 and building and maintaining the database.
734 .sh 2 "Defining the Namespace"
736 Putting all UUCP hosts into a flat namespace
739 is not practical for a number of reasons.
741 with over 1600 sites already,
742 and (with the increasing availability of inexpensive microcomputers
744 several thousand more coming within a few years,
745 the database update problem
746 is simply intractable
747 if the namespace is flat.
749 there are almost certainly name conflicts today.
751 as the number of sites grow
752 the names become ever less mnemonic.
755 that there be some sort of naming authority
756 for the set of top level names
758 as unpleasant a possibility
760 It will simply not be possible
761 to have one host resolving all names.
762 It may however be possible
764 in a fashion similar to that of assigning names of newsgroups
767 it will be essential to encourage everyone
768 to become subdomains of an existing domain
769 whenever possible \*-
770 even though this will certainly bruise some egos.
774 were to be added to the UUCP network,
775 it would probably actually be addressed as
785 .sh 2 "Creating and Propagating the Software"
787 The software required to implement a consistent namespace
788 is relatively trivial.
789 Two modules are needed,
790 one to handle incoming mail
791 and one to handle outgoing mail.
794 must be prepared to handle either old or new style addresses.
796 can be passed through unchanged.
798 must be turned into new style addresses
802 is slightly trickier.
803 It must do a database lookup on the recipient addresses
804 (passed on the command line)
805 to determine what hosts to send the message to.
806 If those hosts do not accept new-style addresses,
807 it must transform all addresses in the header of the message
808 into old style using the database lookup.
810 Both of these modules
812 except for the issue of modifying the header.
813 It seems prudent to choose one format
814 for the message headers.
815 For a number of reasons,
816 Berkeley has elected to use the ARPANET protocols
819 this protocol is somewhat difficult to parse.
821 Propagation is somewhat more difficult.
822 There are a large number of hosts
824 that will want to run completely standard systems
825 (for very good reasons).
826 The strategy is not to convert the entire network \*-
827 only enough of it it alleviate the problem.
828 .sh 2 "Building and Maintaining the Database"
830 This is by far the most difficult problem.
831 A prototype for this database
833 but it is maintained by hand
834 and does not pretend to be complete.
836 This problem will be reduced considerably
837 if people choose to group their hosts
839 This would require a global update
840 only when a new top level domain
842 A message to a host in a subdomain
843 could simply be routed to a known domain gateway
844 for further processing.
848 might be routed to the
852 new hosts could be added
854 without notifying the rest of the world.
858 be notified as an efficiency measure.
860 There may be more than one domain gateway.
861 A domain such as BTL,
863 might have a dozen gateways to the outside world;
865 could choose the closest gateway.
867 would be that all gateways
868 maintain a consistent view of the domain
870 .sh 2 "Logical Structure"
873 domains are organized into a tree.
874 There need not be a host actually associated
875 with each level in the tree \*-
877 there will be no host associated with the name
880 an organization might group names together for administrative reasons;
884 CAD.research.BigCorp.UUCP
886 might not actually have a host representing
890 it may frequently be convenient to have a host
896 if a single host exists that
899 then mail from outside Berkeley
900 can forward mail to that host
901 for further resolution
902 without knowing Berkeley's
905 This is not unlike the operation
906 of the telephone network.
908 This may also be useful
909 inside certain large domains.
911 at Berkeley it may be presumed
912 that most hosts know about other hosts
913 inside the Berkeley domain.
914 But if they process an address
918 for further examination.
919 Thus as new hosts are added
923 be updated immediately;
924 other hosts can be updated as convenient.
926 Ideally this name resolution process
927 would be performed by a name server
929 to avoid unnecessary copying
934 this could result in unnecessary delays.
935 .sh 1 "COMPARISON WITH DELIVERMAIL"
940 The primary differences are:
942 Configuration information is not compiled in.
943 This change simplifies many of the problems
944 of moving to other machines.
945 It also allows easy debugging of new mailers.
947 Address parsing is more flexible.
950 only supported one gateway to any network,
953 can be sensitive to host names
954 and reroute to different gateways.
958 features eliminate the requirement that the system alias file
959 be writable by any user
960 (or that an update program be written,
961 or that the system administration make all changes).
964 supports message batching across networks
965 when a message is being sent to multiple recipients.
967 A mail queue is provided in
969 Mail that cannot be delivered immediately
970 but can potentially be delivered later
971 is stored in this queue for a later retry.
972 The queue also provides a buffer against system crashes;
973 after the message has been collected
974 it may be reliably redelivered
975 even if the system crashes during the initial delivery.
978 uses the networking support provided by 4.2BSD
979 to provide a direct interface networks such as the ARPANET
981 using SMTP (the Simple Mail Transfer Protocol)
982 over a TCP/IP connection.
992 Henderson, D. A. Jr.,
994 Standard for the Format of ARPA Network Text Messages.
1002 Standard for the Format of Arpa Internet Text Messages.
1004 Network Information Center,
1006 Menlo Park, California.
1014 ARPANET Protocol Handbook.
1016 Network Information Center,
1018 Menlo Park, California.
1025 A Dial-Up Network of UNIX Systems.
1028 UNIX Programmer's Manual, Seventh Edition,
1034 An Introduction to the Berkeley Network.
1035 University of California, Berkeley California.
1040 Mail Reference Manual.
1041 University of California, Berkeley.
1042 In UNIX Programmer's Manual,
1052 The Design of the CSNET Name Server.
1054 University of Wisconsin,
1062 The Domain Naming Convention for Internet User Applications.
1064 Network Information Center,
1066 Menlo Park, California.
1071 A Distributed System for Internet Name Service.
1073 Network Information Center,
1075 Menlo Park, California.