7 Network Working Group Internet Engineering Task Force
8 Request for Comments: 1122 R. Braden, Editor
12 Requirements for Internet Hosts -- Communication Layers
17 This RFC is an official specification for the Internet community. It
18 incorporates by reference, amends, corrects, and supplements the
19 primary protocol standards documents relating to hosts. Distribution
20 of this document is unlimited.
24 This is one RFC of a pair that defines and discusses the requirements
25 for Internet host software. This RFC covers the communications
26 protocol layers: link layer, IP layer, and transport layer; its
27 companion RFC-1123 covers the application and support protocols.
36 1. INTRODUCTION ............................................... 5
37 1.1 The Internet Architecture .............................. 6
38 1.1.1 Internet Hosts .................................... 6
39 1.1.2 Architectural Assumptions ......................... 7
40 1.1.3 Internet Protocol Suite ........................... 8
41 1.1.4 Embedded Gateway Code ............................. 10
42 1.2 General Considerations ................................. 12
43 1.2.1 Continuing Internet Evolution ..................... 12
44 1.2.2 Robustness Principle .............................. 12
45 1.2.3 Error Logging ..................................... 13
46 1.2.4 Configuration ..................................... 14
47 1.3 Reading this Document .................................. 15
48 1.3.1 Organization ...................................... 15
49 1.3.2 Requirements ...................................... 16
50 1.3.3 Terminology ....................................... 17
51 1.4 Acknowledgments ........................................ 20
53 2. LINK LAYER .................................................. 21
54 2.1 INTRODUCTION ........................................... 21
58 Internet Engineering Task Force [Page 1]
63 RFC1122 INTRODUCTION October 1989
66 2.2 PROTOCOL WALK-THROUGH .................................. 21
67 2.3 SPECIFIC ISSUES ........................................ 21
68 2.3.1 Trailer Protocol Negotiation ...................... 21
69 2.3.2 Address Resolution Protocol -- ARP ................ 22
70 2.3.2.1 ARP Cache Validation ......................... 22
71 2.3.2.2 ARP Packet Queue ............................. 24
72 2.3.3 Ethernet and IEEE 802 Encapsulation ............... 24
73 2.4 LINK/INTERNET LAYER INTERFACE .......................... 25
74 2.5 LINK LAYER REQUIREMENTS SUMMARY ........................ 26
76 3. INTERNET LAYER PROTOCOLS .................................... 27
77 3.1 INTRODUCTION ............................................ 27
78 3.2 PROTOCOL WALK-THROUGH .................................. 29
79 3.2.1 Internet Protocol -- IP ............................ 29
80 3.2.1.1 Version Number ............................... 29
81 3.2.1.2 Checksum ..................................... 29
82 3.2.1.3 Addressing ................................... 29
83 3.2.1.4 Fragmentation and Reassembly ................. 32
84 3.2.1.5 Identification ............................... 32
85 3.2.1.6 Type-of-Service .............................. 33
86 3.2.1.7 Time-to-Live ................................. 34
87 3.2.1.8 Options ...................................... 35
88 3.2.2 Internet Control Message Protocol -- ICMP .......... 38
89 3.2.2.1 Destination Unreachable ...................... 39
90 3.2.2.2 Redirect ..................................... 40
91 3.2.2.3 Source Quench ................................ 41
92 3.2.2.4 Time Exceeded ................................ 41
93 3.2.2.5 Parameter Problem ............................ 42
94 3.2.2.6 Echo Request/Reply ........................... 42
95 3.2.2.7 Information Request/Reply .................... 43
96 3.2.2.8 Timestamp and Timestamp Reply ................ 43
97 3.2.2.9 Address Mask Request/Reply ................... 45
98 3.2.3 Internet Group Management Protocol IGMP ........... 47
99 3.3 SPECIFIC ISSUES ........................................ 47
100 3.3.1 Routing Outbound Datagrams ........................ 47
101 3.3.1.1 Local/Remote Decision ........................ 47
102 3.3.1.2 Gateway Selection ............................ 48
103 3.3.1.3 Route Cache .................................. 49
104 3.3.1.4 Dead Gateway Detection ....................... 51
105 3.3.1.5 New Gateway Selection ........................ 55
106 3.3.1.6 Initialization ............................... 56
107 3.3.2 Reassembly ........................................ 56
108 3.3.3 Fragmentation ..................................... 58
109 3.3.4 Local Multihoming ................................. 60
110 3.3.4.1 Introduction ................................. 60
111 3.3.4.2 Multihoming Requirements ..................... 61
112 3.3.4.3 Choosing a Source Address .................... 64
113 3.3.5 Source Route Forwarding ........................... 65
117 Internet Engineering Task Force [Page 2]
122 RFC1122 INTRODUCTION October 1989
125 3.3.6 Broadcasts ........................................ 66
126 3.3.7 IP Multicasting ................................... 67
127 3.3.8 Error Reporting ................................... 69
128 3.4 INTERNET/TRANSPORT LAYER INTERFACE ..................... 69
129 3.5 INTERNET LAYER REQUIREMENTS SUMMARY .................... 72
131 4. TRANSPORT PROTOCOLS ......................................... 77
132 4.1 USER DATAGRAM PROTOCOL -- UDP .......................... 77
133 4.1.1 INTRODUCTION ...................................... 77
134 4.1.2 PROTOCOL WALK-THROUGH ............................. 77
135 4.1.3 SPECIFIC ISSUES ................................... 77
136 4.1.3.1 Ports ........................................ 77
137 4.1.3.2 IP Options ................................... 77
138 4.1.3.3 ICMP Messages ................................ 78
139 4.1.3.4 UDP Checksums ................................ 78
140 4.1.3.5 UDP Multihoming .............................. 79
141 4.1.3.6 Invalid Addresses ............................ 79
142 4.1.4 UDP/APPLICATION LAYER INTERFACE ................... 79
143 4.1.5 UDP REQUIREMENTS SUMMARY .......................... 80
144 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP ................... 82
145 4.2.1 INTRODUCTION ...................................... 82
146 4.2.2 PROTOCOL WALK-THROUGH ............................. 82
147 4.2.2.1 Well-Known Ports ............................. 82
148 4.2.2.2 Use of Push .................................. 82
149 4.2.2.3 Window Size .................................. 83
150 4.2.2.4 Urgent Pointer ............................... 84
151 4.2.2.5 TCP Options .................................. 85
152 4.2.2.6 Maximum Segment Size Option .................. 85
153 4.2.2.7 TCP Checksum ................................. 86
154 4.2.2.8 TCP Connection State Diagram ................. 86
155 4.2.2.9 Initial Sequence Number Selection ............ 87
156 4.2.2.10 Simultaneous Open Attempts .................. 87
157 4.2.2.11 Recovery from Old Duplicate SYN ............. 87
158 4.2.2.12 RST Segment ................................. 87
159 4.2.2.13 Closing a Connection ........................ 87
160 4.2.2.14 Data Communication .......................... 89
161 4.2.2.15 Retransmission Timeout ...................... 90
162 4.2.2.16 Managing the Window ......................... 91
163 4.2.2.17 Probing Zero Windows ........................ 92
164 4.2.2.18 Passive OPEN Calls .......................... 92
165 4.2.2.19 Time to Live ................................ 93
166 4.2.2.20 Event Processing ............................ 93
167 4.2.2.21 Acknowledging Queued Segments ............... 94
168 4.2.3 SPECIFIC ISSUES ................................... 95
169 4.2.3.1 Retransmission Timeout Calculation ........... 95
170 4.2.3.2 When to Send an ACK Segment .................. 96
171 4.2.3.3 When to Send a Window Update ................. 97
172 4.2.3.4 When to Send Data ............................ 98
176 Internet Engineering Task Force [Page 3]
181 RFC1122 INTRODUCTION October 1989
184 4.2.3.5 TCP Connection Failures ...................... 100
185 4.2.3.6 TCP Keep-Alives .............................. 101
186 4.2.3.7 TCP Multihoming .............................. 103
187 4.2.3.8 IP Options ................................... 103
188 4.2.3.9 ICMP Messages ................................ 103
189 4.2.3.10 Remote Address Validation ................... 104
190 4.2.3.11 TCP Traffic Patterns ........................ 104
191 4.2.3.12 Efficiency .................................. 105
192 4.2.4 TCP/APPLICATION LAYER INTERFACE ................... 106
193 4.2.4.1 Asynchronous Reports ......................... 106
194 4.2.4.2 Type-of-Service .............................. 107
195 4.2.4.3 Flush Call ................................... 107
196 4.2.4.4 Multihoming .................................. 108
197 4.2.5 TCP REQUIREMENT SUMMARY ........................... 108
199 5. REFERENCES ................................................. 112
235 Internet Engineering Task Force [Page 4]
240 RFC1122 INTRODUCTION October 1989
245 This document is one of a pair that defines and discusses the
246 requirements for host system implementations of the Internet protocol
247 suite. This RFC covers the communication protocol layers: link
248 layer, IP layer, and transport layer. Its companion RFC,
249 "Requirements for Internet Hosts -- Application and Support"
250 [INTRO:1], covers the application layer protocols. This document
251 should also be read in conjunction with "Requirements for Internet
254 These documents are intended to provide guidance for vendors,
255 implementors, and users of Internet communication software. They
256 represent the consensus of a large body of technical experience and
257 wisdom, contributed by the members of the Internet research and
260 This RFC enumerates standard protocols that a host connected to the
261 Internet must use, and it incorporates by reference the RFCs and
262 other documents describing the current specifications for these
263 protocols. It corrects errors in the referenced documents and adds
264 additional discussion and guidance for an implementor.
266 For each protocol, this document also contains an explicit set of
267 requirements, recommendations, and options. The reader must
268 understand that the list of requirements in this document is
269 incomplete by itself; the complete set of requirements for an
270 Internet host is primarily defined in the standard protocol
271 specification documents, with the corrections, amendments, and
272 supplements contained in this RFC.
274 A good-faith implementation of the protocols that was produced after
275 careful reading of the RFC's and with some interaction with the
276 Internet technical community, and that followed good communications
277 software engineering practices, should differ from the requirements
278 of this document in only minor ways. Thus, in many cases, the
279 "requirements" in this RFC are already stated or implied in the
280 standard protocol documents, so that their inclusion here is, in a
281 sense, redundant. However, they were included because some past
282 implementation has made the wrong choice, causing problems of
283 interoperability, performance, and/or robustness.
285 This document includes discussion and explanation of many of the
286 requirements and recommendations. A simple list of requirements
287 would be dangerous, because:
289 o Some required features are more important than others, and some
290 features are optional.
294 Internet Engineering Task Force [Page 5]
299 RFC1122 INTRODUCTION October 1989
302 o There may be valid reasons why particular vendor products that
303 are designed for restricted contexts might choose to use
304 different specifications.
306 However, the specifications of this document must be followed to meet
307 the general goal of arbitrary host interoperation across the
308 diversity and complexity of the Internet system. Although most
309 current implementations fail to meet these requirements in various
310 ways, some minor and some major, this specification is the ideal
311 towards which we need to move.
313 These requirements are based on the current level of Internet
314 architecture. This document will be updated as required to provide
315 additional clarifications or to include additional information in
316 those areas in which specifications are still evolving.
318 This introductory section begins with a brief overview of the
319 Internet architecture as it relates to hosts, and then gives some
320 general advice to host software vendors. Finally, there is some
321 guidance on reading the rest of the document and some terminology.
323 1.1 The Internet Architecture
325 General background and discussion on the Internet architecture and
326 supporting protocol suite can be found in the DDN Protocol
327 Handbook [INTRO:3]; for background see for example [INTRO:9],
328 [INTRO:10], and [INTRO:11]. Reference [INTRO:5] describes the
329 procedure for obtaining Internet protocol documents, while
330 [INTRO:6] contains a list of the numbers assigned within Internet
335 A host computer, or simply "host," is the ultimate consumer of
336 communication services. A host generally executes application
337 programs on behalf of user(s), employing network and/or
338 Internet communication services in support of this function.
339 An Internet host corresponds to the concept of an "End-System"
340 used in the OSI protocol suite [INTRO:13].
342 An Internet communication system consists of interconnected
343 packet networks supporting communication among host computers
344 using the Internet protocols. The networks are interconnected
345 using packet-switching computers called "gateways" or "IP
346 routers" by the Internet community, and "Intermediate Systems"
347 by the OSI world [INTRO:13]. The RFC "Requirements for
348 Internet Gateways" [INTRO:2] contains the official
349 specifications for Internet gateways. That RFC together with
353 Internet Engineering Task Force [Page 6]
358 RFC1122 INTRODUCTION October 1989
361 the present document and its companion [INTRO:1] define the
362 rules for the current realization of the Internet architecture.
364 Internet hosts span a wide range of size, speed, and function.
365 They range in size from small microprocessors through
366 workstations to mainframes and supercomputers. In function,
367 they range from single-purpose hosts (such as terminal servers)
368 to full-service hosts that support a variety of online network
369 services, typically including remote login, file transfer, and
372 A host is generally said to be multihomed if it has more than
373 one interface to the same or to different networks. See
374 Section 1.1.3 on "Terminology".
376 1.1.2 Architectural Assumptions
378 The current Internet architecture is based on a set of
379 assumptions about the communication system. The assumptions
380 most relevant to hosts are as follows:
382 (a) The Internet is a network of networks.
384 Each host is directly connected to some particular
385 network(s); its connection to the Internet is only
386 conceptual. Two hosts on the same network communicate
387 with each other using the same set of protocols that they
388 would use to communicate with hosts on distant networks.
390 (b) Gateways don't keep connection state information.
392 To improve robustness of the communication system,
393 gateways are designed to be stateless, forwarding each IP
394 datagram independently of other datagrams. As a result,
395 redundant paths can be exploited to provide robust service
396 in spite of failures of intervening gateways and networks.
398 All state information required for end-to-end flow control
399 and reliability is implemented in the hosts, in the
400 transport layer or in application programs. All
401 connection control information is thus co-located with the
402 end points of the communication, so it will be lost only
403 if an end point fails.
405 (c) Routing complexity should be in the gateways.
407 Routing is a complex and difficult problem, and ought to
408 be performed by the gateways, not the hosts. An important
412 Internet Engineering Task Force [Page 7]
417 RFC1122 INTRODUCTION October 1989
420 objective is to insulate host software from changes caused
421 by the inevitable evolution of the Internet routing
424 (d) The System must tolerate wide network variation.
426 A basic objective of the Internet design is to tolerate a
427 wide range of network characteristics -- e.g., bandwidth,
428 delay, packet loss, packet reordering, and maximum packet
429 size. Another objective is robustness against failure of
430 individual networks, gateways, and hosts, using whatever
431 bandwidth is still available. Finally, the goal is full
432 "open system interconnection": an Internet host must be
433 able to interoperate robustly and effectively with any
434 other Internet host, across diverse Internet paths.
436 Sometimes host implementors have designed for less
437 ambitious goals. For example, the LAN environment is
438 typically much more benign than the Internet as a whole;
439 LANs have low packet loss and delay and do not reorder
440 packets. Some vendors have fielded host implementations
441 that are adequate for a simple LAN environment, but work
442 badly for general interoperation. The vendor justifies
443 such a product as being economical within the restricted
444 LAN market. However, isolated LANs seldom stay isolated
445 for long; they are soon gatewayed to each other, to
446 organization-wide internets, and eventually to the global
447 Internet system. In the end, neither the customer nor the
448 vendor is served by incomplete or substandard Internet
451 The requirements spelled out in this document are designed
452 for a full-function Internet host, capable of full
453 interoperation over an arbitrary Internet path.
456 1.1.3 Internet Protocol Suite
458 To communicate using the Internet system, a host must implement
459 the layered set of protocols comprising the Internet protocol
460 suite. A host typically must implement at least one protocol
463 The protocol layers used in the Internet architecture are as
471 Internet Engineering Task Force [Page 8]
476 RFC1122 INTRODUCTION October 1989
479 The application layer is the top layer of the Internet
480 protocol suite. The Internet suite does not further
481 subdivide the application layer, although some of the
482 Internet application layer protocols do contain some
483 internal sub-layering. The application layer of the
484 Internet suite essentially combines the functions of the
485 top two layers -- Presentation and Application -- of the
488 We distinguish two categories of application layer
489 protocols: user protocols that provide service directly
490 to users, and support protocols that provide common system
491 functions. Requirements for user and support protocols
492 will be found in the companion RFC [INTRO:1].
494 The most common Internet user protocols are:
496 o Telnet (remote login)
497 o FTP (file transfer)
498 o SMTP (electronic mail delivery)
500 There are a number of other standardized user protocols
501 [INTRO:4] and many private user protocols.
503 Support protocols, used for host name mapping, booting,
504 and management, include SNMP, BOOTP, RARP, and the Domain
505 Name System (DNS) protocols.
510 The transport layer provides end-to-end communication
511 services for applications. There are two primary
512 transport layer protocols at present:
514 o Transmission Control Protocol (TCP)
515 o User Datagram Protocol (UDP)
517 TCP is a reliable connection-oriented transport service
518 that provides end-to-end reliability, resequencing, and
519 flow control. UDP is a connectionless ("datagram")
522 Other transport protocols have been developed by the
523 research community, and the set of official Internet
524 transport protocols may be expanded in the future.
526 Transport layer protocols are discussed in Chapter 4.
530 Internet Engineering Task Force [Page 9]
535 RFC1122 INTRODUCTION October 1989
540 All Internet transport protocols use the Internet Protocol
541 (IP) to carry data from source host to destination host.
542 IP is a connectionless or datagram internetwork service,
543 providing no end-to-end delivery guarantees. Thus, IP
544 datagrams may arrive at the destination host damaged,
545 duplicated, out of order, or not at all. The layers above
546 IP are responsible for reliable delivery service when it
547 is required. The IP protocol includes provision for
548 addressing, type-of-service specification, fragmentation
549 and reassembly, and security information.
551 The datagram or connectionless nature of the IP protocol
552 is a fundamental and characteristic feature of the
553 Internet architecture. Internet IP was the model for the
554 OSI Connectionless Network Protocol [INTRO:12].
556 ICMP is a control protocol that is considered to be an
557 integral part of IP, although it is architecturally
558 layered upon IP, i.e., it uses IP to carry its data end-
559 to-end just as a transport protocol like TCP or UDP does.
560 ICMP provides error reporting, congestion reporting, and
561 first-hop gateway redirection.
563 IGMP is an Internet layer protocol used for establishing
564 dynamic host groups for IP multicasting.
566 The Internet layer protocols IP, ICMP, and IGMP are
567 discussed in Chapter 3.
572 To communicate on its directly-connected network, a host
573 must implement the communication protocol used to
574 interface to that network. We call this a link layer or
575 media-access layer protocol.
577 There is a wide variety of link layer protocols,
578 corresponding to the many different types of networks.
582 1.1.4 Embedded Gateway Code
584 Some Internet host software includes embedded gateway
585 functionality, so that these hosts can forward packets as a
589 Internet Engineering Task Force [Page 10]
594 RFC1122 INTRODUCTION October 1989
597 gateway would, while still performing the application layer
600 Such dual-purpose systems must follow the Gateway Requirements
601 RFC [INTRO:2] with respect to their gateway functions, and
602 must follow the present document with respect to their host
603 functions. In all overlapping cases, the two specifications
604 should be in agreement.
606 There are varying opinions in the Internet community about
607 embedded gateway functionality. The main arguments are as
610 o Pro: in a local network environment where networking is
611 informal, or in isolated internets, it may be convenient
612 and economical to use existing host systems as gateways.
614 There is also an architectural argument for embedded
615 gateway functionality: multihoming is much more common
616 than originally foreseen, and multihoming forces a host to
617 make routing decisions as if it were a gateway. If the
618 multihomed host contains an embedded gateway, it will
619 have full routing knowledge and as a result will be able
620 to make more optimal routing decisions.
622 o Con: Gateway algorithms and protocols are still changing,
623 and they will continue to change as the Internet system
624 grows larger. Attempting to include a general gateway
625 function within the host IP layer will force host system
626 maintainers to track these (more frequent) changes. Also,
627 a larger pool of gateway implementations will make
628 coordinating the changes more difficult. Finally, the
629 complexity of a gateway IP layer is somewhat greater than
630 that of a host, making the implementation and operation
633 In addition, the style of operation of some hosts is not
634 appropriate for providing stable and robust gateway
637 There is considerable merit in both of these viewpoints. One
638 conclusion can be drawn: an host administrator must have
639 conscious control over whether or not a given host acts as a
640 gateway. See Section 3.1 for the detailed requirements.
648 Internet Engineering Task Force [Page 11]
653 RFC1122 INTRODUCTION October 1989
656 1.2 General Considerations
658 There are two important lessons that vendors of Internet host
659 software have learned and which a new vendor should consider
662 1.2.1 Continuing Internet Evolution
664 The enormous growth of the Internet has revealed problems of
665 management and scaling in a large datagram-based packet
666 communication system. These problems are being addressed, and
667 as a result there will be continuing evolution of the
668 specifications described in this document. These changes will
669 be carefully planned and controlled, since there is extensive
670 participation in this planning by the vendors and by the
671 organizations responsible for operations of the networks.
673 Development, evolution, and revision are characteristic of
674 computer network protocols today, and this situation will
675 persist for some years. A vendor who develops computer
676 communication software for the Internet protocol suite (or any
677 other protocol suite!) and then fails to maintain and update
678 that software for changing specifications is going to leave a
679 trail of unhappy customers. The Internet is a large
680 communication network, and the users are in constant contact
681 through it. Experience has shown that knowledge of
682 deficiencies in vendor software propagates quickly through the
683 Internet technical community.
685 1.2.2 Robustness Principle
687 At every layer of the protocols, there is a general rule whose
688 application can lead to enormous benefits in robustness and
689 interoperability [IP:1]:
691 "Be liberal in what you accept, and
692 conservative in what you send"
694 Software should be written to deal with every conceivable
695 error, no matter how unlikely; sooner or later a packet will
696 come in with that particular combination of errors and
697 attributes, and unless the software is prepared, chaos can
698 ensue. In general, it is best to assume that the network is
699 filled with malevolent entities that will send in packets
700 designed to have the worst possible effect. This assumption
701 will lead to suitable protective design, although the most
702 serious problems in the Internet have been caused by
703 unenvisaged mechanisms triggered by low-probability events;
707 Internet Engineering Task Force [Page 12]
712 RFC1122 INTRODUCTION October 1989
715 mere human malice would never have taken so devious a course!
717 Adaptability to change must be designed into all levels of
718 Internet host software. As a simple example, consider a
719 protocol specification that contains an enumeration of values
720 for a particular header field -- e.g., a type field, a port
721 number, or an error code; this enumeration must be assumed to
722 be incomplete. Thus, if a protocol specification defines four
723 possible error codes, the software must not break when a fifth
724 code shows up. An undefined code might be logged (see below),
725 but it must not cause a failure.
727 The second part of the principle is almost as important:
728 software on other hosts may contain deficiencies that make it
729 unwise to exploit legal but obscure protocol features. It is
730 unwise to stray far from the obvious and simple, lest untoward
731 effects result elsewhere. A corollary of this is "watch out
732 for misbehaving hosts"; host software should be prepared, not
733 just to survive other misbehaving hosts, but also to cooperate
734 to limit the amount of disruption such hosts can cause to the
735 shared communication facility.
739 The Internet includes a great variety of host and gateway
740 systems, each implementing many protocols and protocol layers,
741 and some of these contain bugs and mis-features in their
742 Internet protocol software. As a result of complexity,
743 diversity, and distribution of function, the diagnosis of
744 Internet problems is often very difficult.
746 Problem diagnosis will be aided if host implementations include
747 a carefully designed facility for logging erroneous or
748 "strange" protocol events. It is important to include as much
749 diagnostic information as possible when an error is logged. In
750 particular, it is often useful to record the header(s) of a
751 packet that caused an error. However, care must be taken to
752 ensure that error logging does not consume prohibitive amounts
753 of resources or otherwise interfere with the operation of the
756 There is a tendency for abnormal but harmless protocol events
757 to overflow error logging files; this can be avoided by using a
758 "circular" log, or by enabling logging only while diagnosing a
759 known failure. It may be useful to filter and count duplicate
760 successive messages. One strategy that seems to work well is:
761 (1) always count abnormalities and make such counts accessible
762 through the management protocol (see [INTRO:1]); and (2) allow
766 Internet Engineering Task Force [Page 13]
771 RFC1122 INTRODUCTION October 1989
774 the logging of a great variety of events to be selectively
775 enabled. For example, it might useful to be able to "log
776 everything" or to "log everything for host X".
778 Note that different managements may have differing policies
779 about the amount of error logging that they want normally
780 enabled in a host. Some will say, "if it doesn't hurt me, I
781 don't want to know about it", while others will want to take a
782 more watchful and aggressive attitude about detecting and
783 removing protocol abnormalities.
787 It would be ideal if a host implementation of the Internet
788 protocol suite could be entirely self-configuring. This would
789 allow the whole suite to be implemented in ROM or cast into
790 silicon, it would simplify diskless workstations, and it would
791 be an immense boon to harried LAN administrators as well as
792 system vendors. We have not reached this ideal; in fact, we
795 At many points in this document, you will find a requirement
796 that a parameter be a configurable option. There are several
797 different reasons behind such requirements. In a few cases,
798 there is current uncertainty or disagreement about the best
799 value, and it may be necessary to update the recommended value
800 in the future. In other cases, the value really depends on
801 external factors -- e.g., the size of the host and the
802 distribution of its communication load, or the speeds and
803 topology of nearby networks -- and self-tuning algorithms are
804 unavailable and may be insufficient. In some cases,
805 configurability is needed because of administrative
808 Finally, some configuration options are required to communicate
809 with obsolete or incorrect implementations of the protocols,
810 distributed without sources, that unfortunately persist in many
811 parts of the Internet. To make correct systems coexist with
812 these faulty systems, administrators often have to "mis-
813 configure" the correct systems. This problem will correct
814 itself gradually as the faulty systems are retired, but it
815 cannot be ignored by vendors.
817 When we say that a parameter must be configurable, we do not
818 intend to require that its value be explicitly read from a
819 configuration file at every boot time. We recommend that
820 implementors set up a default for each parameter, so a
821 configuration file is only necessary to override those defaults
825 Internet Engineering Task Force [Page 14]
830 RFC1122 INTRODUCTION October 1989
833 that are inappropriate in a particular installation. Thus, the
834 configurability requirement is an assurance that it will be
835 POSSIBLE to override the default when necessary, even in a
836 binary-only or ROM-based product.
838 This document requires a particular value for such defaults in
839 some cases. The choice of default is a sensitive issue when
840 the configuration item controls the accommodation to existing
841 faulty systems. If the Internet is to converge successfully to
842 complete interoperability, the default values built into
843 implementations must implement the official protocol, not
844 "mis-configurations" to accommodate faulty implementations.
845 Although marketing considerations have led some vendors to
846 choose mis-configuration defaults, we urge vendors to choose
847 defaults that will conform to the standard.
849 Finally, we note that a vendor needs to provide adequate
850 documentation on all configuration parameters, their limits and
854 1.3 Reading this Document
858 Protocol layering, which is generally used as an organizing
859 principle in implementing network software, has also been used
860 to organize this document. In describing the rules, we assume
861 that an implementation does strictly mirror the layering of the
862 protocols. Thus, the following three major sections specify
863 the requirements for the link layer, the internet layer, and
864 the transport layer, respectively. A companion RFC [INTRO:1]
865 covers application level software. This layerist organization
866 was chosen for simplicity and clarity.
868 However, strict layering is an imperfect model, both for the
869 protocol suite and for recommended implementation approaches.
870 Protocols in different layers interact in complex and sometimes
871 subtle ways, and particular functions often involve multiple
872 layers. There are many design choices in an implementation,
873 many of which involve creative "breaking" of strict layering.
874 Every implementor is urged to read references [INTRO:7] and
877 This document describes the conceptual service interface
878 between layers using a functional ("procedure call") notation,
879 like that used in the TCP specification [TCP:1]. A host
880 implementation must support the logical information flow
884 Internet Engineering Task Force [Page 15]
889 RFC1122 INTRODUCTION October 1989
892 implied by these calls, but need not literally implement the
893 calls themselves. For example, many implementations reflect
894 the coupling between the transport layer and the IP layer by
895 giving them shared access to common data structures. These
896 data structures, rather than explicit procedure calls, are then
897 the agency for passing much of the information that is
900 In general, each major section of this document is organized
901 into the following subsections:
905 (2) Protocol Walk-Through -- considers the protocol
906 specification documents section-by-section, correcting
907 errors, stating requirements that may be ambiguous or
908 ill-defined, and providing further clarification or
911 (3) Specific Issues -- discusses protocol design and
912 implementation issues that were not included in the walk-
915 (4) Interfaces -- discusses the service interface to the next
918 (5) Summary -- contains a summary of the requirements of the
922 Under many of the individual topics in this document, there is
923 parenthetical material labeled "DISCUSSION" or
924 "IMPLEMENTATION". This material is intended to give
925 clarification and explanation of the preceding requirements
926 text. It also includes some suggestions on possible future
927 directions or developments. The implementation material
928 contains suggested approaches that an implementor may want to
931 The summary sections are intended to be guides and indexes to
932 the text, but are necessarily cryptic and incomplete. The
933 summaries should never be used or referenced separately from
938 In this document, the words that are used to define the
939 significance of each particular requirement are capitalized.
943 Internet Engineering Task Force [Page 16]
948 RFC1122 INTRODUCTION October 1989
955 This word or the adjective "REQUIRED" means that the item
956 is an absolute requirement of the specification.
960 This word or the adjective "RECOMMENDED" means that there
961 may exist valid reasons in particular circumstances to
962 ignore this item, but the full implications should be
963 understood and the case carefully weighed before choosing
968 This word or the adjective "OPTIONAL" means that this item
969 is truly optional. One vendor may choose to include the
970 item because a particular marketplace requires it or
971 because it enhances the product, for example; another
972 vendor may omit the same item.
975 An implementation is not compliant if it fails to satisfy one
976 or more of the MUST requirements for the protocols it
977 implements. An implementation that satisfies all the MUST and
978 all the SHOULD requirements for its protocols is said to be
979 "unconditionally compliant"; one that satisfies all the MUST
980 requirements but not all the SHOULD requirements for its
981 protocols is said to be "conditionally compliant".
985 This document uses the following technical terms:
988 A segment is the unit of end-to-end transmission in the
989 TCP protocol. A segment consists of a TCP header followed
990 by application data. A segment is transmitted by
991 encapsulation inside an IP datagram.
994 In this description of the lower-layer protocols, a
995 message is the unit of transmission in a transport layer
996 protocol. In particular, a TCP segment is a message. A
997 message consists of a transport protocol header followed
998 by application protocol data. To be transmitted end-to-
1002 Internet Engineering Task Force [Page 17]
1007 RFC1122 INTRODUCTION October 1989
1010 end through the Internet, a message must be encapsulated
1014 An IP datagram is the unit of end-to-end transmission in
1015 the IP protocol. An IP datagram consists of an IP header
1016 followed by transport layer data, i.e., of an IP header
1017 followed by a message.
1019 In the description of the internet layer (Section 3), the
1020 unqualified term "datagram" should be understood to refer
1024 A packet is the unit of data passed across the interface
1025 between the internet layer and the link layer. It
1026 includes an IP header and data. A packet may be a
1027 complete IP datagram or a fragment of an IP datagram.
1030 A frame is the unit of transmission in a link layer
1031 protocol, and consists of a link-layer header followed by
1035 A network to which a host is interfaced is often known as
1036 the "local network" or the "subnetwork" relative to that
1037 host. However, these terms can cause confusion, and
1038 therefore we use the term "connected network" in this
1042 A host is said to be multihomed if it has multiple IP
1043 addresses. For a discussion of multihoming, see Section
1046 Physical network interface
1047 This is a physical interface to a connected network and
1048 has a (possibly unique) link-layer address. Multiple
1049 physical network interfaces on a single host may share the
1050 same link-layer address, but the address must be unique
1051 for different hosts on the same physical network.
1053 Logical [network] interface
1054 We define a logical [network] interface to be a logical
1055 path, distinguished by a unique IP address, to a connected
1056 network. See Section 3.3.4.
1061 Internet Engineering Task Force [Page 18]
1066 RFC1122 INTRODUCTION October 1989
1069 Specific-destination address
1070 This is the effective destination address of a datagram,
1071 even if it is broadcast or multicast; see Section 3.2.1.3.
1074 At a given moment, all the IP datagrams from a particular
1075 source host to a particular destination host will
1076 typically traverse the same sequence of gateways. We use
1077 the term "path" for this sequence. Note that a path is
1078 uni-directional; it is not unusual to have different paths
1079 in the two directions between a given host pair.
1082 The maximum transmission unit, i.e., the size of the
1083 largest packet that can be transmitted.
1086 The terms frame, packet, datagram, message, and segment are
1087 illustrated by the following schematic diagrams:
1089 A. Transmission on connected network:
1090 _______________________________________________
1091 | LL hdr | IP hdr | (data) |
1092 |________|________|_____________________________|
1094 <---------- Frame ----------------------------->
1095 <----------Packet -------------------->
1098 B. Before IP fragmentation or after IP reassembly:
1099 ______________________________________
1100 | IP hdr | transport| Application Data |
1101 |________|____hdr___|__________________|
1103 <-------- Datagram ------------------>
1104 <-------- Message ----------->
1106 ______________________________________
1107 | IP hdr | TCP hdr | Application Data |
1108 |________|__________|__________________|
1110 <-------- Datagram ------------------>
1111 <-------- Segment ----------->
1120 Internet Engineering Task Force [Page 19]
1125 RFC1122 INTRODUCTION October 1989
1130 This document incorporates contributions and comments from a large
1131 group of Internet protocol experts, including representatives of
1132 university and research labs, vendors, and government agencies.
1133 It was assembled primarily by the Host Requirements Working Group
1134 of the Internet Engineering Task Force (IETF).
1136 The Editor would especially like to acknowledge the tireless
1137 dedication of the following people, who attended many long
1138 meetings and generated 3 million bytes of electronic mail over the
1139 past 18 months in pursuit of this document: Philip Almquist, Dave
1140 Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve
1141 Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore),
1142 John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG),
1143 Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge
1144 (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software).
1146 In addition, the following people made major contributions to the
1147 effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia
1148 (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA),
1149 Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL),
1150 John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill
1151 Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul
1152 (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue
1153 Technology), and Mike StJohns (DCA). The following also made
1154 significant contributions to particular areas: Eric Allman
1155 (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic
1156 (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn
1157 (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann
1158 (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun
1159 Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen
1162 We are grateful to all, including any contributors who may have
1163 been inadvertently omitted from this list.
1179 Internet Engineering Task Force [Page 20]
1184 RFC1122 LINK LAYER October 1989
1191 All Internet systems, both hosts and gateways, have the same
1192 requirements for link layer protocols. These requirements are
1193 given in Chapter 3 of "Requirements for Internet Gateways"
1194 [INTRO:2], augmented with the material in this section.
1196 2.2 PROTOCOL WALK-THROUGH
1202 2.3.1 Trailer Protocol Negotiation
1204 The trailer protocol [LINK:1] for link-layer encapsulation MAY
1205 be used, but only when it has been verified that both systems
1206 (host or gateway) involved in the link-layer communication
1207 implement trailers. If the system does not dynamically
1208 negotiate use of the trailer protocol on a per-destination
1209 basis, the default configuration MUST disable the protocol.
1212 The trailer protocol is a link-layer encapsulation
1213 technique that rearranges the data contents of packets
1214 sent on the physical network. In some cases, trailers
1215 improve the throughput of higher layer protocols by
1216 reducing the amount of data copying within the operating
1217 system. Higher layer protocols are unaware of trailer
1218 use, but both the sending and receiving host MUST
1219 understand the protocol if it is used.
1221 Improper use of trailers can result in very confusing
1222 symptoms. Only packets with specific size attributes are
1223 encapsulated using trailers, and typically only a small
1224 fraction of the packets being exchanged have these
1225 attributes. Thus, if a system using trailers exchanges
1226 packets with a system that does not, some packets
1227 disappear into a black hole while others are delivered
1231 On an Ethernet, packets encapsulated with trailers use a
1232 distinct Ethernet type [LINK:1], and trailer negotiation
1233 is performed at the time that ARP is used to discover the
1234 link-layer address of a destination system.
1238 Internet Engineering Task Force [Page 21]
1243 RFC1122 LINK LAYER October 1989
1246 Specifically, the ARP exchange is completed in the usual
1247 manner using the normal IP protocol type, but a host that
1248 wants to speak trailers will send an additional "trailer
1249 ARP reply" packet, i.e., an ARP reply that specifies the
1250 trailer encapsulation protocol type but otherwise has the
1251 format of a normal ARP reply. If a host configured to use
1252 trailers receives a trailer ARP reply message from a
1253 remote machine, it can add that machine to the list of
1254 machines that understand trailers, e.g., by marking the
1255 corresponding entry in the ARP cache.
1257 Hosts wishing to receive trailer encapsulations send
1258 trailer ARP replies whenever they complete exchanges of
1259 normal ARP messages for IP. Thus, a host that received an
1260 ARP request for its IP protocol address would send a
1261 trailer ARP reply in addition to the normal IP ARP reply;
1262 a host that sent the IP ARP request would send a trailer
1263 ARP reply when it received the corresponding IP ARP reply.
1264 In this way, either the requesting or responding host in
1265 an IP ARP exchange may request that it receive trailer
1268 This scheme, using extra trailer ARP reply packets rather
1269 than sending an ARP request for the trailer protocol type,
1270 was designed to avoid a continuous exchange of ARP packets
1271 with a misbehaving host that, contrary to any
1272 specification or common sense, responded to an ARP reply
1273 for trailers with another ARP reply for IP. This problem
1274 is avoided by sending a trailer ARP reply in response to
1275 an IP ARP reply only when the IP ARP reply answers an
1276 outstanding request; this is true when the hardware
1277 address for the host is still unknown when the IP ARP
1278 reply is received. A trailer ARP reply may always be sent
1279 along with an IP ARP reply responding to an IP ARP
1282 2.3.2 Address Resolution Protocol -- ARP
1284 2.3.2.1 ARP Cache Validation
1286 An implementation of the Address Resolution Protocol (ARP)
1287 [LINK:2] MUST provide a mechanism to flush out-of-date cache
1288 entries. If this mechanism involves a timeout, it SHOULD be
1289 possible to configure the timeout value.
1291 A mechanism to prevent ARP flooding (repeatedly sending an
1292 ARP Request for the same IP address, at a high rate) MUST be
1293 included. The recommended maximum rate is 1 per second per
1297 Internet Engineering Task Force [Page 22]
1302 RFC1122 LINK LAYER October 1989
1308 The ARP specification [LINK:2] suggests but does not
1309 require a timeout mechanism to invalidate cache entries
1310 when hosts change their Ethernet addresses. The
1311 prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
1312 has significantly increased the likelihood that cache
1313 entries in hosts will become invalid, and therefore
1314 some ARP-cache invalidation mechanism is now required
1315 for hosts. Even in the absence of proxy ARP, a long-
1316 period cache timeout is useful in order to
1317 automatically correct any bad ARP data that might have
1321 Four mechanisms have been used, sometimes in
1322 combination, to flush out-of-date cache entries.
1324 (1) Timeout -- Periodically time out cache entries,
1325 even if they are in use. Note that this timeout
1326 should be restarted when the cache entry is
1327 "refreshed" (by observing the source fields,
1328 regardless of target address, of an ARP broadcast
1329 from the system in question). For proxy ARP
1330 situations, the timeout needs to be on the order
1333 (2) Unicast Poll -- Actively poll the remote host by
1334 periodically sending a point-to-point ARP Request
1335 to it, and delete the entry if no ARP Reply is
1336 received from N successive polls. Again, the
1337 timeout should be on the order of a minute, and
1340 (3) Link-Layer Advice -- If the link-layer driver
1341 detects a delivery problem, flush the
1342 corresponding ARP cache entry.
1344 (4) Higher-layer Advice -- Provide a call from the
1345 Internet layer to the link layer to indicate a
1346 delivery problem. The effect of this call would
1347 be to invalidate the corresponding cache entry.
1348 This call would be analogous to the
1349 "ADVISE_DELIVPROB()" call from the transport layer
1350 to the Internet layer (see Section 3.4), and in
1351 fact the ADVISE_DELIVPROB routine might in turn
1352 call the link-layer advice routine to invalidate
1356 Internet Engineering Task Force [Page 23]
1361 RFC1122 LINK LAYER October 1989
1364 the ARP cache entry.
1366 Approaches (1) and (2) involve ARP cache timeouts on
1367 the order of a minute or less. In the absence of proxy
1368 ARP, a timeout this short could create noticeable
1369 overhead traffic on a very large Ethernet. Therefore,
1370 it may be necessary to configure a host to lengthen the
1373 2.3.2.2 ARP Packet Queue
1375 The link layer SHOULD save (rather than discard) at least
1376 one (the latest) packet of each set of packets destined to
1377 the same unresolved IP address, and transmit the saved
1378 packet when the address has been resolved.
1381 Failure to follow this recommendation causes the first
1382 packet of every exchange to be lost. Although higher-
1383 layer protocols can generally cope with packet loss by
1384 retransmission, packet loss does impact performance.
1385 For example, loss of a TCP open request causes the
1386 initial round-trip time estimate to be inflated. UDP-
1387 based applications such as the Domain Name System are
1388 more seriously affected.
1390 2.3.3 Ethernet and IEEE 802 Encapsulation
1392 The IP encapsulation for Ethernets is described in RFC-894
1393 [LINK:3], while RFC-1042 [LINK:4] describes the IP
1394 encapsulation for IEEE 802 networks. RFC-1042 elaborates and
1395 replaces the discussion in Section 3.4 of [INTRO:2].
1397 Every Internet host connected to a 10Mbps Ethernet cable:
1399 o MUST be able to send and receive packets using RFC-894
1402 o SHOULD be able to receive RFC-1042 packets, intermixed
1403 with RFC-894 packets; and
1405 o MAY be able to send packets using RFC-1042 encapsulation.
1408 An Internet host that implements sending both the RFC-894 and
1409 the RFC-1042 encapsulations MUST provide a configuration switch
1410 to select which is sent, and this switch MUST default to RFC-
1415 Internet Engineering Task Force [Page 24]
1420 RFC1122 LINK LAYER October 1989
1423 Note that the standard IP encapsulation in RFC-1042 does not
1424 use the protocol id value (K1=6) that IEEE reserved for IP;
1425 instead, it uses a value (K1=170) that implies an extension
1426 (the "SNAP") which can be used to hold the Ether-Type field.
1427 An Internet system MUST NOT send 802 packets using K1=6.
1429 Address translation from Internet addresses to link-layer
1430 addresses on Ethernet and IEEE 802 networks MUST be managed by
1431 the Address Resolution Protocol (ARP).
1433 The MTU for an Ethernet is 1500 and for 802.3 is 1492.
1436 The IEEE 802.3 specification provides for operation over a
1437 10Mbps Ethernet cable, in which case Ethernet and IEEE
1438 802.3 frames can be physically intermixed. A receiver can
1439 distinguish Ethernet and 802.3 frames by the value of the
1440 802.3 Length field; this two-octet field coincides in the
1441 header with the Ether-Type field of an Ethernet frame. In
1442 particular, the 802.3 Length field must be less than or
1443 equal to 1500, while all valid Ether-Type values are
1446 Another compatibility problem arises with link-layer
1447 broadcasts. A broadcast sent with one framing will not be
1448 seen by hosts that can receive only the other framing.
1450 The provisions of this section were designed to provide
1451 direct interoperation between 894-capable and 1042-capable
1452 systems on the same cable, to the maximum extent possible.
1453 It is intended to support the present situation where
1454 894-only systems predominate, while providing an easy
1455 transition to a possible future in which 1042-capable
1456 systems become common.
1458 Note that 894-only systems cannot interoperate directly
1459 with 1042-only systems. If the two system types are set
1460 up as two different logical networks on the same cable,
1461 they can communicate only through an IP gateway.
1462 Furthermore, it is not useful or even possible for a
1463 dual-format host to discover automatically which format to
1464 send, because of the problem of link-layer broadcasts.
1466 2.4 LINK/INTERNET LAYER INTERFACE
1468 The packet receive interface between the IP layer and the link
1469 layer MUST include a flag to indicate whether the incoming packet
1470 was addressed to a link-layer broadcast address.
1474 Internet Engineering Task Force [Page 25]
1479 RFC1122 LINK LAYER October 1989
1483 Although the IP layer does not generally know link layer
1484 addresses (since every different network medium typically has
1485 a different address format), the broadcast address on a
1486 broadcast-capable medium is an important special case. See
1487 Section 3.2.2, especially the DISCUSSION concerning broadcast
1490 The packet send interface between the IP and link layers MUST
1491 include the 5-bit TOS field (see Section 3.2.1.6).
1493 The link layer MUST NOT report a Destination Unreachable error to
1494 IP solely because there is no ARP cache entry for a destination.
1496 2.5 LINK LAYER REQUIREMENTS SUMMARY
1507 FEATURE |SECTION| | | |T|T|e
1508 --------------------------------------------------|-------|-|-|-|-|-|--
1510 Trailer encapsulation |2.3.1 | | |x| | |
1511 Send Trailers by default without negotiation |2.3.1 | | | | |x|
1512 ARP |2.3.2 | | | | | |
1513 Flush out-of-date ARP cache entries |2.3.2.1|x| | | | |
1514 Prevent ARP floods |2.3.2.1|x| | | | |
1515 Cache timeout configurable |2.3.2.1| |x| | | |
1516 Save at least one (latest) unresolved pkt |2.3.2.2| |x| | | |
1517 Ethernet and IEEE 802 Encapsulation |2.3.3 | | | | | |
1518 Host able to: |2.3.3 | | | | | |
1519 Send & receive RFC-894 encapsulation |2.3.3 |x| | | | |
1520 Receive RFC-1042 encapsulation |2.3.3 | |x| | | |
1521 Send RFC-1042 encapsulation |2.3.3 | | |x| | |
1522 Then config. sw. to select, RFC-894 dflt |2.3.3 |x| | | | |
1523 Send K1=6 encapsulation |2.3.3 | | | | |x|
1524 Use ARP on Ethernet and IEEE 802 nets |2.3.3 |x| | | | |
1525 Link layer report b'casts to IP layer |2.4 |x| | | | |
1526 IP layer pass TOS to link layer |2.4 |x| | | | |
1527 No ARP cache entry treated as Dest. Unreach. |2.4 | | | | |x|
1533 Internet Engineering Task Force [Page 26]
1538 RFC1122 INTERNET LAYER October 1989
1541 3. INTERNET LAYER PROTOCOLS
1545 The Robustness Principle: "Be liberal in what you accept, and
1546 conservative in what you send" is particularly important in the
1547 Internet layer, where one misbehaving host can deny Internet
1548 service to many other hosts.
1550 The protocol standards used in the Internet layer are:
1552 o RFC-791 [IP:1] defines the IP protocol and gives an
1553 introduction to the architecture of the Internet.
1555 o RFC-792 [IP:2] defines ICMP, which provides routing,
1556 diagnostic and error functionality for IP. Although ICMP
1557 messages are encapsulated within IP datagrams, ICMP
1558 processing is considered to be (and is typically implemented
1559 as) part of the IP layer. See Section 3.2.2.
1561 o RFC-950 [IP:3] defines the mandatory subnet extension to the
1562 addressing architecture.
1564 o RFC-1112 [IP:4] defines the Internet Group Management
1565 Protocol IGMP, as part of a recommended extension to hosts
1566 and to the host-gateway interface to support Internet-wide
1567 multicasting at the IP level. See Section 3.2.3.
1569 The target of an IP multicast may be an arbitrary group of
1570 Internet hosts. IP multicasting is designed as a natural
1571 extension of the link-layer multicasting facilities of some
1572 networks, and it provides a standard means for local access
1573 to such link-layer multicasting facilities.
1575 Other important references are listed in Section 5 of this
1578 The Internet layer of host software MUST implement both IP and
1579 ICMP. See Section 3.3.7 for the requirements on support of IGMP.
1581 The host IP layer has two basic functions: (1) choose the "next
1582 hop" gateway or host for outgoing IP datagrams and (2) reassemble
1583 incoming IP datagrams. The IP layer may also (3) implement
1584 intentional fragmentation of outgoing datagrams. Finally, the IP
1585 layer must (4) provide diagnostic and error functionality. We
1586 expect that IP layer functions may increase somewhat in the
1587 future, as further Internet control and management facilities are
1592 Internet Engineering Task Force [Page 27]
1597 RFC1122 INTERNET LAYER October 1989
1600 For normal datagrams, the processing is straightforward. For
1601 incoming datagrams, the IP layer:
1603 (1) verifies that the datagram is correctly formatted;
1605 (2) verifies that it is destined to the local host;
1607 (3) processes options;
1609 (4) reassembles the datagram if necessary; and
1611 (5) passes the encapsulated message to the appropriate
1612 transport-layer protocol module.
1614 For outgoing datagrams, the IP layer:
1616 (1) sets any fields not set by the transport layer;
1618 (2) selects the correct first hop on the connected network (a
1619 process called "routing");
1621 (3) fragments the datagram if necessary and if intentional
1622 fragmentation is implemented (see Section 3.3.3); and
1624 (4) passes the packet(s) to the appropriate link-layer driver.
1627 A host is said to be multihomed if it has multiple IP addresses.
1628 Multihoming introduces considerable confusion and complexity into
1629 the protocol suite, and it is an area in which the Internet
1630 architecture falls seriously short of solving all problems. There
1631 are two distinct problem areas in multihoming:
1633 (1) Local multihoming -- the host itself is multihomed; or
1635 (2) Remote multihoming -- the local host needs to communicate
1636 with a remote multihomed host.
1638 At present, remote multihoming MUST be handled at the application
1639 layer, as discussed in the companion RFC [INTRO:1]. A host MAY
1640 support local multihoming, which is discussed in this document,
1641 and in particular in Section 3.3.4.
1643 Any host that forwards datagrams generated by another host is
1644 acting as a gateway and MUST also meet the specifications laid out
1645 in the gateway requirements RFC [INTRO:2]. An Internet host that
1646 includes embedded gateway code MUST have a configuration switch to
1647 disable the gateway function, and this switch MUST default to the
1651 Internet Engineering Task Force [Page 28]
1656 RFC1122 INTERNET LAYER October 1989
1659 non-gateway mode. In this mode, a datagram arriving through one
1660 interface will not be forwarded to another host or gateway (unless
1661 it is source-routed), regardless of whether the host is single-
1662 homed or multihomed. The host software MUST NOT automatically
1663 move into gateway mode if the host has more than one interface, as
1664 the operator of the machine may neither want to provide that
1665 service nor be competent to do so.
1667 In the following, the action specified in certain cases is to
1668 "silently discard" a received datagram. This means that the
1669 datagram will be discarded without further processing and that the
1670 host will not send any ICMP error message (see Section 3.2.2) as a
1671 result. However, for diagnosis of problems a host SHOULD provide
1672 the capability of logging the error (see Section 1.2.3), including
1673 the contents of the silently-discarded datagram, and SHOULD record
1674 the event in a statistics counter.
1677 Silent discard of erroneous datagrams is generally intended
1678 to prevent "broadcast storms".
1680 3.2 PROTOCOL WALK-THROUGH
1682 3.2.1 Internet Protocol -- IP
1684 3.2.1.1 Version Number: RFC-791 Section 3.1
1686 A datagram whose version number is not 4 MUST be silently
1689 3.2.1.2 Checksum: RFC-791 Section 3.1
1691 A host MUST verify the IP header checksum on every received
1692 datagram and silently discard every datagram that has a bad
1695 3.2.1.3 Addressing: RFC-791 Section 3.2
1697 There are now five classes of IP addresses: Class A through
1698 Class E. Class D addresses are used for IP multicasting
1699 [IP:4], while Class E addresses are reserved for
1702 A multicast (Class D) address is a 28-bit logical address
1703 that stands for a group of hosts, and may be either
1704 permanent or transient. Permanent multicast addresses are
1705 allocated by the Internet Assigned Number Authority
1706 [INTRO:6], while transient addresses may be allocated
1710 Internet Engineering Task Force [Page 29]
1715 RFC1122 INTERNET LAYER October 1989
1718 dynamically to transient groups. Group membership is
1719 determined dynamically using IGMP [IP:4].
1721 We now summarize the important special cases for Class A, B,
1722 and C IP addresses, using the following notation for an IP
1725 { <Network-number>, <Host-number> }
1728 { <Network-number>, <Subnet-number>, <Host-number> }
1730 and the notation "-1" for a field that contains all 1 bits.
1731 This notation is not intended to imply that the 1-bits in an
1732 address mask need be contiguous.
1736 This host on this network. MUST NOT be sent, except as
1737 a source address as part of an initialization procedure
1738 by which the host learns its own IP address.
1740 See also Section 3.3.6 for a non-standard use of {0,0}.
1742 (b) { 0, <Host-number> }
1744 Specified host on this network. It MUST NOT be sent,
1745 except as a source address as part of an initialization
1746 procedure by which the host learns its full IP address.
1750 Limited broadcast. It MUST NOT be used as a source
1753 A datagram with this destination address will be
1754 received by every host on the connected physical
1755 network but will not be forwarded outside that network.
1757 (d) { <Network-number>, -1 }
1759 Directed broadcast to the specified network. It MUST
1760 NOT be used as a source address.
1762 (e) { <Network-number>, <Subnet-number>, -1 }
1764 Directed broadcast to the specified subnet. It MUST
1765 NOT be used as a source address.
1769 Internet Engineering Task Force [Page 30]
1774 RFC1122 INTERNET LAYER October 1989
1777 (f) { <Network-number>, -1, -1 }
1779 Directed broadcast to all subnets of the specified
1780 subnetted network. It MUST NOT be used as a source
1785 Internal host loopback address. Addresses of this form
1786 MUST NOT appear outside a host.
1788 The <Network-number> is administratively assigned so that
1789 its value will be unique in the entire world.
1791 IP addresses are not permitted to have the value 0 or -1 for
1792 any of the <Host-number>, <Network-number>, or <Subnet-
1793 number> fields (except in the special cases listed above).
1794 This implies that each of these fields will be at least two
1797 For further discussion of broadcast addresses, see Section
1800 A host MUST support the subnet extensions to IP [IP:3]. As
1801 a result, there will be an address mask of the form:
1802 {-1, -1, 0} associated with each of the host's local IP
1803 addresses; see Sections 3.2.2.9 and 3.3.1.1.
1805 When a host sends any datagram, the IP source address MUST
1806 be one of its own IP addresses (but not a broadcast or
1809 A host MUST silently discard an incoming datagram that is
1810 not destined for the host. An incoming datagram is destined
1811 for the host if the datagram's destination address field is:
1813 (1) (one of) the host's IP address(es); or
1815 (2) an IP broadcast address valid for the connected
1818 (3) the address for a multicast group of which the host is
1819 a member on the incoming physical interface.
1821 For most purposes, a datagram addressed to a broadcast or
1822 multicast destination is processed as if it had been
1823 addressed to one of the host's IP addresses; we use the term
1824 "specific-destination address" for the equivalent local IP
1828 Internet Engineering Task Force [Page 31]
1833 RFC1122 INTERNET LAYER October 1989
1836 address of the host. The specific-destination address is
1837 defined to be the destination address in the IP header
1838 unless the header contains a broadcast or multicast address,
1839 in which case the specific-destination is an IP address
1840 assigned to the physical interface on which the datagram
1843 A host MUST silently discard an incoming datagram containing
1844 an IP source address that is invalid by the rules of this
1845 section. This validation could be done in either the IP
1846 layer or by each protocol in the transport layer.
1849 A mis-addressed datagram might be caused by a link-
1850 layer broadcast of a unicast datagram or by a gateway
1851 or host that is confused or mis-configured.
1853 An architectural goal for Internet hosts was to allow
1854 IP addresses to be featureless 32-bit numbers, avoiding
1855 algorithms that required a knowledge of the IP address
1856 format. Otherwise, any future change in the format or
1857 interpretation of IP addresses will require host
1858 software changes. However, validation of broadcast and
1859 multicast addresses violates this goal; a few other
1860 violations are described elsewhere in this document.
1862 Implementers should be aware that applications
1863 depending upon the all-subnets directed broadcast
1864 address (f) may be unusable on some networks. All-
1865 subnets broadcast is not widely implemented in vendor
1866 gateways at present, and even when it is implemented, a
1867 particular network administration may disable it in the
1868 gateway configuration.
1870 3.2.1.4 Fragmentation and Reassembly: RFC-791 Section 3.2
1872 The Internet model requires that every host support
1873 reassembly. See Sections 3.3.2 and 3.3.3 for the
1874 requirements on fragmentation and reassembly.
1876 3.2.1.5 Identification: RFC-791 Section 3.2
1878 When sending an identical copy of an earlier datagram, a
1879 host MAY optionally retain the same Identification field in
1887 Internet Engineering Task Force [Page 32]
1892 RFC1122 INTERNET LAYER October 1989
1896 Some Internet protocol experts have maintained that
1897 when a host sends an identical copy of an earlier
1898 datagram, the new copy should contain the same
1899 Identification value as the original. There are two
1900 suggested advantages: (1) if the datagrams are
1901 fragmented and some of the fragments are lost, the
1902 receiver may be able to reconstruct a complete datagram
1903 from fragments of the original and the copies; (2) a
1904 congested gateway might use the IP Identification field
1905 (and Fragment Offset) to discard duplicate datagrams
1908 However, the observed patterns of datagram loss in the
1909 Internet do not favor the probability of retransmitted
1910 fragments filling reassembly gaps, while other
1911 mechanisms (e.g., TCP repacketizing upon
1912 retransmission) tend to prevent retransmission of an
1913 identical datagram [IP:9]. Therefore, we believe that
1914 retransmitting the same Identification field is not
1915 useful. Also, a connectionless transport protocol like
1916 UDP would require the cooperation of the application
1917 programs to retain the same Identification value in
1918 identical datagrams.
1920 3.2.1.6 Type-of-Service: RFC-791 Section 3.2
1922 The "Type-of-Service" byte in the IP header is divided into
1923 two sections: the Precedence field (high-order 3 bits), and
1924 a field that is customarily called "Type-of-Service" or
1925 "TOS" (low-order 5 bits). In this document, all references
1926 to "TOS" or the "TOS field" refer to the low-order 5 bits
1929 The Precedence field is intended for Department of Defense
1930 applications of the Internet protocols. The use of non-zero
1931 values in this field is outside the scope of this document
1932 and the IP standard specification. Vendors should consult
1933 the Defense Communication Agency (DCA) for guidance on the
1934 IP Precedence field and its implications for other protocol
1935 layers. However, vendors should note that the use of
1936 precedence will most likely require that its value be passed
1937 between protocol layers in just the same way as the TOS
1940 The IP layer MUST provide a means for the transport layer to
1941 set the TOS field of every datagram that is sent; the
1942 default is all zero bits. The IP layer SHOULD pass received
1946 Internet Engineering Task Force [Page 33]
1951 RFC1122 INTERNET LAYER October 1989
1954 TOS values up to the transport layer.
1956 The particular link-layer mappings of TOS contained in RFC-
1957 795 SHOULD NOT be implemented.
1960 While the TOS field has been little used in the past,
1961 it is expected to play an increasing role in the near
1962 future. The TOS field is expected to be used to
1963 control two aspects of gateway operations: routing and
1964 queueing algorithms. See Section 2 of [INTRO:1] for
1965 the requirements on application programs to specify TOS
1968 The TOS field may also be mapped into link-layer
1969 service selectors. This has been applied to provide
1970 effective sharing of serial lines by different classes
1971 of TCP traffic, for example. However, the mappings
1972 suggested in RFC-795 for networks that were included in
1973 the Internet as of 1981 are now obsolete.
1975 3.2.1.7 Time-to-Live: RFC-791 Section 3.2
1977 A host MUST NOT send a datagram with a Time-to-Live (TTL)
1980 A host MUST NOT discard a datagram just because it was
1981 received with TTL less than 2.
1983 The IP layer MUST provide a means for the transport layer to
1984 set the TTL field of every datagram that is sent. When a
1985 fixed TTL value is used, it MUST be configurable. The
1986 current suggested value will be published in the "Assigned
1990 The TTL field has two functions: limit the lifetime of
1991 TCP segments (see RFC-793 [TCP:1], p. 28), and
1992 terminate Internet routing loops. Although TTL is a
1993 time in seconds, it also has some attributes of a hop-
1994 count, since each gateway is required to reduce the TTL
1995 field by at least one.
1997 The intent is that TTL expiration will cause a datagram
1998 to be discarded by a gateway but not by the destination
1999 host; however, hosts that act as gateways by forwarding
2000 datagrams must follow the gateway rules for TTL.
2005 Internet Engineering Task Force [Page 34]
2010 RFC1122 INTERNET LAYER October 1989
2013 A higher-layer protocol may want to set the TTL in
2014 order to implement an "expanding scope" search for some
2015 Internet resource. This is used by some diagnostic
2016 tools, and is expected to be useful for locating the
2017 "nearest" server of a given class using IP
2018 multicasting, for example. A particular transport
2019 protocol may also want to specify its own TTL bound on
2020 maximum datagram lifetime.
2022 A fixed value must be at least big enough for the
2023 Internet "diameter," i.e., the longest possible path.
2024 A reasonable value is about twice the diameter, to
2025 allow for continued Internet growth.
2027 3.2.1.8 Options: RFC-791 Section 3.2
2029 There MUST be a means for the transport layer to specify IP
2030 options to be included in transmitted IP datagrams (see
2033 All IP options (except NOP or END-OF-LIST) received in
2034 datagrams MUST be passed to the transport layer (or to ICMP
2035 processing when the datagram is an ICMP message). The IP
2036 and transport layer MUST each interpret those IP options
2037 that they understand and silently ignore the others.
2039 Later sections of this document discuss specific IP option
2040 support required by each of ICMP, TCP, and UDP.
2043 Passing all received IP options to the transport layer
2044 is a deliberate "violation of strict layering" that is
2045 designed to ease the introduction of new transport-
2046 relevant IP options in the future. Each layer must
2047 pick out any options that are relevant to its own
2048 processing and ignore the rest. For this purpose,
2049 every IP option except NOP and END-OF-LIST will include
2050 a specification of its own length.
2052 This document does not define the order in which a
2053 receiver must process multiple options in the same IP
2054 header. Hosts sending multiple options must be aware
2055 that this introduces an ambiguity in the meaning of
2056 certain options when combined with a source-route
2060 The IP layer must not crash as the result of an option
2064 Internet Engineering Task Force [Page 35]
2069 RFC1122 INTERNET LAYER October 1989
2072 length that is outside the possible range. For
2073 example, erroneous option lengths have been observed to
2074 put some IP implementations into infinite loops.
2076 Here are the requirements for specific IP options:
2081 Some environments require the Security option in every
2082 datagram; such a requirement is outside the scope of
2083 this document and the IP standard specification. Note,
2084 however, that the security options described in RFC-791
2085 and RFC-1038 are obsolete. For DoD applications,
2086 vendors should consult [IP:8] for guidance.
2089 (b) Stream Identifier Option
2091 This option is obsolete; it SHOULD NOT be sent, and it
2092 MUST be silently ignored if received.
2095 (c) Source Route Options
2097 A host MUST support originating a source route and MUST
2098 be able to act as the final destination of a source
2101 If host receives a datagram containing a completed
2102 source route (i.e., the pointer points beyond the last
2103 field), the datagram has reached its final destination;
2104 the option as received (the recorded route) MUST be
2105 passed up to the transport layer (or to ICMP message
2106 processing). This recorded route will be reversed and
2107 used to form a return source route for reply datagrams
2108 (see discussion of IP Options in Section 4). When a
2109 return source route is built, it MUST be correctly
2110 formed even if the recorded route included the source
2111 host (see case (B) in the discussion below).
2113 An IP header containing more than one Source Route
2114 option MUST NOT be sent; the effect on routing of
2115 multiple Source Route options is implementation-
2118 Section 3.3.5 presents the rules for a host acting as
2119 an intermediate hop in a source route, i.e., forwarding
2123 Internet Engineering Task Force [Page 36]
2128 RFC1122 INTERNET LAYER October 1989
2131 a source-routed datagram.
2134 If a source-routed datagram is fragmented, each
2135 fragment will contain a copy of the source route.
2136 Since the processing of IP options (including a
2137 source route) must precede reassembly, the
2138 original datagram will not be reassembled until
2139 the final destination is reached.
2141 Suppose a source routed datagram is to be routed
2142 from host S to host D via gateways G1, G2, ... Gn.
2143 There was an ambiguity in the specification over
2144 whether the source route option in a datagram sent
2145 out by S should be (A) or (B):
2147 (A): {>>G2, G3, ... Gn, D} <--- CORRECT
2149 (B): {S, >>G2, G3, ... Gn, D} <---- WRONG
2151 (where >> represents the pointer). If (A) is
2152 sent, the datagram received at D will contain the
2153 option: {G1, G2, ... Gn >>}, with S and D as the
2154 IP source and destination addresses. If (B) were
2155 sent, the datagram received at D would again
2156 contain S and D as the same IP source and
2157 destination addresses, but the option would be:
2158 {S, G1, ...Gn >>}; i.e., the originating host
2159 would be the first hop in the route.
2162 (d) Record Route Option
2164 Implementation of originating and processing the Record
2165 Route option is OPTIONAL.
2168 (e) Timestamp Option
2170 Implementation of originating and processing the
2171 Timestamp option is OPTIONAL. If it is implemented,
2172 the following rules apply:
2174 o The originating host MUST record a timestamp in a
2175 Timestamp option whose Internet address fields are
2176 not pre-specified or whose first pre-specified
2177 address is the host's interface address.
2182 Internet Engineering Task Force [Page 37]
2187 RFC1122 INTERNET LAYER October 1989
2190 o The destination host MUST (if possible) add the
2191 current timestamp to a Timestamp option before
2192 passing the option to the transport layer or to
2193 ICMP for processing.
2195 o A timestamp value MUST follow the rules given in
2196 Section 3.2.2.8 for the ICMP Timestamp message.
2199 3.2.2 Internet Control Message Protocol -- ICMP
2201 ICMP messages are grouped into two classes.
2204 ICMP error messages:
2206 Destination Unreachable (see Section 3.2.2.1)
2207 Redirect (see Section 3.2.2.2)
2208 Source Quench (see Section 3.2.2.3)
2209 Time Exceeded (see Section 3.2.2.4)
2210 Parameter Problem (see Section 3.2.2.5)
2214 ICMP query messages:
2216 Echo (see Section 3.2.2.6)
2217 Information (see Section 3.2.2.7)
2218 Timestamp (see Section 3.2.2.8)
2219 Address Mask (see Section 3.2.2.9)
2222 If an ICMP message of unknown type is received, it MUST be
2225 Every ICMP error message includes the Internet header and at
2226 least the first 8 data octets of the datagram that triggered
2227 the error; more than 8 octets MAY be sent; this header and data
2228 MUST be unchanged from the received datagram.
2230 In those cases where the Internet layer is required to pass an
2231 ICMP error message to the transport layer, the IP protocol
2232 number MUST be extracted from the original header and used to
2233 select the appropriate transport protocol entity to handle the
2236 An ICMP error message SHOULD be sent with normal (i.e., zero)
2241 Internet Engineering Task Force [Page 38]
2246 RFC1122 INTERNET LAYER October 1989
2249 An ICMP error message MUST NOT be sent as the result of
2252 * an ICMP error message, or
2254 * a datagram destined to an IP broadcast or IP multicast
2257 * a datagram sent as a link-layer broadcast, or
2259 * a non-initial fragment, or
2261 * a datagram whose source address does not define a single
2262 host -- e.g., a zero address, a loopback address, a
2263 broadcast address, a multicast address, or a Class E
2266 NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT
2267 ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES.
2270 These rules will prevent the "broadcast storms" that have
2271 resulted from hosts returning ICMP error messages in
2272 response to broadcast datagrams. For example, a broadcast
2273 UDP segment to a non-existent port could trigger a flood
2274 of ICMP Destination Unreachable datagrams from all
2275 machines that do not have a client for that destination
2276 port. On a large Ethernet, the resulting collisions can
2277 render the network useless for a second or more.
2279 Every datagram that is broadcast on the connected network
2280 should have a valid IP broadcast address as its IP
2281 destination (see Section 3.3.6). However, some hosts
2282 violate this rule. To be certain to detect broadcast
2283 datagrams, therefore, hosts are required to check for a
2284 link-layer broadcast as well as an IP-layer broadcast
2288 This requires that the link layer inform the IP layer when
2289 a link-layer broadcast datagram has been received; see
2292 3.2.2.1 Destination Unreachable: RFC-792
2294 The following additional codes are hereby defined:
2296 6 = destination network unknown
2300 Internet Engineering Task Force [Page 39]
2305 RFC1122 INTERNET LAYER October 1989
2308 7 = destination host unknown
2310 8 = source host isolated
2312 9 = communication with destination network
2313 administratively prohibited
2315 10 = communication with destination host
2316 administratively prohibited
2318 11 = network unreachable for type of service
2320 12 = host unreachable for type of service
2322 A host SHOULD generate Destination Unreachable messages with
2325 2 (Protocol Unreachable), when the designated transport
2326 protocol is not supported; or
2328 3 (Port Unreachable), when the designated transport
2329 protocol (e.g., UDP) is unable to demultiplex the
2330 datagram but has no protocol mechanism to inform the
2333 A Destination Unreachable message that is received MUST be
2334 reported to the transport layer. The transport layer SHOULD
2335 use the information appropriately; for example, see Sections
2336 4.1.3.3, 4.2.3.9, and 4.2.4 below. A transport protocol
2337 that has its own mechanism for notifying the sender that a
2338 port is unreachable (e.g., TCP, which sends RST segments)
2339 MUST nevertheless accept an ICMP Port Unreachable for the
2342 A Destination Unreachable message that is received with code
2343 0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a
2344 routing transient and MUST therefore be interpreted as only
2345 a hint, not proof, that the specified destination is
2346 unreachable [IP:11]. For example, it MUST NOT be used as
2347 proof of a dead gateway (see Section 3.3.1).
2349 3.2.2.2 Redirect: RFC-792
2351 A host SHOULD NOT send an ICMP Redirect message; Redirects
2352 are to be sent only by gateways.
2354 A host receiving a Redirect message MUST update its routing
2355 information accordingly. Every host MUST be prepared to
2359 Internet Engineering Task Force [Page 40]
2364 RFC1122 INTERNET LAYER October 1989
2367 accept both Host and Network Redirects and to process them
2368 as described in Section 3.3.1.2 below.
2370 A Redirect message SHOULD be silently discarded if the new
2371 gateway address it specifies is not on the same connected
2372 (sub-) net through which the Redirect arrived [INTRO:2,
2373 Appendix A], or if the source of the Redirect is not the
2374 current first-hop gateway for the specified destination (see
2377 3.2.2.3 Source Quench: RFC-792
2379 A host MAY send a Source Quench message if it is
2380 approaching, or has reached, the point at which it is forced
2381 to discard incoming datagrams due to a shortage of
2382 reassembly buffers or other resources. See Section 2.2.3 of
2383 [INTRO:2] for suggestions on when to send Source Quench.
2385 If a Source Quench message is received, the IP layer MUST
2386 report it to the transport layer (or ICMP processing). In
2387 general, the transport or application layer SHOULD implement
2388 a mechanism to respond to Source Quench for any protocol
2389 that can send a sequence of datagrams to the same
2390 destination and which can reasonably be expected to maintain
2391 enough state information to make this feasible. See Section
2392 4 for the handling of Source Quench by TCP and UDP.
2395 A Source Quench may be generated by the target host or
2396 by some gateway in the path of a datagram. The host
2397 receiving a Source Quench should throttle itself back
2398 for a period of time, then gradually increase the
2399 transmission rate again. The mechanism to respond to
2400 Source Quench may be in the transport layer (for
2401 connection-oriented protocols like TCP) or in the
2402 application layer (for protocols that are built on top
2405 A mechanism has been proposed [IP:14] to make the IP
2406 layer respond directly to Source Quench by controlling
2407 the rate at which datagrams are sent, however, this
2408 proposal is currently experimental and not currently
2411 3.2.2.4 Time Exceeded: RFC-792
2413 An incoming Time Exceeded message MUST be passed to the
2418 Internet Engineering Task Force [Page 41]
2423 RFC1122 INTERNET LAYER October 1989
2427 A gateway will send a Time Exceeded Code 0 (In Transit)
2428 message when it discards a datagram due to an expired
2429 TTL field. This indicates either a gateway routing
2430 loop or too small an initial TTL value.
2432 A host may receive a Time Exceeded Code 1 (Reassembly
2433 Timeout) message from a destination host that has timed
2434 out and discarded an incomplete datagram; see Section
2435 3.3.2 below. In the future, receipt of this message
2436 might be part of some "MTU discovery" procedure, to
2437 discover the maximum datagram size that can be sent on
2438 the path without fragmentation.
2440 3.2.2.5 Parameter Problem: RFC-792
2442 A host SHOULD generate Parameter Problem messages. An
2443 incoming Parameter Problem message MUST be passed to the
2444 transport layer, and it MAY be reported to the user.
2447 The ICMP Parameter Problem message is sent to the
2448 source host for any problem not specifically covered by
2449 another ICMP message. Receipt of a Parameter Problem
2450 message generally indicates some local or remote
2451 implementation error.
2453 A new variant on the Parameter Problem message is hereby
2455 Code 1 = required option is missing.
2458 This variant is currently in use in the military
2459 community for a missing security option.
2461 3.2.2.6 Echo Request/Reply: RFC-792
2463 Every host MUST implement an ICMP Echo server function that
2464 receives Echo Requests and sends corresponding Echo Replies.
2465 A host SHOULD also implement an application-layer interface
2466 for sending an Echo Request and receiving an Echo Reply, for
2467 diagnostic purposes.
2469 An ICMP Echo Request destined to an IP broadcast or IP
2470 multicast address MAY be silently discarded.
2477 Internet Engineering Task Force [Page 42]
2482 RFC1122 INTERNET LAYER October 1989
2486 This neutral provision results from a passionate debate
2487 between those who feel that ICMP Echo to a broadcast
2488 address provides a valuable diagnostic capability and
2489 those who feel that misuse of this feature can too
2490 easily create packet storms.
2492 The IP source address in an ICMP Echo Reply MUST be the same
2493 as the specific-destination address (defined in Section
2494 3.2.1.3) of the corresponding ICMP Echo Request message.
2496 Data received in an ICMP Echo Request MUST be entirely
2497 included in the resulting Echo Reply. However, if sending
2498 the Echo Reply requires intentional fragmentation that is
2499 not implemented, the datagram MUST be truncated to maximum
2500 transmission size (see Section 3.3.3) and sent.
2502 Echo Reply messages MUST be passed to the ICMP user
2503 interface, unless the corresponding Echo Request originated
2506 If a Record Route and/or Time Stamp option is received in an
2507 ICMP Echo Request, this option (these options) SHOULD be
2508 updated to include the current host and included in the IP
2509 header of the Echo Reply message, without "truncation".
2510 Thus, the recorded route will be for the entire round trip.
2512 If a Source Route option is received in an ICMP Echo
2513 Request, the return route MUST be reversed and used as a
2514 Source Route option for the Echo Reply message.
2516 3.2.2.7 Information Request/Reply: RFC-792
2518 A host SHOULD NOT implement these messages.
2521 The Information Request/Reply pair was intended to
2522 support self-configuring systems such as diskless
2523 workstations, to allow them to discover their IP
2524 network numbers at boot time. However, the RARP and
2525 BOOTP protocols provide better mechanisms for a host to
2526 discover its own IP address.
2528 3.2.2.8 Timestamp and Timestamp Reply: RFC-792
2530 A host MAY implement Timestamp and Timestamp Reply. If they
2531 are implemented, the following rules MUST be followed.
2536 Internet Engineering Task Force [Page 43]
2541 RFC1122 INTERNET LAYER October 1989
2544 o The ICMP Timestamp server function returns a Timestamp
2545 Reply to every Timestamp message that is received. If
2546 this function is implemented, it SHOULD be designed for
2547 minimum variability in delay (e.g., implemented in the
2548 kernel to avoid delay in scheduling a user process).
2550 The following cases for Timestamp are to be handled
2551 according to the corresponding rules for ICMP Echo:
2553 o An ICMP Timestamp Request message to an IP broadcast or
2554 IP multicast address MAY be silently discarded.
2556 o The IP source address in an ICMP Timestamp Reply MUST
2557 be the same as the specific-destination address of the
2558 corresponding Timestamp Request message.
2560 o If a Source-route option is received in an ICMP Echo
2561 Request, the return route MUST be reversed and used as
2562 a Source Route option for the Timestamp Reply message.
2564 o If a Record Route and/or Timestamp option is received
2565 in a Timestamp Request, this (these) option(s) SHOULD
2566 be updated to include the current host and included in
2567 the IP header of the Timestamp Reply message.
2569 o Incoming Timestamp Reply messages MUST be passed up to
2570 the ICMP user interface.
2572 The preferred form for a timestamp value (the "standard
2573 value") is in units of milliseconds since midnight Universal
2574 Time. However, it may be difficult to provide this value
2575 with millisecond resolution. For example, many systems use
2576 clocks that update only at line frequency, 50 or 60 times
2577 per second. Therefore, some latitude is allowed in a
2580 (a) A "standard value" MUST be updated at least 15 times
2581 per second (i.e., at most the six low-order bits of the
2582 value may be undefined).
2584 (b) The accuracy of a "standard value" MUST approximate
2585 that of operator-set CPU clocks, i.e., correct within a
2595 Internet Engineering Task Force [Page 44]
2600 RFC1122 INTERNET LAYER October 1989
2603 3.2.2.9 Address Mask Request/Reply: RFC-950
2605 A host MUST support the first, and MAY implement all three,
2606 of the following methods for determining the address mask(s)
2607 corresponding to its IP address(es):
2609 (1) static configuration information;
2611 (2) obtaining the address mask(s) dynamically as a side-
2612 effect of the system initialization process (see
2615 (3) sending ICMP Address Mask Request(s) and receiving ICMP
2616 Address Mask Reply(s).
2618 The choice of method to be used in a particular host MUST be
2621 When method (3), the use of Address Mask messages, is
2624 (a) When it initializes, the host MUST broadcast an Address
2625 Mask Request message on the connected network
2626 corresponding to the IP address. It MUST retransmit
2627 this message a small number of times if it does not
2628 receive an immediate Address Mask Reply.
2630 (b) Until it has received an Address Mask Reply, the host
2631 SHOULD assume a mask appropriate for the address class
2632 of the IP address, i.e., assume that the connected
2633 network is not subnetted.
2635 (c) The first Address Mask Reply message received MUST be
2636 used to set the address mask corresponding to the
2637 particular local IP address. This is true even if the
2638 first Address Mask Reply message is "unsolicited", in
2639 which case it will have been broadcast and may arrive
2640 after the host has ceased to retransmit Address Mask
2641 Requests. Once the mask has been set by an Address
2642 Mask Reply, later Address Mask Reply messages MUST be
2645 Conversely, if Address Mask messages are disabled, then no
2646 ICMP Address Mask Requests will be sent, and any ICMP
2647 Address Mask Replies received for that local IP address MUST
2648 be (silently) ignored.
2650 A host SHOULD make some reasonableness check on any address
2654 Internet Engineering Task Force [Page 45]
2659 RFC1122 INTERNET LAYER October 1989
2662 mask it installs; see IMPLEMENTATION section below.
2664 A system MUST NOT send an Address Mask Reply unless it is an
2665 authoritative agent for address masks. An authoritative
2666 agent may be a host or a gateway, but it MUST be explicitly
2667 configured as a address mask agent. Receiving an address
2668 mask via an Address Mask Reply does not give the receiver
2669 authority and MUST NOT be used as the basis for issuing
2670 Address Mask Replies.
2672 With a statically configured address mask, there SHOULD be
2673 an additional configuration flag that determines whether the
2674 host is to act as an authoritative agent for this mask,
2675 i.e., whether it will answer Address Mask Request messages
2678 If it is configured as an agent, the host MUST broadcast an
2679 Address Mask Reply for the mask on the appropriate interface
2680 when it initializes.
2682 See "System Initialization" in [INTRO:1] for more
2683 information about the use of Address Mask Request/Reply
2687 Hosts that casually send Address Mask Replies with
2688 invalid address masks have often been a serious
2689 nuisance. To prevent this, Address Mask Replies ought
2690 to be sent only by authoritative agents that have been
2691 selected by explicit administrative action.
2693 When an authoritative agent receives an Address Mask
2694 Request message, it will send a unicast Address Mask
2695 Reply to the source IP address. If the network part of
2696 this address is zero (see (a) and (b) in 3.2.1.3), the
2697 Reply will be broadcast.
2699 Getting no reply to its Address Mask Request messages,
2700 a host will assume there is no agent and use an
2701 unsubnetted mask, but the agent may be only temporarily
2702 unreachable. An agent will broadcast an unsolicited
2703 Address Mask Reply whenever it initializes, in order to
2704 update the masks of all hosts that have initialized in
2708 The following reasonableness check on an address mask
2709 is suggested: the mask is not all 1 bits, and it is
2713 Internet Engineering Task Force [Page 46]
2718 RFC1122 INTERNET LAYER October 1989
2721 either zero or else the 8 highest-order bits are on.
2723 3.2.3 Internet Group Management Protocol IGMP
2725 IGMP [IP:4] is a protocol used between hosts and gateways on a
2726 single network to establish hosts' membership in particular
2727 multicast groups. The gateways use this information, in
2728 conjunction with a multicast routing protocol, to support IP
2729 multicasting across the Internet.
2731 At this time, implementation of IGMP is OPTIONAL; see Section
2732 3.3.7 for more information. Without IGMP, a host can still
2733 participate in multicasting local to its connected networks.
2737 3.3.1 Routing Outbound Datagrams
2739 The IP layer chooses the correct next hop for each datagram it
2740 sends. If the destination is on a connected network, the
2741 datagram is sent directly to the destination host; otherwise,
2742 it has to be routed to a gateway on a connected network.
2744 3.3.1.1 Local/Remote Decision
2746 To decide if the destination is on a connected network, the
2747 following algorithm MUST be used [see IP:3]:
2749 (a) The address mask (particular to a local IP address for
2750 a multihomed host) is a 32-bit mask that selects the
2751 network number and subnet number fields of the
2752 corresponding IP address.
2754 (b) If the IP destination address bits extracted by the
2755 address mask match the IP source address bits extracted
2756 by the same mask, then the destination is on the
2757 corresponding connected network, and the datagram is to
2758 be transmitted directly to the destination host.
2760 (c) If not, then the destination is accessible only through
2761 a gateway. Selection of a gateway is described below
2764 A special-case destination address is handled as follows:
2766 * For a limited broadcast or a multicast address, simply
2767 pass the datagram to the link layer for the appropriate
2772 Internet Engineering Task Force [Page 47]
2777 RFC1122 INTERNET LAYER October 1989
2780 * For a (network or subnet) directed broadcast, the
2781 datagram can use the standard routing algorithms.
2783 The host IP layer MUST operate correctly in a minimal
2784 network environment, and in particular, when there are no
2785 gateways. For example, if the IP layer of a host insists on
2786 finding at least one gateway to initialize, the host will be
2787 unable to operate on a single isolated broadcast net.
2789 3.3.1.2 Gateway Selection
2791 To efficiently route a series of datagrams to the same
2792 destination, the source host MUST keep a "route cache" of
2793 mappings to next-hop gateways. A host uses the following
2794 basic algorithm on this cache to route a datagram; this
2795 algorithm is designed to put the primary routing burden on
2796 the gateways [IP:11].
2798 (a) If the route cache contains no information for a
2799 particular destination, the host chooses a "default"
2800 gateway and sends the datagram to it. It also builds a
2801 corresponding Route Cache entry.
2803 (b) If that gateway is not the best next hop to the
2804 destination, the gateway will forward the datagram to
2805 the best next-hop gateway and return an ICMP Redirect
2806 message to the source host.
2808 (c) When it receives a Redirect, the host updates the
2809 next-hop gateway in the appropriate route cache entry,
2810 so later datagrams to the same destination will go
2811 directly to the best gateway.
2813 Since the subnet mask appropriate to the destination address
2814 is generally not known, a Network Redirect message SHOULD be
2815 treated identically to a Host Redirect message; i.e., the
2816 cache entry for the destination host (only) would be updated
2817 (or created, if an entry for that host did not exist) for
2821 This recommendation is to protect against gateways that
2822 erroneously send Network Redirects for a subnetted
2823 network, in violation of the gateway requirements
2826 When there is no route cache entry for the destination host
2827 address (and the destination is not on the connected
2831 Internet Engineering Task Force [Page 48]
2836 RFC1122 INTERNET LAYER October 1989
2839 network), the IP layer MUST pick a gateway from its list of
2840 "default" gateways. The IP layer MUST support multiple
2843 As an extra feature, a host IP layer MAY implement a table
2844 of "static routes". Each such static route MAY include a
2845 flag specifying whether it may be overridden by ICMP
2849 A host generally needs to know at least one default
2850 gateway to get started. This information can be
2851 obtained from a configuration file or else from the
2852 host startup sequence, e.g., the BOOTP protocol (see
2855 It has been suggested that a host can augment its list
2856 of default gateways by recording any new gateways it
2857 learns about. For example, it can record every gateway
2858 to which it is ever redirected. Such a feature, while
2859 possibly useful in some circumstances, may cause
2860 problems in other cases (e.g., gateways are not all
2861 equal), and it is not recommended.
2863 A static route is typically a particular preset mapping
2864 from destination host or network into a particular
2865 next-hop gateway; it might also depend on the Type-of-
2866 Service (see next section). Static routes would be set
2867 up by system administrators to override the normal
2868 automatic routing mechanism, to handle exceptional
2869 situations. However, any static routing information is
2870 a potential source of failure as configurations change
2875 Each route cache entry needs to include the following
2878 (1) Local IP address (for a multihomed host)
2880 (2) Destination IP address
2882 (3) Type(s)-of-Service
2884 (4) Next-hop gateway IP address
2886 Field (2) MAY be the full IP address of the destination
2890 Internet Engineering Task Force [Page 49]
2895 RFC1122 INTERNET LAYER October 1989
2898 host, or only the destination network number. Field (3),
2899 the TOS, SHOULD be included.
2901 See Section 3.3.4.2 for a discussion of the implications of
2902 multihoming for the lookup procedure in this cache.
2905 Including the Type-of-Service field in the route cache
2906 and considering it in the host route algorithm will
2907 provide the necessary mechanism for the future when
2908 Type-of-Service routing is commonly used in the
2909 Internet. See Section 3.2.1.6.
2911 Each route cache entry defines the endpoints of an
2912 Internet path. Although the connecting path may change
2913 dynamically in an arbitrary way, the transmission
2914 characteristics of the path tend to remain
2915 approximately constant over a time period longer than a
2916 single typical host-host transport connection.
2917 Therefore, a route cache entry is a natural place to
2918 cache data on the properties of the path. Examples of
2919 such properties might be the maximum unfragmented
2920 datagram size (see Section 3.3.3), or the average
2921 round-trip delay measured by a transport protocol.
2922 This data will generally be both gathered and used by a
2923 higher layer protocol, e.g., by TCP, or by an
2924 application using UDP. Experiments are currently in
2925 progress on caching path properties in this manner.
2927 There is no consensus on whether the route cache should
2928 be keyed on destination host addresses alone, or allow
2929 both host and network addresses. Those who favor the
2930 use of only host addresses argue that:
2932 (1) As required in Section 3.3.1.2, Redirect messages
2933 will generally result in entries keyed on
2934 destination host addresses; the simplest and most
2935 general scheme would be to use host addresses
2938 (2) The IP layer may not always know the address mask
2939 for a network address in a complex subnetted
2942 (3) The use of only host addresses allows the
2943 destination address to be used as a pure 32-bit
2944 number, which may allow the Internet architecture
2945 to be more easily extended in the future without
2949 Internet Engineering Task Force [Page 50]
2954 RFC1122 INTERNET LAYER October 1989
2957 any change to the hosts.
2959 The opposing view is that allowing a mixture of
2960 destination hosts and networks in the route cache:
2962 (1) Saves memory space.
2964 (2) Leads to a simpler data structure, easily
2965 combining the cache with the tables of default and
2966 static routes (see below).
2968 (3) Provides a more useful place to cache path
2969 properties, as discussed earlier.
2973 The cache needs to be large enough to include entries
2974 for the maximum number of destination hosts that may be
2977 A route cache entry may also include control
2978 information used to choose an entry for replacement.
2979 This might take the form of a "recently used" bit, a
2980 use count, or a last-used timestamp, for example. It
2981 is recommended that it include the time of last
2982 modification of the entry, for diagnostic purposes.
2984 An implementation may wish to reduce the overhead of
2985 scanning the route cache for every datagram to be
2986 transmitted. This may be accomplished with a hash
2987 table to speed the lookup, or by giving a connection-
2988 oriented transport protocol a "hint" or temporary
2989 handle on the appropriate cache entry, to be passed to
2990 the IP layer with each subsequent datagram.
2992 Although we have described the route cache, the lists
2993 of default gateways, and a table of static routes as
2994 conceptually distinct, in practice they may be combined
2995 into a single "routing table" data structure.
2997 3.3.1.4 Dead Gateway Detection
2999 The IP layer MUST be able to detect the failure of a "next-
3000 hop" gateway that is listed in its route cache and to choose
3001 an alternate gateway (see Section 3.3.1.5).
3003 Dead gateway detection is covered in some detail in RFC-816
3004 [IP:11]. Experience to date has not produced a complete
3008 Internet Engineering Task Force [Page 51]
3013 RFC1122 INTERNET LAYER October 1989
3016 algorithm which is totally satisfactory, though it has
3017 identified several forbidden paths and promising techniques.
3019 * A particular gateway SHOULD NOT be used indefinitely in
3020 the absence of positive indications that it is
3023 * Active probes such as "pinging" (i.e., using an ICMP
3024 Echo Request/Reply exchange) are expensive and scale
3025 poorly. In particular, hosts MUST NOT actively check
3026 the status of a first-hop gateway by simply pinging the
3027 gateway continuously.
3029 * Even when it is the only effective way to verify a
3030 gateway's status, pinging MUST be used only when
3031 traffic is being sent to the gateway and when there is
3032 no other positive indication to suggest that the
3033 gateway is functioning.
3035 * To avoid pinging, the layers above and/or below the
3036 Internet layer SHOULD be able to give "advice" on the
3037 status of route cache entries when either positive
3038 (gateway OK) or negative (gateway dead) information is
3043 If an implementation does not include an adequate
3044 mechanism for detecting a dead gateway and re-routing,
3045 a gateway failure may cause datagrams to apparently
3046 vanish into a "black hole". This failure can be
3047 extremely confusing for users and difficult for network
3050 The dead-gateway detection mechanism must not cause
3051 unacceptable load on the host, on connected networks,
3052 or on first-hop gateway(s). The exact constraints on
3053 the timeliness of dead gateway detection and on
3054 acceptable load may vary somewhat depending on the
3055 nature of the host's mission, but a host generally
3056 needs to detect a failed first-hop gateway quickly
3057 enough that transport-layer connections will not break
3058 before an alternate gateway can be selected.
3060 Passing advice from other layers of the protocol stack
3061 complicates the interfaces between the layers, but it
3062 is the preferred approach to dead gateway detection.
3063 Advice can come from almost any part of the IP/TCP
3067 Internet Engineering Task Force [Page 52]
3072 RFC1122 INTERNET LAYER October 1989
3075 architecture, but it is expected to come primarily from
3076 the transport and link layers. Here are some possible
3077 sources for gateway advice:
3079 o TCP or any connection-oriented transport protocol
3080 should be able to give negative advice, e.g.,
3081 triggered by excessive retransmissions.
3083 o TCP may give positive advice when (new) data is
3084 acknowledged. Even though the route may be
3085 asymmetric, an ACK for new data proves that the
3086 acknowleged data must have been transmitted
3089 o An ICMP Redirect message from a particular gateway
3090 should be used as positive advice about that
3093 o Link-layer information that reliably detects and
3094 reports host failures (e.g., ARPANET Destination
3095 Dead messages) should be used as negative advice.
3097 o Failure to ARP or to re-validate ARP mappings may
3098 be used as negative advice for the corresponding
3101 o Packets arriving from a particular link-layer
3102 address are evidence that the system at this
3103 address is alive. However, turning this
3104 information into advice about gateways requires
3105 mapping the link-layer address into an IP address,
3106 and then checking that IP address against the
3107 gateways pointed to by the route cache. This is
3108 probably prohibitively inefficient.
3110 Note that positive advice that is given for every
3111 datagram received may cause unacceptable overhead in
3114 While advice might be passed using required arguments
3115 in all interfaces to the IP layer, some transport and
3116 application layer protocols cannot deduce the correct
3117 advice. These interfaces must therefore allow a
3118 neutral value for advice, since either always-positive
3119 or always-negative advice leads to incorrect behavior.
3121 There is another technique for dead gateway detection
3122 that has been commonly used but is not recommended.
3126 Internet Engineering Task Force [Page 53]
3131 RFC1122 INTERNET LAYER October 1989
3134 This technique depends upon the host passively
3135 receiving ("wiretapping") the Interior Gateway Protocol
3136 (IGP) datagrams that the gateways are broadcasting to
3137 each other. This approach has the drawback that a host
3138 needs to recognize all the interior gateway protocols
3139 that gateways may use (see [INTRO:2]). In addition, it
3140 only works on a broadcast network.
3142 At present, pinging (i.e., using ICMP Echo messages) is
3143 the mechanism for gateway probing when absolutely
3144 required. A successful ping guarantees that the
3145 addressed interface and its associated machine are up,
3146 but it does not guarantee that the machine is a gateway
3147 as opposed to a host. The normal inference is that if
3148 a Redirect or other evidence indicates that a machine
3149 was a gateway, successful pings will indicate that the
3150 machine is still up and hence still a gateway.
3151 However, since a host silently discards packets that a
3152 gateway would forward or redirect, this assumption
3153 could sometimes fail. To avoid this problem, a new
3154 ICMP message under development will ask "are you a
3158 The following specific algorithm has been suggested:
3160 o Associate a "reroute timer" with each gateway
3161 pointed to by the route cache. Initialize the
3162 timer to a value Tr, which must be small enough to
3163 allow detection of a dead gateway before transport
3164 connections time out.
3166 o Positive advice would reset the reroute timer to
3167 Tr. Negative advice would reduce or zero the
3170 o Whenever the IP layer used a particular gateway to
3171 route a datagram, it would check the corresponding
3172 reroute timer. If the timer had expired (reached
3173 zero), the IP layer would send a ping to the
3174 gateway, followed immediately by the datagram.
3176 o The ping (ICMP Echo) would be sent again if
3177 necessary, up to N times. If no ping reply was
3178 received in N tries, the gateway would be assumed
3179 to have failed, and a new first-hop gateway would
3180 be chosen for all cache entries pointing to the
3185 Internet Engineering Task Force [Page 54]
3190 RFC1122 INTERNET LAYER October 1989
3193 Note that the size of Tr is inversely related to the
3194 amount of advice available. Tr should be large enough
3197 * Any pinging will be at a low level (e.g., <10%) of
3198 all packets sent to a gateway from the host, AND
3200 * pinging is infrequent (e.g., every 3 minutes)
3202 Since the recommended algorithm is concerned with the
3203 gateways pointed to by route cache entries, rather than
3204 the cache entries themselves, a two level data
3205 structure (perhaps coordinated with ARP or similar
3206 caches) may be desirable for implementing a route
3209 3.3.1.5 New Gateway Selection
3211 If the failed gateway is not the current default, the IP
3212 layer can immediately switch to a default gateway. If it is
3213 the current default that failed, the IP layer MUST select a
3214 different default gateway (assuming more than one default is
3215 known) for the failed route and for establishing new routes.
3218 When a gateway does fail, the other gateways on the
3219 connected network will learn of the failure through
3220 some inter-gateway routing protocol. However, this
3221 will not happen instantaneously, since gateway routing
3222 protocols typically have a settling time of 30-60
3223 seconds. If the host switches to an alternative
3224 gateway before the gateways have agreed on the failure,
3225 the new target gateway will probably forward the
3226 datagram to the failed gateway and send a Redirect back
3227 to the host pointing to the failed gateway (!). The
3228 result is likely to be a rapid oscillation in the
3229 contents of the host's route cache during the gateway
3230 settling period. It has been proposed that the dead-
3231 gateway logic should include some hysteresis mechanism
3232 to prevent such oscillations. However, experience has
3233 not shown any harm from such oscillations, since
3234 service cannot be restored to the host until the
3235 gateways' routing information does settle down.
3238 One implementation technique for choosing a new default
3239 gateway is to simply round-robin among the default
3240 gateways in the host's list. Another is to rank the
3244 Internet Engineering Task Force [Page 55]
3249 RFC1122 INTERNET LAYER October 1989
3252 gateways in priority order, and when the current
3253 default gateway is not the highest priority one, to
3254 "ping" the higher-priority gateways slowly to detect
3255 when they return to service. This pinging can be at a
3256 very low rate, e.g., 0.005 per second.
3258 3.3.1.6 Initialization
3260 The following information MUST be configurable:
3264 (2) Address mask(s).
3266 (3) A list of default gateways, with a preference level.
3268 A manual method of entering this configuration data MUST be
3269 provided. In addition, a variety of methods can be used to
3270 determine this information dynamically; see the section on
3271 "Host Initialization" in [INTRO:1].
3274 Some host implementations use "wiretapping" of gateway
3275 protocols on a broadcast network to learn what gateways
3276 exist. A standard method for default gateway discovery
3277 is under development.
3281 The IP layer MUST implement reassembly of IP datagrams.
3283 We designate the largest datagram size that can be reassembled
3284 by EMTU_R ("Effective MTU to receive"); this is sometimes
3285 called the "reassembly buffer size". EMTU_R MUST be greater
3286 than or equal to 576, SHOULD be either configurable or
3287 indefinite, and SHOULD be greater than or equal to the MTU of
3288 the connected network(s).
3291 A fixed EMTU_R limit should not be built into the code
3292 because some application layer protocols require EMTU_R
3293 values larger than 576.
3296 An implementation may use a contiguous reassembly buffer
3297 for each datagram, or it may use a more complex data
3298 structure that places no definite limit on the reassembled
3299 datagram size; in the latter case, EMTU_R is said to be
3303 Internet Engineering Task Force [Page 56]
3308 RFC1122 INTERNET LAYER October 1989
3313 Logically, reassembly is performed by simply copying each
3314 fragment into the packet buffer at the proper offset.
3315 Note that fragments may overlap if successive
3316 retransmissions use different packetizing but the same
3319 The tricky part of reassembly is the bookkeeping to
3320 determine when all bytes of the datagram have been
3321 reassembled. We recommend Clark's algorithm [IP:10] that
3322 requires no additional data space for the bookkeeping.
3323 However, note that, contrary to [IP:10], the first
3324 fragment header needs to be saved for inclusion in a
3325 possible ICMP Time Exceeded (Reassembly Timeout) message.
3327 There MUST be a mechanism by which the transport layer can
3328 learn MMS_R, the maximum message size that can be received and
3329 reassembled in an IP datagram (see GET_MAXSIZES calls in
3330 Section 3.4). If EMTU_R is not indefinite, then the value of
3335 since 20 is the minimum size of an IP header.
3337 There MUST be a reassembly timeout. The reassembly timeout
3338 value SHOULD be a fixed value, not set from the remaining TTL.
3339 It is recommended that the value lie between 60 seconds and 120
3340 seconds. If this timeout expires, the partially-reassembled
3341 datagram MUST be discarded and an ICMP Time Exceeded message
3342 sent to the source host (if fragment zero has been received).
3345 The IP specification says that the reassembly timeout
3346 should be the remaining TTL from the IP header, but this
3347 does not work well because gateways generally treat TTL as
3348 a simple hop count rather than an elapsed time. If the
3349 reassembly timeout is too small, datagrams will be
3350 discarded unnecessarily, and communication may fail. The
3351 timeout needs to be at least as large as the typical
3352 maximum delay across the Internet. A realistic minimum
3353 reassembly timeout would be 60 seconds.
3355 It has been suggested that a cache might be kept of
3356 round-trip times measured by transport protocols for
3357 various destinations, and that these values might be used
3358 to dynamically determine a reasonable reassembly timeout
3362 Internet Engineering Task Force [Page 57]
3367 RFC1122 INTERNET LAYER October 1989
3370 value. Further investigation of this approach is
3373 If the reassembly timeout is set too high, buffer
3374 resources in the receiving host will be tied up too long,
3375 and the MSL (Maximum Segment Lifetime) [TCP:1] will be
3376 larger than necessary. The MSL controls the maximum rate
3377 at which fragmented datagrams can be sent using distinct
3378 values of the 16-bit Ident field; a larger MSL lowers the
3379 maximum rate. The TCP specification [TCP:1] arbitrarily
3380 assumes a value of 2 minutes for MSL. This sets an upper
3381 limit on a reasonable reassembly timeout value.
3385 Optionally, the IP layer MAY implement a mechanism to fragment
3386 outgoing datagrams intentionally.
3388 We designate by EMTU_S ("Effective MTU for sending") the
3389 maximum IP datagram size that may be sent, for a particular
3390 combination of IP source and destination addresses and perhaps
3393 A host MUST implement a mechanism to allow the transport layer
3394 to learn MMS_S, the maximum transport-layer message size that
3395 may be sent for a given {source, destination, TOS} triplet (see
3396 GET_MAXSIZES call in Section 3.4). If no local fragmentation
3397 is performed, the value of MMS_S will be:
3399 MMS_S = EMTU_S - <IP header size>
3401 and EMTU_S must be less than or equal to the MTU of the network
3402 interface corresponding to the source address of the datagram.
3403 Note that <IP header size> in this equation will be 20, unless
3404 the IP reserves space to insert IP options for its own purposes
3405 in addition to any options inserted by the transport layer.
3407 A host that does not implement local fragmentation MUST ensure
3408 that the transport layer (for TCP) or the application layer
3409 (for UDP) obtains MMS_S from the IP layer and does not send a
3410 datagram exceeding MMS_S in size.
3412 It is generally desirable to avoid local fragmentation and to
3413 choose EMTU_S low enough to avoid fragmentation in any gateway
3414 along the path. In the absence of actual knowledge of the
3415 minimum MTU along the path, the IP layer SHOULD use
3416 EMTU_S <= 576 whenever the destination address is not on a
3417 connected network, and otherwise use the connected network's
3421 Internet Engineering Task Force [Page 58]
3426 RFC1122 INTERNET LAYER October 1989
3431 The MTU of each physical interface MUST be configurable.
3433 A host IP layer implementation MAY have a configuration flag
3434 "All-Subnets-MTU", indicating that the MTU of the connected
3435 network is to be used for destinations on different subnets
3436 within the same network, but not for other networks. Thus,
3437 this flag causes the network class mask, rather than the subnet
3438 address mask, to be used to choose an EMTU_S. For a multihomed
3439 host, an "All-Subnets-MTU" flag is needed for each network
3443 Picking the correct datagram size to use when sending data
3444 is a complex topic [IP:9].
3446 (a) In general, no host is required to accept an IP
3447 datagram larger than 576 bytes (including header and
3448 data), so a host must not send a larger datagram
3449 without explicit knowledge or prior arrangement with
3450 the destination host. Thus, MMS_S is only an upper
3451 bound on the datagram size that a transport protocol
3452 may send; even when MMS_S exceeds 556, the transport
3453 layer must limit its messages to 556 bytes in the
3454 absence of other knowledge about the destination
3457 (b) Some transport protocols (e.g., TCP) provide a way to
3458 explicitly inform the sender about the largest
3459 datagram the other end can receive and reassemble
3460 [IP:7]. There is no corresponding mechanism in the
3463 A transport protocol that assumes an EMTU_R larger
3464 than 576 (see Section 3.3.2), can send a datagram of
3465 this larger size to another host that implements the
3468 (c) Hosts should ideally limit their EMTU_S for a given
3469 destination to the minimum MTU of all the networks
3470 along the path, to avoid any fragmentation. IP
3471 fragmentation, while formally correct, can create a
3472 serious transport protocol performance problem,
3473 because loss of a single fragment means all the
3474 fragments in the segment must be retransmitted
3480 Internet Engineering Task Force [Page 59]
3485 RFC1122 INTERNET LAYER October 1989
3488 Since nearly all networks in the Internet currently
3489 support an MTU of 576 or greater, we strongly recommend
3490 the use of 576 for datagrams sent to non-local networks.
3492 It has been suggested that a host could determine the MTU
3493 over a given path by sending a zero-offset datagram
3494 fragment and waiting for the receiver to time out the
3495 reassembly (which cannot complete!) and return an ICMP
3496 Time Exceeded message. This message would include the
3497 largest remaining fragment header in its body. More
3498 direct mechanisms are being experimented with, but have
3499 not yet been adopted (see e.g., RFC-1063).
3501 3.3.4 Local Multihoming
3503 3.3.4.1 Introduction
3505 A multihomed host has multiple IP addresses, which we may
3506 think of as "logical interfaces". These logical interfaces
3507 may be associated with one or more physical interfaces, and
3508 these physical interfaces may be connected to the same or
3511 Here are some important cases of multihoming:
3513 (a) Multiple Logical Networks
3515 The Internet architects envisioned that each physical
3516 network would have a single unique IP network (or
3517 subnet) number. However, LAN administrators have
3518 sometimes found it useful to violate this assumption,
3519 operating a LAN with multiple logical networks per
3520 physical connected network.
3522 If a host connected to such a physical network is
3523 configured to handle traffic for each of N different
3524 logical networks, then the host will have N logical
3525 interfaces. These could share a single physical
3526 interface, or might use N physical interfaces to the
3529 (b) Multiple Logical Hosts
3531 When a host has multiple IP addresses that all have the
3532 same <Network-number> part (and the same <Subnet-
3533 number> part, if any), the logical interfaces are known
3534 as "logical hosts". These logical interfaces might
3535 share a single physical interface or might use separate
3539 Internet Engineering Task Force [Page 60]
3544 RFC1122 INTERNET LAYER October 1989
3547 physical interfaces to the same physical network.
3549 (c) Simple Multihoming
3551 In this case, each logical interface is mapped into a
3552 separate physical interface and each physical interface
3553 is connected to a different physical network. The term
3554 "multihoming" was originally applied only to this case,
3555 but it is now applied more generally.
3557 A host with embedded gateway functionality will
3558 typically fall into the simple multihoming case. Note,
3559 however, that a host may be simply multihomed without
3560 containing an embedded gateway, i.e., without
3561 forwarding datagrams from one connected network to
3564 This case presents the most difficult routing problems.
3565 The choice of interface (i.e., the choice of first-hop
3566 network) may significantly affect performance or even
3567 reachability of remote parts of the Internet.
3570 Finally, we note another possibility that is NOT
3571 multihoming: one logical interface may be bound to multiple
3572 physical interfaces, in order to increase the reliability or
3573 throughput between directly connected machines by providing
3574 alternative physical paths between them. For instance, two
3575 systems might be connected by multiple point-to-point links.
3576 We call this "link-layer multiplexing". With link-layer
3577 multiplexing, the protocols above the link layer are unaware
3578 that multiple physical interfaces are present; the link-
3579 layer device driver is responsible for multiplexing and
3580 routing packets across the physical interfaces.
3582 In the Internet protocol architecture, a transport protocol
3583 instance ("entity") has no address of its own, but instead
3584 uses a single Internet Protocol (IP) address. This has
3585 implications for the IP, transport, and application layers,
3586 and for the interfaces between them. In particular, the
3587 application software may have to be aware of the multiple IP
3588 addresses of a multihomed host; in other cases, the choice
3589 can be made within the network software.
3591 3.3.4.2 Multihoming Requirements
3593 The following general rules apply to the selection of an IP
3594 source address for sending a datagram from a multihomed
3598 Internet Engineering Task Force [Page 61]
3603 RFC1122 INTERNET LAYER October 1989
3608 (1) If the datagram is sent in response to a received
3609 datagram, the source address for the response SHOULD be
3610 the specific-destination address of the request. See
3611 Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
3612 section of [INTRO:1] for more specific requirements on
3615 Otherwise, a source address must be selected.
3617 (2) An application MUST be able to explicitly specify the
3618 source address for initiating a connection or a
3621 (3) In the absence of such a specification, the networking
3622 software MUST choose a source address. Rules for this
3623 choice are described below.
3626 There are two key requirement issues related to multihoming:
3628 (A) A host MAY silently discard an incoming datagram whose
3629 destination address does not correspond to the physical
3630 interface through which it is received.
3632 (B) A host MAY restrict itself to sending (non-source-
3633 routed) IP datagrams only through the physical
3634 interface that corresponds to the IP source address of
3639 Internet host implementors have used two different
3640 conceptual models for multihoming, briefly summarized
3641 in the following discussion. This document takes no
3642 stand on which model is preferred; each seems to have a
3643 place. This ambivalence is reflected in the issues (A)
3644 and (B) being optional.
3648 The Strong ES (End System, i.e., host) model
3649 emphasizes the host/gateway (ES/IS) distinction,
3650 and would therefore substitute MUST for MAY in
3651 issues (A) and (B) above. It tends to model a
3652 multihomed host as a set of logical hosts within
3653 the same physical host.
3657 Internet Engineering Task Force [Page 62]
3662 RFC1122 INTERNET LAYER October 1989
3665 With respect to (A), proponents of the Strong ES
3666 model note that automatic Internet routing
3667 mechanisms could not route a datagram to a
3668 physical interface that did not correspond to the
3669 destination address.
3671 Under the Strong ES model, the route computation
3672 for an outgoing datagram is the mapping:
3674 route(src IP addr, dest IP addr, TOS)
3677 Here the source address is included as a parameter
3678 in order to select a gateway that is directly
3679 reachable on the corresponding physical interface.
3680 Note that this model logically requires that in
3681 general there be at least one default gateway, and
3682 preferably multiple defaults, for each IP source
3687 This view de-emphasizes the ES/IS distinction, and
3688 would therefore substitute MUST NOT for MAY in
3689 issues (A) and (B). This model may be the more
3690 natural one for hosts that wiretap gateway routing
3691 protocols, and is necessary for hosts that have
3692 embedded gateway functionality.
3694 The Weak ES Model may cause the Redirect mechanism
3695 to fail. If a datagram is sent out a physical
3696 interface that does not correspond to the
3697 destination address, the first-hop gateway will
3698 not realize when it needs to send a Redirect. On
3699 the other hand, if the host has embedded gateway
3700 functionality, then it has routing information
3701 without listening to Redirects.
3703 In the Weak ES model, the route computation for an
3704 outgoing datagram is the mapping:
3706 route(dest IP addr, TOS) -> gateway, interface
3716 Internet Engineering Task Force [Page 63]
3721 RFC1122 INTERNET LAYER October 1989
3724 3.3.4.3 Choosing a Source Address
3727 When it sends an initial connection request (e.g., a
3728 TCP "SYN" segment) or a datagram service request (e.g.,
3729 a UDP-based query), the transport layer on a multihomed
3730 host needs to know which source address to use. If the
3731 application does not specify it, the transport layer
3732 must ask the IP layer to perform the conceptual
3735 GET_SRCADDR(remote IP addr, TOS)
3738 Here TOS is the Type-of-Service value (see Section
3739 3.2.1.6), and the result is the desired source address.
3740 The following rules are suggested for implementing this
3743 (a) If the remote Internet address lies on one of the
3744 (sub-) nets to which the host is directly
3745 connected, a corresponding source address may be
3746 chosen, unless the corresponding interface is
3749 (b) The route cache may be consulted, to see if there
3750 is an active route to the specified destination
3751 network through any network interface; if so, a
3752 local IP address corresponding to that interface
3755 (c) The table of static routes, if any (see Section
3756 3.3.1.2) may be similarly consulted.
3758 (d) The default gateways may be consulted. If these
3759 gateways are assigned to different interfaces, the
3760 interface corresponding to the gateway with the
3761 highest preference may be chosen.
3763 In the future, there may be a defined way for a
3764 multihomed host to ask the gateways on all connected
3765 networks for advice about the best network to use for a
3769 It will be noted that this process is essentially the
3770 same as datagram routing (see Section 3.3.1), and
3771 therefore hosts may be able to combine the
3775 Internet Engineering Task Force [Page 64]
3780 RFC1122 INTERNET LAYER October 1989
3783 implementation of the two functions.
3785 3.3.5 Source Route Forwarding
3787 Subject to restrictions given below, a host MAY be able to act
3788 as an intermediate hop in a source route, forwarding a source-
3789 routed datagram to the next specified hop.
3791 However, in performing this gateway-like function, the host
3792 MUST obey all the relevant rules for a gateway forwarding
3793 source-routed datagrams [INTRO:2]. This includes the following
3794 specific provisions, which override the corresponding host
3795 provisions given earlier in this document:
3797 (A) TTL (ref. Section 3.2.1.7)
3799 The TTL field MUST be decremented and the datagram perhaps
3800 discarded as specified for a gateway in [INTRO:2].
3802 (B) ICMP Destination Unreachable (ref. Section 3.2.2.1)
3804 A host MUST be able to generate Destination Unreachable
3805 messages with the following codes:
3807 4 (Fragmentation Required but DF Set) when a source-
3808 routed datagram cannot be fragmented to fit into the
3811 5 (Source Route Failed) when a source-routed datagram
3812 cannot be forwarded, e.g., because of a routing
3813 problem or because the next hop of a strict source
3814 route is not on a connected network.
3816 (C) IP Source Address (ref. Section 3.2.1.3)
3818 A source-routed datagram being forwarded MAY (and normally
3819 will) have a source address that is not one of the IP
3820 addresses of the forwarding host.
3822 (D) Record Route Option (ref. Section 3.2.1.8d)
3824 A host that is forwarding a source-routed datagram
3825 containing a Record Route option MUST update that option,
3828 (E) Timestamp Option (ref. Section 3.2.1.8e)
3830 A host that is forwarding a source-routed datagram
3834 Internet Engineering Task Force [Page 65]
3839 RFC1122 INTERNET LAYER October 1989
3842 containing a Timestamp Option MUST add the current
3843 timestamp to that option, according to the rules for this
3846 To define the rules restricting host forwarding of source-
3847 routed datagrams, we use the term "local source-routing" if the
3848 next hop will be through the same physical interface through
3849 which the datagram arrived; otherwise, it is "non-local
3852 o A host is permitted to perform local source-routing
3853 without restriction.
3855 o A host that supports non-local source-routing MUST have a
3856 configurable switch to disable forwarding, and this switch
3857 MUST default to disabled.
3859 o The host MUST satisfy all gateway requirements for
3860 configurable policy filters [INTRO:2] restricting non-
3863 If a host receives a datagram with an incomplete source route
3864 but does not forward it for some reason, the host SHOULD return
3865 an ICMP Destination Unreachable (code 5, Source Route Failed)
3866 message, unless the datagram was itself an ICMP error message.
3870 Section 3.2.1.3 defined the four standard IP broadcast address
3873 Limited Broadcast: {-1, -1}
3875 Directed Broadcast: {<Network-number>,-1}
3877 Subnet Directed Broadcast:
3878 {<Network-number>,<Subnet-number>,-1}
3880 All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
3882 A host MUST recognize any of these forms in the destination
3883 address of an incoming datagram.
3885 There is a class of hosts* that use non-standard broadcast
3886 address forms, substituting 0 for -1. All hosts SHOULD
3887 _________________________
3888 *4.2BSD Unix and its derivatives, but not 4.3BSD.
3893 Internet Engineering Task Force [Page 66]
3898 RFC1122 INTERNET LAYER October 1989
3901 recognize and accept any of these non-standard broadcast
3902 addresses as the destination address of an incoming datagram.
3903 A host MAY optionally have a configuration option to choose the
3904 0 or the -1 form of broadcast address, for each physical
3905 interface, but this option SHOULD default to the standard (-1)
3908 When a host sends a datagram to a link-layer broadcast address,
3909 the IP destination address MUST be a legal IP broadcast or IP
3912 A host SHOULD silently discard a datagram that is received via
3913 a link-layer broadcast (see Section 2.4) but does not specify
3914 an IP multicast or broadcast destination address.
3916 Hosts SHOULD use the Limited Broadcast address to broadcast to
3917 a connected network.
3921 Using the Limited Broadcast address instead of a Directed
3922 Broadcast address may improve system robustness. Problems
3923 are often caused by machines that do not understand the
3924 plethora of broadcast addresses (see Section 3.2.1.3), or
3925 that may have different ideas about which broadcast
3926 addresses are in use. The prime example of the latter is
3927 machines that do not understand subnetting but are
3928 attached to a subnetted net. Sending a Subnet Broadcast
3929 for the connected network will confuse those machines,
3930 which will see it as a message to some other host.
3932 There has been discussion on whether a datagram addressed
3933 to the Limited Broadcast address ought to be sent from all
3934 the interfaces of a multihomed host. This specification
3935 takes no stand on the issue.
3937 3.3.7 IP Multicasting
3939 A host SHOULD support local IP multicasting on all connected
3940 networks for which a mapping from Class D IP addresses to
3941 link-layer addresses has been specified (see below). Support
3942 for local IP multicasting includes sending multicast datagrams,
3943 joining multicast groups and receiving multicast datagrams, and
3944 leaving multicast groups. This implies support for all of
3945 [IP:4] except the IGMP protocol itself, which is OPTIONAL.
3952 Internet Engineering Task Force [Page 67]
3957 RFC1122 INTERNET LAYER October 1989
3961 IGMP provides gateways that are capable of multicast
3962 routing with the information required to support IP
3963 multicasting across multiple networks. At this time,
3964 multicast-routing gateways are in the experimental stage
3965 and are not widely available. For hosts that are not
3966 connected to networks with multicast-routing gateways or
3967 that do not need to receive multicast datagrams
3968 originating on other networks, IGMP serves no purpose and
3969 is therefore optional for now. However, the rest of
3970 [IP:4] is currently recommended for the purpose of
3971 providing IP-layer access to local network multicast
3972 addressing, as a preferable alternative to local broadcast
3973 addressing. It is expected that IGMP will become
3974 recommended at some future date, when multicast-routing
3975 gateways have become more widely available.
3977 If IGMP is not implemented, a host SHOULD still join the "all-
3978 hosts" group (224.0.0.1) when the IP layer is initialized and
3979 remain a member for as long as the IP layer is active.
3982 Joining the "all-hosts" group will support strictly local
3983 uses of multicasting, e.g., a gateway discovery protocol,
3984 even if IGMP is not implemented.
3986 The mapping of IP Class D addresses to local addresses is
3987 currently specified for the following types of networks:
3989 o Ethernet/IEEE 802.3, as defined in [IP:4].
3991 o Any network that supports broadcast but not multicast,
3992 addressing: all IP Class D addresses map to the local
3995 o Any type of point-to-point link (e.g., SLIP or HDLC
3996 links): no mapping required. All IP multicast datagrams
3997 are sent as-is, inside the local framing.
3999 Mappings for other types of networks will be specified in the
4002 A host SHOULD provide a way for higher-layer protocols or
4003 applications to determine which of the host's connected
4004 network(s) support IP multicast addressing.
4011 Internet Engineering Task Force [Page 68]
4016 RFC1122 INTERNET LAYER October 1989
4019 3.3.8 Error Reporting
4021 Wherever practical, hosts MUST return ICMP error datagrams on
4022 detection of an error, except in those cases where returning an
4023 ICMP error message is specifically prohibited.
4026 A common phenomenon in datagram networks is the "black
4027 hole disease": datagrams are sent out, but nothing comes
4028 back. Without any error datagrams, it is difficult for
4029 the user to figure out what the problem is.
4031 3.4 INTERNET/TRANSPORT LAYER INTERFACE
4033 The interface between the IP layer and the transport layer MUST
4034 provide full access to all the mechanisms of the IP layer,
4035 including options, Type-of-Service, and Time-to-Live. The
4036 transport layer MUST either have mechanisms to set these interface
4037 parameters, or provide a path to pass them through from an
4038 application, or both.
4041 Applications are urged to make use of these mechanisms where
4042 applicable, even when the mechanisms are not currently
4043 effective in the Internet (e.g., TOS). This will allow these
4044 mechanisms to be immediately useful when they do become
4045 effective, without a large amount of retrofitting of host
4048 We now describe a conceptual interface between the transport layer
4049 and the IP layer, as a set of procedure calls. This is an
4050 extension of the information in Section 3.3 of RFC-791 [IP:1].
4055 SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
4058 where the parameters are defined in RFC-791. Passing an Id
4059 parameter is optional; see Section 3.2.1.5.
4065 => result, src, dst, SpecDest, TOS, len, opt)
4070 Internet Engineering Task Force [Page 69]
4075 RFC1122 INTERNET LAYER October 1989
4078 All the parameters are defined in RFC-791, except for:
4080 SpecDest = specific-destination address of datagram
4081 (defined in Section 3.2.1.3)
4083 The result parameter dst contains the datagram's destination
4084 address. Since this may be a broadcast or multicast address,
4085 the SpecDest parameter (not shown in RFC-791) MUST be passed.
4086 The parameter opt contains all the IP options received in the
4087 datagram; these MUST also be passed to the transport layer.
4090 * Select Source Address
4092 GET_SRCADDR(remote, TOS) -> local
4094 remote = remote IP address
4095 TOS = Type-of-Service
4096 local = local IP address
4098 See Section 3.3.4.3.
4101 * Find Maximum Datagram Sizes
4103 GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
4105 MMS_R = maximum receive transport-message size.
4106 MMS_S = maximum send transport-message size.
4107 (local, remote, TOS defined above)
4109 See Sections 3.3.2 and 3.3.3.
4112 * Advice on Delivery Success
4114 ADVISE_DELIVPROB(sense, local, remote, TOS)
4116 Here the parameter sense is a 1-bit flag indicating whether
4117 positive or negative advice is being given; see the
4118 discussion in Section 3.3.1.4. The other parameters were
4124 SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
4129 Internet Engineering Task Force [Page 70]
4134 RFC1122 INTERNET LAYER October 1989
4137 (Parameters defined in RFC-791).
4139 Passing an Id parameter is optional; see Section 3.2.1.5.
4140 The transport layer MUST be able to send certain ICMP
4141 messages: Port Unreachable or any of the query-type
4142 messages. This function could be considered to be a special
4143 case of the SEND() call, of course; we describe it separately
4147 * Receive ICMP Message
4149 RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
4151 (Parameters defined in RFC-791).
4153 The IP layer MUST pass certain ICMP messages up to the
4154 appropriate transport-layer routine. This function could be
4155 considered to be a special case of the RECV() call, of
4156 course; we describe it separately for clarity.
4158 For an ICMP error message, the data that is passed up MUST
4159 include the original Internet header plus all the octets of
4160 the original message that are included in the ICMP message.
4161 This data will be used by the transport layer to locate the
4162 connection state information, if any.
4164 In particular, the following ICMP messages are to be passed
4167 o Destination Unreachable
4171 o Echo Reply (to ICMP user interface, unless the Echo
4172 Request originated in the IP layer)
4174 o Timestamp Reply (to ICMP user interface)
4180 In the future, there may be additions to this interface to
4181 pass path data (see Section 3.3.1.3) between the IP and
4188 Internet Engineering Task Force [Page 71]
4193 RFC1122 INTERNET LAYER October 1989
4196 3.5 INTERNET LAYER REQUIREMENTS SUMMARY
4208 FEATURE |SECTION | | | |T|T|e
4209 -------------------------------------------------|--------|-|-|-|-|-|--
4211 Implement IP and ICMP |3.1 |x| | | | |
4212 Handle remote multihoming in application layer |3.1 |x| | | | |
4213 Support local multihoming |3.1 | | |x| | |
4214 Meet gateway specs if forward datagrams |3.1 |x| | | | |
4215 Configuration switch for embedded gateway |3.1 |x| | | | |1
4216 Config switch default to non-gateway |3.1 |x| | | | |1
4217 Auto-config based on number of interfaces |3.1 | | | | |x|1
4218 Able to log discarded datagrams |3.1 | |x| | | |
4219 Record in counter |3.1 | |x| | | |
4221 Silently discard Version != 4 |3.2.1.1 |x| | | | |
4222 Verify IP checksum, silently discard bad dgram |3.2.1.2 |x| | | | |
4223 Addressing: | | | | | | |
4224 Subnet addressing (RFC-950) |3.2.1.3 |x| | | | |
4225 Src address must be host's own IP address |3.2.1.3 |x| | | | |
4226 Silently discard datagram with bad dest addr |3.2.1.3 |x| | | | |
4227 Silently discard datagram with bad src addr |3.2.1.3 |x| | | | |
4228 Support reassembly |3.2.1.4 |x| | | | |
4229 Retain same Id field in identical datagram |3.2.1.5 | | |x| | |
4232 Allow transport layer to set TOS |3.2.1.6 |x| | | | |
4233 Pass received TOS up to transport layer |3.2.1.6 | |x| | | |
4234 Use RFC-795 link-layer mappings for TOS |3.2.1.6 | | | |x| |
4236 Send packet with TTL of 0 |3.2.1.7 | | | | |x|
4237 Discard received packets with TTL < 2 |3.2.1.7 | | | | |x|
4238 Allow transport layer to set TTL |3.2.1.7 |x| | | | |
4239 Fixed TTL is configurable |3.2.1.7 |x| | | | |
4241 IP Options: | | | | | | |
4242 Allow transport layer to send IP options |3.2.1.8 |x| | | | |
4243 Pass all IP options rcvd to higher layer |3.2.1.8 |x| | | | |
4247 Internet Engineering Task Force [Page 72]
4252 RFC1122 INTERNET LAYER October 1989
4255 IP layer silently ignore unknown options |3.2.1.8 |x| | | | |
4256 Security option |3.2.1.8a| | |x| | |
4257 Send Stream Identifier option |3.2.1.8b| | | |x| |
4258 Silently ignore Stream Identifer option |3.2.1.8b|x| | | | |
4259 Record Route option |3.2.1.8d| | |x| | |
4260 Timestamp option |3.2.1.8e| | |x| | |
4261 Source Route Option: | | | | | | |
4262 Originate & terminate Source Route options |3.2.1.8c|x| | | | |
4263 Datagram with completed SR passed up to TL |3.2.1.8c|x| | | | |
4264 Build correct (non-redundant) return route |3.2.1.8c|x| | | | |
4265 Send multiple SR options in one header |3.2.1.8c| | | | |x|
4268 Silently discard ICMP msg with unknown type |3.2.2 |x| | | | |
4269 Include more than 8 octets of orig datagram |3.2.2 | | |x| | |
4270 Included octets same as received |3.2.2 |x| | | | |
4271 Demux ICMP Error to transport protocol |3.2.2 |x| | | | |
4272 Send ICMP error message with TOS=0 |3.2.2 | |x| | | |
4273 Send ICMP error message for: | | | | | | |
4274 - ICMP error msg |3.2.2 | | | | |x|
4275 - IP b'cast or IP m'cast |3.2.2 | | | | |x|
4276 - Link-layer b'cast |3.2.2 | | | | |x|
4277 - Non-initial fragment |3.2.2 | | | | |x|
4278 - Datagram with non-unique src address |3.2.2 | | | | |x|
4279 Return ICMP error msgs (when not prohibited) |3.3.8 |x| | | | |
4281 Dest Unreachable: | | | | | | |
4282 Generate Dest Unreachable (code 2/3) |3.2.2.1 | |x| | | |
4283 Pass ICMP Dest Unreachable to higher layer |3.2.2.1 |x| | | | |
4284 Higher layer act on Dest Unreach |3.2.2.1 | |x| | | |
4285 Interpret Dest Unreach as only hint |3.2.2.1 |x| | | | |
4286 Redirect: | | | | | | |
4287 Host send Redirect |3.2.2.2 | | | |x| |
4288 Update route cache when recv Redirect |3.2.2.2 |x| | | | |
4289 Handle both Host and Net Redirects |3.2.2.2 |x| | | | |
4290 Discard illegal Redirect |3.2.2.2 | |x| | | |
4291 Source Quench: | | | | | | |
4292 Send Source Quench if buffering exceeded |3.2.2.3 | | |x| | |
4293 Pass Source Quench to higher layer |3.2.2.3 |x| | | | |
4294 Higher layer act on Source Quench |3.2.2.3 | |x| | | |
4295 Time Exceeded: pass to higher layer |3.2.2.4 |x| | | | |
4296 Parameter Problem: | | | | | | |
4297 Send Parameter Problem messages |3.2.2.5 | |x| | | |
4298 Pass Parameter Problem to higher layer |3.2.2.5 |x| | | | |
4299 Report Parameter Problem to user |3.2.2.5 | | |x| | |
4301 ICMP Echo Request or Reply: | | | | | | |
4302 Echo server and Echo client |3.2.2.6 |x| | | | |
4306 Internet Engineering Task Force [Page 73]
4311 RFC1122 INTERNET LAYER October 1989
4314 Echo client |3.2.2.6 | |x| | | |
4315 Discard Echo Request to broadcast address |3.2.2.6 | | |x| | |
4316 Discard Echo Request to multicast address |3.2.2.6 | | |x| | |
4317 Use specific-dest addr as Echo Reply src |3.2.2.6 |x| | | | |
4318 Send same data in Echo Reply |3.2.2.6 |x| | | | |
4319 Pass Echo Reply to higher layer |3.2.2.6 |x| | | | |
4320 Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |
4321 Reverse and reflect Source Route option |3.2.2.6 |x| | | | |
4323 ICMP Information Request or Reply: |3.2.2.7 | | | |x| |
4324 ICMP Timestamp and Timestamp Reply: |3.2.2.8 | | |x| | |
4325 Minimize delay variability |3.2.2.8 | |x| | | |1
4326 Silently discard b'cast Timestamp |3.2.2.8 | | |x| | |1
4327 Silently discard m'cast Timestamp |3.2.2.8 | | |x| | |1
4328 Use specific-dest addr as TS Reply src |3.2.2.8 |x| | | | |1
4329 Reflect Record Route, Time Stamp options |3.2.2.6 | |x| | | |1
4330 Reverse and reflect Source Route option |3.2.2.8 |x| | | | |1
4331 Pass Timestamp Reply to higher layer |3.2.2.8 |x| | | | |1
4332 Obey rules for "standard value" |3.2.2.8 |x| | | | |1
4334 ICMP Address Mask Request and Reply: | | | | | | |
4335 Addr Mask source configurable |3.2.2.9 |x| | | | |
4336 Support static configuration of addr mask |3.2.2.9 |x| | | | |
4337 Get addr mask dynamically during booting |3.2.2.9 | | |x| | |
4338 Get addr via ICMP Addr Mask Request/Reply |3.2.2.9 | | |x| | |
4339 Retransmit Addr Mask Req if no Reply |3.2.2.9 |x| | | | |3
4340 Assume default mask if no Reply |3.2.2.9 | |x| | | |3
4341 Update address mask from first Reply only |3.2.2.9 |x| | | | |3
4342 Reasonableness check on Addr Mask |3.2.2.9 | |x| | | |
4343 Send unauthorized Addr Mask Reply msgs |3.2.2.9 | | | | |x|
4344 Explicitly configured to be agent |3.2.2.9 |x| | | | |
4345 Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | |
4346 Broadcast Addr Mask Reply when init. |3.2.2.9 |x| | | | |3
4348 ROUTING OUTBOUND DATAGRAMS: | | | | | | |
4349 Use address mask in local/remote decision |3.3.1.1 |x| | | | |
4350 Operate with no gateways on conn network |3.3.1.1 |x| | | | |
4351 Maintain "route cache" of next-hop gateways |3.3.1.2 |x| | | | |
4352 Treat Host and Net Redirect the same |3.3.1.2 | |x| | | |
4353 If no cache entry, use default gateway |3.3.1.2 |x| | | | |
4354 Support multiple default gateways |3.3.1.2 |x| | | | |
4355 Provide table of static routes |3.3.1.2 | | |x| | |
4356 Flag: route overridable by Redirects |3.3.1.2 | | |x| | |
4357 Key route cache on host, not net address |3.3.1.3 | | |x| | |
4358 Include TOS in route cache |3.3.1.3 | |x| | | |
4360 Able to detect failure of next-hop gateway |3.3.1.4 |x| | | | |
4361 Assume route is good forever |3.3.1.4 | | | |x| |
4365 Internet Engineering Task Force [Page 74]
4370 RFC1122 INTERNET LAYER October 1989
4373 Ping gateways continuously |3.3.1.4 | | | | |x|
4374 Ping only when traffic being sent |3.3.1.4 |x| | | | |
4375 Ping only when no positive indication |3.3.1.4 |x| | | | |
4376 Higher and lower layers give advice |3.3.1.4 | |x| | | |
4377 Switch from failed default g'way to another |3.3.1.5 |x| | | | |
4378 Manual method of entering config info |3.3.1.6 |x| | | | |
4380 REASSEMBLY and FRAGMENTATION: | | | | | | |
4381 Able to reassemble incoming datagrams |3.3.2 |x| | | | |
4382 At least 576 byte datagrams |3.3.2 |x| | | | |
4383 EMTU_R configurable or indefinite |3.3.2 | |x| | | |
4384 Transport layer able to learn MMS_R |3.3.2 |x| | | | |
4385 Send ICMP Time Exceeded on reassembly timeout |3.3.2 |x| | | | |
4386 Fixed reassembly timeout value |3.3.2 | |x| | | |
4388 Pass MMS_S to higher layers |3.3.3 |x| | | | |
4389 Local fragmentation of outgoing packets |3.3.3 | | |x| | |
4390 Else don't send bigger than MMS_S |3.3.3 |x| | | | |
4391 Send max 576 to off-net destination |3.3.3 | |x| | | |
4392 All-Subnets-MTU configuration flag |3.3.3 | | |x| | |
4394 MULTIHOMING: | | | | | | |
4395 Reply with same addr as spec-dest addr |3.3.4.2 | |x| | | |
4396 Allow application to choose local IP addr |3.3.4.2 |x| | | | |
4397 Silently discard d'gram in "wrong" interface |3.3.4.2 | | |x| | |
4398 Only send d'gram through "right" interface |3.3.4.2 | | |x| | |4
4400 SOURCE-ROUTE FORWARDING: | | | | | | |
4401 Forward datagram with Source Route option |3.3.5 | | |x| | |1
4402 Obey corresponding gateway rules |3.3.5 |x| | | | |1
4403 Update TTL by gateway rules |3.3.5 |x| | | | |1
4404 Able to generate ICMP err code 4, 5 |3.3.5 |x| | | | |1
4405 IP src addr not local host |3.3.5 | | |x| | |1
4406 Update Timestamp, Record Route options |3.3.5 |x| | | | |1
4407 Configurable switch for non-local SRing |3.3.5 |x| | | | |1
4408 Defaults to OFF |3.3.5 |x| | | | |1
4409 Satisfy gwy access rules for non-local SRing |3.3.5 |x| | | | |1
4410 If not forward, send Dest Unreach (cd 5) |3.3.5 | |x| | | |2
4412 BROADCAST: | | | | | | |
4413 Broadcast addr as IP source addr |3.2.1.3 | | | | |x|
4414 Receive 0 or -1 broadcast formats OK |3.3.6 | |x| | | |
4415 Config'ble option to send 0 or -1 b'cast |3.3.6 | | |x| | |
4416 Default to -1 broadcast |3.3.6 | |x| | | |
4417 Recognize all broadcast address formats |3.3.6 |x| | | | |
4418 Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6 |x| | | | |
4419 Silently discard link-layer-only b'cast dg's |3.3.6 | |x| | | |
4420 Use Limited Broadcast addr for connected net |3.3.6 | |x| | | |
4424 Internet Engineering Task Force [Page 75]
4429 RFC1122 INTERNET LAYER October 1989
4433 MULTICAST: | | | | | | |
4434 Support local IP multicasting (RFC-1112) |3.3.7 | |x| | | |
4435 Support IGMP (RFC-1112) |3.3.7 | | |x| | |
4436 Join all-hosts group at startup |3.3.7 | |x| | | |
4437 Higher layers learn i'face m'cast capability |3.3.7 | |x| | | |
4439 INTERFACE: | | | | | | |
4440 Allow transport layer to use all IP mechanisms |3.4 |x| | | | |
4441 Pass interface ident up to transport layer |3.4 |x| | | | |
4442 Pass all IP options up to transport layer |3.4 |x| | | | |
4443 Transport layer can send certain ICMP messages |3.4 |x| | | | |
4444 Pass spec'd ICMP messages up to transp. layer |3.4 |x| | | | |
4445 Include IP hdr+8 octets or more from orig. |3.4 |x| | | | |
4446 Able to leap tall buildings at a single bound |3.5 | |x| | | |
4450 (1) Only if feature is implemented.
4452 (2) This requirement is overruled if datagram is an ICMP error message.
4454 (3) Only if feature is implemented and is configured "on".
4456 (4) Unless has embedded gateway functionality or is source routed.
4483 Internet Engineering Task Force [Page 76]
4488 RFC1122 TRANSPORT LAYER -- UDP October 1989
4491 4. TRANSPORT PROTOCOLS
4493 4.1 USER DATAGRAM PROTOCOL -- UDP
4497 The User Datagram Protocol UDP [UDP:1] offers only a minimal
4498 transport service -- non-guaranteed datagram delivery -- and
4499 gives applications direct access to the datagram service of the
4500 IP layer. UDP is used by applications that do not require the
4501 level of service of TCP or that wish to use communications
4502 services (e.g., multicast or broadcast delivery) not available
4505 UDP is almost a null protocol; the only services it provides
4506 over IP are checksumming of data and multiplexing by port
4507 number. Therefore, an application program running over UDP
4508 must deal directly with end-to-end communication problems that
4509 a connection-oriented protocol would have handled -- e.g.,
4510 retransmission for reliable delivery, packetization and
4511 reassembly, flow control, congestion avoidance, etc., when
4512 these are required. The fairly complex coupling between IP and
4513 TCP will be mirrored in the coupling between UDP and many
4514 applications using UDP.
4516 4.1.2 PROTOCOL WALK-THROUGH
4518 There are no known errors in the specification of UDP.
4520 4.1.3 SPECIFIC ISSUES
4524 UDP well-known ports follow the same rules as TCP well-known
4525 ports; see Section 4.2.2.1 below.
4527 If a datagram arrives addressed to a UDP port for which
4528 there is no pending LISTEN call, UDP SHOULD send an ICMP
4529 Port Unreachable message.
4533 UDP MUST pass any IP option that it receives from the IP
4534 layer transparently to the application layer.
4536 An application MUST be able to specify IP options to be sent
4537 in its UDP datagrams, and UDP MUST pass these options to the
4542 Internet Engineering Task Force [Page 77]
4547 RFC1122 TRANSPORT LAYER -- UDP October 1989
4551 At present, the only options that need be passed
4552 through UDP are Source Route, Record Route, and Time
4553 Stamp. However, new options may be defined in the
4554 future, and UDP need not and should not make any
4555 assumptions about the format or content of options it
4556 passes to or from the application; an exception to this
4557 might be an IP-layer security option.
4559 An application based on UDP will need to obtain a
4560 source route from a request datagram and supply a
4561 reversed route for sending the corresponding reply.
4563 4.1.3.3 ICMP Messages
4565 UDP MUST pass to the application layer all ICMP error
4566 messages that it receives from the IP layer. Conceptually
4567 at least, this may be accomplished with an upcall to the
4568 ERROR_REPORT routine (see Section 4.2.4.1).
4571 Note that ICMP error messages resulting from sending a
4572 UDP datagram are received asynchronously. A UDP-based
4573 application that wants to receive ICMP error messages
4574 is responsible for maintaining the state necessary to
4575 demultiplex these messages when they arrive; for
4576 example, the application may keep a pending receive
4577 operation for this purpose. The application is also
4578 responsible to avoid confusion from a delayed ICMP
4579 error message resulting from an earlier use of the same
4582 4.1.3.4 UDP Checksums
4584 A host MUST implement the facility to generate and validate
4585 UDP checksums. An application MAY optionally be able to
4586 control whether a UDP checksum will be generated, but it
4587 MUST default to checksumming on.
4589 If a UDP datagram is received with a checksum that is non-
4590 zero and invalid, UDP MUST silently discard the datagram.
4591 An application MAY optionally be able to control whether UDP
4592 datagrams without checksums should be discarded or passed to
4596 Some applications that normally run only across local
4597 area networks have chosen to turn off UDP checksums for
4601 Internet Engineering Task Force [Page 78]
4606 RFC1122 TRANSPORT LAYER -- UDP October 1989
4609 efficiency. As a result, numerous cases of undetected
4610 errors have been reported. The advisability of ever
4611 turning off UDP checksumming is very controversial.
4614 There is a common implementation error in UDP
4615 checksums. Unlike the TCP checksum, the UDP checksum
4616 is optional; the value zero is transmitted in the
4617 checksum field of a UDP header to indicate the absence
4618 of a checksum. If the transmitter really calculates a
4619 UDP checksum of zero, it must transmit the checksum as
4620 all 1's (65535). No special action is required at the
4621 receiver, since zero and 65535 are equivalent in 1's
4622 complement arithmetic.
4624 4.1.3.5 UDP Multihoming
4626 When a UDP datagram is received, its specific-destination
4627 address MUST be passed up to the application layer.
4629 An application program MUST be able to specify the IP source
4630 address to be used for sending a UDP datagram or to leave it
4631 unspecified (in which case the networking software will
4632 choose an appropriate source address). There SHOULD be a
4633 way to communicate the chosen source address up to the
4634 application layer (e.g, so that the application can later
4635 receive a reply datagram only from the corresponding
4639 A request/response application that uses UDP should use
4640 a source address for the response that is the same as
4641 the specific destination address of the request. See
4642 the "General Issues" section of [INTRO:1].
4644 4.1.3.6 Invalid Addresses
4646 A UDP datagram received with an invalid IP source address
4647 (e.g., a broadcast or multicast address) must be discarded
4648 by UDP or by the IP layer (see Section 3.2.1.3).
4650 When a host sends a UDP datagram, the source address MUST be
4651 (one of) the IP address(es) of the host.
4653 4.1.4 UDP/APPLICATION LAYER INTERFACE
4655 The application interface to UDP MUST provide the full services
4656 of the IP/transport interface described in Section 3.4 of this
4660 Internet Engineering Task Force [Page 79]
4665 RFC1122 TRANSPORT LAYER -- UDP October 1989
4668 document. Thus, an application using UDP needs the functions
4669 of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and
4670 RECV_ICMP() calls described in Section 3.4. For example,
4671 GET_MAXSIZES() can be used to learn the effective maximum UDP
4672 maximum datagram size for a particular {interface,remote
4675 An application-layer program MUST be able to set the TTL and
4676 TOS values as well as IP options for sending a UDP datagram,
4677 and these values must be passed transparently to the IP layer.
4678 UDP MAY pass the received TOS up to the application layer.
4680 4.1.5 UDP REQUIREMENTS SUMMARY
4692 FEATURE |SECTION | | | |T|T|e
4693 -------------------------------------------------|--------|-|-|-|-|-|--
4696 -------------------------------------------------|--------|-|-|-|-|-|--
4698 UDP send Port Unreachable |4.1.3.1 | |x| | | |
4700 IP Options in UDP | | | | | | |
4701 - Pass rcv'd IP options to applic layer |4.1.3.2 |x| | | | |
4702 - Applic layer can specify IP options in Send |4.1.3.2 |x| | | | |
4703 - UDP passes IP options down to IP layer |4.1.3.2 |x| | | | |
4705 Pass ICMP msgs up to applic layer |4.1.3.3 |x| | | | |
4707 UDP checksums: | | | | | | |
4708 - Able to generate/check checksum |4.1.3.4 |x| | | | |
4709 - Silently discard bad checksum |4.1.3.4 |x| | | | |
4710 - Sender Option to not generate checksum |4.1.3.4 | | |x| | |
4711 - Default is to checksum |4.1.3.4 |x| | | | |
4712 - Receiver Option to require checksum |4.1.3.4 | | |x| | |
4714 UDP Multihoming | | | | | | |
4715 - Pass spec-dest addr to application |4.1.3.5 |x| | | | |
4719 Internet Engineering Task Force [Page 80]
4724 RFC1122 TRANSPORT LAYER -- UDP October 1989
4727 - Applic layer can specify Local IP addr |4.1.3.5 |x| | | | |
4728 - Applic layer specify wild Local IP addr |4.1.3.5 |x| | | | |
4729 - Applic layer notified of Local IP addr used |4.1.3.5 | |x| | | |
4731 Bad IP src addr silently discarded by UDP/IP |4.1.3.6 |x| | | | |
4732 Only send valid IP source address |4.1.3.6 |x| | | | |
4733 UDP Application Interface Services | | | | | | |
4734 Full IP interface of 3.4 for application |4.1.4 |x| | | | |
4735 - Able to spec TTL, TOS, IP opts when send dg |4.1.4 |x| | | | |
4736 - Pass received TOS up to applic layer |4.1.4 | | |x| | |
4778 Internet Engineering Task Force [Page 81]
4783 RFC1122 TRANSPORT LAYER -- TCP October 1989
4786 4.2 TRANSMISSION CONTROL PROTOCOL -- TCP
4790 The Transmission Control Protocol TCP [TCP:1] is the primary
4791 virtual-circuit transport protocol for the Internet suite. TCP
4792 provides reliable, in-sequence delivery of a full-duplex stream
4793 of octets (8-bit bytes). TCP is used by those applications
4794 needing reliable, connection-oriented transport service, e.g.,
4795 mail (SMTP), file transfer (FTP), and virtual terminal service
4796 (Telnet); requirements for these application-layer protocols
4797 are described in [INTRO:1].
4799 4.2.2 PROTOCOL WALK-THROUGH
4801 4.2.2.1 Well-Known Ports: RFC-793 Section 2.7
4804 TCP reserves port numbers in the range 0-255 for
4805 "well-known" ports, used to access services that are
4806 standardized across the Internet. The remainder of the
4807 port space can be freely allocated to application
4808 processes. Current well-known port definitions are
4809 listed in the RFC entitled "Assigned Numbers"
4810 [INTRO:6]. A prerequisite for defining a new well-
4811 known port is an RFC documenting the proposed service
4812 in enough detail to allow new implementations.
4814 Some systems extend this notion by adding a third
4815 subdivision of the TCP port space: reserved ports,
4816 which are generally used for operating-system-specific
4817 services. For example, reserved ports might fall
4818 between 256 and some system-dependent upper limit.
4819 Some systems further choose to protect well-known and
4820 reserved ports by permitting only privileged users to
4821 open TCP connections with those port values. This is
4822 perfectly reasonable as long as the host does not
4823 assume that all hosts protect their low-numbered ports
4826 4.2.2.2 Use of Push: RFC-793 Section 2.8
4828 When an application issues a series of SEND calls without
4829 setting the PUSH flag, the TCP MAY aggregate the data
4830 internally without sending it. Similarly, when a series of
4831 segments is received without the PSH bit, a TCP MAY queue
4832 the data internally without passing it to the receiving
4837 Internet Engineering Task Force [Page 82]
4842 RFC1122 TRANSPORT LAYER -- TCP October 1989
4845 The PSH bit is not a record marker and is independent of
4846 segment boundaries. The transmitter SHOULD collapse
4847 successive PSH bits when it packetizes data, to send the
4848 largest possible segment.
4850 A TCP MAY implement PUSH flags on SEND calls. If PUSH flags
4851 are not implemented, then the sending TCP: (1) must not
4852 buffer data indefinitely, and (2) MUST set the PSH bit in
4853 the last buffered segment (i.e., when there is no more
4854 queued data to be sent).
4856 The discussion in RFC-793 on pages 48, 50, and 74
4857 erroneously implies that a received PSH flag must be passed
4858 to the application layer. Passing a received PSH flag to
4859 the application layer is now OPTIONAL.
4861 An application program is logically required to set the PUSH
4862 flag in a SEND call whenever it needs to force delivery of
4863 the data to avoid a communication deadlock. However, a TCP
4864 SHOULD send a maximum-sized segment whenever possible, to
4865 improve performance (see Section 4.2.3.4).
4868 When the PUSH flag is not implemented on SEND calls,
4869 i.e., when the application/TCP interface uses a pure
4870 streaming model, responsibility for aggregating any
4871 tiny data fragments to form reasonable sized segments
4872 is partially borne by the application layer.
4874 Generally, an interactive application protocol must set
4875 the PUSH flag at least in the last SEND call in each
4876 command or response sequence. A bulk transfer protocol
4877 like FTP should set the PUSH flag on the last segment
4878 of a file or when necessary to prevent buffer deadlock.
4880 At the receiver, the PSH bit forces buffered data to be
4881 delivered to the application (even if less than a full
4882 buffer has been received). Conversely, the lack of a
4883 PSH bit can be used to avoid unnecessary wakeup calls
4884 to the application process; this can be an important
4885 performance optimization for large timesharing hosts.
4886 Passing the PSH bit to the receiving application allows
4887 an analogous optimization within the application.
4889 4.2.2.3 Window Size: RFC-793 Section 3.1
4891 The window size MUST be treated as an unsigned number, or
4892 else large window sizes will appear like negative windows
4896 Internet Engineering Task Force [Page 83]
4901 RFC1122 TRANSPORT LAYER -- TCP October 1989
4904 and TCP will not work. It is RECOMMENDED that
4905 implementations reserve 32-bit fields for the send and
4906 receive window sizes in the connection record and do all
4907 window computations with 32 bits.
4910 It is known that the window field in the TCP header is
4911 too small for high-speed, long-delay paths.
4912 Experimental TCP options have been defined to extend
4913 the window size; see for example [TCP:11]. In
4914 anticipation of the adoption of such an extension, TCP
4915 implementors should treat windows as 32 bits.
4917 4.2.2.4 Urgent Pointer: RFC-793 Section 3.1
4919 The second sentence is in error: the urgent pointer points
4920 to the sequence number of the LAST octet (not LAST+1) in a
4921 sequence of urgent data. The description on page 56 (last
4922 sentence) is correct.
4924 A TCP MUST support a sequence of urgent data of any length.
4926 A TCP MUST inform the application layer asynchronously
4927 whenever it receives an Urgent pointer and there was
4928 previously no pending urgent data, or whenever the Urgent
4929 pointer advances in the data stream. There MUST be a way
4930 for the application to learn how much urgent data remains to
4931 be read from the connection, or at least to determine
4932 whether or not more urgent data remains to be read.
4935 Although the Urgent mechanism may be used for any
4936 application, it is normally used to send "interrupt"-
4937 type commands to a Telnet program (see "Using Telnet
4938 Synch Sequence" section in [INTRO:1]).
4940 The asynchronous or "out-of-band" notification will
4941 allow the application to go into "urgent mode", reading
4942 data from the TCP connection. This allows control
4943 commands to be sent to an application whose normal
4944 input buffers are full of unprocessed data.
4947 The generic ERROR-REPORT() upcall described in Section
4948 4.2.4.1 is a possible mechanism for informing the
4949 application of the arrival of urgent data.
4955 Internet Engineering Task Force [Page 84]
4960 RFC1122 TRANSPORT LAYER -- TCP October 1989
4963 4.2.2.5 TCP Options: RFC-793 Section 3.1
4965 A TCP MUST be able to receive a TCP option in any segment.
4966 A TCP MUST ignore without error any TCP option it does not
4967 implement, assuming that the option has a length field (all
4968 TCP options defined in the future will have length fields).
4969 TCP MUST be prepared to handle an illegal option length
4970 (e.g., zero) without crashing; a suggested procedure is to
4971 reset the connection and log the reason.
4973 4.2.2.6 Maximum Segment Size Option: RFC-793 Section 3.1
4975 TCP MUST implement both sending and receiving the Maximum
4976 Segment Size option [TCP:4].
4978 TCP SHOULD send an MSS (Maximum Segment Size) option in
4979 every SYN segment when its receive MSS differs from the
4980 default 536, and MAY send it always.
4982 If an MSS option is not received at connection setup, TCP
4983 MUST assume a default send MSS of 536 (576-40) [TCP:4].
4985 The maximum size of a segment that TCP really sends, the
4986 "effective send MSS," MUST be the smaller of the send MSS
4987 (which reflects the available reassembly buffer size at the
4988 remote host) and the largest size permitted by the IP layer:
4992 min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize
4996 * SendMSS is the MSS value received from the remote host,
4997 or the default 536 if no MSS option is received.
4999 * MMS_S is the maximum size for a transport-layer message
5002 * TCPhdrsize is the size of the TCP header; this is
5003 normally 20, but may be larger if TCP options are to be
5006 * IPoptionsize is the size of any IP options that TCP
5007 will pass to the IP layer with the current message.
5010 The MSS value to be sent in an MSS option must be less than
5014 Internet Engineering Task Force [Page 85]
5019 RFC1122 TRANSPORT LAYER -- TCP October 1989
5026 where MMS_R is the maximum size for a transport-layer
5027 message that can be received (and reassembled). TCP obtains
5028 MMS_R and MMS_S from the IP layer; see the generic call
5029 GET_MAXSIZES in Section 3.4.
5032 The choice of TCP segment size has a strong effect on
5033 performance. Larger segments increase throughput by
5034 amortizing header size and per-datagram processing
5035 overhead over more data bytes; however, if the packet
5036 is so large that it causes IP fragmentation, efficiency
5037 drops sharply if any fragments are lost [IP:9].
5039 Some TCP implementations send an MSS option only if the
5040 destination host is on a non-connected network.
5041 However, in general the TCP layer may not have the
5042 appropriate information to make this decision, so it is
5043 preferable to leave to the IP layer the task of
5044 determining a suitable MTU for the Internet path. We
5045 therefore recommend that TCP always send the option (if
5046 not 536) and that the IP layer determine MMS_R as
5047 specified in 3.3.3 and 3.4. A proposed IP-layer
5048 mechanism to measure the MTU would then modify the IP
5049 layer without changing TCP.
5051 4.2.2.7 TCP Checksum: RFC-793 Section 3.1
5053 Unlike the UDP checksum (see Section 4.1.3.4), the TCP
5054 checksum is never optional. The sender MUST generate it and
5055 the receiver MUST check it.
5057 4.2.2.8 TCP Connection State Diagram: RFC-793 Section 3.2,
5060 There are several problems with this diagram:
5062 (a) The arrow from SYN-SENT to SYN-RCVD should be labeled
5063 with "snd SYN,ACK", to agree with the text on page 68
5066 (b) There could be an arrow from SYN-RCVD state to LISTEN
5067 state, conditioned on receiving a RST after a passive
5068 open (see text page 70).
5073 Internet Engineering Task Force [Page 86]
5078 RFC1122 TRANSPORT LAYER -- TCP October 1989
5081 (c) It is possible to go directly from FIN-WAIT-1 to the
5082 TIME-WAIT state (see page 75 of the spec).
5085 4.2.2.9 Initial Sequence Number Selection: RFC-793 Section
5088 A TCP MUST use the specified clock-driven selection of
5089 initial sequence numbers.
5091 4.2.2.10 Simultaneous Open Attempts: RFC-793 Section 3.4, page
5094 There is an error in Figure 8: the packet on line 7 should
5095 be identical to the packet on line 5.
5097 A TCP MUST support simultaneous open attempts.
5100 It sometimes surprises implementors that if two
5101 applications attempt to simultaneously connect to each
5102 other, only one connection is generated instead of two.
5103 This was an intentional design decision; don't try to
5106 4.2.2.11 Recovery from Old Duplicate SYN: RFC-793 Section 3.4,
5109 Note that a TCP implementation MUST keep track of whether a
5110 connection has reached SYN_RCVD state as the result of a
5111 passive OPEN or an active OPEN.
5113 4.2.2.12 RST Segment: RFC-793 Section 3.4
5115 A TCP SHOULD allow a received RST segment to include data.
5118 It has been suggested that a RST segment could contain
5119 ASCII text that encoded and explained the cause of the
5120 RST. No standard has yet been established for such
5123 4.2.2.13 Closing a Connection: RFC-793 Section 3.5
5125 A TCP connection may terminate in two ways: (1) the normal
5126 TCP close sequence using a FIN handshake, and (2) an "abort"
5127 in which one or more RST segments are sent and the
5128 connection state is immediately discarded. If a TCP
5132 Internet Engineering Task Force [Page 87]
5137 RFC1122 TRANSPORT LAYER -- TCP October 1989
5140 connection is closed by the remote site, the local
5141 application MUST be informed whether it closed normally or
5144 The normal TCP close sequence delivers buffered data
5145 reliably in both directions. Since the two directions of a
5146 TCP connection are closed independently, it is possible for
5147 a connection to be "half closed," i.e., closed in only one
5148 direction, and a host is permitted to continue sending data
5149 in the open direction on a half-closed connection.
5151 A host MAY implement a "half-duplex" TCP close sequence, so
5152 that an application that has called CLOSE cannot continue to
5153 read data from the connection. If such a host issues a
5154 CLOSE call while received data is still pending in TCP, or
5155 if new data is received after CLOSE is called, its TCP
5156 SHOULD send a RST to show that data was lost.
5158 When a connection is closed actively, it MUST linger in
5159 TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
5160 However, it MAY accept a new SYN from the remote TCP to
5161 reopen the connection directly from TIME-WAIT state, if it:
5163 (1) assigns its initial sequence number for the new
5164 connection to be larger than the largest sequence
5165 number it used on the previous connection incarnation,
5168 (2) returns to TIME-WAIT state if the SYN turns out to be
5173 TCP's full-duplex data-preserving close is a feature
5174 that is not included in the analogous ISO transport
5177 Some systems have not implemented half-closed
5178 connections, presumably because they do not fit into
5179 the I/O model of their particular operating system. On
5180 these systems, once an application has called CLOSE, it
5181 can no longer read input data from the connection; this
5182 is referred to as a "half-duplex" TCP close sequence.
5184 The graceful close algorithm of TCP requires that the
5185 connection state remain defined on (at least) one end
5186 of the connection, for a timeout period of 2xMSL, i.e.,
5187 4 minutes. During this period, the (remote socket,
5191 Internet Engineering Task Force [Page 88]
5196 RFC1122 TRANSPORT LAYER -- TCP October 1989
5199 local socket) pair that defines the connection is busy
5200 and cannot be reused. To shorten the time that a given
5201 port pair is tied up, some TCPs allow a new SYN to be
5202 accepted in TIME-WAIT state.
5204 4.2.2.14 Data Communication: RFC-793 Section 3.7, page 40
5206 Since RFC-793 was written, there has been extensive work on
5207 TCP algorithms to achieve efficient data communication.
5208 Later sections of the present document describe required and
5209 recommended TCP algorithms to determine when to send data
5210 (Section 4.2.3.4), when to send an acknowledgment (Section
5211 4.2.3.2), and when to update the window (Section 4.2.3.3).
5214 One important performance issue is "Silly Window
5215 Syndrome" or "SWS" [TCP:5], a stable pattern of small
5216 incremental window movements resulting in extremely
5217 poor TCP performance. Algorithms to avoid SWS are
5218 described below for both the sending side (Section
5219 4.2.3.4) and the receiving side (Section 4.2.3.3).
5221 In brief, SWS is caused by the receiver advancing the
5222 right window edge whenever it has any new buffer space
5223 available to receive data and by the sender using any
5224 incremental window, no matter how small, to send more
5225 data [TCP:5]. The result can be a stable pattern of
5226 sending tiny data segments, even though both sender and
5227 receiver have a large total buffer space for the
5228 connection. SWS can only occur during the transmission
5229 of a large amount of data; if the connection goes
5230 quiescent, the problem will disappear. It is caused by
5231 typical straightforward implementation of window
5232 management, but the sender and receiver algorithms
5233 given below will avoid it.
5235 Another important TCP performance issue is that some
5236 applications, especially remote login to character-at-
5237 a-time hosts, tend to send streams of one-octet data
5238 segments. To avoid deadlocks, every TCP SEND call from
5239 such applications must be "pushed", either explicitly
5240 by the application or else implicitly by TCP. The
5241 result may be a stream of TCP segments that contain one
5242 data octet each, which makes very inefficient use of
5243 the Internet and contributes to Internet congestion.
5244 The Nagle Algorithm described in Section 4.2.3.4
5245 provides a simple and effective solution to this
5246 problem. It does have the effect of clumping
5250 Internet Engineering Task Force [Page 89]
5255 RFC1122 TRANSPORT LAYER -- TCP October 1989
5258 characters over Telnet connections; this may initially
5259 surprise users accustomed to single-character echo, but
5260 user acceptance has not been a problem.
5262 Note that the Nagle algorithm and the send SWS
5263 avoidance algorithm play complementary roles in
5264 improving performance. The Nagle algorithm discourages
5265 sending tiny segments when the data to be sent
5266 increases in small increments, while the SWS avoidance
5267 algorithm discourages small segments resulting from the
5268 right window edge advancing in small increments.
5270 A careless implementation can send two or more
5271 acknowledgment segments per data segment received. For
5272 example, suppose the receiver acknowledges every data
5273 segment immediately. When the application program
5274 subsequently consumes the data and increases the
5275 available receive buffer space again, the receiver may
5276 send a second acknowledgment segment to update the
5277 window at the sender. The extreme case occurs with
5278 single-character segments on TCP connections using the
5279 Telnet protocol for remote login service. Some
5280 implementations have been observed in which each
5281 incoming 1-character segment generates three return
5282 segments: (1) the acknowledgment, (2) a one byte
5283 increase in the window, and (3) the echoed character,
5286 4.2.2.15 Retransmission Timeout: RFC-793 Section 3.7, page 41
5288 The algorithm suggested in RFC-793 for calculating the
5289 retransmission timeout is now known to be inadequate; see
5290 Section 4.2.3.1 below.
5292 Recent work by Jacobson [TCP:7] on Internet congestion and
5293 TCP retransmission stability has produced a transmission
5294 algorithm combining "slow start" with "congestion
5295 avoidance". A TCP MUST implement this algorithm.
5297 If a retransmitted packet is identical to the original
5298 packet (which implies not only that the data boundaries have
5299 not changed, but also that the window and acknowledgment
5300 fields of the header have not changed), then the same IP
5301 Identification field MAY be used (see Section 3.2.1.5).
5304 Some TCP implementors have chosen to "packetize" the
5305 data stream, i.e., to pick segment boundaries when
5309 Internet Engineering Task Force [Page 90]
5314 RFC1122 TRANSPORT LAYER -- TCP October 1989
5317 segments are originally sent and to queue these
5318 segments in a "retransmission queue" until they are
5319 acknowledged. Another design (which may be simpler) is
5320 to defer packetizing until each time data is
5321 transmitted or retransmitted, so there will be no
5322 segment retransmission queue.
5324 In an implementation with a segment retransmission
5325 queue, TCP performance may be enhanced by repacketizing
5326 the segments awaiting acknowledgment when the first
5327 retransmission timeout occurs. That is, the
5328 outstanding segments that fitted would be combined into
5329 one maximum-sized segment, with a new IP Identification
5330 value. The TCP would then retain this combined segment
5331 in the retransmit queue until it was acknowledged.
5332 However, if the first two segments in the
5333 retransmission queue totalled more than one maximum-
5334 sized segment, the TCP would retransmit only the first
5335 segment using the original IP Identification field.
5337 4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41
5339 A TCP receiver SHOULD NOT shrink the window, i.e., move the
5340 right window edge to the left. However, a sending TCP MUST
5341 be robust against window shrinking, which may cause the
5342 "useable window" (see Section 4.2.3.4) to become negative.
5344 If this happens, the sender SHOULD NOT send new data, but
5345 SHOULD retransmit normally the old unacknowledged data
5346 between SND.UNA and SND.UNA+SND.WND. The sender MAY also
5347 retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT
5348 time out the connection if data beyond the right window edge
5349 is not acknowledged. If the window shrinks to zero, the TCP
5350 MUST probe it in the standard way (see next Section).
5353 Many TCP implementations become confused if the window
5354 shrinks from the right after data has been sent into a
5355 larger window. Note that TCP has a heuristic to select
5356 the latest window update despite possible datagram
5357 reordering; as a result, it may ignore a window update
5358 with a smaller window than previously offered if
5359 neither the sequence number nor the acknowledgment
5360 number is increased.
5368 Internet Engineering Task Force [Page 91]
5373 RFC1122 TRANSPORT LAYER -- TCP October 1989
5376 4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42
5378 Probing of zero (offered) windows MUST be supported.
5380 A TCP MAY keep its offered receive window closed
5381 indefinitely. As long as the receiving TCP continues to
5382 send acknowledgments in response to the probe segments, the
5383 sending TCP MUST allow the connection to stay open.
5386 It is extremely important to remember that ACK
5387 (acknowledgment) segments that contain no data are not
5388 reliably transmitted by TCP. If zero window probing is
5389 not supported, a connection may hang forever when an
5390 ACK segment that re-opens the window is lost.
5392 The delay in opening a zero window generally occurs
5393 when the receiving application stops taking data from
5394 its TCP. For example, consider a printer daemon
5395 application, stopped because the printer ran out of
5398 The transmitting host SHOULD send the first zero-window
5399 probe when a zero window has existed for the retransmission
5400 timeout period (see Section 4.2.2.15), and SHOULD increase
5401 exponentially the interval between successive probes.
5404 This procedure minimizes delay if the zero-window
5405 condition is due to a lost ACK segment containing a
5406 window-opening update. Exponential backoff is
5407 recommended, possibly with some maximum interval not
5408 specified here. This procedure is similar to that of
5409 the retransmission algorithm, and it may be possible to
5410 combine the two procedures in the implementation.
5412 4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8
5414 Every passive OPEN call either creates a new connection
5415 record in LISTEN state, or it returns an error; it MUST NOT
5416 affect any previously created connection record.
5418 A TCP that supports multiple concurrent users MUST provide
5419 an OPEN call that will functionally allow an application to
5420 LISTEN on a port while a connection block with the same
5421 local port is in SYN-SENT or SYN-RECEIVED state.
5427 Internet Engineering Task Force [Page 92]
5432 RFC1122 TRANSPORT LAYER -- TCP October 1989
5435 Some applications (e.g., SMTP servers) may need to
5436 handle multiple connection attempts at about the same
5437 time. The probability of a connection attempt failing
5438 is reduced by giving the application some means of
5439 listening for a new connection at the same time that an
5440 earlier connection attempt is going through the three-
5444 Acceptable implementations of concurrent opens may
5445 permit multiple passive OPEN calls, or they may allow
5446 "cloning" of LISTEN-state connections from a single
5449 4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52
5451 RFC-793 specified that TCP was to request the IP layer to
5452 send TCP segments with TTL = 60. This is obsolete; the TTL
5453 value used to send TCP segments MUST be configurable. See
5454 Section 3.2.1.7 for discussion.
5456 4.2.2.20 Event Processing: RFC-793 Section 3.9
5458 While it is not strictly required, a TCP SHOULD be capable
5459 of queueing out-of-order TCP segments. Change the "may" in
5460 the last sentence of the first paragraph on page 70 to
5464 Some small-host implementations have omitted segment
5465 queueing because of limited buffer space. This
5466 omission may be expected to adversely affect TCP
5467 throughput, since loss of a single segment causes all
5468 later segments to appear to be "out of sequence".
5470 In general, the processing of received segments MUST be
5471 implemented to aggregate ACK segments whenever possible.
5472 For example, if the TCP is processing a series of queued
5473 segments, it MUST process them all before sending any ACK
5476 Here are some detailed error corrections and notes on the
5477 Event Processing section of RFC-793.
5479 (a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK
5482 (b) LISTEN state, check for SYN (pp. 65, 66): With a SYN
5486 Internet Engineering Task Force [Page 93]
5491 RFC1122 TRANSPORT LAYER -- TCP October 1989
5494 bit, if the security/compartment or the precedence is
5495 wrong for the segment, a reset is sent. The wrong form
5496 of reset is shown in the text; it should be:
5498 <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
5501 (c) SYN-SENT state, Check for SYN, p. 68: When the
5502 connection enters ESTABLISHED state, the following
5503 variables must be set:
5509 (d) Check security and precedence, p. 71: The first heading
5510 "ESTABLISHED STATE" should really be a list of all
5511 states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-
5512 1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and
5515 (e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if
5516 the connection was initiated with a passive OPEN, then
5517 return this connection to the LISTEN state and return.
5520 (f) Check ACK field, SYN-RECEIVED state, p. 72: When the
5521 connection enters ESTABLISHED state, the variables
5522 listed in (c) must be set.
5524 (g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a
5525 duplicate if SEG.ACK =< SND.UNA (the = was omitted).
5526 Similarly, the window should be updated if: SND.UNA =<
5529 (h) USER TIMEOUT, p. 77:
5531 It would be better to notify the application of the
5532 timeout rather than letting TCP force the connection
5533 closed. However, see also Section 4.2.3.5.
5536 4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9
5538 A TCP MAY send an ACK segment acknowledging RCV.NXT when a
5539 valid segment arrives that is in the window but not at the
5545 Internet Engineering Task Force [Page 94]
5550 RFC1122 TRANSPORT LAYER -- TCP October 1989
5554 RFC-793 (see page 74) was ambiguous about whether or
5555 not an ACK segment should be sent when an out-of-order
5556 segment was received, i.e., when SEG.SEQ was unequal to
5559 One reason for ACKing out-of-order segments might be to
5560 support an experimental algorithm known as "fast
5561 retransmit". With this algorithm, the sender uses the
5562 "redundant" ACK's to deduce that a segment has been
5563 lost before the retransmission timer has expired. It
5564 counts the number of times an ACK has been received
5565 with the same value of SEG.ACK and with the same right
5566 window edge. If more than a threshold number of such
5567 ACK's is received, then the segment containing the
5568 octets starting at SEG.ACK is assumed to have been lost
5569 and is retransmitted, without awaiting a timeout. The
5570 threshold is chosen to compensate for the maximum
5571 likely segment reordering in the Internet. There is
5572 not yet enough experience with the fast retransmit
5573 algorithm to determine how useful it is.
5575 4.2.3 SPECIFIC ISSUES
5577 4.2.3.1 Retransmission Timeout Calculation
5579 A host TCP MUST implement Karn's algorithm and Jacobson's
5580 algorithm for computing the retransmission timeout ("RTO").
5582 o Jacobson's algorithm for computing the smoothed round-
5583 trip ("RTT") time incorporates a simple measure of the
5586 o Karn's algorithm for selecting RTT measurements ensures
5587 that ambiguous round-trip times will not corrupt the
5588 calculation of the smoothed round-trip time [TCP:6].
5590 This implementation also MUST include "exponential backoff"
5591 for successive RTO values for the same segment.
5592 Retransmission of SYN segments SHOULD use the same algorithm
5596 There were two known problems with the RTO calculations
5597 specified in RFC-793. First, the accurate measurement
5598 of RTTs is difficult when there are retransmissions.
5599 Second, the algorithm to compute the smoothed round-
5600 trip time is inadequate [TCP:7], because it incorrectly
5604 Internet Engineering Task Force [Page 95]
5609 RFC1122 TRANSPORT LAYER -- TCP October 1989
5612 assumed that the variance in RTT values would be small
5613 and constant. These problems were solved by Karn's and
5614 Jacobson's algorithm, respectively.
5616 The performance increase resulting from the use of
5617 these improvements varies from noticeable to dramatic.
5618 Jacobson's algorithm for incorporating the measured RTT
5619 variance is especially important on a low-speed link,
5620 where the natural variation of packet sizes causes a
5621 large variation in RTT. One vendor found link
5622 utilization on a 9.6kb line went from 10% to 90% as a
5623 result of implementing Jacobson's variance algorithm in
5626 The following values SHOULD be used to initialize the
5627 estimation parameters for a new connection:
5629 (a) RTT = 0 seconds.
5631 (b) RTO = 3 seconds. (The smoothed variance is to be
5632 initialized to the value that will result in this RTO).
5634 The recommended upper and lower bounds on the RTO are known
5635 to be inadequate on large internets. The lower bound SHOULD
5636 be measured in fractions of a second (to accommodate high
5637 speed LANs) and the upper bound should be 2*MSL, i.e., 240
5641 Experience has shown that these initialization values
5642 are reasonable, and that in any case the Karn and
5643 Jacobson algorithms make TCP behavior reasonably
5644 insensitive to the initial parameter choices.
5646 4.2.3.2 When to Send an ACK Segment
5648 A host that is receiving a stream of TCP data segments can
5649 increase efficiency in both the Internet and the hosts by
5650 sending fewer than one ACK (acknowledgment) segment per data
5651 segment received; this is known as a "delayed ACK" [TCP:5].
5653 A TCP SHOULD implement a delayed ACK, but an ACK should not
5654 be excessively delayed; in particular, the delay MUST be
5655 less than 0.5 seconds, and in a stream of full-sized
5656 segments there SHOULD be an ACK for at least every second
5663 Internet Engineering Task Force [Page 96]
5668 RFC1122 TRANSPORT LAYER -- TCP October 1989
5671 A delayed ACK gives the application an opportunity to
5672 update the window and perhaps to send an immediate
5673 response. In particular, in the case of character-mode
5674 remote login, a delayed ACK can reduce the number of
5675 segments sent by the server by a factor of 3 (ACK,
5676 window update, and echo character all combined in one
5679 In addition, on some large multi-user hosts, a delayed
5680 ACK can substantially reduce protocol processing
5681 overhead by reducing the total number of packets to be
5682 processed [TCP:5]. However, excessive delays on ACK's
5683 can disturb the round-trip timing and packet "clocking"
5686 4.2.3.3 When to Send a Window Update
5688 A TCP MUST include a SWS avoidance algorithm in the receiver
5692 The receiver's SWS avoidance algorithm determines when
5693 the right window edge may be advanced; this is
5694 customarily known as "updating the window". This
5695 algorithm combines with the delayed ACK algorithm (see
5696 Section 4.2.3.2) to determine when an ACK segment
5697 containing the current window will really be sent to
5698 the receiver. We use the notation of RFC-793; see
5699 Figures 4 and 5 in that document.
5701 The solution to receiver SWS is to avoid advancing the
5702 right window edge RCV.NXT+RCV.WND in small increments,
5703 even if data is received from the network in small
5706 Suppose the total receive buffer space is RCV.BUFF. At
5707 any given moment, RCV.USER octets of this total may be
5708 tied up with data that has been received and
5709 acknowledged but which the user process has not yet
5710 consumed. When the connection is quiescent, RCV.WND =
5711 RCV.BUFF and RCV.USER = 0.
5713 Keeping the right window edge fixed as data arrives and
5714 is acknowledged requires that the receiver offer less
5715 than its full buffer space, i.e., the receiver must
5716 specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
5717 as RCV.NXT increases. Thus, the total buffer space
5718 RCV.BUFF is generally divided into three parts:
5722 Internet Engineering Task Force [Page 97]
5727 RFC1122 TRANSPORT LAYER -- TCP October 1989
5731 |<------- RCV.BUFF ---------------->|
5733 ----|---------|------------------|------|----
5737 1 - RCV.USER = data received but not yet consumed;
5738 2 - RCV.WND = space advertised to sender;
5739 3 - Reduction = space available but not yet
5743 The suggested SWS avoidance algorithm for the receiver
5744 is to keep RCV.NXT+RCV.WND fixed until the reduction
5747 RCV.BUFF - RCV.USER - RCV.WND >=
5749 min( Fr * RCV.BUFF, Eff.snd.MSS )
5751 where Fr is a fraction whose recommended value is 1/2,
5752 and Eff.snd.MSS is the effective send MSS for the
5753 connection (see Section 4.2.2.6). When the inequality
5754 is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
5756 Note that the general effect of this algorithm is to
5757 advance RCV.WND in increments of Eff.snd.MSS (for
5758 realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2).
5759 Note also that the receiver must use its own
5760 Eff.snd.MSS, assuming it is the same as the sender's.
5762 4.2.3.4 When to Send Data
5764 A TCP MUST include a SWS avoidance algorithm in the sender.
5766 A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
5767 coalesce short segments. However, there MUST be a way for
5768 an application to disable the Nagle algorithm on an
5769 individual connection. In all cases, sending data is also
5770 subject to the limitation imposed by the Slow Start
5771 algorithm (Section 4.2.2.15).
5774 The Nagle algorithm is generally as follows:
5776 If there is unacknowledged data (i.e., SND.NXT >
5777 SND.UNA), then the sending TCP buffers all user
5781 Internet Engineering Task Force [Page 98]
5786 RFC1122 TRANSPORT LAYER -- TCP October 1989
5789 data (regardless of the PSH bit), until the
5790 outstanding data has been acknowledged or until
5791 the TCP can send a full-sized segment (Eff.snd.MSS
5792 bytes; see Section 4.2.2.6).
5794 Some applications (e.g., real-time display window
5795 updates) require that the Nagle algorithm be turned
5796 off, so small data segments can be streamed out at the
5800 The sender's SWS avoidance algorithm is more difficult
5801 than the receivers's, because the sender does not know
5802 (directly) the receiver's total buffer space RCV.BUFF.
5803 An approach which has been found to work well is for
5804 the sender to calculate Max(SND.WND), the maximum send
5805 window it has seen so far on the connection, and to use
5806 this value as an estimate of RCV.BUFF. Unfortunately,
5807 this can only be an estimate; the receiver may at any
5808 time reduce the size of RCV.BUFF. To avoid a resulting
5809 deadlock, it is necessary to have a timeout to force
5810 transmission of data, overriding the SWS avoidance
5811 algorithm. In practice, this timeout should seldom
5814 The "useable window" [TCP:5] is:
5816 U = SND.UNA + SND.WND - SND.NXT
5818 i.e., the offered window less the amount of data sent
5819 but not acknowledged. If D is the amount of data
5820 queued in the sending TCP but not yet sent, then the
5821 following set of rules is recommended.
5825 (1) if a maximum-sized segment can be sent, i.e, if:
5827 min(D,U) >= Eff.snd.MSS;
5830 (2) or if the data is pushed and all queued data can
5831 be sent now, i.e., if:
5833 [SND.NXT = SND.UNA and] PUSHED and D <= U
5835 (the bracketed condition is imposed by the Nagle
5840 Internet Engineering Task Force [Page 99]
5845 RFC1122 TRANSPORT LAYER -- TCP October 1989
5848 (3) or if at least a fraction Fs of the maximum window
5849 can be sent, i.e., if:
5851 [SND.NXT = SND.UNA and]
5853 min(D.U) >= Fs * Max(SND.WND);
5856 (4) or if data is PUSHed and the override timeout
5859 Here Fs is a fraction whose recommended value is 1/2.
5860 The override timeout should be in the range 0.1 - 1.0
5861 seconds. It may be convenient to combine this timer
5862 with the timer used to probe zero windows (Section
5865 Finally, note that the SWS avoidance algorithm just
5866 specified is to be used instead of the sender-side
5867 algorithm contained in [TCP:5].
5869 4.2.3.5 TCP Connection Failures
5871 Excessive retransmission of the same segment by TCP
5872 indicates some failure of the remote host or the Internet
5873 path. This failure may be of short or long duration. The
5874 following procedure MUST be used to handle excessive
5875 retransmissions of data segments [IP:11]:
5877 (a) There are two thresholds R1 and R2 measuring the amount
5878 of retransmission that has occurred for the same
5879 segment. R1 and R2 might be measured in time units or
5880 as a count of retransmissions.
5882 (b) When the number of transmissions of the same segment
5883 reaches or exceeds threshold R1, pass negative advice
5884 (see Section 3.3.1.4) to the IP layer, to trigger
5885 dead-gateway diagnosis.
5887 (c) When the number of transmissions of the same segment
5888 reaches a threshold R2 greater than R1, close the
5891 (d) An application MUST be able to set the value for R2 for
5892 a particular connection. For example, an interactive
5893 application might set R2 to "infinity," giving the user
5894 control over when to disconnect.
5899 Internet Engineering Task Force [Page 100]
5904 RFC1122 TRANSPORT LAYER -- TCP October 1989
5907 (d) TCP SHOULD inform the application of the delivery
5908 problem (unless such information has been disabled by
5909 the application; see Section 4.2.4.1), when R1 is
5910 reached and before R2. This will allow a remote login
5911 (User Telnet) application program to inform the user,
5914 The value of R1 SHOULD correspond to at least 3
5915 retransmissions, at the current RTO. The value of R2 SHOULD
5916 correspond to at least 100 seconds.
5918 An attempt to open a TCP connection could fail with
5919 excessive retransmissions of the SYN segment or by receipt
5920 of a RST segment or an ICMP Port Unreachable. SYN
5921 retransmissions MUST be handled in the general way just
5922 described for data retransmissions, including notification
5923 of the application layer.
5925 However, the values of R1 and R2 may be different for SYN
5926 and data segments. In particular, R2 for a SYN segment MUST
5927 be set large enough to provide retransmission of the segment
5928 for at least 3 minutes. The application can close the
5929 connection (i.e., give up on the open attempt) sooner, of
5933 Some Internet paths have significant setup times, and
5934 the number of such paths is likely to increase in the
5937 4.2.3.6 TCP Keep-Alives
5939 Implementors MAY include "keep-alives" in their TCP
5940 implementations, although this practice is not universally
5941 accepted. If keep-alives are included, the application MUST
5942 be able to turn them on or off for each TCP connection, and
5943 they MUST default to off.
5945 Keep-alive packets MUST only be sent when no data or
5946 acknowledgement packets have been received for the
5947 connection within an interval. This interval MUST be
5948 configurable and MUST default to no less than two hours.
5950 It is extremely important to remember that ACK segments that
5951 contain no data are not reliably transmitted by TCP.
5952 Consequently, if a keep-alive mechanism is implemented it
5953 MUST NOT interpret failure to respond to any specific probe
5954 as a dead connection.
5958 Internet Engineering Task Force [Page 101]
5963 RFC1122 TRANSPORT LAYER -- TCP October 1989
5966 An implementation SHOULD send a keep-alive segment with no
5967 data; however, it MAY be configurable to send a keep-alive
5968 segment containing one garbage octet, for compatibility with
5969 erroneous TCP implementations.
5972 A "keep-alive" mechanism periodically probes the other
5973 end of a connection when the connection is otherwise
5974 idle, even when there is no data to be sent. The TCP
5975 specification does not include a keep-alive mechanism
5976 because it could: (1) cause perfectly good connections
5977 to break during transient Internet failures; (2)
5978 consume unnecessary bandwidth ("if no one is using the
5979 connection, who cares if it is still good?"); and (3)
5980 cost money for an Internet path that charges for
5983 Some TCP implementations, however, have included a
5984 keep-alive mechanism. To confirm that an idle
5985 connection is still active, these implementations send
5986 a probe segment designed to elicit a response from the
5987 peer TCP. Such a segment generally contains SEG.SEQ =
5988 SND.NXT-1 and may or may not contain one garbage octet
5989 of data. Note that on a quiet connection SND.NXT =
5990 RCV.NXT, so that this SEG.SEQ will be outside the
5991 window. Therefore, the probe causes the receiver to
5992 return an acknowledgment segment, confirming that the
5993 connection is still live. If the peer has dropped the
5994 connection due to a network partition or a crash, it
5995 will respond with a RST instead of an acknowledgment
5998 Unfortunately, some misbehaved TCP implementations fail
5999 to respond to a segment with SEG.SEQ = SND.NXT-1 unless
6000 the segment contains data. Alternatively, an
6001 implementation could determine whether a peer responded
6002 correctly to keep-alive packets with no garbage data
6005 A TCP keep-alive mechanism should only be invoked in
6006 server applications that might otherwise hang
6007 indefinitely and consume resources unnecessarily if a
6008 client crashes or aborts a connection during a network
6017 Internet Engineering Task Force [Page 102]
6022 RFC1122 TRANSPORT LAYER -- TCP October 1989
6025 4.2.3.7 TCP Multihoming
6027 If an application on a multihomed host does not specify the
6028 local IP address when actively opening a TCP connection,
6029 then the TCP MUST ask the IP layer to select a local IP
6030 address before sending the (first) SYN. See the function
6031 GET_SRCADDR() in Section 3.4.
6033 At all other times, a previous segment has either been sent
6034 or received on this connection, and TCP MUST use the same
6035 local address is used that was used in those previous
6040 When received options are passed up to TCP from the IP
6041 layer, TCP MUST ignore options that it does not understand.
6043 A TCP MAY support the Time Stamp and Record Route options.
6045 An application MUST be able to specify a source route when
6046 it actively opens a TCP connection, and this MUST take
6047 precedence over a source route received in a datagram.
6049 When a TCP connection is OPENed passively and a packet
6050 arrives with a completed IP Source Route option (containing
6051 a return route), TCP MUST save the return route and use it
6052 for all segments sent on this connection. If a different
6053 source route arrives in a later segment, the later
6054 definition SHOULD override the earlier one.
6056 4.2.3.9 ICMP Messages
6058 TCP MUST act on an ICMP error message passed up from the IP
6059 layer, directing it to the connection that created the
6060 error. The necessary demultiplexing information can be
6061 found in the IP header contained within the ICMP message.
6065 TCP MUST react to a Source Quench by slowing
6066 transmission on the connection. The RECOMMENDED
6067 procedure is for a Source Quench to trigger a "slow
6068 start," as if a retransmission timeout had occurred.
6070 o Destination Unreachable -- codes 0, 1, 5
6072 Since these Unreachable messages indicate soft error
6076 Internet Engineering Task Force [Page 103]
6081 RFC1122 TRANSPORT LAYER -- TCP October 1989
6084 conditions, TCP MUST NOT abort the connection, and it
6085 SHOULD make the information available to the
6089 TCP could report the soft error condition directly
6090 to the application layer with an upcall to the
6091 ERROR_REPORT routine, or it could merely note the
6092 message and report it to the application only when
6093 and if the TCP connection times out.
6095 o Destination Unreachable -- codes 2-4
6097 These are hard error conditions, so TCP SHOULD abort
6100 o Time Exceeded -- codes 0, 1
6102 This should be handled the same way as Destination
6103 Unreachable codes 0, 1, 5 (see above).
6107 This should be handled the same way as Destination
6108 Unreachable codes 0, 1, 5 (see above).
6111 4.2.3.10 Remote Address Validation
6113 A TCP implementation MUST reject as an error a local OPEN
6114 call for an invalid remote IP address (e.g., a broadcast or
6117 An incoming SYN with an invalid source address must be
6118 ignored either by TCP or by the IP layer (see Section
6121 A TCP implementation MUST silently discard an incoming SYN
6122 segment that is addressed to a broadcast or multicast
6125 4.2.3.11 TCP Traffic Patterns
6128 The TCP protocol specification [TCP:1] gives the
6129 implementor much freedom in designing the algorithms
6130 that control the message flow over the connection --
6131 packetizing, managing the window, sending
6135 Internet Engineering Task Force [Page 104]
6140 RFC1122 TRANSPORT LAYER -- TCP October 1989
6143 acknowledgments, etc. These design decisions are
6144 difficult because a TCP must adapt to a wide range of
6145 traffic patterns. Experience has shown that a TCP
6146 implementor needs to verify the design on two extreme
6149 o Single-character Segments
6151 Even if the sender is using the Nagle Algorithm,
6152 when a TCP connection carries remote login traffic
6153 across a low-delay LAN the receiver will generally
6154 get a stream of single-character segments. If
6155 remote terminal echo mode is in effect, the
6156 receiver's system will generally echo each
6157 character as it is received.
6161 When TCP is used for bulk transfer, the data
6162 stream should be made up (almost) entirely of
6163 segments of the size of the effective MSS.
6164 Although TCP uses a sequence number space with
6165 byte (octet) granularity, in bulk-transfer mode
6166 its operation should be as if TCP used a sequence
6167 space that counted only segments.
6169 Experience has furthermore shown that a single TCP can
6170 effectively and efficiently handle these two extremes.
6172 The most important tool for verifying a new TCP
6173 implementation is a packet trace program. There is a
6174 large volume of experience showing the importance of
6175 tracing a variety of traffic patterns with other TCP
6176 implementations and studying the results carefully.
6182 Extensive experience has led to the following
6183 suggestions for efficient implementation of TCP:
6187 In bulk data transfer, the primary CPU-intensive
6188 tasks are copying data from one place to another
6189 and checksumming the data. It is vital to
6190 minimize the number of copies of TCP data. Since
6194 Internet Engineering Task Force [Page 105]
6199 RFC1122 TRANSPORT LAYER -- TCP October 1989
6202 the ultimate speed limitation may be fetching data
6203 across the memory bus, it may be useful to combine
6204 the copy with checksumming, doing both with a
6205 single memory fetch.
6207 (b) Hand-Craft the Checksum Routine
6209 A good TCP checksumming routine is typically two
6210 to five times faster than a simple and direct
6211 implementation of the definition. Great care and
6212 clever coding are often required and advisable to
6213 make the checksumming code "blazing fast". See
6216 (c) Code for the Common Case
6218 TCP protocol processing can be complicated, but
6219 for most segments there are only a few simple
6220 decisions to be made. Per-segment processing will
6221 be greatly speeded up by coding the main line to
6222 minimize the number of decisions in the most
6226 4.2.4 TCP/APPLICATION LAYER INTERFACE
6228 4.2.4.1 Asynchronous Reports
6230 There MUST be a mechanism for reporting soft TCP error
6231 conditions to the application. Generically, we assume this
6232 takes the form of an application-supplied ERROR_REPORT
6233 routine that may be upcalled [INTRO:7] asynchronously from
6234 the transport layer:
6236 ERROR_REPORT(local connection name, reason, subreason)
6238 The precise encoding of the reason and subreason parameters
6239 is not specified here. However, the conditions that are
6240 reported asynchronously to the application MUST include:
6242 * ICMP error message arrived (see 4.2.3.9)
6244 * Excessive retransmissions (see 4.2.3.5)
6246 * Urgent pointer advance (see 4.2.2.4).
6248 However, an application program that does not want to
6249 receive such ERROR_REPORT calls SHOULD be able to
6253 Internet Engineering Task Force [Page 106]
6258 RFC1122 TRANSPORT LAYER -- TCP October 1989
6261 effectively disable these calls.
6264 These error reports generally reflect soft errors that
6265 can be ignored without harm by many applications. It
6266 has been suggested that these error report calls should
6267 default to "disabled," but this is not required.
6269 4.2.4.2 Type-of-Service
6271 The application layer MUST be able to specify the Type-of-
6272 Service (TOS) for segments that are sent on a connection.
6273 It not required, but the application SHOULD be able to
6274 change the TOS during the connection lifetime. TCP SHOULD
6275 pass the current TOS value without change to the IP layer,
6276 when it sends segments on the connection.
6278 The TOS will be specified independently in each direction on
6279 the connection, so that the receiver application will
6280 specify the TOS used for ACK segments.
6282 TCP MAY pass the most recently received TOS up to the
6286 Some applications (e.g., SMTP) change the nature of
6287 their communication during the lifetime of a
6288 connection, and therefore would like to change the TOS
6291 Note also that the OPEN call specified in RFC-793
6292 includes a parameter ("options") in which the caller
6293 can specify IP options such as source route, record
6294 route, or timestamp.
6298 Some TCP implementations have included a FLUSH call, which
6299 will empty the TCP send queue of any data for which the user
6300 has issued SEND calls but which is still to the right of the
6301 current send window. That is, it flushes as much queued
6302 send data as possible without losing sequence number
6303 synchronization. This is useful for implementing the "abort
6304 output" function of Telnet.
6312 Internet Engineering Task Force [Page 107]
6317 RFC1122 TRANSPORT LAYER -- TCP October 1989
6322 The user interface outlined in sections 2.7 and 3.8 of RFC-
6323 793 needs to be extended for multihoming. The OPEN call
6324 MUST have an optional parameter:
6326 OPEN( ... [local IP address,] ... )
6328 to allow the specification of the local IP address.
6331 Some TCP-based applications need to specify the local
6332 IP address to be used to open a particular connection;
6336 A passive OPEN call with a specified "local IP address"
6337 parameter will await an incoming connection request to
6338 that address. If the parameter is unspecified, a
6339 passive OPEN will await an incoming connection request
6340 to any local IP address, and then bind the local IP
6341 address of the connection to the particular address
6344 For an active OPEN call, a specified "local IP address"
6345 parameter will be used for opening the connection. If
6346 the parameter is unspecified, the networking software
6347 will choose an appropriate local IP address (see
6348 Section 3.3.4.2) for the connection
6350 4.2.5 TCP REQUIREMENT SUMMARY
6361 FEATURE |SECTION | | | |T|T|e
6362 -------------------------------------------------|--------|-|-|-|-|-|--
6364 Push flag | | | | | | |
6365 Aggregate or queue un-pushed data |4.2.2.2 | | |x| | |
6366 Sender collapse successive PSH flags |4.2.2.2 | |x| | | |
6367 SEND call can specify PUSH |4.2.2.2 | | |x| | |
6371 Internet Engineering Task Force [Page 108]
6376 RFC1122 TRANSPORT LAYER -- TCP October 1989
6379 If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x|
6380 If cannot: PSH last segment |4.2.2.2 |x| | | | |
6381 Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1
6382 Send max size segment when possible |4.2.2.2 | |x| | | |
6384 Window | | | | | | |
6385 Treat as unsigned number |4.2.2.3 |x| | | | |
6386 Handle as 32-bit number |4.2.2.3 | |x| | | |
6387 Shrink window from right |4.2.2.16| | | |x| |
6388 Robust against shrinking window |4.2.2.16|x| | | | |
6389 Receiver's window closed indefinitely |4.2.2.17| | |x| | |
6390 Sender probe zero window |4.2.2.17|x| | | | |
6391 First probe after RTO |4.2.2.17| |x| | | |
6392 Exponential backoff |4.2.2.17| |x| | | |
6393 Allow window stay zero indefinitely |4.2.2.17|x| | | | |
6394 Sender timeout OK conn with zero wind |4.2.2.17| | | | |x|
6396 Urgent Data | | | | | | |
6397 Pointer points to last octet |4.2.2.4 |x| | | | |
6398 Arbitrary length urgent data sequence |4.2.2.4 |x| | | | |
6399 Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1
6400 ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1
6402 TCP Options | | | | | | |
6403 Receive TCP option in any segment |4.2.2.5 |x| | | | |
6404 Ignore unsupported options |4.2.2.5 |x| | | | |
6405 Cope with illegal option length |4.2.2.5 |x| | | | |
6406 Implement sending & receiving MSS option |4.2.2.6 |x| | | | |
6407 Send MSS option unless 536 |4.2.2.6 | |x| | | |
6408 Send MSS option always |4.2.2.6 | | |x| | |
6409 Send-MSS default is 536 |4.2.2.6 |x| | | | |
6410 Calculate effective send seg size |4.2.2.6 |x| | | | |
6412 TCP Checksums | | | | | | |
6413 Sender compute checksum |4.2.2.7 |x| | | | |
6414 Receiver check checksum |4.2.2.7 |x| | | | |
6416 Use clock-driven ISN selection |4.2.2.9 |x| | | | |
6418 Opening Connections | | | | | | |
6419 Support simultaneous open attempts |4.2.2.10|x| | | | |
6420 SYN-RCVD remembers last state |4.2.2.11|x| | | | |
6421 Passive Open call interfere with others |4.2.2.18| | | | |x|
6422 Function: simultan. LISTENs for same port |4.2.2.18|x| | | | |
6423 Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | |
6424 Otherwise, use local addr of conn. |4.2.3.7 |x| | | | |
6425 OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|
6426 Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |
6430 Internet Engineering Task Force [Page 109]
6435 RFC1122 TRANSPORT LAYER -- TCP October 1989
6439 Closing Connections | | | | | | |
6440 RST can contain data |4.2.2.12| |x| | | |
6441 Inform application of aborted conn |4.2.2.13|x| | | | |
6442 Half-duplex close connections |4.2.2.13| | |x| | |
6443 Send RST to indicate data lost |4.2.2.13| |x| | | |
6444 In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | |
6445 Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |
6447 Retransmissions | | | | | | |
6448 Jacobson Slow Start algorithm |4.2.2.15|x| | | | |
6449 Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |
6450 Retransmit with same IP ident |4.2.2.15| | |x| | |
6451 Karn's algorithm |4.2.3.1 |x| | | | |
6452 Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |
6453 Exponential backoff |4.2.3.1 |x| | | | |
6454 SYN RTO calc same as data |4.2.3.1 | |x| | | |
6455 Recommended initial values and bounds |4.2.3.1 | |x| | | |
6457 Generating ACK's: | | | | | | |
6458 Queue out-of-order segments |4.2.2.20| |x| | | |
6459 Process all Q'd before send ACK |4.2.2.20|x| | | | |
6460 Send ACK for out-of-order segment |4.2.2.21| | |x| | |
6461 Delayed ACK's |4.2.3.2 | |x| | | |
6462 Delay < 0.5 seconds |4.2.3.2 |x| | | | |
6463 Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | |
6464 Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | |
6466 Sending data | | | | | | |
6467 Configurable TTL |4.2.2.19|x| | | | |
6468 Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | |
6469 Nagle algorithm |4.2.3.4 | |x| | | |
6470 Application can disable Nagle algorithm |4.2.3.4 |x| | | | |
6472 Connection Failures: | | | | | | |
6473 Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | |
6474 Close connection on R2 retxs |4.2.3.5 |x| | | | |
6475 ALP can set R2 |4.2.3.5 |x| | | | |1
6476 Inform ALP of R1<=retxs<R2 |4.2.3.5 | |x| | | |1
6477 Recommended values for R1, R2 |4.2.3.5 | |x| | | |
6478 Same mechanism for SYNs |4.2.3.5 |x| | | | |
6479 R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | |
6481 Send Keep-alive Packets: |4.2.3.6 | | |x| | |
6482 - Application can request |4.2.3.6 |x| | | | |
6483 - Default is "off" |4.2.3.6 |x| | | | |
6484 - Only send if idle for interval |4.2.3.6 |x| | | | |
6485 - Interval configurable |4.2.3.6 |x| | | | |
6489 Internet Engineering Task Force [Page 110]
6494 RFC1122 TRANSPORT LAYER -- TCP October 1989
6497 - Default at least 2 hrs. |4.2.3.6 |x| | | | |
6498 - Tolerant of lost ACK's |4.2.3.6 |x| | | | |
6500 IP Options | | | | | | |
6501 Ignore options TCP doesn't understand |4.2.3.8 |x| | | | |
6502 Time Stamp support |4.2.3.8 | | |x| | |
6503 Record Route support |4.2.3.8 | | |x| | |
6504 Source Route: | | | | | | |
6505 ALP can specify |4.2.3.8 |x| | | | |1
6506 Overrides src rt in datagram |4.2.3.8 |x| | | | |
6507 Build return route from src rt |4.2.3.8 |x| | | | |
6508 Later src route overrides |4.2.3.8 | |x| | | |
6510 Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |
6511 Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |
6512 Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|
6513 Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |
6514 Source Quench => slow start |4.2.3.9 | |x| | | |
6515 Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |
6516 Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |
6518 Address Validation | | | | | | |
6519 Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |
6520 Reject SYN from invalid IP address |4.2.3.10|x| | | | |
6521 Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |
6523 TCP/ALP Interface Services | | | | | | |
6524 Error Report mechanism |4.2.4.1 |x| | | | |
6525 ALP can disable Error Report Routine |4.2.4.1 | |x| | | |
6526 ALP can specify TOS for sending |4.2.4.2 |x| | | | |
6527 Passed unchanged to IP |4.2.4.2 | |x| | | |
6528 ALP can change TOS during connection |4.2.4.2 | |x| | | |
6529 Pass received TOS up to ALP |4.2.4.2 | | |x| | |
6530 FLUSH call |4.2.4.3 | | |x| | |
6531 Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | |
6532 -------------------------------------------------|--------|-|-|-|-|-|--
6533 -------------------------------------------------|--------|-|-|-|-|-|--
6537 (1) "ALP" means Application-Layer program.
6548 Internet Engineering Task Force [Page 111]
6553 RFC1122 TRANSPORT LAYER -- TCP October 1989
6558 INTRODUCTORY REFERENCES
6561 [INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
6562 IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
6565 [INTRO:2] "Requirements for Internet Gateways," R. Braden and J.
6566 Postel, RFC-1009, June 1987.
6568 [INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
6569 (three volumes), SRI International, December 1985.
6571 [INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel,
6574 This document is republished periodically with new RFC numbers; the
6575 latest version must be used.
6577 [INTRO:5] "Protocol Document Order Information," O. Jacobsen and J.
6578 Postel, RFC-980, March 1986.
6580 [INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
6583 This document is republished periodically with new RFC numbers; the
6584 latest version must be used.
6586 [INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
6587 Clark, RFC-817, July 1982.
6589 [INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
6590 SOSP, Orcas Island, Washington, December 1985.
6593 Secondary References:
6596 [INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf
6597 and R. Kahn, IEEE Transactions on Communication, May 1974.
6599 [INTRO:10] "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
6600 Cohen, Computer Networks, Vol. 5, No. 4, July 1981.
6602 [INTRO:11] "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
6603 R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
6607 Internet Engineering Task Force [Page 112]
6612 RFC1122 TRANSPORT LAYER -- TCP October 1989
6615 March 1985. Also in: IEEE Communications Magazine, March 1985.
6616 Also available as ISI-RS-85-153.
6618 [INTRO:12] "Final Text of DIS8473, Protocol for Providing the
6619 Connectionless Mode Network Service," ANSI, published as RFC-994,
6622 [INTRO:13] "End System to Intermediate System Routing Exchange
6623 Protocol," ANSI X3S3.3, published as RFC-995, April 1986.
6626 LINK LAYER REFERENCES
6629 [LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
6632 [LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
6635 [LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
6636 Networks," C. Hornig, RFC-894, April 1984.
6638 [LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
6639 "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.
6641 This RFC contains a great deal of information of importance to
6642 Internet implementers planning to use IEEE 802 networks.
6648 [IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.
6650 [IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
6653 [IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
6654 RFC-950, August 1985.
6656 [IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
6659 [IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
6660 of Defense, August 1983.
6662 This specification, as amended by RFC-963, is intended to describe
6666 Internet Engineering Task Force [Page 113]
6671 RFC1122 TRANSPORT LAYER -- TCP October 1989
6674 the Internet Protocol but has some serious omissions (e.g., the
6675 mandatory subnet extension [IP:3] and the optional multicasting
6676 extension [IP:4]). It is also out of date. If there is a
6677 conflict, RFC-791, RFC-792, and RFC-950 must be taken as
6678 authoritative, while the present document is authoritative over
6681 [IP:6] "Some Problems with the Specification of the Military Standard
6682 Internet Protocol," D. Sidhu, RFC-963, November 1985.
6684 [IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
6685 RFC-879, November 1983.
6687 Discusses and clarifies the relationship between the TCP Maximum
6688 Segment Size option and the IP datagram size.
6690 [IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108,
6693 [IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
6694 SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol.
6697 This useful paper discusses the problems created by Internet
6698 fragmentation and presents alternative solutions.
6700 [IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
6703 This and the following paper should be read by every implementor.
6705 [IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.
6707 SECONDARY IP REFERENCES:
6710 [IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
6711 Mogul, RFC-922, October 1984.
6713 [IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
6716 [IP:14] "Something a Host Could Do with Source Quench: The Source Quench
6717 Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
6720 This RFC first described directed broadcast addresses. However,
6721 the bulk of the RFC is concerned with gateways, not hosts.
6725 Internet Engineering Task Force [Page 114]
6730 RFC1122 TRANSPORT LAYER -- TCP October 1989
6736 [UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.
6742 [TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
6746 [TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
6747 Defense, August 1984.
6749 This specification as amended by RFC-964 is intended to describe
6750 the same protocol as RFC-793 [TCP:1]. If there is a conflict,
6751 RFC-793 takes precedence, and the present document is authoritative
6755 [TCP:3] "Some Problems with the Specification of the Military Standard
6756 Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
6760 [TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
6761 RFC-879, November 1983.
6764 [TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
6768 [TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
6769 SIGCOMM-87, August 1987.
6772 [TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
6776 SECONDARY TCP REFERENCES:
6779 [TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
6780 Clark, RFC-817, July 1982.
6784 Internet Engineering Task Force [Page 115]
6789 RFC1122 TRANSPORT LAYER -- TCP October 1989
6792 [TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.
6795 [TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
6796 Partridge, RFC-1071, September 1988.
6799 [TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
6800 RFC-1072, October 1988.
6803 Security Considerations
6805 There are many security issues in the communication layers of host
6806 software, but a full discussion is beyond the scope of this RFC.
6808 The Internet architecture generally provides little protection
6809 against spoofing of IP source addresses, so any security mechanism
6810 that is based upon verifying the IP source address of a datagram
6811 should be treated with suspicion. However, in restricted
6812 environments some source-address checking may be possible. For
6813 example, there might be a secure LAN whose gateway to the rest of the
6814 Internet discarded any incoming datagram with a source address that
6815 spoofed the LAN address. In this case, a host on the LAN could use
6816 the source address to test for local vs. remote source. This problem
6817 is complicated by source routing, and some have suggested that
6818 source-routed datagram forwarding by hosts (see Section 3.3.5) should
6819 be outlawed for security reasons.
6821 Security-related issues are mentioned in sections concerning the IP
6822 Security option (Section 3.2.1.8), the ICMP Parameter Problem message
6823 (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
6824 reserved TCP ports (Section 4.2.2.1).
6829 USC/Information Sciences Institute
6831 Marina del Rey, CA 90292-6695
6833 Phone: (213) 822 1511
6835 EMail: Braden@ISI.EDU
6843 Internet Engineering Task Force [Page 116]