external/bsd/bind/dist/doc/rfc/rfc1122.txt

   1
   2
   3
   4
   5
   6
   7 Network Working Group                    Internet Engineering Task Force
   8 Request for Comments: 1122                             R. Braden, Editor
   9                                                             October 1989
  10
  11
  12         Requirements for Internet Hosts -- Communication Layers
  13
  14
  15 Status of This Memo
  16
  17    This RFC is an official specification for the Internet community.  It
  18    incorporates by reference, amends, corrects, and supplements the
  19    primary protocol standards documents relating to hosts.  Distribution
  20    of this document is unlimited.
  21
  22 Summary
  23
  24    This is one RFC of a pair that defines and discusses the requirements
  25    for Internet host software.  This RFC covers the communications
  26    protocol layers: link layer, IP layer, and transport layer; its
  27    companion RFC-1123 covers the application and support protocols.
  28
  29
  30
  31                            Table of Contents
  32
  33
  34
  35
  36    1.  INTRODUCTION ...............................................    5
  37       1.1  The Internet Architecture ..............................    6
  38          1.1.1  Internet Hosts ....................................    6
  39          1.1.2  Architectural Assumptions .........................    7
  40          1.1.3  Internet Protocol Suite ...........................    8
  41          1.1.4  Embedded Gateway Code .............................   10
  42       1.2  General Considerations .................................   12
  43          1.2.1  Continuing Internet Evolution .....................   12
  44          1.2.2  Robustness Principle ..............................   12
  45          1.2.3  Error Logging .....................................   13
  46          1.2.4  Configuration .....................................   14
  47       1.3  Reading this Document ..................................   15
  48          1.3.1  Organization ......................................   15
  49          1.3.2  Requirements ......................................   16
  50          1.3.3  Terminology .......................................   17
  51       1.4  Acknowledgments ........................................   20
  52
  53    2. LINK LAYER ..................................................   21
  54       2.1  INTRODUCTION ...........................................   21
  55
  56
  57
  58 Internet Engineering Task Force                                 [Page 1]
  59 \f
  60
  61
  62
  63 RFC1122                       INTRODUCTION                  October 1989
  64
  65
  66       2.2  PROTOCOL WALK-THROUGH ..................................   21
  67       2.3  SPECIFIC ISSUES ........................................   21
  68          2.3.1  Trailer Protocol Negotiation ......................   21
  69          2.3.2  Address Resolution Protocol -- ARP ................   22
  70             2.3.2.1  ARP Cache Validation .........................   22
  71             2.3.2.2  ARP Packet Queue .............................   24
  72          2.3.3  Ethernet and IEEE 802 Encapsulation ...............   24
  73       2.4  LINK/INTERNET LAYER INTERFACE ..........................   25
  74       2.5  LINK LAYER REQUIREMENTS SUMMARY ........................   26
  75
  76    3. INTERNET LAYER PROTOCOLS ....................................   27
  77       3.1 INTRODUCTION ............................................   27
  78       3.2  PROTOCOL WALK-THROUGH ..................................   29
  79          3.2.1 Internet Protocol -- IP ............................   29
  80             3.2.1.1  Version Number ...............................   29
  81             3.2.1.2  Checksum .....................................   29
  82             3.2.1.3  Addressing ...................................   29
  83             3.2.1.4  Fragmentation and Reassembly .................   32
  84             3.2.1.5  Identification ...............................   32
  85             3.2.1.6  Type-of-Service ..............................   33
  86             3.2.1.7  Time-to-Live .................................   34
  87             3.2.1.8  Options ......................................   35
  88          3.2.2 Internet Control Message Protocol -- ICMP ..........   38
  89             3.2.2.1  Destination Unreachable ......................   39
  90             3.2.2.2  Redirect .....................................   40
  91             3.2.2.3  Source Quench ................................   41
  92             3.2.2.4  Time Exceeded ................................   41
  93             3.2.2.5  Parameter Problem ............................   42
  94             3.2.2.6  Echo Request/Reply ...........................   42
  95             3.2.2.7  Information Request/Reply ....................   43
  96             3.2.2.8  Timestamp and Timestamp Reply ................   43
  97             3.2.2.9  Address Mask Request/Reply ...................   45
  98          3.2.3  Internet Group Management Protocol IGMP ...........   47
  99       3.3  SPECIFIC ISSUES ........................................   47
 100          3.3.1  Routing Outbound Datagrams ........................   47
 101             3.3.1.1  Local/Remote Decision ........................   47
 102             3.3.1.2  Gateway Selection ............................   48
 103             3.3.1.3  Route Cache ..................................   49
 104             3.3.1.4  Dead Gateway Detection .......................   51
 105             3.3.1.5  New Gateway Selection ........................   55
 106             3.3.1.6  Initialization ...............................   56
 107          3.3.2  Reassembly ........................................   56
 108          3.3.3  Fragmentation .....................................   58
 109          3.3.4  Local Multihoming .................................   60
 110             3.3.4.1  Introduction .................................   60
 111             3.3.4.2  Multihoming Requirements .....................   61
 112             3.3.4.3  Choosing a Source Address ....................   64
 113          3.3.5  Source Route Forwarding ...........................   65
 114
 115
 116
 117 Internet Engineering Task Force                                 [Page 2]
 118 \f
 119
 120
 121
 122 RFC1122                       INTRODUCTION                  October 1989
 123
 124
 125          3.3.6  Broadcasts ........................................   66
 126          3.3.7  IP Multicasting ...................................   67
 127          3.3.8  Error Reporting ...................................   69
 128       3.4  INTERNET/TRANSPORT LAYER INTERFACE .....................   69
 129       3.5  INTERNET LAYER REQUIREMENTS SUMMARY ....................   72
 130
 131    4. TRANSPORT PROTOCOLS .........................................   77
 132       4.1  USER DATAGRAM PROTOCOL -- UDP ..........................   77
 133          4.1.1  INTRODUCTION ......................................   77
 134          4.1.2  PROTOCOL WALK-THROUGH .............................   77
 135          4.1.3  SPECIFIC ISSUES ...................................   77
 136             4.1.3.1  Ports ........................................   77
 137             4.1.3.2  IP Options ...................................   77
 138             4.1.3.3  ICMP Messages ................................   78
 139             4.1.3.4  UDP Checksums ................................   78
 140             4.1.3.5  UDP Multihoming ..............................   79
 141             4.1.3.6  Invalid Addresses ............................   79
 142          4.1.4  UDP/APPLICATION LAYER INTERFACE ...................   79
 143          4.1.5  UDP REQUIREMENTS SUMMARY ..........................   80
 144       4.2  TRANSMISSION CONTROL PROTOCOL -- TCP ...................   82
 145          4.2.1  INTRODUCTION ......................................   82
 146          4.2.2  PROTOCOL WALK-THROUGH .............................   82
 147             4.2.2.1  Well-Known Ports .............................   82
 148             4.2.2.2  Use of Push ..................................   82
 149             4.2.2.3  Window Size ..................................   83
 150             4.2.2.4  Urgent Pointer ...............................   84
 151             4.2.2.5  TCP Options ..................................   85
 152             4.2.2.6  Maximum Segment Size Option ..................   85
 153             4.2.2.7  TCP Checksum .................................   86
 154             4.2.2.8  TCP Connection State Diagram .................   86
 155             4.2.2.9  Initial Sequence Number Selection ............   87
 156             4.2.2.10  Simultaneous Open Attempts ..................   87
 157             4.2.2.11  Recovery from Old Duplicate SYN .............   87
 158             4.2.2.12  RST Segment .................................   87
 159             4.2.2.13  Closing a Connection ........................   87
 160             4.2.2.14  Data Communication ..........................   89
 161             4.2.2.15  Retransmission Timeout ......................   90
 162             4.2.2.16  Managing the Window .........................   91
 163             4.2.2.17  Probing Zero Windows ........................   92
 164             4.2.2.18  Passive OPEN Calls ..........................   92
 165             4.2.2.19  Time to Live ................................   93
 166             4.2.2.20  Event Processing ............................   93
 167             4.2.2.21  Acknowledging Queued Segments ...............   94
 168          4.2.3  SPECIFIC ISSUES ...................................   95
 169             4.2.3.1  Retransmission Timeout Calculation ...........   95
 170             4.2.3.2  When to Send an ACK Segment ..................   96
 171             4.2.3.3  When to Send a Window Update .................   97
 172             4.2.3.4  When to Send Data ............................   98
 173
 174
 175
 176 Internet Engineering Task Force                                 [Page 3]
 177 \f
 178
 179
 180
 181 RFC1122                       INTRODUCTION                  October 1989
 182
 183
 184             4.2.3.5  TCP Connection Failures ......................  100
 185             4.2.3.6  TCP Keep-Alives ..............................  101
 186             4.2.3.7  TCP Multihoming ..............................  103
 187             4.2.3.8  IP Options ...................................  103
 188             4.2.3.9  ICMP Messages ................................  103
 189             4.2.3.10  Remote Address Validation ...................  104
 190             4.2.3.11  TCP Traffic Patterns ........................  104
 191             4.2.3.12  Efficiency ..................................  105
 192          4.2.4  TCP/APPLICATION LAYER INTERFACE ...................  106
 193             4.2.4.1  Asynchronous Reports .........................  106
 194             4.2.4.2  Type-of-Service ..............................  107
 195             4.2.4.3  Flush Call ...................................  107
 196             4.2.4.4  Multihoming ..................................  108
 197          4.2.5  TCP REQUIREMENT SUMMARY ...........................  108
 198
 199    5.  REFERENCES .................................................  112
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235 Internet Engineering Task Force                                 [Page 4]
 236 \f
 237
 238
 239
 240 RFC1122                       INTRODUCTION                  October 1989
 241
 242
 243 1.  INTRODUCTION
 244
 245    This document is one of a pair that defines and discusses the
 246    requirements for host system implementations of the Internet protocol
 247    suite.  This RFC covers the communication protocol layers:  link
 248    layer, IP layer, and transport layer.  Its companion RFC,
 249    "Requirements for Internet Hosts -- Application and Support"
 250    [INTRO:1], covers the application layer protocols.  This document
 251    should also be read in conjunction with "Requirements for Internet
 252    Gateways" [INTRO:2].
 253
 254    These documents are intended to provide guidance for vendors,
 255    implementors, and users of Internet communication software.  They
 256    represent the consensus of a large body of technical experience and
 257    wisdom, contributed by the members of the Internet research and
 258    vendor communities.
 259
 260    This RFC enumerates standard protocols that a host connected to the
 261    Internet must use, and it incorporates by reference the RFCs and
 262    other documents describing the current specifications for these
 263    protocols.  It corrects errors in the referenced documents and adds
 264    additional discussion and guidance for an implementor.
 265
 266    For each protocol, this document also contains an explicit set of
 267    requirements, recommendations, and options.  The reader must
 268    understand that the list of requirements in this document is
 269    incomplete by itself; the complete set of requirements for an
 270    Internet host is primarily defined in the standard protocol
 271    specification documents, with the corrections, amendments, and
 272    supplements contained in this RFC.
 273
 274    A good-faith implementation of the protocols that was produced after
 275    careful reading of the RFC's and with some interaction with the
 276    Internet technical community, and that followed good communications
 277    software engineering practices, should differ from the requirements
 278    of this document in only minor ways.  Thus, in many cases, the
 279    "requirements" in this RFC are already stated or implied in the
 280    standard protocol documents, so that their inclusion here is, in a
 281    sense, redundant.  However, they were included because some past
 282    implementation has made the wrong choice, causing problems of
 283    interoperability, performance, and/or robustness.
 284
 285    This document includes discussion and explanation of many of the
 286    requirements and recommendations.  A simple list of requirements
 287    would be dangerous, because:
 288
 289    o    Some required features are more important than others, and some
 290         features are optional.
 291
 292
 293
 294 Internet Engineering Task Force                                 [Page 5]
 295 \f
 296
 297
 298
 299 RFC1122                       INTRODUCTION                  October 1989
 300
 301
 302    o    There may be valid reasons why particular vendor products that
 303         are designed for restricted contexts might choose to use
 304         different specifications.
 305
 306    However, the specifications of this document must be followed to meet
 307    the general goal of arbitrary host interoperation across the
 308    diversity and complexity of the Internet system.  Although most
 309    current implementations fail to meet these requirements in various
 310    ways, some minor and some major, this specification is the ideal
 311    towards which we need to move.
 312
 313    These requirements are based on the current level of Internet
 314    architecture.  This document will be updated as required to provide
 315    additional clarifications or to include additional information in
 316    those areas in which specifications are still evolving.
 317
 318    This introductory section begins with a brief overview of the
 319    Internet architecture as it relates to hosts, and then gives some
 320    general advice to host software vendors.  Finally, there is some
 321    guidance on reading the rest of the document and some terminology.
 322
 323    1.1  The Internet Architecture
 324
 325       General background and discussion on the Internet architecture and
 326       supporting protocol suite can be found in the DDN Protocol
 327       Handbook [INTRO:3]; for background see for example [INTRO:9],
 328       [INTRO:10], and [INTRO:11].  Reference [INTRO:5] describes the
 329       procedure for obtaining Internet protocol documents, while
 330       [INTRO:6] contains a list of the numbers assigned within Internet
 331       protocols.
 332
 333       1.1.1  Internet Hosts
 334
 335          A host computer, or simply "host," is the ultimate consumer of
 336          communication services.  A host generally executes application
 337          programs on behalf of user(s), employing network and/or
 338          Internet communication services in support of this function.
 339          An Internet host corresponds to the concept of an "End-System"
 340          used in the OSI protocol suite [INTRO:13].
 341
 342          An Internet communication system consists of interconnected
 343          packet networks supporting communication among host computers
 344          using the Internet protocols.  The networks are interconnected
 345          using packet-switching computers called "gateways" or "IP
 346          routers" by the Internet community, and "Intermediate Systems"
 347          by the OSI world [INTRO:13].  The RFC "Requirements for
 348          Internet Gateways" [INTRO:2] contains the official
 349          specifications for Internet gateways.  That RFC together with
 350
 351
 352
 353 Internet Engineering Task Force                                 [Page 6]
 354 \f
 355
 356
 357
 358 RFC1122                       INTRODUCTION                  October 1989
 359
 360
 361          the present document and its companion [INTRO:1] define the
 362          rules for the current realization of the Internet architecture.
 363
 364          Internet hosts span a wide range of size, speed, and function.
 365          They range in size from small microprocessors through
 366          workstations to mainframes and supercomputers.  In function,
 367          they range from single-purpose hosts (such as terminal servers)
 368          to full-service hosts that support a variety of online network
 369          services, typically including remote login, file transfer, and
 370          electronic mail.
 371
 372          A host is generally said to be multihomed if it has more than
 373          one interface to the same or to different networks.  See
 374          Section 1.1.3 on "Terminology".
 375
 376       1.1.2  Architectural Assumptions
 377
 378          The current Internet architecture is based on a set of
 379          assumptions about the communication system.  The assumptions
 380          most relevant to hosts are as follows:
 381
 382          (a)  The Internet is a network of networks.
 383
 384               Each host is directly connected to some particular
 385               network(s); its connection to the Internet is only
 386               conceptual.  Two hosts on the same network communicate
 387               with each other using the same set of protocols that they
 388               would use to communicate with hosts on distant networks.
 389
 390          (b)  Gateways don't keep connection state information.
 391
 392               To improve robustness of the communication system,
 393               gateways are designed to be stateless, forwarding each IP
 394               datagram independently of other datagrams.  As a result,
 395               redundant paths can be exploited to provide robust service
 396               in spite of failures of intervening gateways and networks.
 397
 398               All state information required for end-to-end flow control
 399               and reliability is implemented in the hosts, in the
 400               transport layer or in application programs.  All
 401               connection control information is thus co-located with the
 402               end points of the communication, so it will be lost only
 403               if an end point fails.
 404
 405          (c)  Routing complexity should be in the gateways.
 406
 407               Routing is a complex and difficult problem, and ought to
 408               be performed by the gateways, not the hosts.  An important
 409
 410
 411
 412 Internet Engineering Task Force                                 [Page 7]
 413 \f
 414
 415
 416
 417 RFC1122                       INTRODUCTION                  October 1989
 418
 419
 420               objective is to insulate host software from changes caused
 421               by the inevitable evolution of the Internet routing
 422               architecture.
 423
 424          (d)  The System must tolerate wide network variation.
 425
 426               A basic objective of the Internet design is to tolerate a
 427               wide range of network characteristics -- e.g., bandwidth,
 428               delay, packet loss, packet reordering, and maximum packet
 429               size.  Another objective is robustness against failure of
 430               individual networks, gateways, and hosts, using whatever
 431               bandwidth is still available.  Finally, the goal is full
 432               "open system interconnection": an Internet host must be
 433               able to interoperate robustly and effectively with any
 434               other Internet host, across diverse Internet paths.
 435
 436               Sometimes host implementors have designed for less
 437               ambitious goals.  For example, the LAN environment is
 438               typically much more benign than the Internet as a whole;
 439               LANs have low packet loss and delay and do not reorder
 440               packets.  Some vendors have fielded host implementations
 441               that are adequate for a simple LAN environment, but work
 442               badly for general interoperation.  The vendor justifies
 443               such a product as being economical within the restricted
 444               LAN market.  However, isolated LANs seldom stay isolated
 445               for long; they are soon gatewayed to each other, to
 446               organization-wide internets, and eventually to the global
 447               Internet system.  In the end, neither the customer nor the
 448               vendor is served by incomplete or substandard Internet
 449               host software.
 450
 451               The requirements spelled out in this document are designed
 452               for a full-function Internet host, capable of full
 453               interoperation over an arbitrary Internet path.
 454
 455
 456       1.1.3  Internet Protocol Suite
 457
 458          To communicate using the Internet system, a host must implement
 459          the layered set of protocols comprising the Internet protocol
 460          suite.  A host typically must implement at least one protocol
 461          from each layer.
 462
 463          The protocol layers used in the Internet architecture are as
 464          follows [INTRO:4]:
 465
 466
 467          o  Application Layer
 468
 469
 470
 471 Internet Engineering Task Force                                 [Page 8]
 472 \f
 473
 474
 475
 476 RFC1122                       INTRODUCTION                  October 1989
 477
 478
 479               The application layer is the top layer of the Internet
 480               protocol suite.  The Internet suite does not further
 481               subdivide the application layer, although some of the
 482               Internet application layer protocols do contain some
 483               internal sub-layering.  The application layer of the
 484               Internet suite essentially combines the functions of the
 485               top two layers -- Presentation and Application -- of the
 486               OSI reference model.
 487
 488               We distinguish two categories of application layer
 489               protocols:  user protocols that provide service directly
 490               to users, and support protocols that provide common system
 491               functions.  Requirements for user and support protocols
 492               will be found in the companion RFC [INTRO:1].
 493
 494               The most common Internet user protocols are:
 495
 496                 o  Telnet (remote login)
 497                 o  FTP    (file transfer)
 498                 o  SMTP   (electronic mail delivery)
 499
 500               There are a number of other standardized user protocols
 501               [INTRO:4] and many private user protocols.
 502
 503               Support protocols, used for host name mapping, booting,
 504               and management, include SNMP, BOOTP, RARP, and the Domain
 505               Name System (DNS) protocols.
 506
 507
 508          o  Transport Layer
 509
 510               The transport layer provides end-to-end communication
 511               services for applications.  There are two primary
 512               transport layer protocols at present:
 513
 514                 o Transmission Control Protocol (TCP)
 515                 o User Datagram Protocol (UDP)
 516
 517               TCP is a reliable connection-oriented transport service
 518               that provides end-to-end reliability, resequencing, and
 519               flow control.  UDP is a connectionless ("datagram")
 520               transport service.
 521
 522               Other transport protocols have been developed by the
 523               research community, and the set of official Internet
 524               transport protocols may be expanded in the future.
 525
 526               Transport layer protocols are discussed in Chapter 4.
 527
 528
 529
 530 Internet Engineering Task Force                                 [Page 9]
 531 \f
 532
 533
 534
 535 RFC1122                       INTRODUCTION                  October 1989
 536
 537
 538          o  Internet Layer
 539
 540               All Internet transport protocols use the Internet Protocol
 541               (IP) to carry data from source host to destination host.
 542               IP is a connectionless or datagram internetwork service,
 543               providing no end-to-end delivery guarantees. Thus, IP
 544               datagrams may arrive at the destination host damaged,
 545               duplicated, out of order, or not at all.  The layers above
 546               IP are responsible for reliable delivery service when it
 547               is required.  The IP protocol includes provision for
 548               addressing, type-of-service specification, fragmentation
 549               and reassembly, and security information.
 550
 551               The datagram or connectionless nature of the IP protocol
 552               is a fundamental and characteristic feature of the
 553               Internet architecture.  Internet IP was the model for the
 554               OSI Connectionless Network Protocol [INTRO:12].
 555
 556               ICMP is a control protocol that is considered to be an
 557               integral part of IP, although it is architecturally
 558               layered upon IP, i.e., it uses IP to carry its data end-
 559               to-end just as a transport protocol like TCP or UDP does.
 560               ICMP provides error reporting, congestion reporting, and
 561               first-hop gateway redirection.
 562
 563               IGMP is an Internet layer protocol used for establishing
 564               dynamic host groups for IP multicasting.
 565
 566               The Internet layer protocols IP, ICMP, and IGMP are
 567               discussed in Chapter 3.
 568
 569
 570          o  Link Layer
 571
 572               To communicate on its directly-connected network, a host
 573               must implement the communication protocol used to
 574               interface to that network.  We call this a link layer or
 575               media-access layer protocol.
 576
 577               There is a wide variety of link layer protocols,
 578               corresponding to the many different types of networks.
 579               See Chapter 2.
 580
 581
 582       1.1.4  Embedded Gateway Code
 583
 584          Some Internet host software includes embedded gateway
 585          functionality, so that these hosts can forward packets as a
 586
 587
 588
 589 Internet Engineering Task Force                                [Page 10]
 590 \f
 591
 592
 593
 594 RFC1122                       INTRODUCTION                  October 1989
 595
 596
 597          gateway would, while still performing the application layer
 598          functions of a host.
 599
 600          Such dual-purpose systems must follow the Gateway Requirements
 601          RFC [INTRO:2]  with respect to their gateway functions, and
 602          must follow the present document with respect to their host
 603          functions.  In all overlapping cases, the two specifications
 604          should be in agreement.
 605
 606          There are varying opinions in the Internet community about
 607          embedded gateway functionality.  The main arguments are as
 608          follows:
 609
 610          o    Pro: in a local network environment where networking is
 611               informal, or in isolated internets, it may be convenient
 612               and economical to use existing host systems as gateways.
 613
 614               There is also an architectural argument for embedded
 615               gateway functionality: multihoming is much more common
 616               than originally foreseen, and multihoming forces a host to
 617               make routing decisions as if it were a gateway.  If the
 618               multihomed  host contains an embedded gateway, it will
 619               have full routing knowledge and as a result will be able
 620               to make more optimal routing decisions.
 621
 622          o    Con: Gateway algorithms and protocols are still changing,
 623               and they will continue to change as the Internet system
 624               grows larger.  Attempting to include a general gateway
 625               function within the host IP layer will force host system
 626               maintainers to track these (more frequent) changes.  Also,
 627               a larger pool of gateway implementations will make
 628               coordinating the changes more difficult.  Finally, the
 629               complexity of a gateway IP layer is somewhat greater than
 630               that of a host, making the implementation and operation
 631               tasks more complex.
 632
 633               In addition, the style of operation of some hosts is not
 634               appropriate for providing stable and robust gateway
 635               service.
 636
 637          There is considerable merit in both of these viewpoints.  One
 638          conclusion can be drawn: an host administrator must have
 639          conscious control over whether or not a given host acts as a
 640          gateway.  See Section 3.1 for the detailed requirements.
 641
 642
 643
 644
 645
 646
 647
 648 Internet Engineering Task Force                                [Page 11]
 649 \f
 650
 651
 652
 653 RFC1122                       INTRODUCTION                  October 1989
 654
 655
 656    1.2  General Considerations
 657
 658       There are two important lessons that vendors of Internet host
 659       software have learned and which a new vendor should consider
 660       seriously.
 661
 662       1.2.1  Continuing Internet Evolution
 663
 664          The enormous growth of the Internet has revealed problems of
 665          management and scaling in a large datagram-based packet
 666          communication system.  These problems are being addressed, and
 667          as a result there will be continuing evolution of the
 668          specifications described in this document.  These changes will
 669          be carefully planned and controlled, since there is extensive
 670          participation in this planning by the vendors and by the
 671          organizations responsible for operations of the networks.
 672
 673          Development, evolution, and revision are characteristic of
 674          computer network protocols today, and this situation will
 675          persist for some years.  A vendor who develops computer
 676          communication software for the Internet protocol suite (or any
 677          other protocol suite!) and then fails to maintain and update
 678          that software for changing specifications is going to leave a
 679          trail of unhappy customers.  The Internet is a large
 680          communication network, and the users are in constant contact
 681          through it.  Experience has shown that knowledge of
 682          deficiencies in vendor software propagates quickly through the
 683          Internet technical community.
 684
 685       1.2.2  Robustness Principle
 686
 687          At every layer of the protocols, there is a general rule whose
 688          application can lead to enormous benefits in robustness and
 689          interoperability [IP:1]:
 690
 691                 "Be liberal in what you accept, and
 692                  conservative in what you send"
 693
 694          Software should be written to deal with every conceivable
 695          error, no matter how unlikely; sooner or later a packet will
 696          come in with that particular combination of errors and
 697          attributes, and unless the software is prepared, chaos can
 698          ensue.  In general, it is best to assume that the network is
 699          filled with malevolent entities that will send in packets
 700          designed to have the worst possible effect.  This assumption
 701          will lead to suitable protective design, although the most
 702          serious problems in the Internet have been caused by
 703          unenvisaged mechanisms triggered by low-probability events;
 704
 705
 706
 707 Internet Engineering Task Force                                [Page 12]
 708 \f
 709
 710
 711
 712 RFC1122                       INTRODUCTION                  October 1989
 713
 714
 715          mere human malice would never have taken so devious a course!
 716
 717          Adaptability to change must be designed into all levels of
 718          Internet host software.  As a simple example, consider a
 719          protocol specification that contains an enumeration of values
 720          for a particular header field -- e.g., a type field, a port
 721          number, or an error code; this enumeration must be assumed to
 722          be incomplete.  Thus, if a protocol specification defines four
 723          possible error codes, the software must not break when a fifth
 724          code shows up.  An undefined code might be logged (see below),
 725          but it must not cause a failure.
 726
 727          The second part of the principle is almost as important:
 728          software on other hosts may contain deficiencies that make it
 729          unwise to exploit legal but obscure protocol features.  It is
 730          unwise to stray far from the obvious and simple, lest untoward
 731          effects result elsewhere.  A corollary of this is "watch out
 732          for misbehaving hosts"; host software should be prepared, not
 733          just to survive other misbehaving hosts, but also to cooperate
 734          to limit the amount of disruption such hosts can cause to the
 735          shared communication facility.
 736
 737       1.2.3  Error Logging
 738
 739          The Internet includes a great variety of host and gateway
 740          systems, each implementing many protocols and protocol layers,
 741          and some of these contain bugs and mis-features in their
 742          Internet protocol software.  As a result of complexity,
 743          diversity, and distribution of function, the diagnosis of
 744          Internet problems is often very difficult.
 745
 746          Problem diagnosis will be aided if host implementations include
 747          a carefully designed facility for logging erroneous or
 748          "strange" protocol events.  It is important to include as much
 749          diagnostic information as possible when an error is logged.  In
 750          particular, it is often useful to record the header(s) of a
 751          packet that caused an error.  However, care must be taken to
 752          ensure that error logging does not consume prohibitive amounts
 753          of resources or otherwise interfere with the operation of the
 754          host.
 755
 756          There is a tendency for abnormal but harmless protocol events
 757          to overflow error logging files; this can be avoided by using a
 758          "circular" log, or by enabling logging only while diagnosing a
 759          known failure.  It may be useful to filter and count duplicate
 760          successive messages.  One strategy that seems to work well is:
 761          (1) always count abnormalities and make such counts accessible
 762          through the management protocol (see [INTRO:1]); and (2) allow
 763
 764
 765
 766 Internet Engineering Task Force                                [Page 13]
 767 \f
 768
 769
 770
 771 RFC1122                       INTRODUCTION                  October 1989
 772
 773
 774          the logging of a great variety of events to be selectively
 775          enabled.  For example, it might useful to be able to "log
 776          everything" or to "log everything for host X".
 777
 778          Note that different managements may have differing policies
 779          about the amount of error logging that they want normally
 780          enabled in a host.  Some will say, "if it doesn't hurt me, I
 781          don't want to know about it", while others will want to take a
 782          more watchful and aggressive attitude about detecting and
 783          removing protocol abnormalities.
 784
 785       1.2.4  Configuration
 786
 787          It would be ideal if a host implementation of the Internet
 788          protocol suite could be entirely self-configuring.  This would
 789          allow the whole suite to be implemented in ROM or cast into
 790          silicon, it would simplify diskless workstations, and it would
 791          be an immense boon to harried LAN administrators as well as
 792          system vendors.  We have not reached this ideal; in fact, we
 793          are not even close.
 794
 795          At many points in this document, you will find a requirement
 796          that a parameter be a configurable option.  There are several
 797          different reasons behind such requirements.  In a few cases,
 798          there is current uncertainty or disagreement about the best
 799          value, and it may be necessary to update the recommended value
 800          in the future.  In other cases, the value really depends on
 801          external factors -- e.g., the size of the host and the
 802          distribution of its communication load, or the speeds and
 803          topology of nearby networks -- and self-tuning algorithms are
 804          unavailable and may be insufficient.  In some cases,
 805          configurability is needed because of administrative
 806          requirements.
 807
 808          Finally, some configuration options are required to communicate
 809          with obsolete or incorrect implementations of the protocols,
 810          distributed without sources, that unfortunately persist in many
 811          parts of the Internet.  To make correct systems coexist with
 812          these faulty systems, administrators often have to "mis-
 813          configure" the correct systems.  This problem will correct
 814          itself gradually as the faulty systems are retired, but it
 815          cannot be ignored by vendors.
 816
 817          When we say that a parameter must be configurable, we do not
 818          intend to require that its value be explicitly read from a
 819          configuration file at every boot time.  We recommend that
 820          implementors set up a default for each parameter, so a
 821          configuration file is only necessary to override those defaults
 822
 823
 824
 825 Internet Engineering Task Force                                [Page 14]
 826 \f
 827
 828
 829
 830 RFC1122                       INTRODUCTION                  October 1989
 831
 832
 833          that are inappropriate in a particular installation.  Thus, the
 834          configurability requirement is an assurance that it will be
 835          POSSIBLE to override the default when necessary, even in a
 836          binary-only or ROM-based product.
 837
 838          This document requires a particular value for such defaults in
 839          some cases.  The choice of default is a sensitive issue when
 840          the configuration item controls the accommodation to existing
 841          faulty systems.  If the Internet is to converge successfully to
 842          complete interoperability, the default values built into
 843          implementations must implement the official protocol, not
 844          "mis-configurations" to accommodate faulty implementations.
 845          Although marketing considerations have led some vendors to
 846          choose mis-configuration defaults, we urge vendors to choose
 847          defaults that will conform to the standard.
 848
 849          Finally, we note that a vendor needs to provide adequate
 850          documentation on all configuration parameters, their limits and
 851          effects.
 852
 853
 854    1.3  Reading this Document
 855
 856       1.3.1  Organization
 857
 858          Protocol layering, which is generally used as an organizing
 859          principle in implementing network software, has also been used
 860          to organize this document.  In describing the rules, we assume
 861          that an implementation does strictly mirror the layering of the
 862          protocols.  Thus, the following three major sections specify
 863          the requirements for the link layer, the internet layer, and
 864          the transport layer, respectively.  A companion RFC [INTRO:1]
 865          covers application level software.  This layerist organization
 866          was chosen for simplicity and clarity.
 867
 868          However, strict layering is an imperfect model, both for the
 869          protocol suite and for recommended implementation approaches.
 870          Protocols in different layers interact in complex and sometimes
 871          subtle ways, and particular functions often involve multiple
 872          layers.  There are many design choices in an implementation,
 873          many of which involve creative "breaking" of strict layering.
 874          Every implementor is urged to read references [INTRO:7] and
 875          [INTRO:8].
 876
 877          This document describes the conceptual service interface
 878          between layers using a functional ("procedure call") notation,
 879          like that used in the TCP specification [TCP:1].  A host
 880          implementation must support the logical information flow
 881
 882
 883
 884 Internet Engineering Task Force                                [Page 15]
 885 \f
 886
 887
 888
 889 RFC1122                       INTRODUCTION                  October 1989
 890
 891
 892          implied by these calls, but need not literally implement the
 893          calls themselves.  For example, many implementations reflect
 894          the coupling between the transport layer and the IP layer by
 895          giving them shared access to common data structures.  These
 896          data structures, rather than explicit procedure calls, are then
 897          the agency for passing much of the information that is
 898          required.
 899
 900          In general, each major section of this document is organized
 901          into the following subsections:
 902
 903          (1)  Introduction
 904
 905          (2)  Protocol Walk-Through -- considers the protocol
 906               specification documents section-by-section, correcting
 907               errors, stating requirements that may be ambiguous or
 908               ill-defined, and providing further clarification or
 909               explanation.
 910
 911          (3)  Specific Issues -- discusses protocol design and
 912               implementation issues that were not included in the walk-
 913               through.
 914
 915          (4)  Interfaces -- discusses the service interface to the next
 916               higher layer.
 917
 918          (5)  Summary -- contains a summary of the requirements of the
 919               section.
 920
 921
 922          Under many of the individual topics in this document, there is
 923          parenthetical material labeled "DISCUSSION" or
 924          "IMPLEMENTATION". This material is intended to give
 925          clarification and explanation of the preceding requirements
 926          text.  It also includes some suggestions on possible future
 927          directions or developments.  The implementation material
 928          contains suggested approaches that an implementor may want to
 929          consider.
 930
 931          The summary sections are intended to be guides and indexes to
 932          the text, but are necessarily cryptic and incomplete.  The
 933          summaries should never be used or referenced separately from
 934          the complete RFC.
 935
 936       1.3.2  Requirements
 937
 938          In this document, the words that are used to define the
 939          significance of each particular requirement are capitalized.
 940
 941
 942
 943 Internet Engineering Task Force                                [Page 16]
 944 \f
 945
 946
 947
 948 RFC1122                       INTRODUCTION                  October 1989
 949
 950
 951          These words are:
 952
 953          *    "MUST"
 954
 955               This word or the adjective "REQUIRED" means that the item
 956               is an absolute requirement of the specification.
 957
 958          *    "SHOULD"
 959
 960               This word or the adjective "RECOMMENDED" means that there
 961               may exist valid reasons in particular circumstances to
 962               ignore this item, but the full implications should be
 963               understood and the case carefully weighed before choosing
 964               a different course.
 965
 966          *    "MAY"
 967
 968               This word or the adjective "OPTIONAL" means that this item
 969               is truly optional.  One vendor may choose to include the
 970               item because a particular marketplace requires it or
 971               because it enhances the product, for example; another
 972               vendor may omit the same item.
 973
 974
 975          An implementation is not compliant if it fails to satisfy one
 976          or more of the MUST requirements for the protocols it
 977          implements.  An implementation that satisfies all the MUST and
 978          all the SHOULD requirements for its protocols is said to be
 979          "unconditionally compliant"; one that satisfies all the MUST
 980          requirements but not all the SHOULD requirements for its
 981          protocols is said to be "conditionally compliant".
 982
 983       1.3.3  Terminology
 984
 985          This document uses the following technical terms:
 986
 987          Segment
 988               A segment is the unit of end-to-end transmission in the
 989               TCP protocol.  A segment consists of a TCP header followed
 990               by application data.  A segment is transmitted by
 991               encapsulation inside an IP datagram.
 992
 993          Message
 994               In this description of the lower-layer protocols, a
 995               message is the unit of transmission in a transport layer
 996               protocol.  In particular, a TCP segment is a message.  A
 997               message consists of a transport protocol header followed
 998               by application protocol data.  To be transmitted end-to-
 999
1000
1001
1002 Internet Engineering Task Force                                [Page 17]
1003 \f
1004
1005
1006
1007 RFC1122                       INTRODUCTION                  October 1989
1008
1009
1010               end through the Internet, a message must be encapsulated
1011               inside a datagram.
1012
1013          IP Datagram
1014               An IP datagram is the unit of end-to-end transmission in
1015               the IP protocol.  An IP datagram consists of an IP header
1016               followed by transport layer data, i.e., of an IP header
1017               followed by a message.
1018
1019               In the description of the internet layer (Section 3), the
1020               unqualified term "datagram" should be understood to refer
1021               to an IP datagram.
1022
1023          Packet
1024               A packet is the unit of data passed across the interface
1025               between the internet layer and the link layer.  It
1026               includes an IP header and data.  A packet may be a
1027               complete IP datagram or a fragment of an IP datagram.
1028
1029          Frame
1030               A frame is the unit of transmission in a link layer
1031               protocol, and consists of a link-layer header followed by
1032               a packet.
1033
1034          Connected Network
1035               A network to which a host is interfaced is often known as
1036               the "local network" or the "subnetwork" relative to that
1037               host.  However, these terms can cause confusion, and
1038               therefore we use the term "connected network" in this
1039               document.
1040
1041          Multihomed
1042               A host is said to be multihomed if it has multiple IP
1043               addresses.  For a discussion of multihoming, see Section
1044               3.3.4 below.
1045
1046          Physical network interface
1047               This is a physical interface to a connected network and
1048               has a (possibly unique) link-layer address.  Multiple
1049               physical network interfaces on a single host may share the
1050               same link-layer address, but the address must be unique
1051               for different hosts on the same physical network.
1052
1053          Logical [network] interface
1054               We define a logical [network] interface to be a logical
1055               path, distinguished by a unique IP address, to a connected
1056               network.  See Section 3.3.4.
1057
1058
1059
1060
1061 Internet Engineering Task Force                                [Page 18]
1062 \f
1063
1064
1065
1066 RFC1122                       INTRODUCTION                  October 1989
1067
1068
1069          Specific-destination address
1070               This is the effective destination address of a datagram,
1071               even if it is broadcast or multicast; see Section 3.2.1.3.
1072
1073          Path
1074               At a given moment, all the IP datagrams from a particular
1075               source host to a particular destination host will
1076               typically traverse the same sequence of gateways.  We use
1077               the term "path" for this sequence.  Note that a path is
1078               uni-directional; it is not unusual to have different paths
1079               in the two directions between a given host pair.
1080
1081          MTU
1082               The maximum transmission unit, i.e., the size of the
1083               largest packet that can be transmitted.
1084
1085
1086          The terms frame, packet, datagram, message, and segment are
1087          illustrated by the following schematic diagrams:
1088
1089          A. Transmission on connected network:
1090            _______________________________________________
1091           | LL hdr | IP hdr |         (data)              |
1092           |________|________|_____________________________|
1093
1094            <---------- Frame ----------------------------->
1095                     <----------Packet -------------------->
1096
1097
1098          B. Before IP fragmentation or after IP reassembly:
1099                     ______________________________________
1100                    | IP hdr | transport| Application Data |
1101                    |________|____hdr___|__________________|
1102
1103                     <--------  Datagram ------------------>
1104                              <-------- Message ----------->
1105            or, for TCP:
1106                     ______________________________________
1107                    | IP hdr |  TCP hdr | Application Data |
1108                    |________|__________|__________________|
1109
1110                     <--------  Datagram ------------------>
1111                              <-------- Segment ----------->
1112
1113
1114
1115
1116
1117
1118
1119
1120 Internet Engineering Task Force                                [Page 19]
1121 \f
1122
1123
1124
1125 RFC1122                       INTRODUCTION                  October 1989
1126
1127
1128    1.4  Acknowledgments
1129
1130       This document incorporates contributions and comments from a large
1131       group of Internet protocol experts, including representatives of
1132       university and research labs, vendors, and government agencies.
1133       It was assembled primarily by the Host Requirements Working Group
1134       of the Internet Engineering Task Force (IETF).
1135
1136       The Editor would especially like to acknowledge the tireless
1137       dedication of the following people, who attended many long
1138       meetings and generated 3 million bytes of electronic mail over the
1139       past 18 months in pursuit of this document: Philip Almquist, Dave
1140       Borman (Cray Research), Noel Chiappa, Dave Crocker (DEC), Steve
1141       Deering (Stanford), Mike Karels (Berkeley), Phil Karn (Bellcore),
1142       John Lekashman (NASA), Charles Lynn (BBN), Keith McCloghrie (TWG),
1143       Paul Mockapetris (ISI), Thomas Narten (Purdue), Craig Partridge
1144       (BBN), Drew Perkins (CMU), and James Van Bokkelen (FTP Software).
1145
1146       In addition, the following people made major contributions to the
1147       effort: Bill Barns (Mitre), Steve Bellovin (AT&T), Mike Brescia
1148       (BBN), Ed Cain (DCA), Annette DeSchon (ISI), Martin Gross (DCA),
1149       Phill Gross (NRI), Charles Hedrick (Rutgers), Van Jacobson (LBL),
1150       John Klensin (MIT), Mark Lottor (SRI), Milo Medin (NASA), Bill
1151       Melohn (Sun Microsystems), Greg Minshall (Kinetics), Jeff Mogul
1152       (DEC), John Mullen (CMC), Jon Postel (ISI), John Romkey (Epilogue
1153       Technology), and Mike StJohns (DCA).  The following also made
1154       significant contributions to particular areas: Eric Allman
1155       (Berkeley), Rob Austein (MIT), Art Berggreen (ACC), Keith Bostic
1156       (Berkeley), Vint Cerf (NRI), Wayne Hathaway (NASA), Matt Korn
1157       (IBM), Erik Naggum (Naggum Software, Norway), Robert Ullmann
1158       (Prime Computer), David Waitzman (BBN), Frank Wancho (USA), Arun
1159       Welch (Ohio State), Bill Westfield (Cisco), and Rayan Zachariassen
1160       (Toronto).
1161
1162       We are grateful to all, including any contributors who may have
1163       been inadvertently omitted from this list.
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179 Internet Engineering Task Force                                [Page 20]
1180 \f
1181
1182
1183
1184 RFC1122                        LINK LAYER                   October 1989
1185
1186
1187 2. LINK LAYER
1188
1189    2.1  INTRODUCTION
1190
1191       All Internet systems, both hosts and gateways, have the same
1192       requirements for link layer protocols.  These requirements are
1193       given in Chapter 3 of "Requirements for Internet Gateways"
1194       [INTRO:2], augmented with the material in this section.
1195
1196    2.2  PROTOCOL WALK-THROUGH
1197
1198       None.
1199
1200    2.3  SPECIFIC ISSUES
1201
1202       2.3.1  Trailer Protocol Negotiation
1203
1204          The trailer protocol [LINK:1] for link-layer encapsulation MAY
1205          be used, but only when it has been verified that both systems
1206          (host or gateway) involved in the link-layer communication
1207          implement trailers.  If the system does not dynamically
1208          negotiate use of the trailer protocol on a per-destination
1209          basis, the default configuration MUST disable the protocol.
1210
1211          DISCUSSION:
1212               The trailer protocol is a link-layer encapsulation
1213               technique that rearranges the data contents of packets
1214               sent on the physical network.  In some cases, trailers
1215               improve the throughput of higher layer protocols by
1216               reducing the amount of data copying within the operating
1217               system.  Higher layer protocols are unaware of trailer
1218               use, but both the sending and receiving host MUST
1219               understand the protocol if it is used.
1220
1221               Improper use of trailers can result in very confusing
1222               symptoms.  Only packets with specific size attributes are
1223               encapsulated using trailers, and typically only a small
1224               fraction of the packets being exchanged have these
1225               attributes.  Thus, if a system using trailers exchanges
1226               packets with a system that does not, some packets
1227               disappear into a black hole while others are delivered
1228               successfully.
1229
1230          IMPLEMENTATION:
1231               On an Ethernet, packets encapsulated with trailers use a
1232               distinct Ethernet type [LINK:1], and trailer negotiation
1233               is performed at the time that ARP is used to discover the
1234               link-layer address of a destination system.
1235
1236
1237
1238 Internet Engineering Task Force                                [Page 21]
1239 \f
1240
1241
1242
1243 RFC1122                        LINK LAYER                   October 1989
1244
1245
1246               Specifically, the ARP exchange is completed in the usual
1247               manner using the normal IP protocol type, but a host that
1248               wants to speak trailers will send an additional "trailer
1249               ARP reply" packet, i.e., an ARP reply that specifies the
1250               trailer encapsulation protocol type but otherwise has the
1251               format of a normal ARP reply.  If a host configured to use
1252               trailers receives a trailer ARP reply message from a
1253               remote machine, it can add that machine to the list of
1254               machines that understand trailers, e.g., by marking the
1255               corresponding entry in the ARP cache.
1256
1257               Hosts wishing to receive trailer encapsulations send
1258               trailer ARP replies whenever they complete exchanges of
1259               normal ARP messages for IP.  Thus, a host that received an
1260               ARP request for its IP protocol address would send a
1261               trailer ARP reply in addition to the normal IP ARP reply;
1262               a host that sent the IP ARP request would send a trailer
1263               ARP reply when it received the corresponding IP ARP reply.
1264               In this way, either the requesting or responding host in
1265               an IP ARP exchange may request that it receive trailer
1266               encapsulations.
1267
1268               This scheme, using extra trailer ARP reply packets rather
1269               than sending an ARP request for the trailer protocol type,
1270               was designed to avoid a continuous exchange of ARP packets
1271               with a misbehaving host that, contrary to any
1272               specification or common sense, responded to an ARP reply
1273               for trailers with another ARP reply for IP.  This problem
1274               is avoided by sending a trailer ARP reply in response to
1275               an IP ARP reply only when the IP ARP reply answers an
1276               outstanding request; this is true when the hardware
1277               address for the host is still unknown when the IP ARP
1278               reply is received.  A trailer ARP reply may always be sent
1279               along with an IP ARP reply responding to an IP ARP
1280               request.
1281
1282       2.3.2  Address Resolution Protocol -- ARP
1283
1284          2.3.2.1  ARP Cache Validation
1285
1286             An implementation of the Address Resolution Protocol (ARP)
1287             [LINK:2] MUST provide a mechanism to flush out-of-date cache
1288             entries.  If this mechanism involves a timeout, it SHOULD be
1289             possible to configure the timeout value.
1290
1291             A mechanism to prevent ARP flooding (repeatedly sending an
1292             ARP Request for the same IP address, at a high rate) MUST be
1293             included.  The recommended maximum rate is 1 per second per
1294
1295
1296
1297 Internet Engineering Task Force                                [Page 22]
1298 \f
1299
1300
1301
1302 RFC1122                        LINK LAYER                   October 1989
1303
1304
1305             destination.
1306
1307             DISCUSSION:
1308                  The ARP specification [LINK:2] suggests but does not
1309                  require a timeout mechanism to invalidate cache entries
1310                  when hosts change their Ethernet addresses.  The
1311                  prevalence of proxy ARP (see Section 2.4 of [INTRO:2])
1312                  has significantly increased the likelihood that cache
1313                  entries in hosts will become invalid, and therefore
1314                  some ARP-cache invalidation mechanism is now required
1315                  for hosts.  Even in the absence of proxy ARP, a long-
1316                  period cache timeout is useful in order to
1317                  automatically correct any bad ARP data that might have
1318                  been cached.
1319
1320             IMPLEMENTATION:
1321                  Four mechanisms have been used, sometimes in
1322                  combination, to flush out-of-date cache entries.
1323
1324                  (1)  Timeout -- Periodically time out cache entries,
1325                       even if they are in use.  Note that this timeout
1326                       should be restarted when the cache entry is
1327                       "refreshed" (by observing the source fields,
1328                       regardless of target address, of an ARP broadcast
1329                       from the system in question).  For proxy ARP
1330                       situations, the timeout needs to be on the order
1331                       of a minute.
1332
1333                  (2)  Unicast Poll -- Actively poll the remote host by
1334                       periodically sending a point-to-point ARP Request
1335                       to it, and delete the entry if no ARP Reply is
1336                       received from N successive polls.  Again, the
1337                       timeout should be on the order of a minute, and
1338                       typically N is 2.
1339
1340                  (3)  Link-Layer Advice -- If the link-layer driver
1341                       detects a delivery problem, flush the
1342                       corresponding ARP cache entry.
1343
1344                  (4)  Higher-layer Advice -- Provide a call from the
1345                       Internet layer to the link layer to indicate a
1346                       delivery problem.  The effect of this call would
1347                       be to invalidate the corresponding cache entry.
1348                       This call would be analogous to the
1349                       "ADVISE_DELIVPROB()" call from the transport layer
1350                       to the Internet layer (see Section 3.4), and in
1351                       fact the ADVISE_DELIVPROB routine might in turn
1352                       call the link-layer advice routine to invalidate
1353
1354
1355
1356 Internet Engineering Task Force                                [Page 23]
1357 \f
1358
1359
1360
1361 RFC1122                        LINK LAYER                   October 1989
1362
1363
1364                       the ARP cache entry.
1365
1366                  Approaches (1) and (2) involve ARP cache timeouts on
1367                  the order of a minute or less.  In the absence of proxy
1368                  ARP, a timeout this short could create noticeable
1369                  overhead traffic on a very large Ethernet.  Therefore,
1370                  it may be necessary to configure a host to lengthen the
1371                  ARP cache timeout.
1372
1373          2.3.2.2  ARP Packet Queue
1374
1375             The link layer SHOULD save (rather than discard) at least
1376             one (the latest) packet of each set of packets destined to
1377             the same unresolved IP address, and transmit the saved
1378             packet when the address has been resolved.
1379
1380             DISCUSSION:
1381                  Failure to follow this recommendation causes the first
1382                  packet of every exchange to be lost.  Although higher-
1383                  layer protocols can generally cope with packet loss by
1384                  retransmission, packet loss does impact performance.
1385                  For example, loss of a TCP open request causes the
1386                  initial round-trip time estimate to be inflated.  UDP-
1387                  based applications such as the Domain Name System are
1388                  more seriously affected.
1389
1390       2.3.3  Ethernet and IEEE 802 Encapsulation
1391
1392          The IP encapsulation for Ethernets is described in RFC-894
1393          [LINK:3], while RFC-1042 [LINK:4] describes the IP
1394          encapsulation for IEEE 802 networks.  RFC-1042 elaborates and
1395          replaces the discussion in Section 3.4 of [INTRO:2].
1396
1397          Every Internet host connected to a 10Mbps Ethernet cable:
1398
1399          o    MUST be able to send and receive packets using RFC-894
1400               encapsulation;
1401
1402          o    SHOULD be able to receive RFC-1042 packets, intermixed
1403               with RFC-894 packets; and
1404
1405          o    MAY be able to send packets using RFC-1042 encapsulation.
1406
1407
1408          An Internet host that implements sending both the RFC-894 and
1409          the RFC-1042 encapsulations MUST provide a configuration switch
1410          to select which is sent, and this switch MUST default to RFC-
1411          894.
1412
1413
1414
1415 Internet Engineering Task Force                                [Page 24]
1416 \f
1417
1418
1419
1420 RFC1122                        LINK LAYER                   October 1989
1421
1422
1423          Note that the standard IP encapsulation in RFC-1042 does not
1424          use the protocol id value (K1=6) that IEEE reserved for IP;
1425          instead, it uses a value (K1=170) that implies an extension
1426          (the "SNAP") which can be used to hold the Ether-Type field.
1427          An Internet system MUST NOT send 802 packets using K1=6.
1428
1429          Address translation from Internet addresses to link-layer
1430          addresses on Ethernet and IEEE 802 networks MUST be managed by
1431          the Address Resolution Protocol (ARP).
1432
1433          The MTU for an Ethernet is 1500 and for 802.3 is 1492.
1434
1435          DISCUSSION:
1436               The IEEE 802.3 specification provides for operation over a
1437               10Mbps Ethernet cable, in which case Ethernet and IEEE
1438               802.3 frames can be physically intermixed.  A receiver can
1439               distinguish Ethernet and 802.3 frames by the value of the
1440               802.3 Length field; this two-octet field coincides in the
1441               header with the Ether-Type field of an Ethernet frame.  In
1442               particular, the 802.3 Length field must be less than or
1443               equal to 1500, while all valid Ether-Type values are
1444               greater than 1500.
1445
1446               Another compatibility problem arises with link-layer
1447               broadcasts.  A broadcast sent with one framing will not be
1448               seen by hosts that can receive only the other framing.
1449
1450               The provisions of this section were designed to provide
1451               direct interoperation between 894-capable and 1042-capable
1452               systems on the same cable, to the maximum extent possible.
1453               It is intended to support the present situation where
1454               894-only systems predominate, while providing an easy
1455               transition to a possible future in which 1042-capable
1456               systems become common.
1457
1458               Note that 894-only systems cannot interoperate directly
1459               with 1042-only systems.  If the two system types are set
1460               up as two different logical networks on the same cable,
1461               they can communicate only through an IP gateway.
1462               Furthermore, it is not useful or even possible for a
1463               dual-format host to discover automatically which format to
1464               send, because of the problem of link-layer broadcasts.
1465
1466    2.4  LINK/INTERNET LAYER INTERFACE
1467
1468       The packet receive interface between the IP layer and the link
1469       layer MUST include a flag to indicate whether the incoming packet
1470       was addressed to a link-layer broadcast address.
1471
1472
1473
1474 Internet Engineering Task Force                                [Page 25]
1475 \f
1476
1477
1478
1479 RFC1122                        LINK LAYER                   October 1989
1480
1481
1482       DISCUSSION
1483            Although the IP layer does not generally know link layer
1484            addresses (since every different network medium typically has
1485            a different address format), the broadcast address on a
1486            broadcast-capable medium is an important special case.  See
1487            Section 3.2.2, especially the DISCUSSION concerning broadcast
1488            storms.
1489
1490       The packet send interface between the IP and link layers MUST
1491       include the 5-bit TOS field (see Section 3.2.1.6).
1492
1493       The link layer MUST NOT report a Destination Unreachable error to
1494       IP solely because there is no ARP cache entry for a destination.
1495
1496    2.5  LINK LAYER REQUIREMENTS SUMMARY
1497
1498                                                   |       | | | |S| |
1499                                                   |       | | | |H| |F
1500                                                   |       | | | |O|M|o
1501                                                   |       | |S| |U|U|o
1502                                                   |       | |H| |L|S|t
1503                                                   |       |M|O| |D|T|n
1504                                                   |       |U|U|M| | |o
1505                                                   |       |S|L|A|N|N|t
1506                                                   |       |T|D|Y|O|O|t
1507 FEATURE                                           |SECTION| | | |T|T|e
1508 --------------------------------------------------|-------|-|-|-|-|-|--
1509                                                   |       | | | | | |
1510 Trailer encapsulation                             |2.3.1  | | |x| | |
1511 Send Trailers by default without negotiation      |2.3.1  | | | | |x|
1512 ARP                                               |2.3.2  | | | | | |
1513   Flush out-of-date ARP cache entries             |2.3.2.1|x| | | | |
1514   Prevent ARP floods                              |2.3.2.1|x| | | | |
1515   Cache timeout configurable                      |2.3.2.1| |x| | | |
1516   Save at least one (latest) unresolved pkt       |2.3.2.2| |x| | | |
1517 Ethernet and IEEE 802 Encapsulation               |2.3.3  | | | | | |
1518   Host able to:                                   |2.3.3  | | | | | |
1519     Send & receive RFC-894 encapsulation          |2.3.3  |x| | | | |
1520     Receive RFC-1042 encapsulation                |2.3.3  | |x| | | |
1521     Send RFC-1042 encapsulation                   |2.3.3  | | |x| | |
1522       Then config. sw. to select, RFC-894 dflt    |2.3.3  |x| | | | |
1523   Send K1=6 encapsulation                         |2.3.3  | | | | |x|
1524   Use ARP on Ethernet and IEEE 802 nets           |2.3.3  |x| | | | |
1525 Link layer report b'casts to IP layer             |2.4    |x| | | | |
1526 IP layer pass TOS to link layer                   |2.4    |x| | | | |
1527 No ARP cache entry treated as Dest. Unreach.      |2.4    | | | | |x|
1528
1529
1530
1531
1532
1533 Internet Engineering Task Force                                [Page 26]
1534 \f
1535
1536
1537
1538 RFC1122                      INTERNET LAYER                 October 1989
1539
1540
1541 3. INTERNET LAYER PROTOCOLS
1542
1543    3.1 INTRODUCTION
1544
1545       The Robustness Principle: "Be liberal in what you accept, and
1546       conservative in what you send" is particularly important in the
1547       Internet layer, where one misbehaving host can deny Internet
1548       service to many other hosts.
1549
1550       The protocol standards used in the Internet layer are:
1551
1552       o    RFC-791 [IP:1] defines the IP protocol and gives an
1553            introduction to the architecture of the Internet.
1554
1555       o    RFC-792 [IP:2] defines ICMP, which provides routing,
1556            diagnostic and error functionality for IP.  Although ICMP
1557            messages are encapsulated within IP datagrams, ICMP
1558            processing is considered to be (and is typically implemented
1559            as) part of the IP layer.  See Section 3.2.2.
1560
1561       o    RFC-950 [IP:3] defines the mandatory subnet extension to the
1562            addressing architecture.
1563
1564       o    RFC-1112 [IP:4] defines the Internet Group Management
1565            Protocol IGMP, as part of a recommended extension to hosts
1566            and to the host-gateway interface to support Internet-wide
1567            multicasting at the IP level.  See Section 3.2.3.
1568
1569            The target of an IP multicast may be an arbitrary group of
1570            Internet hosts.  IP multicasting is designed as a natural
1571            extension of the link-layer multicasting facilities of some
1572            networks, and it provides a standard means for local access
1573            to such link-layer multicasting facilities.
1574
1575       Other important references are listed in Section 5 of this
1576       document.
1577
1578       The Internet layer of host software MUST implement both IP and
1579       ICMP.  See Section 3.3.7 for the requirements on support of IGMP.
1580
1581       The host IP layer has two basic functions:  (1) choose the "next
1582       hop" gateway or host for outgoing IP datagrams and (2) reassemble
1583       incoming IP datagrams.  The IP layer may also (3) implement
1584       intentional fragmentation of outgoing datagrams.  Finally, the IP
1585       layer must (4) provide diagnostic and error functionality.  We
1586       expect that IP layer functions may increase somewhat in the
1587       future, as further Internet control and management facilities are
1588       developed.
1589
1590
1591
1592 Internet Engineering Task Force                                [Page 27]
1593 \f
1594
1595
1596
1597 RFC1122                      INTERNET LAYER                 October 1989
1598
1599
1600       For normal datagrams, the processing is straightforward.  For
1601       incoming datagrams, the IP layer:
1602
1603       (1)  verifies that the datagram is correctly formatted;
1604
1605       (2)  verifies that it is destined to the local host;
1606
1607       (3)  processes options;
1608
1609       (4)  reassembles the datagram if necessary; and
1610
1611       (5)  passes the encapsulated message to the appropriate
1612            transport-layer protocol module.
1613
1614       For outgoing datagrams, the IP layer:
1615
1616       (1)  sets any fields not set by the transport layer;
1617
1618       (2)  selects the correct first hop on the connected network (a
1619            process called "routing");
1620
1621       (3)  fragments the datagram if necessary and if intentional
1622            fragmentation is implemented (see Section 3.3.3); and
1623
1624       (4)  passes the packet(s) to the appropriate link-layer driver.
1625
1626
1627       A host is said to be multihomed if it has multiple IP addresses.
1628       Multihoming introduces considerable confusion and complexity into
1629       the protocol suite, and it is an area in which the Internet
1630       architecture falls seriously short of solving all problems.  There
1631       are two distinct problem areas in multihoming:
1632
1633       (1)  Local multihoming --  the host itself is multihomed; or
1634
1635       (2)  Remote multihoming -- the local host needs to communicate
1636            with a remote multihomed host.
1637
1638       At present, remote multihoming MUST be handled at the application
1639       layer, as discussed in the companion RFC [INTRO:1].  A host MAY
1640       support local multihoming, which is discussed in this document,
1641       and in particular in Section 3.3.4.
1642
1643       Any host that forwards datagrams generated by another host is
1644       acting as a gateway and MUST also meet the specifications laid out
1645       in the gateway requirements RFC [INTRO:2].  An Internet host that
1646       includes embedded gateway code MUST have a configuration switch to
1647       disable the gateway function, and this switch MUST default to the
1648
1649
1650
1651 Internet Engineering Task Force                                [Page 28]
1652 \f
1653
1654
1655
1656 RFC1122                      INTERNET LAYER                 October 1989
1657
1658
1659       non-gateway mode.  In this mode, a datagram arriving through one
1660       interface will not be forwarded to another host or gateway (unless
1661       it is source-routed), regardless of whether the host is single-
1662       homed or multihomed.  The host software MUST NOT automatically
1663       move into gateway mode if the host has more than one interface, as
1664       the operator of the machine may neither want to provide that
1665       service nor be competent to do so.
1666
1667       In the following, the action specified in certain cases is to
1668       "silently discard" a received datagram.  This means that the
1669       datagram will be discarded without further processing and that the
1670       host will not send any ICMP error message (see Section 3.2.2) as a
1671       result.  However, for diagnosis of problems a host SHOULD provide
1672       the capability of logging the error (see Section 1.2.3), including
1673       the contents of the silently-discarded datagram, and SHOULD record
1674       the event in a statistics counter.
1675
1676       DISCUSSION:
1677            Silent discard of erroneous datagrams is generally intended
1678            to prevent "broadcast storms".
1679
1680    3.2  PROTOCOL WALK-THROUGH
1681
1682       3.2.1 Internet Protocol -- IP
1683
1684          3.2.1.1  Version Number: RFC-791 Section 3.1
1685
1686             A datagram whose version number is not 4 MUST be silently
1687             discarded.
1688
1689          3.2.1.2  Checksum: RFC-791 Section 3.1
1690
1691             A host MUST verify the IP header checksum on every received
1692             datagram and silently discard every datagram that has a bad
1693             checksum.
1694
1695          3.2.1.3  Addressing: RFC-791 Section 3.2
1696
1697             There are now five classes of IP addresses: Class A through
1698             Class E.  Class D addresses are used for IP multicasting
1699             [IP:4], while Class E addresses are reserved for
1700             experimental use.
1701
1702             A multicast (Class D) address is a 28-bit logical address
1703             that stands for a group of hosts, and may be either
1704             permanent or transient.  Permanent multicast addresses are
1705             allocated by the Internet Assigned Number Authority
1706             [INTRO:6], while transient addresses may be allocated
1707
1708
1709
1710 Internet Engineering Task Force                                [Page 29]
1711 \f
1712
1713
1714
1715 RFC1122                      INTERNET LAYER                 October 1989
1716
1717
1718             dynamically to transient groups.  Group membership is
1719             determined dynamically using IGMP [IP:4].
1720
1721             We now summarize the important special cases for Class A, B,
1722             and C IP addresses, using the following notation for an IP
1723             address:
1724
1725                 { <Network-number>, <Host-number> }
1726
1727             or
1728                 { <Network-number>, <Subnet-number>, <Host-number> }
1729
1730             and the notation "-1" for a field that contains all 1 bits.
1731             This notation is not intended to imply that the 1-bits in an
1732             address mask need be contiguous.
1733
1734             (a)  { 0, 0 }
1735
1736                  This host on this network.  MUST NOT be sent, except as
1737                  a source address as part of an initialization procedure
1738                  by which the host learns its own IP address.
1739
1740                  See also Section 3.3.6 for a non-standard use of {0,0}.
1741
1742             (b)  { 0, <Host-number> }
1743
1744                  Specified host on this network.  It MUST NOT be sent,
1745                  except as a source address as part of an initialization
1746                  procedure by which the host learns its full IP address.
1747
1748             (c)  { -1, -1 }
1749
1750                  Limited broadcast.  It MUST NOT be used as a source
1751                  address.
1752
1753                  A datagram with this destination address will be
1754                  received by every host on the connected physical
1755                  network but will not be forwarded outside that network.
1756
1757             (d)  { <Network-number>, -1 }
1758
1759                  Directed broadcast to the specified network.  It MUST
1760                  NOT be used as a source address.
1761
1762             (e)  { <Network-number>, <Subnet-number>, -1 }
1763
1764                  Directed broadcast to the specified subnet.  It MUST
1765                  NOT be used as a source address.
1766
1767
1768
1769 Internet Engineering Task Force                                [Page 30]
1770 \f
1771
1772
1773
1774 RFC1122                      INTERNET LAYER                 October 1989
1775
1776
1777             (f)  { <Network-number>, -1, -1 }
1778
1779                  Directed broadcast to all subnets of the specified
1780                  subnetted network.  It MUST NOT be used as a source
1781                  address.
1782
1783             (g)  { 127, <any> }
1784
1785                  Internal host loopback address.  Addresses of this form
1786                  MUST NOT appear outside a host.
1787
1788             The <Network-number> is administratively assigned so that
1789             its value will be unique in the entire world.
1790
1791             IP addresses are not permitted to have the value 0 or -1 for
1792             any of the <Host-number>, <Network-number>, or <Subnet-
1793             number> fields (except in the special cases listed above).
1794             This implies that each of these fields will be at least two
1795             bits long.
1796
1797             For further discussion of broadcast addresses, see Section
1798             3.3.6.
1799
1800             A host MUST support the subnet extensions to IP [IP:3].  As
1801             a result, there will be an address mask of the form:
1802             {-1, -1, 0} associated with each of the host's local IP
1803             addresses; see Sections 3.2.2.9 and 3.3.1.1.
1804
1805             When a host sends any datagram, the IP source address MUST
1806             be one of its own IP addresses (but not a broadcast or
1807             multicast address).
1808
1809             A host MUST silently discard an incoming datagram that is
1810             not destined for the host.  An incoming datagram is destined
1811             for the host if the datagram's destination address field is:
1812
1813             (1)  (one of) the host's IP address(es); or
1814
1815             (2)  an IP broadcast address valid for the connected
1816                  network; or
1817
1818             (3)  the address for a multicast group of which the host is
1819                  a member on the incoming physical interface.
1820
1821             For most purposes, a datagram addressed to a broadcast or
1822             multicast destination is processed as if it had been
1823             addressed to one of the host's IP addresses; we use the term
1824             "specific-destination address" for the equivalent local IP
1825
1826
1827
1828 Internet Engineering Task Force                                [Page 31]
1829 \f
1830
1831
1832
1833 RFC1122                      INTERNET LAYER                 October 1989
1834
1835
1836             address of the host.  The specific-destination address is
1837             defined to be the destination address in the IP header
1838             unless the header contains a broadcast or multicast address,
1839             in which case the specific-destination is an IP address
1840             assigned to the physical interface on which the datagram
1841             arrived.
1842
1843             A host MUST silently discard an incoming datagram containing
1844             an IP source address that is invalid by the rules of this
1845             section.  This validation could be done in either the IP
1846             layer or by each protocol in the transport layer.
1847
1848             DISCUSSION:
1849                  A mis-addressed datagram might be caused by a link-
1850                  layer broadcast of a unicast datagram or by a gateway
1851                  or host that is confused or mis-configured.
1852
1853                  An architectural goal for Internet hosts was to allow
1854                  IP addresses to be featureless 32-bit numbers, avoiding
1855                  algorithms that required a knowledge of the IP address
1856                  format.  Otherwise, any future change in the format or
1857                  interpretation of IP addresses will require host
1858                  software changes.  However, validation of broadcast and
1859                  multicast addresses violates this goal; a few other
1860                  violations are described elsewhere in this document.
1861
1862                  Implementers should be aware that applications
1863                  depending upon the all-subnets directed broadcast
1864                  address (f) may be unusable on some networks.  All-
1865                  subnets broadcast is not widely implemented in vendor
1866                  gateways at present, and even when it is implemented, a
1867                  particular network administration may disable it in the
1868                  gateway configuration.
1869
1870          3.2.1.4  Fragmentation and Reassembly: RFC-791 Section 3.2
1871
1872             The Internet model requires that every host support
1873             reassembly.  See Sections 3.3.2 and 3.3.3 for the
1874             requirements on fragmentation and reassembly.
1875
1876          3.2.1.5  Identification: RFC-791 Section 3.2
1877
1878             When sending an identical copy of an earlier datagram, a
1879             host MAY optionally retain the same Identification field in
1880             the copy.
1881
1882
1883
1884
1885
1886
1887 Internet Engineering Task Force                                [Page 32]
1888 \f
1889
1890
1891
1892 RFC1122                      INTERNET LAYER                 October 1989
1893
1894
1895             DISCUSSION:
1896                  Some Internet protocol experts have maintained that
1897                  when a host sends an identical copy of an earlier
1898                  datagram, the new copy should contain the same
1899                  Identification value as the original.  There are two
1900                  suggested advantages:  (1) if the datagrams are
1901                  fragmented and some of the fragments are lost, the
1902                  receiver may be able to reconstruct a complete datagram
1903                  from fragments of the original and the copies; (2) a
1904                  congested gateway might use the IP Identification field
1905                  (and Fragment Offset) to discard duplicate datagrams
1906                  from the queue.
1907
1908                  However, the observed patterns of datagram loss in the
1909                  Internet do not favor the probability of retransmitted
1910                  fragments filling reassembly gaps, while other
1911                  mechanisms (e.g., TCP repacketizing upon
1912                  retransmission) tend to prevent retransmission of an
1913                  identical datagram [IP:9].  Therefore, we believe that
1914                  retransmitting the same Identification field is not
1915                  useful.  Also, a connectionless transport protocol like
1916                  UDP would require the cooperation of the application
1917                  programs to retain the same Identification value in
1918                  identical datagrams.
1919
1920          3.2.1.6  Type-of-Service: RFC-791 Section 3.2
1921
1922             The "Type-of-Service" byte in the IP header is divided into
1923             two sections:  the Precedence field (high-order 3 bits), and
1924             a field that is customarily called "Type-of-Service" or
1925             "TOS" (low-order 5 bits).  In this document, all references
1926             to "TOS" or the "TOS field" refer to the low-order 5 bits
1927             only.
1928
1929             The Precedence field is intended for Department of Defense
1930             applications of the Internet protocols.  The use of non-zero
1931             values in this field is outside the scope of this document
1932             and the IP standard specification.  Vendors should consult
1933             the Defense Communication Agency (DCA) for guidance on the
1934             IP Precedence field and its implications for other protocol
1935             layers.  However, vendors should note that the use of
1936             precedence will most likely require that its value be passed
1937             between protocol layers in just the same way as the TOS
1938             field is passed.
1939
1940             The IP layer MUST provide a means for the transport layer to
1941             set the TOS field of every datagram that is sent; the
1942             default is all zero bits.  The IP layer SHOULD pass received
1943
1944
1945
1946 Internet Engineering Task Force                                [Page 33]
1947 \f
1948
1949
1950
1951 RFC1122                      INTERNET LAYER                 October 1989
1952
1953
1954             TOS values up to the transport layer.
1955
1956             The particular link-layer mappings of TOS contained in RFC-
1957             795 SHOULD NOT be implemented.
1958
1959             DISCUSSION:
1960                  While the TOS field has been little used in the past,
1961                  it is expected to play an increasing role in the near
1962                  future.  The TOS field is expected to be used to
1963                  control two aspects of gateway operations: routing and
1964                  queueing algorithms.  See Section 2 of [INTRO:1] for
1965                  the requirements on application programs to specify TOS
1966                  values.
1967
1968                  The TOS field may also be mapped into link-layer
1969                  service selectors.  This has been applied to provide
1970                  effective sharing of serial lines by different classes
1971                  of TCP traffic, for example.  However, the mappings
1972                  suggested in RFC-795 for networks that were included in
1973                  the Internet as of 1981 are now obsolete.
1974
1975          3.2.1.7  Time-to-Live: RFC-791 Section 3.2
1976
1977             A host MUST NOT send a datagram with a Time-to-Live (TTL)
1978             value of zero.
1979
1980             A host MUST NOT discard a datagram just because it was
1981             received with TTL less than 2.
1982
1983             The IP layer MUST provide a means for the transport layer to
1984             set the TTL field of every datagram that is sent.  When a
1985             fixed TTL value is used, it MUST be configurable.  The
1986             current suggested value will be published in the "Assigned
1987             Numbers" RFC.
1988
1989             DISCUSSION:
1990                  The TTL field has two functions: limit the lifetime of
1991                  TCP segments (see RFC-793 [TCP:1], p. 28), and
1992                  terminate Internet routing loops.  Although TTL is a
1993                  time in seconds, it also has some attributes of a hop-
1994                  count, since each gateway is required to reduce the TTL
1995                  field by at least one.
1996
1997                  The intent is that TTL expiration will cause a datagram
1998                  to be discarded by a gateway but not by the destination
1999                  host; however, hosts that act as gateways by forwarding
2000                  datagrams must follow the gateway rules for TTL.
2001
2002
2003
2004
2005 Internet Engineering Task Force                                [Page 34]
2006 \f
2007
2008
2009
2010 RFC1122                      INTERNET LAYER                 October 1989
2011
2012
2013                  A higher-layer protocol may want to set the TTL in
2014                  order to implement an "expanding scope" search for some
2015                  Internet resource.  This is used by some diagnostic
2016                  tools, and is expected to be useful for locating the
2017                  "nearest" server of a given class using IP
2018                  multicasting, for example.  A particular transport
2019                  protocol may also want to specify its own TTL bound on
2020                  maximum datagram lifetime.
2021
2022                  A fixed value must be at least big enough for the
2023                  Internet "diameter," i.e., the longest possible path.
2024                  A reasonable value is about twice the diameter, to
2025                  allow for continued Internet growth.
2026
2027          3.2.1.8  Options: RFC-791 Section 3.2
2028
2029             There MUST be a means for the transport layer to specify IP
2030             options to be included in transmitted IP datagrams (see
2031             Section 3.4).
2032
2033             All IP options (except NOP or END-OF-LIST) received in
2034             datagrams MUST be passed to the transport layer (or to ICMP
2035             processing when the datagram is an ICMP message).  The IP
2036             and transport layer MUST each interpret those IP options
2037             that they understand and silently ignore the others.
2038
2039             Later sections of this document discuss specific IP option
2040             support required by each of ICMP, TCP, and UDP.
2041
2042             DISCUSSION:
2043                  Passing all received IP options to the transport layer
2044                  is a deliberate "violation of strict layering" that is
2045                  designed to ease the introduction of new transport-
2046                  relevant IP options in the future.  Each layer must
2047                  pick out any options that are relevant to its own
2048                  processing and ignore the rest.  For this purpose,
2049                  every IP option except NOP and END-OF-LIST will include
2050                  a specification of its own length.
2051
2052                  This document does not define the order in which a
2053                  receiver must process multiple options in the same IP
2054                  header.  Hosts sending multiple options must be aware
2055                  that this introduces an ambiguity in the meaning of
2056                  certain options when combined with a source-route
2057                  option.
2058
2059             IMPLEMENTATION:
2060                  The IP layer must not crash as the result of an option
2061
2062
2063
2064 Internet Engineering Task Force                                [Page 35]
2065 \f
2066
2067
2068
2069 RFC1122                      INTERNET LAYER                 October 1989
2070
2071
2072                  length that is outside the possible range.  For
2073                  example, erroneous option lengths have been observed to
2074                  put some IP implementations into infinite loops.
2075
2076             Here are the requirements for specific IP options:
2077
2078
2079             (a)  Security Option
2080
2081                  Some environments require the Security option in every
2082                  datagram; such a requirement is outside the scope of
2083                  this document and the IP standard specification.  Note,
2084                  however, that the security options described in RFC-791
2085                  and RFC-1038 are obsolete.  For DoD applications,
2086                  vendors should consult [IP:8] for guidance.
2087
2088
2089             (b)  Stream Identifier Option
2090
2091                  This option is obsolete; it SHOULD NOT be sent, and it
2092                  MUST be silently ignored if received.
2093
2094
2095             (c)  Source Route Options
2096
2097                  A host MUST support originating a source route and MUST
2098                  be able to act as the final destination of a source
2099                  route.
2100
2101                  If host receives a datagram containing a completed
2102                  source route (i.e., the pointer points beyond the last
2103                  field), the datagram has reached its final destination;
2104                  the option as received (the recorded route) MUST be
2105                  passed up to the transport layer (or to ICMP message
2106                  processing).  This recorded route will be reversed and
2107                  used to form a return source route for reply datagrams
2108                  (see discussion of IP Options in Section 4).  When a
2109                  return source route is built, it MUST be correctly
2110                  formed even if the recorded route included the source
2111                  host (see case (B) in the discussion below).
2112
2113                  An IP header containing more than one Source Route
2114                  option MUST NOT be sent; the effect on routing of
2115                  multiple Source Route options is implementation-
2116                  specific.
2117
2118                  Section 3.3.5 presents the rules for a host acting as
2119                  an intermediate hop in a source route, i.e., forwarding
2120
2121
2122
2123 Internet Engineering Task Force                                [Page 36]
2124 \f
2125
2126
2127
2128 RFC1122                      INTERNET LAYER                 October 1989
2129
2130
2131                  a source-routed datagram.
2132
2133                  DISCUSSION:
2134                       If a source-routed datagram is fragmented, each
2135                       fragment will contain a copy of the source route.
2136                       Since the processing of IP options (including a
2137                       source route) must precede reassembly, the
2138                       original datagram will not be reassembled until
2139                       the final destination is reached.
2140
2141                       Suppose a source routed datagram is to be routed
2142                       from host S to host D via gateways G1, G2, ... Gn.
2143                       There was an ambiguity in the specification over
2144                       whether the source route option in a datagram sent
2145                       out by S should be (A) or (B):
2146
2147                           (A):  {>>G2, G3, ... Gn, D}     <--- CORRECT
2148
2149                           (B):  {S, >>G2, G3, ... Gn, D}  <---- WRONG
2150
2151                       (where >> represents the pointer).  If (A) is
2152                       sent, the datagram received at D will contain the
2153                       option: {G1, G2, ... Gn >>}, with S and D as the
2154                       IP source and destination addresses.  If (B) were
2155                       sent, the datagram received at D would again
2156                       contain S and D as the same IP source and
2157                       destination addresses, but the option would be:
2158                       {S, G1, ...Gn >>}; i.e., the originating host
2159                       would be the first hop in the route.
2160
2161
2162             (d)  Record Route Option
2163
2164                  Implementation of originating and processing the Record
2165                  Route option is OPTIONAL.
2166
2167
2168             (e)  Timestamp Option
2169
2170                  Implementation of originating and processing the
2171                  Timestamp option is OPTIONAL.  If it is implemented,
2172                  the following rules apply:
2173
2174                  o    The originating host MUST record a timestamp in a
2175                       Timestamp option whose Internet address fields are
2176                       not pre-specified or whose first pre-specified
2177                       address is the host's interface address.
2178
2179
2180
2181
2182 Internet Engineering Task Force                                [Page 37]
2183 \f
2184
2185
2186
2187 RFC1122                      INTERNET LAYER                 October 1989
2188
2189
2190                  o    The destination host MUST (if possible) add the
2191                       current timestamp to a Timestamp option before
2192                       passing the option to the transport layer or to
2193                       ICMP for processing.
2194
2195                  o    A timestamp value MUST follow the rules given in
2196                       Section 3.2.2.8 for the ICMP Timestamp message.
2197
2198
2199       3.2.2 Internet Control Message Protocol -- ICMP
2200
2201          ICMP messages are grouped into two classes.
2202
2203          *
2204               ICMP error messages:
2205
2206                Destination Unreachable   (see Section 3.2.2.1)
2207                Redirect                  (see Section 3.2.2.2)
2208                Source Quench             (see Section 3.2.2.3)
2209                Time Exceeded             (see Section 3.2.2.4)
2210                Parameter Problem         (see Section 3.2.2.5)
2211
2212
2213          *
2214               ICMP query messages:
2215
2216                 Echo                     (see Section 3.2.2.6)
2217                 Information              (see Section 3.2.2.7)
2218                 Timestamp                (see Section 3.2.2.8)
2219                 Address Mask             (see Section 3.2.2.9)
2220
2221
2222          If an ICMP message of unknown type is received, it MUST be
2223          silently discarded.
2224
2225          Every ICMP error message includes the Internet header and at
2226          least the first 8 data octets of the datagram that triggered
2227          the error; more than 8 octets MAY be sent; this header and data
2228          MUST be unchanged from the received datagram.
2229
2230          In those cases where the Internet layer is required to pass an
2231          ICMP error message to the transport layer, the IP protocol
2232          number MUST be extracted from the original header and used to
2233          select the appropriate transport protocol entity to handle the
2234          error.
2235
2236          An ICMP error message SHOULD be sent with normal (i.e., zero)
2237          TOS bits.
2238
2239
2240
2241 Internet Engineering Task Force                                [Page 38]
2242 \f
2243
2244
2245
2246 RFC1122                      INTERNET LAYER                 October 1989
2247
2248
2249          An ICMP error message MUST NOT be sent as the result of
2250          receiving:
2251
2252          *    an ICMP error message, or
2253
2254          *    a datagram destined to an IP broadcast or IP multicast
2255               address, or
2256
2257          *    a datagram sent as a link-layer broadcast, or
2258
2259          *    a non-initial fragment, or
2260
2261          *    a datagram whose source address does not define a single
2262               host -- e.g., a zero address, a loopback address, a
2263               broadcast address, a multicast address, or a Class E
2264               address.
2265
2266          NOTE: THESE RESTRICTIONS TAKE PRECEDENCE OVER ANY REQUIREMENT
2267          ELSEWHERE IN THIS DOCUMENT FOR SENDING ICMP ERROR MESSAGES.
2268
2269          DISCUSSION:
2270               These rules will prevent the "broadcast storms" that have
2271               resulted from hosts returning ICMP error messages in
2272               response to broadcast datagrams.  For example, a broadcast
2273               UDP segment to a non-existent port could trigger a flood
2274               of ICMP Destination Unreachable datagrams from all
2275               machines that do not have a client for that destination
2276               port.  On a large Ethernet, the resulting collisions can
2277               render the network useless for a second or more.
2278
2279               Every datagram that is broadcast on the connected network
2280               should have a valid IP broadcast address as its IP
2281               destination (see Section 3.3.6).  However, some hosts
2282               violate this rule.  To be certain to detect broadcast
2283               datagrams, therefore, hosts are required to check for a
2284               link-layer broadcast as well as an IP-layer broadcast
2285               address.
2286
2287          IMPLEMENTATION:
2288               This requires that the link layer inform the IP layer when
2289               a link-layer broadcast datagram has been received; see
2290               Section 2.4.
2291
2292          3.2.2.1  Destination Unreachable: RFC-792
2293
2294             The following additional codes are hereby defined:
2295
2296                     6 = destination network unknown
2297
2298
2299
2300 Internet Engineering Task Force                                [Page 39]
2301 \f
2302
2303
2304
2305 RFC1122                      INTERNET LAYER                 October 1989
2306
2307
2308                     7 = destination host unknown
2309
2310                     8 = source host isolated
2311
2312                     9 = communication with destination network
2313                             administratively prohibited
2314
2315                    10 = communication with destination host
2316                             administratively prohibited
2317
2318                    11 = network unreachable for type of service
2319
2320                    12 = host unreachable for type of service
2321
2322             A host SHOULD generate Destination Unreachable messages with
2323             code:
2324
2325             2    (Protocol Unreachable), when the designated transport
2326                  protocol is not supported; or
2327
2328             3    (Port Unreachable), when the designated transport
2329                  protocol (e.g., UDP) is unable to demultiplex the
2330                  datagram but has no protocol mechanism to inform the
2331                  sender.
2332
2333             A Destination Unreachable message that is received MUST be
2334             reported to the transport layer.  The transport layer SHOULD
2335             use the information appropriately; for example, see Sections
2336             4.1.3.3, 4.2.3.9, and 4.2.4 below.  A transport protocol
2337             that has its own mechanism for notifying the sender that a
2338             port is unreachable (e.g., TCP, which sends RST segments)
2339             MUST nevertheless accept an ICMP Port Unreachable for the
2340             same purpose.
2341
2342             A Destination Unreachable message that is received with code
2343             0 (Net), 1 (Host), or 5 (Bad Source Route) may result from a
2344             routing transient and MUST therefore be interpreted as only
2345             a hint, not proof, that the specified destination is
2346             unreachable [IP:11].  For example, it MUST NOT be used as
2347             proof of a dead gateway (see Section 3.3.1).
2348
2349          3.2.2.2  Redirect: RFC-792
2350
2351             A host SHOULD NOT send an ICMP Redirect message; Redirects
2352             are to be sent only by gateways.
2353
2354             A host receiving a Redirect message MUST update its routing
2355             information accordingly.  Every host MUST be prepared to
2356
2357
2358
2359 Internet Engineering Task Force                                [Page 40]
2360 \f
2361
2362
2363
2364 RFC1122                      INTERNET LAYER                 October 1989
2365
2366
2367             accept both Host and Network Redirects and to process them
2368             as described in Section 3.3.1.2 below.
2369
2370             A Redirect message SHOULD be silently discarded if the new
2371             gateway address it specifies is not on the same connected
2372             (sub-) net through which the Redirect arrived [INTRO:2,
2373             Appendix A], or if the source of the Redirect is not the
2374             current first-hop gateway for the specified destination (see
2375             Section 3.3.1).
2376
2377          3.2.2.3  Source Quench: RFC-792
2378
2379             A host MAY send a Source Quench message if it is
2380             approaching, or has reached, the point at which it is forced
2381             to discard incoming datagrams due to a shortage of
2382             reassembly buffers or other resources.  See Section 2.2.3 of
2383             [INTRO:2] for suggestions on when to send Source Quench.
2384
2385             If a Source Quench message is received, the IP layer MUST
2386             report it to the transport layer (or ICMP processing). In
2387             general, the transport or application layer SHOULD implement
2388             a mechanism to respond to Source Quench for any protocol
2389             that can send a sequence of datagrams to the same
2390             destination and which can reasonably be expected to maintain
2391             enough state information to make this feasible.  See Section
2392             4 for the handling of Source Quench by TCP and UDP.
2393
2394             DISCUSSION:
2395                  A Source Quench may be generated by the target host or
2396                  by some gateway in the path of a datagram.  The host
2397                  receiving a Source Quench should throttle itself back
2398                  for a period of time, then gradually increase the
2399                  transmission rate again.  The mechanism to respond to
2400                  Source Quench may be in the transport layer (for
2401                  connection-oriented protocols like TCP) or in the
2402                  application layer (for protocols that are built on top
2403                  of UDP).
2404
2405                  A mechanism has been proposed [IP:14] to make the IP
2406                  layer respond directly to Source Quench by controlling
2407                  the rate at which datagrams are sent, however, this
2408                  proposal is currently experimental and not currently
2409                  recommended.
2410
2411          3.2.2.4  Time Exceeded: RFC-792
2412
2413             An incoming Time Exceeded message MUST be passed to the
2414             transport layer.
2415
2416
2417
2418 Internet Engineering Task Force                                [Page 41]
2419 \f
2420
2421
2422
2423 RFC1122                      INTERNET LAYER                 October 1989
2424
2425
2426             DISCUSSION:
2427                  A gateway will send a Time Exceeded Code 0 (In Transit)
2428                  message when it discards a datagram due to an expired
2429                  TTL field.  This indicates either a gateway routing
2430                  loop or too small an initial TTL value.
2431
2432                  A host may receive a Time Exceeded Code 1 (Reassembly
2433                  Timeout) message from a destination host that has timed
2434                  out and discarded an incomplete datagram; see Section
2435                  3.3.2 below.  In the future, receipt of this message
2436                  might be part of some "MTU discovery" procedure, to
2437                  discover the maximum datagram size that can be sent on
2438                  the path without fragmentation.
2439
2440          3.2.2.5  Parameter Problem: RFC-792
2441
2442             A host SHOULD generate Parameter Problem messages.  An
2443             incoming Parameter Problem message MUST be passed to the
2444             transport layer, and it MAY be reported to the user.
2445
2446             DISCUSSION:
2447                  The ICMP Parameter Problem message is sent to the
2448                  source host for any problem not specifically covered by
2449                  another ICMP message.  Receipt of a Parameter Problem
2450                  message generally indicates some local or remote
2451                  implementation error.
2452
2453             A new variant on the Parameter Problem message is hereby
2454             defined:
2455               Code 1 = required option is missing.
2456
2457             DISCUSSION:
2458                  This variant is currently in use in the military
2459                  community for a missing security option.
2460
2461          3.2.2.6  Echo Request/Reply: RFC-792
2462
2463             Every host MUST implement an ICMP Echo server function that
2464             receives Echo Requests and sends corresponding Echo Replies.
2465             A host SHOULD also implement an application-layer interface
2466             for sending an Echo Request and receiving an Echo Reply, for
2467             diagnostic purposes.
2468
2469             An ICMP Echo Request destined to an IP broadcast or IP
2470             multicast address MAY be silently discarded.
2471
2472
2473
2474
2475
2476
2477 Internet Engineering Task Force                                [Page 42]
2478 \f
2479
2480
2481
2482 RFC1122                      INTERNET LAYER                 October 1989
2483
2484
2485             DISCUSSION:
2486                  This neutral provision results from a passionate debate
2487                  between those who feel that ICMP Echo to a broadcast
2488                  address provides a valuable diagnostic capability and
2489                  those who feel that misuse of this feature can too
2490                  easily create packet storms.
2491
2492             The IP source address in an ICMP Echo Reply MUST be the same
2493             as the specific-destination address (defined in Section
2494             3.2.1.3) of the corresponding ICMP Echo Request message.
2495
2496             Data received in an ICMP Echo Request MUST be entirely
2497             included in the resulting Echo Reply.  However, if sending
2498             the Echo Reply requires intentional fragmentation that is
2499             not implemented, the datagram MUST be truncated to maximum
2500             transmission size (see Section 3.3.3) and sent.
2501
2502             Echo Reply messages MUST be passed to the ICMP user
2503             interface, unless the corresponding Echo Request originated
2504             in the IP layer.
2505
2506             If a Record Route and/or Time Stamp option is received in an
2507             ICMP Echo Request, this option (these options) SHOULD be
2508             updated to include the current host and included in the IP
2509             header of the Echo Reply message, without "truncation".
2510             Thus, the recorded route will be for the entire round trip.
2511
2512             If a Source Route option is received in an ICMP Echo
2513             Request, the return route MUST be reversed and used as a
2514             Source Route option for the Echo Reply message.
2515
2516          3.2.2.7  Information Request/Reply: RFC-792
2517
2518             A host SHOULD NOT implement these messages.
2519
2520             DISCUSSION:
2521                  The Information Request/Reply pair was intended to
2522                  support self-configuring systems such as diskless
2523                  workstations, to allow them to discover their IP
2524                  network numbers at boot time.  However, the RARP and
2525                  BOOTP protocols provide better mechanisms for a host to
2526                  discover its own IP address.
2527
2528          3.2.2.8  Timestamp and Timestamp Reply: RFC-792
2529
2530             A host MAY implement Timestamp and Timestamp Reply.  If they
2531             are implemented, the following rules MUST be followed.
2532
2533
2534
2535
2536 Internet Engineering Task Force                                [Page 43]
2537 \f
2538
2539
2540
2541 RFC1122                      INTERNET LAYER                 October 1989
2542
2543
2544             o    The ICMP Timestamp server function returns a Timestamp
2545                  Reply to every Timestamp message that is received.  If
2546                  this function is implemented, it SHOULD be designed for
2547                  minimum variability in delay (e.g., implemented in the
2548                  kernel to avoid delay in scheduling a user process).
2549
2550             The following cases for Timestamp are to be handled
2551             according to the corresponding rules for ICMP Echo:
2552
2553             o    An ICMP Timestamp Request message to an IP broadcast or
2554                  IP multicast address MAY be silently discarded.
2555
2556             o    The IP source address in an ICMP Timestamp Reply MUST
2557                  be the same as the specific-destination address of the
2558                  corresponding Timestamp Request message.
2559
2560             o    If a Source-route option is received in an ICMP Echo
2561                  Request, the return route MUST be reversed and used as
2562                  a Source Route option for the Timestamp Reply message.
2563
2564             o    If a Record Route and/or Timestamp option is received
2565                  in a Timestamp Request, this (these) option(s) SHOULD
2566                  be updated to include the current host and included in
2567                  the IP header of the Timestamp Reply message.
2568
2569             o    Incoming Timestamp Reply messages MUST be passed up to
2570                  the ICMP user interface.
2571
2572             The preferred form for a timestamp value (the "standard
2573             value") is in units of milliseconds since midnight Universal
2574             Time.  However, it may be difficult to provide this value
2575             with millisecond resolution.  For example, many systems use
2576             clocks that update only at line frequency, 50 or 60 times
2577             per second.  Therefore, some latitude is allowed in a
2578             "standard value":
2579
2580             (a)  A "standard value" MUST be updated at least 15 times
2581                  per second (i.e., at most the six low-order bits of the
2582                  value may be undefined).
2583
2584             (b)  The accuracy of a "standard value" MUST approximate
2585                  that of operator-set CPU clocks, i.e., correct within a
2586                  few minutes.
2587
2588
2589
2590
2591
2592
2593
2594
2595 Internet Engineering Task Force                                [Page 44]
2596 \f
2597
2598
2599
2600 RFC1122                      INTERNET LAYER                 October 1989
2601
2602
2603          3.2.2.9  Address Mask Request/Reply: RFC-950
2604
2605             A host MUST support the first, and MAY implement all three,
2606             of the following methods for determining the address mask(s)
2607             corresponding to its IP address(es):
2608
2609             (1)  static configuration information;
2610
2611             (2)  obtaining the address mask(s) dynamically as a side-
2612                  effect of the system initialization process (see
2613                  [INTRO:1]); and
2614
2615             (3)  sending ICMP Address Mask Request(s) and receiving ICMP
2616                  Address Mask Reply(s).
2617
2618             The choice of method to be used in a particular host MUST be
2619             configurable.
2620
2621             When method (3), the use of Address Mask messages, is
2622             enabled, then:
2623
2624             (a)  When it initializes, the host MUST broadcast an Address
2625                  Mask Request message on the connected network
2626                  corresponding to the IP address.  It MUST retransmit
2627                  this message a small number of times if it does not
2628                  receive an immediate Address Mask Reply.
2629
2630             (b)  Until it has received an Address Mask Reply, the host
2631                  SHOULD assume a mask appropriate for the address class
2632                  of the IP address, i.e., assume that the connected
2633                  network is not subnetted.
2634
2635             (c)  The first Address Mask Reply message received MUST be
2636                  used to set the address mask corresponding to the
2637                  particular local IP address.  This is true even if the
2638                  first Address Mask Reply message is "unsolicited", in
2639                  which case it will have been broadcast and may arrive
2640                  after the host has ceased to retransmit Address Mask
2641                  Requests.  Once the mask has been set by an Address
2642                  Mask Reply, later Address Mask Reply messages MUST be
2643                  (silently) ignored.
2644
2645             Conversely, if Address Mask messages are disabled, then no
2646             ICMP Address Mask Requests will be sent, and any ICMP
2647             Address Mask Replies received for that local IP address MUST
2648             be (silently) ignored.
2649
2650             A host SHOULD make some reasonableness check on any address
2651
2652
2653
2654 Internet Engineering Task Force                                [Page 45]
2655 \f
2656
2657
2658
2659 RFC1122                      INTERNET LAYER                 October 1989
2660
2661
2662             mask it installs; see IMPLEMENTATION section below.
2663
2664             A system MUST NOT send an Address Mask Reply unless it is an
2665             authoritative agent for address masks.  An authoritative
2666             agent may be a host or a gateway, but it MUST be explicitly
2667             configured as a address mask agent.  Receiving an address
2668             mask via an Address Mask Reply does not give the receiver
2669             authority and MUST NOT be used as the basis for issuing
2670             Address Mask Replies.
2671
2672             With a statically configured address mask, there SHOULD be
2673             an additional configuration flag that determines whether the
2674             host is to act as an authoritative agent for this mask,
2675             i.e., whether it will answer Address Mask Request messages
2676             using this mask.
2677
2678             If it is configured as an agent, the host MUST broadcast an
2679             Address Mask Reply for the mask on the appropriate interface
2680             when it initializes.
2681
2682             See "System Initialization" in [INTRO:1] for more
2683             information about the use of Address Mask Request/Reply
2684             messages.
2685
2686             DISCUSSION
2687                  Hosts that casually send Address Mask Replies with
2688                  invalid address masks have often been a serious
2689                  nuisance.  To prevent this, Address Mask Replies ought
2690                  to be sent only by authoritative agents that have been
2691                  selected by explicit administrative action.
2692
2693                  When an authoritative agent receives an Address Mask
2694                  Request message, it will send a unicast Address Mask
2695                  Reply to the source IP address.  If the network part of
2696                  this address is zero (see (a) and (b) in 3.2.1.3), the
2697                  Reply will be broadcast.
2698
2699                  Getting no reply to its Address Mask Request messages,
2700                  a host will assume there is no agent and use an
2701                  unsubnetted mask, but the agent may be only temporarily
2702                  unreachable.  An agent will broadcast an unsolicited
2703                  Address Mask Reply whenever it initializes, in order to
2704                  update the masks of all hosts that have initialized in
2705                  the meantime.
2706
2707             IMPLEMENTATION:
2708                  The following reasonableness check on an address mask
2709                  is suggested: the mask is not all 1 bits, and it is
2710
2711
2712
2713 Internet Engineering Task Force                                [Page 46]
2714 \f
2715
2716
2717
2718 RFC1122                      INTERNET LAYER                 October 1989
2719
2720
2721                  either zero or else the 8 highest-order bits are on.
2722
2723       3.2.3  Internet Group Management Protocol IGMP
2724
2725          IGMP [IP:4] is a protocol used between hosts and gateways on a
2726          single network to establish hosts' membership in particular
2727          multicast groups.  The gateways use this information, in
2728          conjunction with a multicast routing protocol, to support IP
2729          multicasting across the Internet.
2730
2731          At this time, implementation of IGMP is OPTIONAL; see Section
2732          3.3.7 for more information.  Without IGMP, a host can still
2733          participate in multicasting local to its connected networks.
2734
2735    3.3  SPECIFIC ISSUES
2736
2737       3.3.1  Routing Outbound Datagrams
2738
2739          The IP layer chooses the correct next hop for each datagram it
2740          sends.  If the destination is on a connected network, the
2741          datagram is sent directly to the destination host; otherwise,
2742          it has to be routed to a gateway on a connected network.
2743
2744          3.3.1.1  Local/Remote Decision
2745
2746             To decide if the destination is on a connected network, the
2747             following algorithm MUST be used [see IP:3]:
2748
2749             (a)  The address mask (particular to a local IP address for
2750                  a multihomed host) is a 32-bit mask that selects the
2751                  network number and subnet number fields of the
2752                  corresponding IP address.
2753
2754             (b)  If the IP destination address bits extracted by the
2755                  address mask match the IP source address bits extracted
2756                  by the same mask, then the destination is on the
2757                  corresponding connected network, and the datagram is to
2758                  be transmitted directly to the destination host.
2759
2760             (c)  If not, then the destination is accessible only through
2761                  a gateway.  Selection of a gateway is described below
2762                  (3.3.1.2).
2763
2764             A special-case destination address is handled as follows:
2765
2766             *    For a limited broadcast or a multicast address, simply
2767                  pass the datagram to the link layer for the appropriate
2768                  interface.
2769
2770
2771
2772 Internet Engineering Task Force                                [Page 47]
2773 \f
2774
2775
2776
2777 RFC1122                      INTERNET LAYER                 October 1989
2778
2779
2780             *    For a (network or subnet) directed broadcast, the
2781                  datagram can use the standard routing algorithms.
2782
2783             The host IP layer MUST operate correctly in a minimal
2784             network environment, and in particular, when there are no
2785             gateways.  For example, if the IP layer of a host insists on
2786             finding at least one gateway to initialize, the host will be
2787             unable to operate on a single isolated broadcast net.
2788
2789          3.3.1.2  Gateway Selection
2790
2791             To efficiently route a series of datagrams to the same
2792             destination, the source host MUST keep a "route cache" of
2793             mappings to next-hop gateways.  A host uses the following
2794             basic algorithm on this cache to route a datagram; this
2795             algorithm is designed to put the primary routing burden on
2796             the gateways [IP:11].
2797
2798             (a)  If the route cache contains no information for a
2799                  particular destination, the host chooses a "default"
2800                  gateway and sends the datagram to it.  It also builds a
2801                  corresponding Route Cache entry.
2802
2803             (b)  If that gateway is not the best next hop to the
2804                  destination, the gateway will forward the datagram to
2805                  the best next-hop gateway and return an ICMP Redirect
2806                  message to the source host.
2807
2808             (c)  When it receives a Redirect, the host updates the
2809                  next-hop gateway in the appropriate route cache entry,
2810                  so later datagrams to the same destination will go
2811                  directly to the best gateway.
2812
2813             Since the subnet mask appropriate to the destination address
2814             is generally not known, a Network Redirect message SHOULD be
2815             treated identically to a Host Redirect message; i.e., the
2816             cache entry for the destination host (only) would be updated
2817             (or created, if an entry for that host did not exist) for
2818             the new gateway.
2819
2820             DISCUSSION:
2821                  This recommendation is to protect against gateways that
2822                  erroneously send Network Redirects for a subnetted
2823                  network, in violation of the gateway requirements
2824                  [INTRO:2].
2825
2826             When there is no route cache entry for the destination host
2827             address (and the destination is not on the connected
2828
2829
2830
2831 Internet Engineering Task Force                                [Page 48]
2832 \f
2833
2834
2835
2836 RFC1122                      INTERNET LAYER                 October 1989
2837
2838
2839             network), the IP layer MUST pick a gateway from its list of
2840             "default" gateways.  The IP layer MUST support multiple
2841             default gateways.
2842
2843             As an extra feature, a host IP layer MAY implement a table
2844             of "static routes".  Each such static route MAY include a
2845             flag specifying whether it may be overridden by ICMP
2846             Redirects.
2847
2848             DISCUSSION:
2849                  A host generally needs to know at least one default
2850                  gateway to get started.  This information can be
2851                  obtained from a configuration file or else from the
2852                  host startup sequence, e.g., the BOOTP protocol (see
2853                  [INTRO:1]).
2854
2855                  It has been suggested that a host can augment its list
2856                  of default gateways by recording any new gateways it
2857                  learns about.  For example, it can record every gateway
2858                  to which it is ever redirected.  Such a feature, while
2859                  possibly useful in some circumstances, may cause
2860                  problems in other cases (e.g., gateways are not all
2861                  equal), and it is not recommended.
2862
2863                  A static route is typically a particular preset mapping
2864                  from destination host or network into a particular
2865                  next-hop gateway; it might also depend on the Type-of-
2866                  Service (see next section).  Static routes would be set
2867                  up by system administrators to override the normal
2868                  automatic routing mechanism, to handle exceptional
2869                  situations.  However, any static routing information is
2870                  a potential source of failure as configurations change
2871                  or equipment fails.
2872
2873          3.3.1.3  Route Cache
2874
2875             Each route cache entry needs to include the following
2876             fields:
2877
2878             (1)  Local IP address (for a multihomed host)
2879
2880             (2)  Destination IP address
2881
2882             (3)  Type(s)-of-Service
2883
2884             (4)  Next-hop gateway IP address
2885
2886             Field (2) MAY be the full IP address of the destination
2887
2888
2889
2890 Internet Engineering Task Force                                [Page 49]
2891 \f
2892
2893
2894
2895 RFC1122                      INTERNET LAYER                 October 1989
2896
2897
2898             host, or only the destination network number.  Field (3),
2899             the TOS, SHOULD be included.
2900
2901             See Section 3.3.4.2 for a discussion of the implications of
2902             multihoming for the lookup procedure in this cache.
2903
2904             DISCUSSION:
2905                  Including the Type-of-Service field in the route cache
2906                  and considering it in the host route algorithm will
2907                  provide the necessary mechanism for the future when
2908                  Type-of-Service routing is commonly used in the
2909                  Internet.  See Section 3.2.1.6.
2910
2911                  Each route cache entry defines the endpoints of an
2912                  Internet path.  Although the connecting path may change
2913                  dynamically in an arbitrary way, the transmission
2914                  characteristics of the path tend to remain
2915                  approximately constant over a time period longer than a
2916                  single typical host-host transport connection.
2917                  Therefore, a route cache entry is a natural place to
2918                  cache data on the properties of the path.  Examples of
2919                  such properties might be the maximum unfragmented
2920                  datagram size (see Section 3.3.3), or the average
2921                  round-trip delay measured by a transport protocol.
2922                  This data will generally be both gathered and used by a
2923                  higher layer protocol, e.g., by TCP, or by an
2924                  application using UDP.  Experiments are currently in
2925                  progress on caching path properties in this manner.
2926
2927                  There is no consensus on whether the route cache should
2928                  be keyed on destination host addresses alone, or allow
2929                  both host and network addresses.  Those who favor the
2930                  use of only host addresses argue that:
2931
2932                  (1)  As required in Section 3.3.1.2, Redirect messages
2933                       will generally result in entries keyed on
2934                       destination host addresses; the simplest and most
2935                       general scheme would be to use host addresses
2936                       always.
2937
2938                  (2)  The IP layer may not always know the address mask
2939                       for a network address in a complex subnetted
2940                       environment.
2941
2942                  (3)  The use of only host addresses allows the
2943                       destination address to be used as a pure 32-bit
2944                       number, which may allow the Internet architecture
2945                       to be more easily extended in the future without
2946
2947
2948
2949 Internet Engineering Task Force                                [Page 50]
2950 \f
2951
2952
2953
2954 RFC1122                      INTERNET LAYER                 October 1989
2955
2956
2957                       any change to the hosts.
2958
2959                  The opposing view is that allowing a mixture of
2960                  destination hosts and networks in the route cache:
2961
2962                  (1)  Saves memory space.
2963
2964                  (2)  Leads to a simpler data structure, easily
2965                       combining the cache with the tables of default and
2966                       static routes (see below).
2967
2968                  (3)  Provides a more useful place to cache path
2969                       properties, as discussed earlier.
2970
2971
2972             IMPLEMENTATION:
2973                  The cache needs to be large enough to include entries
2974                  for the maximum number of destination hosts that may be
2975                  in use at one time.
2976
2977                  A route cache entry may also include control
2978                  information used to choose an entry for replacement.
2979                  This might take the form of a "recently used" bit, a
2980                  use count, or a last-used timestamp, for example.  It
2981                  is recommended that it include the time of last
2982                  modification of the entry, for diagnostic purposes.
2983
2984                  An implementation may wish to reduce the overhead of
2985                  scanning the route cache for every datagram to be
2986                  transmitted.  This may be accomplished with a hash
2987                  table to speed the lookup, or by giving a connection-
2988                  oriented transport protocol a "hint" or temporary
2989                  handle on the appropriate cache entry, to be passed to
2990                  the IP layer with each subsequent datagram.
2991
2992                  Although we have described the route cache, the lists
2993                  of default gateways, and a table of static routes as
2994                  conceptually distinct, in practice they may be combined
2995                  into a single "routing table" data structure.
2996
2997          3.3.1.4  Dead Gateway Detection
2998
2999             The IP layer MUST be able to detect the failure of a "next-
3000             hop" gateway that is listed in its route cache and to choose
3001             an alternate gateway (see Section 3.3.1.5).
3002
3003             Dead gateway detection is covered in some detail in RFC-816
3004             [IP:11]. Experience to date has not produced a complete
3005
3006
3007
3008 Internet Engineering Task Force                                [Page 51]
3009 \f
3010
3011
3012
3013 RFC1122                      INTERNET LAYER                 October 1989
3014
3015
3016             algorithm which is totally satisfactory, though it has
3017             identified several forbidden paths and promising techniques.
3018
3019             *    A particular gateway SHOULD NOT be used indefinitely in
3020                  the absence of positive indications that it is
3021                  functioning.
3022
3023             *    Active probes such as "pinging" (i.e., using an ICMP
3024                  Echo Request/Reply exchange) are expensive and scale
3025                  poorly.  In particular, hosts MUST NOT actively check
3026                  the status of a first-hop gateway by simply pinging the
3027                  gateway continuously.
3028
3029             *    Even when it is the only effective way to verify a
3030                  gateway's status, pinging MUST be used only when
3031                  traffic is being sent to the gateway and when there is
3032                  no other positive indication to suggest that the
3033                  gateway is functioning.
3034
3035             *    To avoid pinging, the layers above and/or below the
3036                  Internet layer SHOULD be able to give "advice" on the
3037                  status of route cache entries when either positive
3038                  (gateway OK) or negative (gateway dead) information is
3039                  available.
3040
3041
3042             DISCUSSION:
3043                  If an implementation does not include an adequate
3044                  mechanism for detecting a dead gateway and re-routing,
3045                  a gateway failure may cause datagrams to apparently
3046                  vanish into a "black hole".  This failure can be
3047                  extremely confusing for users and difficult for network
3048                  personnel to debug.
3049
3050                  The dead-gateway detection mechanism must not cause
3051                  unacceptable load on the host, on connected networks,
3052                  or on first-hop gateway(s).  The exact constraints on
3053                  the timeliness of dead gateway detection and on
3054                  acceptable load may vary somewhat depending on the
3055                  nature of the host's mission, but a host generally
3056                  needs to detect a failed first-hop gateway quickly
3057                  enough that transport-layer connections will not break
3058                  before an alternate gateway can be selected.
3059
3060                  Passing advice from other layers of the protocol stack
3061                  complicates the interfaces between the layers, but it
3062                  is the preferred approach to dead gateway detection.
3063                  Advice can come from almost any part of the IP/TCP
3064
3065
3066
3067 Internet Engineering Task Force                                [Page 52]
3068 \f
3069
3070
3071
3072 RFC1122                      INTERNET LAYER                 October 1989
3073
3074
3075                  architecture, but it is expected to come primarily from
3076                  the transport and link layers.  Here are some possible
3077                  sources for gateway advice:
3078
3079                  o    TCP or any connection-oriented transport protocol
3080                       should be able to give negative advice, e.g.,
3081                       triggered by excessive retransmissions.
3082
3083                  o    TCP may give positive advice when (new) data is
3084                       acknowledged.  Even though the route may be
3085                       asymmetric, an ACK for new data proves that the
3086                       acknowleged data must have been transmitted
3087                       successfully.
3088
3089                  o    An ICMP Redirect message from a particular gateway
3090                       should be used as positive advice about that
3091                       gateway.
3092
3093                  o    Link-layer information that reliably detects and
3094                       reports host failures (e.g., ARPANET Destination
3095                       Dead messages) should be used as negative advice.
3096
3097                  o    Failure to ARP or to re-validate ARP mappings may
3098                       be used as negative advice for the corresponding
3099                       IP address.
3100
3101                  o    Packets arriving from a particular link-layer
3102                       address are evidence that the system at this
3103                       address is alive.  However, turning this
3104                       information into advice about gateways requires
3105                       mapping the link-layer address into an IP address,
3106                       and then checking that IP address against the
3107                       gateways pointed to by the route cache.  This is
3108                       probably prohibitively inefficient.
3109
3110                  Note that positive advice that is given for every
3111                  datagram received may cause unacceptable overhead in
3112                  the implementation.
3113
3114                  While advice might be passed using required arguments
3115                  in all interfaces to the IP layer, some transport and
3116                  application layer protocols cannot deduce the correct
3117                  advice.  These interfaces must therefore allow a
3118                  neutral value for advice, since either always-positive
3119                  or always-negative advice leads to incorrect behavior.
3120
3121                  There is another technique for dead gateway detection
3122                  that has been commonly used but is not recommended.
3123
3124
3125
3126 Internet Engineering Task Force                                [Page 53]
3127 \f
3128
3129
3130
3131 RFC1122                      INTERNET LAYER                 October 1989
3132
3133
3134                  This technique depends upon the host passively
3135                  receiving ("wiretapping") the Interior Gateway Protocol
3136                  (IGP) datagrams that the gateways are broadcasting to
3137                  each other.  This approach has the drawback that a host
3138                  needs to recognize all the interior gateway protocols
3139                  that gateways may use (see [INTRO:2]).  In addition, it
3140                  only works on a broadcast network.
3141
3142                  At present, pinging (i.e., using ICMP Echo messages) is
3143                  the mechanism for gateway probing when absolutely
3144                  required.  A successful ping guarantees that the
3145                  addressed interface and its associated machine are up,
3146                  but it does not guarantee that the machine is a gateway
3147                  as opposed to a host.  The normal inference is that if
3148                  a Redirect or other evidence indicates that a machine
3149                  was a gateway, successful pings will indicate that the
3150                  machine is still up and hence still a gateway.
3151                  However, since a host silently discards packets that a
3152                  gateway would forward or redirect, this assumption
3153                  could sometimes fail.  To avoid this problem, a new
3154                  ICMP message under development will ask "are you a
3155                  gateway?"
3156
3157             IMPLEMENTATION:
3158                  The following specific algorithm has been suggested:
3159
3160                  o    Associate a "reroute timer" with each gateway
3161                       pointed to by the route cache.  Initialize the
3162                       timer to a value Tr, which must be small enough to
3163                       allow detection of a dead gateway before transport
3164                       connections time out.
3165
3166                  o    Positive advice would reset the reroute timer to
3167                       Tr.  Negative advice would reduce or zero the
3168                       reroute timer.
3169
3170                  o    Whenever the IP layer used a particular gateway to
3171                       route a datagram, it would check the corresponding
3172                       reroute timer.  If the timer had expired (reached
3173                       zero), the IP layer would send a ping to the
3174                       gateway, followed immediately by the datagram.
3175
3176                  o    The ping (ICMP Echo) would be sent again if
3177                       necessary, up to N times.  If no ping reply was
3178                       received in N tries, the gateway would be assumed
3179                       to have failed, and a new first-hop gateway would
3180                       be chosen for all cache entries pointing to the
3181                       failed gateway.
3182
3183
3184
3185 Internet Engineering Task Force                                [Page 54]
3186 \f
3187
3188
3189
3190 RFC1122                      INTERNET LAYER                 October 1989
3191
3192
3193                  Note that the size of Tr is inversely related to the
3194                  amount of advice available.  Tr should be large enough
3195                  to insure that:
3196
3197                  *    Any pinging will be at a low level (e.g., <10%) of
3198                       all packets sent to a gateway from the host, AND
3199
3200                  *    pinging is infrequent (e.g., every 3 minutes)
3201
3202                  Since the recommended algorithm is concerned with the
3203                  gateways pointed to by route cache entries, rather than
3204                  the cache entries themselves, a two level data
3205                  structure (perhaps coordinated with ARP or similar
3206                  caches) may be desirable for implementing a route
3207                  cache.
3208
3209          3.3.1.5  New Gateway Selection
3210
3211             If the failed gateway is not the current default, the IP
3212             layer can immediately switch to a default gateway.  If it is
3213             the current default that failed, the IP layer MUST select a
3214             different default gateway (assuming more than one default is
3215             known) for the failed route and for establishing new routes.
3216
3217             DISCUSSION:
3218                  When a gateway does fail, the other gateways on the
3219                  connected network will learn of the failure through
3220                  some inter-gateway routing protocol.  However, this
3221                  will not happen instantaneously, since gateway routing
3222                  protocols typically have a settling time of 30-60
3223                  seconds.  If the host switches to an alternative
3224                  gateway before the gateways have agreed on the failure,
3225                  the new target gateway will probably forward the
3226                  datagram to the failed gateway and send a Redirect back
3227                  to the host pointing to the failed gateway (!).  The
3228                  result is likely to be a rapid oscillation in the
3229                  contents of the host's route cache during the gateway
3230                  settling period.  It has been proposed that the dead-
3231                  gateway logic should include some hysteresis mechanism
3232                  to prevent such oscillations.  However, experience has
3233                  not shown any harm from such oscillations, since
3234                  service cannot be restored to the host until the
3235                  gateways' routing information does settle down.
3236
3237             IMPLEMENTATION:
3238                  One implementation technique for choosing a new default
3239                  gateway is to simply round-robin among the default
3240                  gateways in the host's list.  Another is to rank the
3241
3242
3243
3244 Internet Engineering Task Force                                [Page 55]
3245 \f
3246
3247
3248
3249 RFC1122                      INTERNET LAYER                 October 1989
3250
3251
3252                  gateways in priority order, and when the current
3253                  default gateway is not the highest priority one, to
3254                  "ping" the higher-priority gateways slowly to detect
3255                  when they return to service.  This pinging can be at a
3256                  very low rate, e.g., 0.005 per second.
3257
3258          3.3.1.6  Initialization
3259
3260             The following information MUST be configurable:
3261
3262             (1)  IP address(es).
3263
3264             (2)  Address mask(s).
3265
3266             (3)  A list of default gateways, with a preference level.
3267
3268             A manual method of entering this configuration data MUST be
3269             provided.  In addition, a variety of methods can be used to
3270             determine this information dynamically; see the section on
3271             "Host Initialization" in [INTRO:1].
3272
3273             DISCUSSION:
3274                  Some host implementations use "wiretapping" of gateway
3275                  protocols on a broadcast network to learn what gateways
3276                  exist.  A standard method for default gateway discovery
3277                  is under development.
3278
3279       3.3.2  Reassembly
3280
3281          The IP layer MUST implement reassembly of IP datagrams.
3282
3283          We designate the largest datagram size that can be reassembled
3284          by EMTU_R ("Effective MTU to receive"); this is sometimes
3285          called the "reassembly buffer size".  EMTU_R MUST be greater
3286          than or equal to 576, SHOULD be either configurable or
3287          indefinite, and SHOULD be greater than or equal to the MTU of
3288          the connected network(s).
3289
3290          DISCUSSION:
3291               A fixed EMTU_R limit should not be built into the code
3292               because some application layer protocols require EMTU_R
3293               values larger than 576.
3294
3295          IMPLEMENTATION:
3296               An implementation may use a contiguous reassembly buffer
3297               for each datagram, or it may use a more complex data
3298               structure that places no definite limit on the reassembled
3299               datagram size; in the latter case, EMTU_R is said to be
3300
3301
3302
3303 Internet Engineering Task Force                                [Page 56]
3304 \f
3305
3306
3307
3308 RFC1122                      INTERNET LAYER                 October 1989
3309
3310
3311               "indefinite".
3312
3313               Logically, reassembly is performed by simply copying each
3314               fragment into the packet buffer at the proper offset.
3315               Note that fragments may overlap if successive
3316               retransmissions use different packetizing but the same
3317               reassembly Id.
3318
3319               The tricky part of reassembly is the bookkeeping to
3320               determine when all bytes of the datagram have been
3321               reassembled.  We recommend Clark's algorithm [IP:10] that
3322               requires no additional data space for the bookkeeping.
3323               However, note that, contrary to [IP:10], the first
3324               fragment header needs to be saved for inclusion in a
3325               possible ICMP Time Exceeded (Reassembly Timeout) message.
3326
3327          There MUST be a mechanism by which the transport layer can
3328          learn MMS_R, the maximum message size that can be received and
3329          reassembled in an IP datagram (see GET_MAXSIZES calls in
3330          Section 3.4).  If EMTU_R is not indefinite, then the value of
3331          MMS_R is given by:
3332
3333             MMS_R = EMTU_R - 20
3334
3335          since 20 is the minimum size of an IP header.
3336
3337          There MUST be a reassembly timeout.  The reassembly timeout
3338          value SHOULD be a fixed value, not set from the remaining TTL.
3339          It is recommended that the value lie between 60 seconds and 120
3340          seconds.  If this timeout expires, the partially-reassembled
3341          datagram MUST be discarded and an ICMP Time Exceeded message
3342          sent to the source host (if fragment zero has been received).
3343
3344          DISCUSSION:
3345               The IP specification says that the reassembly timeout
3346               should be the remaining TTL from the IP header, but this
3347               does not work well because gateways generally treat TTL as
3348               a simple hop count rather than an elapsed time.  If the
3349               reassembly timeout is too small, datagrams will be
3350               discarded unnecessarily, and communication may fail.  The
3351               timeout needs to be at least as large as the typical
3352               maximum delay across the Internet.  A realistic minimum
3353               reassembly timeout would be 60 seconds.
3354
3355               It has been suggested that a cache might be kept of
3356               round-trip times measured by transport protocols for
3357               various destinations, and that these values might be used
3358               to dynamically determine a reasonable reassembly timeout
3359
3360
3361
3362 Internet Engineering Task Force                                [Page 57]
3363 \f
3364
3365
3366
3367 RFC1122                      INTERNET LAYER                 October 1989
3368
3369
3370               value.  Further investigation of this approach is
3371               required.
3372
3373               If the reassembly timeout is set too high, buffer
3374               resources in the receiving host will be tied up too long,
3375               and the MSL (Maximum Segment Lifetime) [TCP:1] will be
3376               larger than necessary.  The MSL controls the maximum rate
3377               at which fragmented datagrams can be sent using distinct
3378               values of the 16-bit Ident field; a larger MSL lowers the
3379               maximum rate.  The TCP specification [TCP:1] arbitrarily
3380               assumes a value of 2 minutes for MSL.  This sets an upper
3381               limit on a reasonable reassembly timeout value.
3382
3383       3.3.3  Fragmentation
3384
3385          Optionally, the IP layer MAY implement a mechanism to fragment
3386          outgoing datagrams intentionally.
3387
3388          We designate by EMTU_S ("Effective MTU for sending") the
3389          maximum IP datagram size that may be sent, for a particular
3390          combination of IP source and destination addresses and perhaps
3391          TOS.
3392
3393          A host MUST implement a mechanism to allow the transport layer
3394          to learn MMS_S, the maximum transport-layer message size that
3395          may be sent for a given {source, destination, TOS} triplet (see
3396          GET_MAXSIZES call in Section 3.4).  If no local fragmentation
3397          is performed, the value of MMS_S will be:
3398
3399             MMS_S = EMTU_S - <IP header size>
3400
3401          and EMTU_S must be less than or equal to the MTU of the network
3402          interface corresponding to the source address of the datagram.
3403          Note that <IP header size> in this equation will be 20, unless
3404          the IP reserves space to insert IP options for its own purposes
3405          in addition to any options inserted by the transport layer.
3406
3407          A host that does not implement local fragmentation MUST ensure
3408          that the transport layer (for TCP) or the application layer
3409          (for UDP) obtains MMS_S from the IP layer and does not send a
3410          datagram exceeding MMS_S in size.
3411
3412          It is generally desirable to avoid local fragmentation and to
3413          choose EMTU_S low enough to avoid fragmentation in any gateway
3414          along the path.  In the absence of actual knowledge of the
3415          minimum MTU along the path, the IP layer SHOULD use
3416          EMTU_S <= 576 whenever the destination address is not on a
3417          connected network, and otherwise use the connected network's
3418
3419
3420
3421 Internet Engineering Task Force                                [Page 58]
3422 \f
3423
3424
3425
3426 RFC1122                      INTERNET LAYER                 October 1989
3427
3428
3429          MTU.
3430
3431          The MTU of each physical interface MUST be configurable.
3432
3433          A host IP layer implementation MAY have a configuration flag
3434          "All-Subnets-MTU", indicating that the MTU of the connected
3435          network is to be used for destinations on different subnets
3436          within the same network, but not for other networks.  Thus,
3437          this flag causes the network class mask, rather than the subnet
3438          address mask, to be used to choose an EMTU_S.  For a multihomed
3439          host, an "All-Subnets-MTU" flag is needed for each network
3440          interface.
3441
3442          DISCUSSION:
3443               Picking the correct datagram size to use when sending data
3444               is a complex topic [IP:9].
3445
3446               (a)  In general, no host is required to accept an IP
3447                    datagram larger than 576 bytes (including header and
3448                    data), so a host must not send a larger datagram
3449                    without explicit knowledge or prior arrangement with
3450                    the destination host.  Thus, MMS_S is only an upper
3451                    bound on the datagram size that a transport protocol
3452                    may send; even when MMS_S exceeds 556, the transport
3453                    layer must limit its messages to 556 bytes in the
3454                    absence of other knowledge about the destination
3455                    host.
3456
3457               (b)  Some transport protocols (e.g., TCP) provide a way to
3458                    explicitly inform the sender about the largest
3459                    datagram the other end can receive and reassemble
3460                    [IP:7].  There is no corresponding mechanism in the
3461                    IP layer.
3462
3463                    A transport protocol that assumes an EMTU_R larger
3464                    than 576 (see Section 3.3.2), can send a datagram of
3465                    this larger size to another host that implements the
3466                    same protocol.
3467
3468               (c)  Hosts should ideally limit their EMTU_S for a given
3469                    destination to the minimum MTU of all the networks
3470                    along the path, to avoid any fragmentation.  IP
3471                    fragmentation, while formally correct, can create a
3472                    serious transport protocol performance problem,
3473                    because loss of a single fragment means all the
3474                    fragments in the segment must be retransmitted
3475                    [IP:9].
3476
3477
3478
3479
3480 Internet Engineering Task Force                                [Page 59]
3481 \f
3482
3483
3484
3485 RFC1122                      INTERNET LAYER                 October 1989
3486
3487
3488               Since nearly all networks in the Internet currently
3489               support an MTU of 576 or greater, we strongly recommend
3490               the use of 576 for datagrams sent to non-local networks.
3491
3492               It has been suggested that a host could determine the MTU
3493               over a given path by sending a zero-offset datagram
3494               fragment and waiting for the receiver to time out the
3495               reassembly (which cannot complete!) and return an ICMP
3496               Time Exceeded message.  This message would include the
3497               largest remaining fragment header in its body.  More
3498               direct mechanisms are being experimented with, but have
3499               not yet been adopted (see e.g., RFC-1063).
3500
3501       3.3.4  Local Multihoming
3502
3503          3.3.4.1  Introduction
3504
3505             A multihomed host has multiple IP addresses, which we may
3506             think of as "logical interfaces".  These logical interfaces
3507             may be associated with one or more physical interfaces, and
3508             these physical interfaces may be connected to the same or
3509             different networks.
3510
3511             Here are some important cases of multihoming:
3512
3513             (a)  Multiple Logical Networks
3514
3515                  The Internet architects envisioned that each physical
3516                  network would have a single unique IP network (or
3517                  subnet) number.  However, LAN administrators have
3518                  sometimes found it useful to violate this assumption,
3519                  operating a LAN with multiple logical networks per
3520                  physical connected network.
3521
3522                  If a host connected to such a physical network is
3523                  configured to handle traffic for each of N different
3524                  logical networks, then the host will have N logical
3525                  interfaces.  These could share a single physical
3526                  interface, or might use N physical interfaces to the
3527                  same network.
3528
3529             (b)  Multiple Logical Hosts
3530
3531                  When a host has multiple IP addresses that all have the
3532                  same <Network-number> part (and the same <Subnet-
3533                  number> part, if any), the logical interfaces are known
3534                  as "logical hosts".  These logical interfaces might
3535                  share a single physical interface or might use separate
3536
3537
3538
3539 Internet Engineering Task Force                                [Page 60]
3540 \f
3541
3542
3543
3544 RFC1122                      INTERNET LAYER                 October 1989
3545
3546
3547                  physical interfaces to the same physical network.
3548
3549             (c)  Simple Multihoming
3550
3551                  In this case, each logical interface is mapped into a
3552                  separate physical interface and each physical interface
3553                  is connected to a different physical network.  The term
3554                  "multihoming" was originally applied only to this case,
3555                  but it is now applied more generally.
3556
3557                  A host with embedded gateway functionality will
3558                  typically fall into the simple multihoming case.  Note,
3559                  however, that a host may be simply multihomed without
3560                  containing an embedded gateway, i.e., without
3561                  forwarding datagrams from one connected network to
3562                  another.
3563
3564                  This case presents the most difficult routing problems.
3565                  The choice of interface (i.e., the choice of first-hop
3566                  network) may significantly affect performance or even
3567                  reachability of remote parts of the Internet.
3568
3569
3570             Finally, we note another possibility that is NOT
3571             multihoming:  one logical interface may be bound to multiple
3572             physical interfaces, in order to increase the reliability or
3573             throughput between directly connected machines by providing
3574             alternative physical paths between them.  For instance, two
3575             systems might be connected by multiple point-to-point links.
3576             We call this "link-layer multiplexing".  With link-layer
3577             multiplexing, the protocols above the link layer are unaware
3578             that multiple physical interfaces are present; the link-
3579             layer device driver is responsible for multiplexing and
3580             routing packets across the physical interfaces.
3581
3582             In the Internet protocol architecture, a transport protocol
3583             instance ("entity") has no address of its own, but instead
3584             uses a single Internet Protocol (IP) address.  This has
3585             implications for the IP, transport, and application layers,
3586             and for the interfaces between them.  In particular, the
3587             application software may have to be aware of the multiple IP
3588             addresses of a multihomed host; in other cases, the choice
3589             can be made within the network software.
3590
3591          3.3.4.2  Multihoming Requirements
3592
3593             The following general rules apply to the selection of an IP
3594             source address for sending a datagram from a multihomed
3595
3596
3597
3598 Internet Engineering Task Force                                [Page 61]
3599 \f
3600
3601
3602
3603 RFC1122                      INTERNET LAYER                 October 1989
3604
3605
3606             host.
3607
3608             (1)  If the datagram is sent in response to a received
3609                  datagram, the source address for the response SHOULD be
3610                  the specific-destination address of the request.  See
3611                  Sections 4.1.3.5 and 4.2.3.7 and the "General Issues"
3612                  section of [INTRO:1] for more specific requirements on
3613                  higher layers.
3614
3615                  Otherwise, a source address must be selected.
3616
3617             (2)  An application MUST be able to explicitly specify the
3618                  source address for initiating a connection or a
3619                  request.
3620
3621             (3)  In the absence of such a specification, the networking
3622                  software MUST choose a source address.  Rules for this
3623                  choice are described below.
3624
3625
3626             There are two key requirement issues related to multihoming:
3627
3628             (A)  A host MAY silently discard an incoming datagram whose
3629                  destination address does not correspond to the physical
3630                  interface through which it is received.
3631
3632             (B)  A host MAY restrict itself to sending (non-source-
3633                  routed) IP datagrams only through the physical
3634                  interface that corresponds to the IP source address of
3635                  the datagrams.
3636
3637
3638             DISCUSSION:
3639                  Internet host implementors have used two different
3640                  conceptual models for multihoming, briefly summarized
3641                  in the following discussion.  This document takes no
3642                  stand on which model is preferred; each seems to have a
3643                  place.  This ambivalence is reflected in the issues (A)
3644                  and (B) being optional.
3645
3646                  o    Strong ES Model
3647
3648                       The Strong ES (End System, i.e., host) model
3649                       emphasizes the host/gateway (ES/IS) distinction,
3650                       and would therefore substitute MUST for MAY in
3651                       issues (A) and (B) above.  It tends to model a
3652                       multihomed host as a set of logical hosts within
3653                       the same physical host.
3654
3655
3656
3657 Internet Engineering Task Force                                [Page 62]
3658 \f
3659
3660
3661
3662 RFC1122                      INTERNET LAYER                 October 1989
3663
3664
3665                       With respect to (A), proponents of the Strong ES
3666                       model note that automatic Internet routing
3667                       mechanisms could not route a datagram to a
3668                       physical interface that did not correspond to the
3669                       destination address.
3670
3671                       Under the Strong ES model, the route computation
3672                       for an outgoing datagram is the mapping:
3673
3674                          route(src IP addr, dest IP addr, TOS)
3675                                                         -> gateway
3676
3677                       Here the source address is included as a parameter
3678                       in order to select a gateway that is directly
3679                       reachable on the corresponding physical interface.
3680                       Note that this model logically requires that in
3681                       general there be at least one default gateway, and
3682                       preferably multiple defaults, for each IP source
3683                       address.
3684
3685                  o    Weak ES Model
3686
3687                       This view de-emphasizes the ES/IS distinction, and
3688                       would therefore substitute MUST NOT for MAY in
3689                       issues (A) and (B).  This model may be the more
3690                       natural one for hosts that wiretap gateway routing
3691                       protocols, and is necessary for hosts that have
3692                       embedded gateway functionality.
3693
3694                       The Weak ES Model may cause the Redirect mechanism
3695                       to fail.  If a datagram is sent out a physical
3696                       interface that does not correspond to the
3697                       destination address, the first-hop gateway will
3698                       not realize when it needs to send a Redirect.  On
3699                       the other hand, if the host has embedded gateway
3700                       functionality, then it has routing information
3701                       without listening to Redirects.
3702
3703                       In the Weak ES model, the route computation for an
3704                       outgoing datagram is the mapping:
3705
3706                          route(dest IP addr, TOS) -> gateway, interface
3707
3708
3709
3710
3711
3712
3713
3714
3715
3716 Internet Engineering Task Force                                [Page 63]
3717 \f
3718
3719
3720
3721 RFC1122                      INTERNET LAYER                 October 1989
3722
3723
3724          3.3.4.3  Choosing a Source Address
3725
3726             DISCUSSION:
3727                  When it sends an initial connection request (e.g., a
3728                  TCP "SYN" segment) or a datagram service request (e.g.,
3729                  a UDP-based query), the transport layer on a multihomed
3730                  host needs to know which source address to use.  If the
3731                  application does not specify it, the transport layer
3732                  must ask the IP layer to perform the conceptual
3733                  mapping:
3734
3735                      GET_SRCADDR(remote IP addr, TOS)
3736                                                -> local IP address
3737
3738                  Here TOS is the Type-of-Service value (see Section
3739                  3.2.1.6), and the result is the desired source address.
3740                  The following rules are suggested for implementing this
3741                  mapping:
3742
3743                  (a)  If the remote Internet address lies on one of the
3744                       (sub-) nets to which the host is directly
3745                       connected, a corresponding source address may be
3746                       chosen, unless the corresponding interface is
3747                       known to be down.
3748
3749                  (b)  The route cache may be consulted, to see if there
3750                       is an active route to the specified destination
3751                       network through any network interface; if so, a
3752                       local IP address corresponding to that interface
3753                       may be chosen.
3754
3755                  (c)  The table of static routes, if any (see Section
3756                       3.3.1.2) may be similarly consulted.
3757
3758                  (d)  The default gateways may be consulted.  If these
3759                       gateways are assigned to different interfaces, the
3760                       interface corresponding to the gateway with the
3761                       highest preference may be chosen.
3762
3763                  In the future, there may be a defined way for a
3764                  multihomed host to ask the gateways on all connected
3765                  networks for advice about the best network to use for a
3766                  given destination.
3767
3768             IMPLEMENTATION:
3769                  It will be noted that this process is essentially the
3770                  same as datagram routing (see Section 3.3.1), and
3771                  therefore hosts may be able to combine the
3772
3773
3774
3775 Internet Engineering Task Force                                [Page 64]
3776 \f
3777
3778
3779
3780 RFC1122                      INTERNET LAYER                 October 1989
3781
3782
3783                  implementation of the two functions.
3784
3785       3.3.5  Source Route Forwarding
3786
3787          Subject to restrictions given below, a host MAY be able to act
3788          as an intermediate hop in a source route, forwarding a source-
3789          routed datagram to the next specified hop.
3790
3791          However, in performing this gateway-like function, the host
3792          MUST obey all the relevant rules for a gateway forwarding
3793          source-routed datagrams [INTRO:2].  This includes the following
3794          specific provisions, which override the corresponding host
3795          provisions given earlier in this document:
3796
3797          (A)  TTL (ref. Section 3.2.1.7)
3798
3799               The TTL field MUST be decremented and the datagram perhaps
3800               discarded as specified for a gateway in [INTRO:2].
3801
3802          (B)  ICMP Destination Unreachable (ref. Section 3.2.2.1)
3803
3804               A host MUST be able to generate Destination Unreachable
3805               messages with the following codes:
3806
3807               4    (Fragmentation Required but DF Set) when a source-
3808                    routed datagram cannot be fragmented to fit into the
3809                    target network;
3810
3811               5    (Source Route Failed) when a source-routed datagram
3812                    cannot be forwarded, e.g., because of a routing
3813                    problem or because the next hop of a strict source
3814                    route is not on a connected network.
3815
3816          (C)  IP Source Address (ref. Section 3.2.1.3)
3817
3818               A source-routed datagram being forwarded MAY (and normally
3819               will) have a source address that is not one of the IP
3820               addresses of the forwarding host.
3821
3822          (D)  Record Route Option (ref. Section 3.2.1.8d)
3823
3824               A host that is forwarding a source-routed datagram
3825               containing a Record Route option MUST update that option,
3826               if it has room.
3827
3828          (E)  Timestamp Option (ref. Section 3.2.1.8e)
3829
3830               A host that is forwarding a source-routed datagram
3831
3832
3833
3834 Internet Engineering Task Force                                [Page 65]
3835 \f
3836
3837
3838
3839 RFC1122                      INTERNET LAYER                 October 1989
3840
3841
3842               containing a Timestamp Option MUST add the current
3843               timestamp to that option, according to the rules for this
3844               option.
3845
3846          To define the rules restricting host forwarding of source-
3847          routed datagrams, we use the term "local source-routing" if the
3848          next hop will be through the same physical interface through
3849          which the datagram arrived; otherwise, it is "non-local
3850          source-routing".
3851
3852          o    A host is permitted to perform local source-routing
3853               without restriction.
3854
3855          o    A host that supports non-local source-routing MUST have a
3856               configurable switch to disable forwarding, and this switch
3857               MUST default to disabled.
3858
3859          o    The host MUST satisfy all gateway requirements for
3860               configurable policy filters [INTRO:2] restricting non-
3861               local forwarding.
3862
3863          If a host receives a datagram with an incomplete source route
3864          but does not forward it for some reason, the host SHOULD return
3865          an ICMP Destination Unreachable (code 5, Source Route Failed)
3866          message, unless the datagram was itself an ICMP error message.
3867
3868       3.3.6  Broadcasts
3869
3870          Section 3.2.1.3 defined the four standard IP broadcast address
3871          forms:
3872
3873            Limited Broadcast:  {-1, -1}
3874
3875            Directed Broadcast:  {<Network-number>,-1}
3876
3877            Subnet Directed Broadcast:
3878                               {<Network-number>,<Subnet-number>,-1}
3879
3880            All-Subnets Directed Broadcast: {<Network-number>,-1,-1}
3881
3882          A host MUST recognize any of these forms in the destination
3883          address of an incoming datagram.
3884
3885          There is a class of hosts* that use non-standard broadcast
3886          address forms, substituting 0 for -1.  All hosts SHOULD
3887 _________________________
3888 *4.2BSD Unix and its derivatives, but not 4.3BSD.
3889
3890
3891
3892
3893 Internet Engineering Task Force                                [Page 66]
3894 \f
3895
3896
3897
3898 RFC1122                      INTERNET LAYER                 October 1989
3899
3900
3901          recognize and accept any of these non-standard broadcast
3902          addresses as the destination address of an incoming datagram.
3903          A host MAY optionally have a configuration option to choose the
3904          0 or the -1 form of broadcast address, for each physical
3905          interface, but this option SHOULD default to the standard (-1)
3906          form.
3907
3908          When a host sends a datagram to a link-layer broadcast address,
3909          the IP destination address MUST be a legal IP broadcast or IP
3910          multicast address.
3911
3912          A host SHOULD silently discard a datagram that is received via
3913          a link-layer broadcast (see Section 2.4) but does not specify
3914          an IP multicast or broadcast destination address.
3915
3916          Hosts SHOULD use the Limited Broadcast address to broadcast to
3917          a connected network.
3918
3919
3920          DISCUSSION:
3921               Using the Limited Broadcast address instead of a Directed
3922               Broadcast address may improve system robustness.  Problems
3923               are often caused by machines that do not understand the
3924               plethora of broadcast addresses (see Section 3.2.1.3), or
3925               that may have different ideas about which broadcast
3926               addresses are in use.  The prime example of the latter is
3927               machines that do not understand subnetting but are
3928               attached to a subnetted net.  Sending a Subnet Broadcast
3929               for the connected network will confuse those machines,
3930               which will see it as a message to some other host.
3931
3932               There has been discussion on whether a datagram addressed
3933               to the Limited Broadcast address ought to be sent from all
3934               the interfaces of a multihomed host.  This specification
3935               takes no stand on the issue.
3936
3937       3.3.7  IP Multicasting
3938
3939          A host SHOULD support local IP multicasting on all connected
3940          networks for which a mapping from Class D IP addresses to
3941          link-layer addresses has been specified (see below).  Support
3942          for local IP multicasting includes sending multicast datagrams,
3943          joining multicast groups and receiving multicast datagrams, and
3944          leaving multicast groups.  This implies support for all of
3945          [IP:4] except the IGMP protocol itself, which is OPTIONAL.
3946
3947
3948
3949
3950
3951
3952 Internet Engineering Task Force                                [Page 67]
3953 \f
3954
3955
3956
3957 RFC1122                      INTERNET LAYER                 October 1989
3958
3959
3960          DISCUSSION:
3961               IGMP provides gateways that are capable of multicast
3962               routing with the information required to support IP
3963               multicasting across multiple networks.  At this time,
3964               multicast-routing gateways are in the experimental stage
3965               and are not widely available.  For hosts that are not
3966               connected to networks with multicast-routing gateways or
3967               that do not need to receive multicast datagrams
3968               originating on other networks, IGMP serves no purpose and
3969               is therefore optional for now.  However, the rest of
3970               [IP:4] is currently recommended for the purpose of
3971               providing IP-layer access to local network multicast
3972               addressing, as a preferable alternative to local broadcast
3973               addressing.  It is expected that IGMP will become
3974               recommended at some future date, when multicast-routing
3975               gateways have become more widely available.
3976
3977          If IGMP is not implemented, a host SHOULD still join the "all-
3978          hosts" group (224.0.0.1) when the IP layer is initialized and
3979          remain a member for as long as the IP layer is active.
3980
3981          DISCUSSION:
3982               Joining the "all-hosts" group will support strictly local
3983               uses of multicasting, e.g., a gateway discovery protocol,
3984               even if IGMP is not implemented.
3985
3986          The mapping of IP Class D addresses to local addresses is
3987          currently specified for the following types of networks:
3988
3989          o    Ethernet/IEEE 802.3, as defined in [IP:4].
3990
3991          o    Any network that supports broadcast but not multicast,
3992               addressing: all IP Class D addresses map to the local
3993               broadcast address.
3994
3995          o    Any type of point-to-point link (e.g., SLIP or HDLC
3996               links): no mapping required.  All IP multicast datagrams
3997               are sent as-is, inside the local framing.
3998
3999          Mappings for other types of networks will be specified in the
4000          future.
4001
4002          A host SHOULD provide a way for higher-layer protocols or
4003          applications to determine which of the host's connected
4004          network(s) support IP multicast addressing.
4005
4006
4007
4008
4009
4010
4011 Internet Engineering Task Force                                [Page 68]
4012 \f
4013
4014
4015
4016 RFC1122                      INTERNET LAYER                 October 1989
4017
4018
4019       3.3.8  Error Reporting
4020
4021          Wherever practical, hosts MUST return ICMP error datagrams on
4022          detection of an error, except in those cases where returning an
4023          ICMP error message is specifically prohibited.
4024
4025          DISCUSSION:
4026               A common phenomenon in datagram networks is the "black
4027               hole disease": datagrams are sent out, but nothing comes
4028               back.  Without any error datagrams, it is difficult for
4029               the user to figure out what the problem is.
4030
4031    3.4  INTERNET/TRANSPORT LAYER INTERFACE
4032
4033       The interface between the IP layer and the transport layer MUST
4034       provide full access to all the mechanisms of the IP layer,
4035       including options, Type-of-Service, and Time-to-Live.  The
4036       transport layer MUST either have mechanisms to set these interface
4037       parameters, or provide a path to pass them through from an
4038       application, or both.
4039
4040       DISCUSSION:
4041            Applications are urged to make use of these mechanisms where
4042            applicable, even when the mechanisms are not currently
4043            effective in the Internet (e.g., TOS).  This will allow these
4044            mechanisms to be immediately useful when they do become
4045            effective, without a large amount of retrofitting of host
4046            software.
4047
4048       We now describe a conceptual interface between the transport layer
4049       and the IP layer, as a set of procedure calls.  This is an
4050       extension of the information in Section 3.3 of RFC-791 [IP:1].
4051
4052
4053       *    Send Datagram
4054
4055                 SEND(src, dst, prot, TOS, TTL, BufPTR, len, Id, DF, opt
4056                      => result )
4057
4058            where the parameters are defined in RFC-791.  Passing an Id
4059            parameter is optional; see Section 3.2.1.5.
4060
4061
4062       *    Receive Datagram
4063
4064                 RECV(BufPTR, prot
4065                      => result, src, dst, SpecDest, TOS, len, opt)
4066
4067
4068
4069
4070 Internet Engineering Task Force                                [Page 69]
4071 \f
4072
4073
4074
4075 RFC1122                      INTERNET LAYER                 October 1989
4076
4077
4078            All the parameters are defined in RFC-791, except for:
4079
4080                 SpecDest = specific-destination address of datagram
4081                             (defined in Section 3.2.1.3)
4082
4083            The result parameter dst contains the datagram's destination
4084            address.  Since this may be a broadcast or multicast address,
4085            the SpecDest parameter (not shown in RFC-791) MUST be passed.
4086            The parameter opt contains all the IP options received in the
4087            datagram; these MUST also be passed to the transport layer.
4088
4089
4090       *    Select Source Address
4091
4092                 GET_SRCADDR(remote, TOS)  -> local
4093
4094                 remote = remote IP address
4095                 TOS = Type-of-Service
4096                 local = local IP address
4097
4098            See Section 3.3.4.3.
4099
4100
4101       *    Find Maximum Datagram Sizes
4102
4103                 GET_MAXSIZES(local, remote, TOS) -> MMS_R, MMS_S
4104
4105                 MMS_R = maximum receive transport-message size.
4106                 MMS_S = maximum send transport-message size.
4107                (local, remote, TOS defined above)
4108
4109            See Sections 3.3.2 and 3.3.3.
4110
4111
4112       *    Advice on Delivery Success
4113
4114                 ADVISE_DELIVPROB(sense, local, remote, TOS)
4115
4116            Here the parameter sense is a 1-bit flag indicating whether
4117            positive or negative advice is being given; see the
4118            discussion in Section 3.3.1.4. The other parameters were
4119            defined earlier.
4120
4121
4122       *    Send ICMP Message
4123
4124                 SEND_ICMP(src, dst, TOS, TTL, BufPTR, len, Id, DF, opt)
4125                      -> result
4126
4127
4128
4129 Internet Engineering Task Force                                [Page 70]
4130 \f
4131
4132
4133
4134 RFC1122                      INTERNET LAYER                 October 1989
4135
4136
4137                 (Parameters defined in RFC-791).
4138
4139            Passing an Id parameter is optional; see Section 3.2.1.5.
4140            The transport layer MUST be able to send certain ICMP
4141            messages:  Port Unreachable or any of the query-type
4142            messages.  This function could be considered to be a special
4143            case of the SEND() call, of course; we describe it separately
4144            for clarity.
4145
4146
4147       *    Receive ICMP Message
4148
4149                 RECV_ICMP(BufPTR ) -> result, src, dst, len, opt
4150
4151                 (Parameters defined in RFC-791).
4152
4153            The IP layer MUST pass certain ICMP messages up to the
4154            appropriate transport-layer routine.  This function could be
4155            considered to be a special case of the RECV() call, of
4156            course; we describe it separately for clarity.
4157
4158            For an ICMP error message, the data that is passed up MUST
4159            include the original Internet header plus all the octets of
4160            the original message that are included in the ICMP message.
4161            This data will be used by the transport layer to locate the
4162            connection state information, if any.
4163
4164            In particular, the following ICMP messages are to be passed
4165            up:
4166
4167            o    Destination Unreachable
4168
4169            o    Source Quench
4170
4171            o    Echo Reply (to ICMP user interface, unless the Echo
4172                 Request originated in the IP layer)
4173
4174            o    Timestamp Reply (to ICMP user interface)
4175
4176            o    Time Exceeded
4177
4178
4179       DISCUSSION:
4180            In the future, there may be additions to this interface to
4181            pass path data (see Section 3.3.1.3) between the IP and
4182            transport layers.
4183
4184
4185
4186
4187
4188 Internet Engineering Task Force                                [Page 71]
4189 \f
4190
4191
4192
4193 RFC1122                      INTERNET LAYER                 October 1989
4194
4195
4196    3.5  INTERNET LAYER REQUIREMENTS SUMMARY
4197
4198
4199                                                  |        | | | |S| |
4200                                                  |        | | | |H| |F
4201                                                  |        | | | |O|M|o
4202                                                  |        | |S| |U|U|o
4203                                                  |        | |H| |L|S|t
4204                                                  |        |M|O| |D|T|n
4205                                                  |        |U|U|M| | |o
4206                                                  |        |S|L|A|N|N|t
4207                                                  |        |T|D|Y|O|O|t
4208 FEATURE                                          |SECTION | | | |T|T|e
4209 -------------------------------------------------|--------|-|-|-|-|-|--
4210                                                  |        | | | | | |
4211 Implement IP and ICMP                            |3.1     |x| | | | |
4212 Handle remote multihoming in application layer   |3.1     |x| | | | |
4213 Support local multihoming                        |3.1     | | |x| | |
4214 Meet gateway specs if forward datagrams          |3.1     |x| | | | |
4215 Configuration switch for embedded gateway        |3.1     |x| | | | |1
4216    Config switch default to non-gateway          |3.1     |x| | | | |1
4217    Auto-config based on number of interfaces     |3.1     | | | | |x|1
4218 Able to log discarded datagrams                  |3.1     | |x| | | |
4219    Record in counter                             |3.1     | |x| | | |
4220                                                  |        | | | | | |
4221 Silently discard Version != 4                    |3.2.1.1 |x| | | | |
4222 Verify IP checksum, silently discard bad dgram   |3.2.1.2 |x| | | | |
4223 Addressing:                                      |        | | | | | |
4224   Subnet addressing (RFC-950)                    |3.2.1.3 |x| | | | |
4225   Src address must be host's own IP address      |3.2.1.3 |x| | | | |
4226   Silently discard datagram with bad dest addr   |3.2.1.3 |x| | | | |
4227   Silently discard datagram with bad src addr    |3.2.1.3 |x| | | | |
4228 Support reassembly                               |3.2.1.4 |x| | | | |
4229 Retain same Id field in identical datagram       |3.2.1.5 | | |x| | |
4230                                                  |        | | | | | |
4231 TOS:                                             |        | | | | | |
4232   Allow transport layer to set TOS               |3.2.1.6 |x| | | | |
4233   Pass received TOS up to transport layer        |3.2.1.6 | |x| | | |
4234   Use RFC-795 link-layer mappings for TOS        |3.2.1.6 | | | |x| |
4235 TTL:                                             |        | | | | | |
4236   Send packet with TTL of 0                      |3.2.1.7 | | | | |x|
4237   Discard received packets with TTL < 2          |3.2.1.7 | | | | |x|
4238   Allow transport layer to set TTL               |3.2.1.7 |x| | | | |
4239   Fixed TTL is configurable                      |3.2.1.7 |x| | | | |
4240                                                  |        | | | | | |
4241 IP Options:                                      |        | | | | | |
4242   Allow transport layer to send IP options       |3.2.1.8 |x| | | | |
4243   Pass all IP options rcvd to higher layer       |3.2.1.8 |x| | | | |
4244
4245
4246
4247 Internet Engineering Task Force                                [Page 72]
4248 \f
4249
4250
4251
4252 RFC1122                      INTERNET LAYER                 October 1989
4253
4254
4255   IP layer silently ignore unknown options       |3.2.1.8 |x| | | | |
4256   Security option                                |3.2.1.8a| | |x| | |
4257   Send Stream Identifier option                  |3.2.1.8b| | | |x| |
4258   Silently ignore Stream Identifer option        |3.2.1.8b|x| | | | |
4259   Record Route option                            |3.2.1.8d| | |x| | |
4260   Timestamp option                               |3.2.1.8e| | |x| | |
4261 Source Route Option:                             |        | | | | | |
4262   Originate & terminate Source Route options     |3.2.1.8c|x| | | | |
4263   Datagram with completed SR passed up to TL     |3.2.1.8c|x| | | | |
4264   Build correct (non-redundant) return route     |3.2.1.8c|x| | | | |
4265   Send multiple SR options in one header         |3.2.1.8c| | | | |x|
4266                                                  |        | | | | | |
4267 ICMP:                                            |        | | | | | |
4268   Silently discard ICMP msg with unknown type    |3.2.2   |x| | | | |
4269   Include more than 8 octets of orig datagram    |3.2.2   | | |x| | |
4270       Included octets same as received           |3.2.2   |x| | | | |
4271   Demux ICMP Error to transport protocol         |3.2.2   |x| | | | |
4272   Send ICMP error message with TOS=0             |3.2.2   | |x| | | |
4273   Send ICMP error message for:                   |        | | | | | |
4274    - ICMP error msg                              |3.2.2   | | | | |x|
4275    - IP b'cast or IP m'cast                      |3.2.2   | | | | |x|
4276    - Link-layer b'cast                           |3.2.2   | | | | |x|
4277    - Non-initial fragment                        |3.2.2   | | | | |x|
4278    - Datagram with non-unique src address        |3.2.2   | | | | |x|
4279   Return ICMP error msgs (when not prohibited)   |3.3.8   |x| | | | |
4280                                                  |        | | | | | |
4281   Dest Unreachable:                              |        | | | | | |
4282     Generate Dest Unreachable (code 2/3)         |3.2.2.1 | |x| | | |
4283     Pass ICMP Dest Unreachable to higher layer   |3.2.2.1 |x| | | | |
4284     Higher layer act on Dest Unreach             |3.2.2.1 | |x| | | |
4285       Interpret Dest Unreach as only hint        |3.2.2.1 |x| | | | |
4286   Redirect:                                      |        | | | | | |
4287     Host send Redirect                           |3.2.2.2 | | | |x| |
4288     Update route cache when recv Redirect        |3.2.2.2 |x| | | | |
4289     Handle both Host and Net Redirects           |3.2.2.2 |x| | | | |
4290     Discard illegal Redirect                     |3.2.2.2 | |x| | | |
4291   Source Quench:                                 |        | | | | | |
4292     Send Source Quench if buffering exceeded     |3.2.2.3 | | |x| | |
4293     Pass Source Quench to higher layer           |3.2.2.3 |x| | | | |
4294     Higher layer act on Source Quench            |3.2.2.3 | |x| | | |
4295   Time Exceeded: pass to higher layer            |3.2.2.4 |x| | | | |
4296   Parameter Problem:                             |        | | | | | |
4297     Send Parameter Problem messages              |3.2.2.5 | |x| | | |
4298     Pass Parameter Problem to higher layer       |3.2.2.5 |x| | | | |
4299     Report Parameter Problem to user             |3.2.2.5 | | |x| | |
4300                                                  |        | | | | | |
4301   ICMP Echo Request or Reply:                    |        | | | | | |
4302     Echo server and Echo client                  |3.2.2.6 |x| | | | |
4303
4304
4305
4306 Internet Engineering Task Force                                [Page 73]
4307 \f
4308
4309
4310
4311 RFC1122                      INTERNET LAYER                 October 1989
4312
4313
4314     Echo client                                  |3.2.2.6 | |x| | | |
4315     Discard Echo Request to broadcast address    |3.2.2.6 | | |x| | |
4316     Discard Echo Request to multicast address    |3.2.2.6 | | |x| | |
4317     Use specific-dest addr as Echo Reply src     |3.2.2.6 |x| | | | |
4318     Send same data in Echo Reply                 |3.2.2.6 |x| | | | |
4319     Pass Echo Reply to higher layer              |3.2.2.6 |x| | | | |
4320     Reflect Record Route, Time Stamp options     |3.2.2.6 | |x| | | |
4321     Reverse and reflect Source Route option      |3.2.2.6 |x| | | | |
4322                                                  |        | | | | | |
4323   ICMP Information Request or Reply:             |3.2.2.7 | | | |x| |
4324   ICMP Timestamp and Timestamp Reply:            |3.2.2.8 | | |x| | |
4325     Minimize delay variability                   |3.2.2.8 | |x| | | |1
4326     Silently discard b'cast Timestamp            |3.2.2.8 | | |x| | |1
4327     Silently discard m'cast Timestamp            |3.2.2.8 | | |x| | |1
4328     Use specific-dest addr as TS Reply src       |3.2.2.8 |x| | | | |1
4329     Reflect Record Route, Time Stamp options     |3.2.2.6 | |x| | | |1
4330     Reverse and reflect Source Route option      |3.2.2.8 |x| | | | |1
4331     Pass Timestamp Reply to higher layer         |3.2.2.8 |x| | | | |1
4332     Obey rules for "standard value"              |3.2.2.8 |x| | | | |1
4333                                                  |        | | | | | |
4334   ICMP Address Mask Request and Reply:           |        | | | | | |
4335     Addr Mask source configurable                |3.2.2.9 |x| | | | |
4336     Support static configuration of addr mask    |3.2.2.9 |x| | | | |
4337     Get addr mask dynamically during booting     |3.2.2.9 | | |x| | |
4338     Get addr via ICMP Addr Mask Request/Reply    |3.2.2.9 | | |x| | |
4339       Retransmit Addr Mask Req if no Reply       |3.2.2.9 |x| | | | |3
4340       Assume default mask if no Reply            |3.2.2.9 | |x| | | |3
4341       Update address mask from first Reply only  |3.2.2.9 |x| | | | |3
4342     Reasonableness check on Addr Mask            |3.2.2.9 | |x| | | |
4343     Send unauthorized Addr Mask Reply msgs       |3.2.2.9 | | | | |x|
4344       Explicitly configured to be agent          |3.2.2.9 |x| | | | |
4345     Static config=> Addr-Mask-Authoritative flag |3.2.2.9 | |x| | | |
4346       Broadcast Addr Mask Reply when init.       |3.2.2.9 |x| | | | |3
4347                                                  |        | | | | | |
4348 ROUTING OUTBOUND DATAGRAMS:                      |        | | | | | |
4349   Use address mask in local/remote decision      |3.3.1.1 |x| | | | |
4350   Operate with no gateways on conn network       |3.3.1.1 |x| | | | |
4351   Maintain "route cache" of next-hop gateways    |3.3.1.2 |x| | | | |
4352   Treat Host and Net Redirect the same           |3.3.1.2 | |x| | | |
4353   If no cache entry, use default gateway         |3.3.1.2 |x| | | | |
4354     Support multiple default gateways            |3.3.1.2 |x| | | | |
4355   Provide table of static routes                 |3.3.1.2 | | |x| | |
4356     Flag: route overridable by Redirects         |3.3.1.2 | | |x| | |
4357   Key route cache on host, not net address       |3.3.1.3 | | |x| | |
4358   Include TOS in route cache                     |3.3.1.3 | |x| | | |
4359                                                  |        | | | | | |
4360   Able to detect failure of next-hop gateway     |3.3.1.4 |x| | | | |
4361   Assume route is good forever                   |3.3.1.4 | | | |x| |
4362
4363
4364
4365 Internet Engineering Task Force                                [Page 74]
4366 \f
4367
4368
4369
4370 RFC1122                      INTERNET LAYER                 October 1989
4371
4372
4373   Ping gateways continuously                     |3.3.1.4 | | | | |x|
4374   Ping only when traffic being sent              |3.3.1.4 |x| | | | |
4375   Ping only when no positive indication          |3.3.1.4 |x| | | | |
4376   Higher and lower layers give advice            |3.3.1.4 | |x| | | |
4377   Switch from failed default g'way to another    |3.3.1.5 |x| | | | |
4378   Manual method of entering config info          |3.3.1.6 |x| | | | |
4379                                                  |        | | | | | |
4380 REASSEMBLY and FRAGMENTATION:                    |        | | | | | |
4381   Able to reassemble incoming datagrams          |3.3.2   |x| | | | |
4382     At least 576 byte datagrams                  |3.3.2   |x| | | | |
4383     EMTU_R configurable or indefinite            |3.3.2   | |x| | | |
4384   Transport layer able to learn MMS_R            |3.3.2   |x| | | | |
4385   Send ICMP Time Exceeded on reassembly timeout  |3.3.2   |x| | | | |
4386     Fixed reassembly timeout value               |3.3.2   | |x| | | |
4387                                                  |        | | | | | |
4388   Pass MMS_S to higher layers                    |3.3.3   |x| | | | |
4389   Local fragmentation of outgoing packets        |3.3.3   | | |x| | |
4390      Else don't send bigger than MMS_S           |3.3.3   |x| | | | |
4391   Send max 576 to off-net destination            |3.3.3   | |x| | | |
4392   All-Subnets-MTU configuration flag             |3.3.3   | | |x| | |
4393                                                  |        | | | | | |
4394 MULTIHOMING:                                     |        | | | | | |
4395   Reply with same addr as spec-dest addr         |3.3.4.2 | |x| | | |
4396   Allow application to choose local IP addr      |3.3.4.2 |x| | | | |
4397   Silently discard d'gram in "wrong" interface   |3.3.4.2 | | |x| | |
4398   Only send d'gram through "right" interface     |3.3.4.2 | | |x| | |4
4399                                                  |        | | | | | |
4400 SOURCE-ROUTE FORWARDING:                         |        | | | | | |
4401   Forward datagram with Source Route option      |3.3.5   | | |x| | |1
4402     Obey corresponding gateway rules             |3.3.5   |x| | | | |1
4403       Update TTL by gateway rules                |3.3.5   |x| | | | |1
4404       Able to generate ICMP err code 4, 5        |3.3.5   |x| | | | |1
4405       IP src addr not local host                 |3.3.5   | | |x| | |1
4406       Update Timestamp, Record Route options     |3.3.5   |x| | | | |1
4407     Configurable switch for non-local SRing      |3.3.5   |x| | | | |1
4408       Defaults to OFF                            |3.3.5   |x| | | | |1
4409     Satisfy gwy access rules for non-local SRing |3.3.5   |x| | | | |1
4410     If not forward, send Dest Unreach (cd 5)     |3.3.5   | |x| | | |2
4411                                                  |        | | | | | |
4412 BROADCAST:                                       |        | | | | | |
4413   Broadcast addr as IP source addr               |3.2.1.3 | | | | |x|
4414   Receive 0 or -1 broadcast formats OK           |3.3.6   | |x| | | |
4415   Config'ble option to send 0 or -1 b'cast       |3.3.6   | | |x| | |
4416     Default to -1 broadcast                      |3.3.6   | |x| | | |
4417   Recognize all broadcast address formats        |3.3.6   |x| | | | |
4418   Use IP b'cast/m'cast addr in link-layer b'cast |3.3.6   |x| | | | |
4419   Silently discard link-layer-only b'cast dg's   |3.3.6   | |x| | | |
4420   Use Limited Broadcast addr for connected net   |3.3.6   | |x| | | |
4421
4422
4423
4424 Internet Engineering Task Force                                [Page 75]
4425 \f
4426
4427
4428
4429 RFC1122                      INTERNET LAYER                 October 1989
4430
4431
4432                                                  |        | | | | | |
4433 MULTICAST:                                       |        | | | | | |
4434   Support local IP multicasting (RFC-1112)       |3.3.7   | |x| | | |
4435   Support IGMP (RFC-1112)                        |3.3.7   | | |x| | |
4436   Join all-hosts group at startup                |3.3.7   | |x| | | |
4437   Higher layers learn i'face m'cast capability   |3.3.7   | |x| | | |
4438                                                  |        | | | | | |
4439 INTERFACE:                                       |        | | | | | |
4440   Allow transport layer to use all IP mechanisms |3.4     |x| | | | |
4441   Pass interface ident up to transport layer     |3.4     |x| | | | |
4442   Pass all IP options up to transport layer      |3.4     |x| | | | |
4443   Transport layer can send certain ICMP messages |3.4     |x| | | | |
4444   Pass spec'd ICMP messages up to transp. layer  |3.4     |x| | | | |
4445      Include IP hdr+8 octets or more from orig.  |3.4     |x| | | | |
4446   Able to leap tall buildings at a single bound  |3.5     | |x| | | |
4447
4448 Footnotes:
4449
4450 (1)  Only if feature is implemented.
4451
4452 (2)  This requirement is overruled if datagram is an ICMP error message.
4453
4454 (3)  Only if feature is implemented and is configured "on".
4455
4456 (4)  Unless has embedded gateway functionality or is source routed.
4457
4458
4459
4460
4461
4462
4463
4464
4465
4466
4467
4468
4469
4470
4471
4472
4473
4474
4475
4476
4477
4478
4479
4480
4481
4482
4483 Internet Engineering Task Force                                [Page 76]
4484 \f
4485
4486
4487
4488 RFC1122                  TRANSPORT LAYER -- UDP             October 1989
4489
4490
4491 4. TRANSPORT PROTOCOLS
4492
4493    4.1  USER DATAGRAM PROTOCOL -- UDP
4494
4495       4.1.1  INTRODUCTION
4496
4497          The User Datagram Protocol UDP [UDP:1] offers only a minimal
4498          transport service -- non-guaranteed datagram delivery -- and
4499          gives applications direct access to the datagram service of the
4500          IP layer.  UDP is used by applications that do not require the
4501          level of service of TCP or that wish to use communications
4502          services (e.g., multicast or broadcast delivery) not available
4503          from TCP.
4504
4505          UDP is almost a null protocol; the only services it provides
4506          over IP are checksumming of data and multiplexing by port
4507          number.  Therefore, an application program running over UDP
4508          must deal directly with end-to-end communication problems that
4509          a connection-oriented protocol would have handled -- e.g.,
4510          retransmission for reliable delivery, packetization and
4511          reassembly, flow control, congestion avoidance, etc., when
4512          these are required.  The fairly complex coupling between IP and
4513          TCP will be mirrored in the coupling between UDP and many
4514          applications using UDP.
4515
4516       4.1.2  PROTOCOL WALK-THROUGH
4517
4518          There are no known errors in the specification of UDP.
4519
4520       4.1.3  SPECIFIC ISSUES
4521
4522          4.1.3.1  Ports
4523
4524             UDP well-known ports follow the same rules as TCP well-known
4525             ports; see Section 4.2.2.1 below.
4526
4527             If a datagram arrives addressed to a UDP port for which
4528             there is no pending LISTEN call, UDP SHOULD send an ICMP
4529             Port Unreachable message.
4530
4531          4.1.3.2  IP Options
4532
4533             UDP MUST pass any IP option that it receives from the IP
4534             layer transparently to the application layer.
4535
4536             An application MUST be able to specify IP options to be sent
4537             in its UDP datagrams, and UDP MUST pass these options to the
4538             IP layer.
4539
4540
4541
4542 Internet Engineering Task Force                                [Page 77]
4543 \f
4544
4545
4546
4547 RFC1122                  TRANSPORT LAYER -- UDP             October 1989
4548
4549
4550             DISCUSSION:
4551                  At present, the only options that need be passed
4552                  through UDP are Source Route, Record Route, and Time
4553                  Stamp.  However, new options may be defined in the
4554                  future, and UDP need not and should not make any
4555                  assumptions about the format or content of options it
4556                  passes to or from the application; an exception to this
4557                  might be an IP-layer security option.
4558
4559                  An application based on UDP will need to obtain a
4560                  source route from a request datagram and supply a
4561                  reversed route for sending the corresponding reply.
4562
4563          4.1.3.3  ICMP Messages
4564
4565             UDP MUST pass to the application layer all ICMP error
4566             messages that it receives from the IP layer.  Conceptually
4567             at least, this may be accomplished with an upcall to the
4568             ERROR_REPORT routine (see Section 4.2.4.1).
4569
4570             DISCUSSION:
4571                  Note that ICMP error messages resulting from sending a
4572                  UDP datagram are received asynchronously.  A UDP-based
4573                  application that wants to receive ICMP error messages
4574                  is responsible for maintaining the state necessary to
4575                  demultiplex these messages when they arrive; for
4576                  example, the application may keep a pending receive
4577                  operation for this purpose.  The application is also
4578                  responsible to avoid confusion from a delayed ICMP
4579                  error message resulting from an earlier use of the same
4580                  port(s).
4581
4582          4.1.3.4  UDP Checksums
4583
4584             A host MUST implement the facility to generate and validate
4585             UDP checksums.  An application MAY optionally be able to
4586             control whether a UDP checksum will be generated, but it
4587             MUST default to checksumming on.
4588
4589             If a UDP datagram is received with a checksum that is non-
4590             zero and invalid, UDP MUST silently discard the datagram.
4591             An application MAY optionally be able to control whether UDP
4592             datagrams without checksums should be discarded or passed to
4593             the application.
4594
4595             DISCUSSION:
4596                  Some applications that normally run only across local
4597                  area networks have chosen to turn off UDP checksums for
4598
4599
4600
4601 Internet Engineering Task Force                                [Page 78]
4602 \f
4603
4604
4605
4606 RFC1122                  TRANSPORT LAYER -- UDP             October 1989
4607
4608
4609                  efficiency.  As a result, numerous cases of undetected
4610                  errors have been reported.  The advisability of ever
4611                  turning off UDP checksumming is very controversial.
4612
4613             IMPLEMENTATION:
4614                  There is a common implementation error in UDP
4615                  checksums.  Unlike the TCP checksum, the UDP checksum
4616                  is optional; the value zero is transmitted in the
4617                  checksum field of a UDP header to indicate the absence
4618                  of a checksum.  If the transmitter really calculates a
4619                  UDP checksum of zero, it must transmit the checksum as
4620                  all 1's (65535).  No special action is required at the
4621                  receiver, since zero and 65535 are equivalent in 1's
4622                  complement arithmetic.
4623
4624          4.1.3.5  UDP Multihoming
4625
4626             When a UDP datagram is received, its specific-destination
4627             address MUST be passed up to the application layer.
4628
4629             An application program MUST be able to specify the IP source
4630             address to be used for sending a UDP datagram or to leave it
4631             unspecified (in which case the networking software will
4632             choose an appropriate source address).  There SHOULD be a
4633             way to communicate the chosen source address up to the
4634             application layer (e.g, so that the application can later
4635             receive a reply datagram only from the corresponding
4636             interface).
4637
4638             DISCUSSION:
4639                  A request/response application that uses UDP should use
4640                  a source address for the response that is the same as
4641                  the specific destination address of the request.  See
4642                  the "General Issues" section of [INTRO:1].
4643
4644          4.1.3.6  Invalid Addresses
4645
4646             A UDP datagram received with an invalid IP source address
4647             (e.g., a broadcast or multicast address) must be discarded
4648             by UDP or by the IP layer (see Section 3.2.1.3).
4649
4650             When a host sends a UDP datagram, the source address MUST be
4651             (one of) the IP address(es) of the host.
4652
4653       4.1.4  UDP/APPLICATION LAYER INTERFACE
4654
4655          The application interface to UDP MUST provide the full services
4656          of the IP/transport interface described in Section 3.4 of this
4657
4658
4659
4660 Internet Engineering Task Force                                [Page 79]
4661 \f
4662
4663
4664
4665 RFC1122                  TRANSPORT LAYER -- UDP             October 1989
4666
4667
4668          document.  Thus, an application using UDP needs the functions
4669          of the GET_SRCADDR(), GET_MAXSIZES(), ADVISE_DELIVPROB(), and
4670          RECV_ICMP() calls described in Section 3.4.  For example,
4671          GET_MAXSIZES() can be used to learn the effective maximum UDP
4672          maximum datagram size for a particular {interface,remote
4673          host,TOS} triplet.
4674
4675          An application-layer program MUST be able to set the TTL and
4676          TOS values as well as IP options for sending a UDP datagram,
4677          and these values must be passed transparently to the IP layer.
4678          UDP MAY pass the received TOS up to the application layer.
4679
4680       4.1.5  UDP REQUIREMENTS SUMMARY
4681
4682
4683                                                  |        | | | |S| |
4684                                                  |        | | | |H| |F
4685                                                  |        | | | |O|M|o
4686                                                  |        | |S| |U|U|o
4687                                                  |        | |H| |L|S|t
4688                                                  |        |M|O| |D|T|n
4689                                                  |        |U|U|M| | |o
4690                                                  |        |S|L|A|N|N|t
4691                                                  |        |T|D|Y|O|O|t
4692 FEATURE                                          |SECTION | | | |T|T|e
4693 -------------------------------------------------|--------|-|-|-|-|-|--
4694                                                  |        | | | | | |
4695     UDP                                          |        | | | | | |
4696 -------------------------------------------------|--------|-|-|-|-|-|--
4697                                                  |        | | | | | |
4698 UDP send Port Unreachable                        |4.1.3.1 | |x| | | |
4699                                                  |        | | | | | |
4700 IP Options in UDP                                |        | | | | | |
4701  - Pass rcv'd IP options to applic layer         |4.1.3.2 |x| | | | |
4702  - Applic layer can specify IP options in Send   |4.1.3.2 |x| | | | |
4703  - UDP passes IP options down to IP layer        |4.1.3.2 |x| | | | |
4704                                                  |        | | | | | |
4705 Pass ICMP msgs up to applic layer                |4.1.3.3 |x| | | | |
4706                                                  |        | | | | | |
4707 UDP checksums:                                   |        | | | | | |
4708  - Able to generate/check checksum               |4.1.3.4 |x| | | | |
4709  - Silently discard bad checksum                 |4.1.3.4 |x| | | | |
4710  - Sender Option to not generate checksum        |4.1.3.4 | | |x| | |
4711    - Default is to checksum                      |4.1.3.4 |x| | | | |
4712  - Receiver Option to require checksum           |4.1.3.4 | | |x| | |
4713                                                  |        | | | | | |
4714 UDP Multihoming                                  |        | | | | | |
4715  - Pass spec-dest addr to application            |4.1.3.5 |x| | | | |
4716
4717
4718
4719 Internet Engineering Task Force                                [Page 80]
4720 \f
4721
4722
4723
4724 RFC1122                  TRANSPORT LAYER -- UDP             October 1989
4725
4726
4727  - Applic layer can specify Local IP addr        |4.1.3.5 |x| | | | |
4728  - Applic layer specify wild Local IP addr       |4.1.3.5 |x| | | | |
4729  - Applic layer notified of Local IP addr used   |4.1.3.5 | |x| | | |
4730                                                  |        | | | | | |
4731 Bad IP src addr silently discarded by UDP/IP     |4.1.3.6 |x| | | | |
4732 Only send valid IP source address                |4.1.3.6 |x| | | | |
4733 UDP Application Interface Services               |        | | | | | |
4734 Full IP interface of 3.4 for application         |4.1.4   |x| | | | |
4735  - Able to spec TTL, TOS, IP opts when send dg   |4.1.4   |x| | | | |
4736  - Pass received TOS up to applic layer          |4.1.4   | | |x| | |
4737
4738
4739
4740
4741
4742
4743
4744
4745
4746
4747
4748
4749
4750
4751
4752
4753
4754
4755
4756
4757
4758
4759
4760
4761
4762
4763
4764
4765
4766
4767
4768
4769
4770
4771
4772
4773
4774
4775
4776
4777
4778 Internet Engineering Task Force                                [Page 81]
4779 \f
4780
4781
4782
4783 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
4784
4785
4786    4.2  TRANSMISSION CONTROL PROTOCOL -- TCP
4787
4788       4.2.1  INTRODUCTION
4789
4790          The Transmission Control Protocol TCP [TCP:1] is the primary
4791          virtual-circuit transport protocol for the Internet suite.  TCP
4792          provides reliable, in-sequence delivery of a full-duplex stream
4793          of octets (8-bit bytes).  TCP is used by those applications
4794          needing reliable, connection-oriented transport service, e.g.,
4795          mail (SMTP), file transfer (FTP), and virtual terminal service
4796          (Telnet); requirements for these application-layer protocols
4797          are described in [INTRO:1].
4798
4799       4.2.2  PROTOCOL WALK-THROUGH
4800
4801          4.2.2.1  Well-Known Ports: RFC-793 Section 2.7
4802
4803             DISCUSSION:
4804                  TCP reserves port numbers in the range 0-255 for
4805                  "well-known" ports, used to access services that are
4806                  standardized across the Internet.  The remainder of the
4807                  port space can be freely allocated to application
4808                  processes.  Current well-known port definitions are
4809                  listed in the RFC entitled "Assigned Numbers"
4810                  [INTRO:6].  A prerequisite for defining a new well-
4811                  known port is an RFC documenting the proposed service
4812                  in enough detail to allow new implementations.
4813
4814                  Some systems extend this notion by adding a third
4815                  subdivision of the TCP port space: reserved ports,
4816                  which are generally used for operating-system-specific
4817                  services.  For example, reserved ports might fall
4818                  between 256 and some system-dependent upper limit.
4819                  Some systems further choose to protect well-known and
4820                  reserved ports by permitting only privileged users to
4821                  open TCP connections with those port values.  This is
4822                  perfectly reasonable as long as the host does not
4823                  assume that all hosts protect their low-numbered ports
4824                  in this manner.
4825
4826          4.2.2.2  Use of Push: RFC-793 Section 2.8
4827
4828             When an application issues a series of SEND calls without
4829             setting the PUSH flag, the TCP MAY aggregate the data
4830             internally without sending it.  Similarly, when a series of
4831             segments is received without the PSH bit, a TCP MAY queue
4832             the data internally without passing it to the receiving
4833             application.
4834
4835
4836
4837 Internet Engineering Task Force                                [Page 82]
4838 \f
4839
4840
4841
4842 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
4843
4844
4845             The PSH bit is not a record marker and is independent of
4846             segment boundaries.  The transmitter SHOULD collapse
4847             successive PSH bits when it packetizes data, to send the
4848             largest possible segment.
4849
4850             A TCP MAY implement PUSH flags on SEND calls.  If PUSH flags
4851             are not implemented, then the sending TCP: (1) must not
4852             buffer data indefinitely, and (2) MUST set the PSH bit in
4853             the last buffered segment (i.e., when there is no more
4854             queued data to be sent).
4855
4856             The discussion in RFC-793 on pages 48, 50, and 74
4857             erroneously implies that a received PSH flag must be passed
4858             to the application layer.  Passing a received PSH flag to
4859             the application layer is now OPTIONAL.
4860
4861             An application program is logically required to set the PUSH
4862             flag in a SEND call whenever it needs to force delivery of
4863             the data to avoid a communication deadlock.  However, a TCP
4864             SHOULD send a maximum-sized segment whenever possible, to
4865             improve performance (see Section 4.2.3.4).
4866
4867             DISCUSSION:
4868                  When the PUSH flag is not implemented on SEND calls,
4869                  i.e., when the application/TCP interface uses a pure
4870                  streaming model, responsibility for aggregating any
4871                  tiny data fragments to form reasonable sized segments
4872                  is partially borne by the application layer.
4873
4874                  Generally, an interactive application protocol must set
4875                  the PUSH flag at least in the last SEND call in each
4876                  command or response sequence.  A bulk transfer protocol
4877                  like FTP should set the PUSH flag on the last segment
4878                  of a file or when necessary to prevent buffer deadlock.
4879
4880                  At the receiver, the PSH bit forces buffered data to be
4881                  delivered to the application (even if less than a full
4882                  buffer has been received). Conversely, the lack of a
4883                  PSH bit can be used to avoid unnecessary wakeup calls
4884                  to the application process; this can be an important
4885                  performance optimization for large timesharing hosts.
4886                  Passing the PSH bit to the receiving application allows
4887                  an analogous optimization within the application.
4888
4889          4.2.2.3  Window Size: RFC-793 Section 3.1
4890
4891             The window size MUST be treated as an unsigned number, or
4892             else large window sizes will appear like negative windows
4893
4894
4895
4896 Internet Engineering Task Force                                [Page 83]
4897 \f
4898
4899
4900
4901 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
4902
4903
4904             and TCP will not work.  It is RECOMMENDED that
4905             implementations reserve 32-bit fields for the send and
4906             receive window sizes in the connection record and do all
4907             window computations with 32 bits.
4908
4909             DISCUSSION:
4910                  It is known that the window field in the TCP header is
4911                  too small for high-speed, long-delay paths.
4912                  Experimental TCP options have been defined to extend
4913                  the window size; see for example [TCP:11].  In
4914                  anticipation of the adoption of such an extension, TCP
4915                  implementors should treat windows as 32 bits.
4916
4917          4.2.2.4  Urgent Pointer: RFC-793 Section 3.1
4918
4919             The second sentence is in error: the urgent pointer points
4920             to the sequence number of the LAST octet (not LAST+1) in a
4921             sequence of urgent data.  The description on page 56 (last
4922             sentence) is correct.
4923
4924             A TCP MUST support a sequence of urgent data of any length.
4925
4926             A TCP MUST inform the application layer asynchronously
4927             whenever it receives an Urgent pointer and there was
4928             previously no pending urgent data, or whenever the Urgent
4929             pointer advances in the data stream.  There MUST be a way
4930             for the application to learn how much urgent data remains to
4931             be read from the connection, or at least to determine
4932             whether or not more urgent data remains to be read.
4933
4934             DISCUSSION:
4935                  Although the Urgent mechanism may be used for any
4936                  application, it is normally used to send "interrupt"-
4937                  type commands to a Telnet program (see "Using Telnet
4938                  Synch Sequence" section in [INTRO:1]).
4939
4940                  The asynchronous or "out-of-band" notification will
4941                  allow the application to go into "urgent mode", reading
4942                  data from the TCP connection.  This allows control
4943                  commands to be sent to an application whose normal
4944                  input buffers are full of unprocessed data.
4945
4946             IMPLEMENTATION:
4947                  The generic ERROR-REPORT() upcall described in Section
4948                  4.2.4.1 is a possible mechanism for informing the
4949                  application of the arrival of urgent data.
4950
4951
4952
4953
4954
4955 Internet Engineering Task Force                                [Page 84]
4956 \f
4957
4958
4959
4960 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
4961
4962
4963          4.2.2.5  TCP Options: RFC-793 Section 3.1
4964
4965             A TCP MUST be able to receive a TCP option in any segment.
4966             A TCP MUST ignore without error any TCP option it does not
4967             implement, assuming that the option has a length field (all
4968             TCP options defined in the future will have length fields).
4969             TCP MUST be prepared to handle an illegal option length
4970             (e.g., zero) without crashing; a suggested procedure is to
4971             reset the connection and log the reason.
4972
4973          4.2.2.6  Maximum Segment Size Option: RFC-793 Section 3.1
4974
4975             TCP MUST implement both sending and receiving the Maximum
4976             Segment Size option [TCP:4].
4977
4978             TCP SHOULD send an MSS (Maximum Segment Size) option in
4979             every SYN segment when its receive MSS differs from the
4980             default 536, and MAY send it always.
4981
4982             If an MSS option is not received at connection setup, TCP
4983             MUST assume a default send MSS of 536 (576-40) [TCP:4].
4984
4985             The maximum size of a segment that TCP really sends, the
4986             "effective send MSS," MUST be the smaller of the send MSS
4987             (which reflects the available reassembly buffer size at the
4988             remote host) and the largest size permitted by the IP layer:
4989
4990                Eff.snd.MSS =
4991
4992                   min(SendMSS+20, MMS_S) - TCPhdrsize - IPoptionsize
4993
4994             where:
4995
4996             *    SendMSS is the MSS value received from the remote host,
4997                  or the default 536 if no MSS option is received.
4998
4999             *    MMS_S is the maximum size for a transport-layer message
5000                  that TCP may send.
5001
5002             *    TCPhdrsize is the size of the TCP header; this is
5003                  normally 20, but may be larger if TCP options are to be
5004                  sent.
5005
5006             *    IPoptionsize is the size of any IP options that TCP
5007                  will pass to the IP layer with the current message.
5008
5009
5010             The MSS value to be sent in an MSS option must be less than
5011
5012
5013
5014 Internet Engineering Task Force                                [Page 85]
5015 \f
5016
5017
5018
5019 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5020
5021
5022             or equal to:
5023
5024                MMS_R - 20
5025
5026             where MMS_R is the maximum size for a transport-layer
5027             message that can be received (and reassembled).  TCP obtains
5028             MMS_R and MMS_S from the IP layer; see the generic call
5029             GET_MAXSIZES in Section 3.4.
5030
5031             DISCUSSION:
5032                  The choice of TCP segment size has a strong effect on
5033                  performance.  Larger segments increase throughput by
5034                  amortizing header size and per-datagram processing
5035                  overhead over more data bytes; however, if the packet
5036                  is so large that it causes IP fragmentation, efficiency
5037                  drops sharply if any fragments are lost [IP:9].
5038
5039                  Some TCP implementations send an MSS option only if the
5040                  destination host is on a non-connected network.
5041                  However, in general the TCP layer may not have the
5042                  appropriate information to make this decision, so it is
5043                  preferable to leave to the IP layer the task of
5044                  determining a suitable MTU for the Internet path.  We
5045                  therefore recommend that TCP always send the option (if
5046                  not 536) and that the IP layer determine MMS_R as
5047                  specified in 3.3.3 and 3.4.  A proposed IP-layer
5048                  mechanism to measure the MTU would then modify the IP
5049                  layer without changing TCP.
5050
5051          4.2.2.7  TCP Checksum: RFC-793 Section 3.1
5052
5053             Unlike the UDP checksum (see Section 4.1.3.4), the TCP
5054             checksum is never optional.  The sender MUST generate it and
5055             the receiver MUST check it.
5056
5057          4.2.2.8  TCP Connection State Diagram: RFC-793 Section 3.2,
5058             page 23
5059
5060             There are several problems with this diagram:
5061
5062             (a)  The arrow from SYN-SENT to SYN-RCVD should be labeled
5063                  with "snd SYN,ACK", to agree with the text on page 68
5064                  and with Figure 8.
5065
5066             (b)  There could be an arrow from SYN-RCVD state to LISTEN
5067                  state, conditioned on receiving a RST after a passive
5068                  open (see text page 70).
5069
5070
5071
5072
5073 Internet Engineering Task Force                                [Page 86]
5074 \f
5075
5076
5077
5078 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5079
5080
5081             (c)  It is possible to go directly from FIN-WAIT-1 to the
5082                  TIME-WAIT state (see page 75 of the spec).
5083
5084
5085          4.2.2.9  Initial Sequence Number Selection: RFC-793 Section
5086             3.3, page 27
5087
5088             A TCP MUST use the specified clock-driven selection of
5089             initial sequence numbers.
5090
5091          4.2.2.10  Simultaneous Open Attempts: RFC-793 Section 3.4, page
5092             32
5093
5094             There is an error in Figure 8: the packet on line 7 should
5095             be identical to the packet on line 5.
5096
5097             A TCP MUST support simultaneous open attempts.
5098
5099             DISCUSSION:
5100                  It sometimes surprises implementors that if two
5101                  applications attempt to simultaneously connect to each
5102                  other, only one connection is generated instead of two.
5103                  This was an intentional design decision; don't try to
5104                  "fix" it.
5105
5106          4.2.2.11  Recovery from Old Duplicate SYN: RFC-793 Section 3.4,
5107             page 33
5108
5109             Note that a TCP implementation MUST keep track of whether a
5110             connection has reached SYN_RCVD state as the result of a
5111             passive OPEN or an active OPEN.
5112
5113          4.2.2.12  RST Segment: RFC-793 Section 3.4
5114
5115             A TCP SHOULD allow a received RST segment to include data.
5116
5117             DISCUSSION
5118                  It has been suggested that a RST segment could contain
5119                  ASCII text that encoded and explained the cause of the
5120                  RST.  No standard has yet been established for such
5121                  data.
5122
5123          4.2.2.13  Closing a Connection: RFC-793 Section 3.5
5124
5125             A TCP connection may terminate in two ways: (1) the normal
5126             TCP close sequence using a FIN handshake, and (2) an "abort"
5127             in which one or more RST segments are sent and the
5128             connection state is immediately discarded.  If a TCP
5129
5130
5131
5132 Internet Engineering Task Force                                [Page 87]
5133 \f
5134
5135
5136
5137 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5138
5139
5140             connection is closed by the remote site, the local
5141             application MUST be informed whether it closed normally or
5142             was aborted.
5143
5144             The normal TCP close sequence delivers buffered data
5145             reliably in both directions.  Since the two directions of a
5146             TCP connection are closed independently, it is possible for
5147             a connection to be "half closed," i.e., closed in only one
5148             direction, and a host is permitted to continue sending data
5149             in the open direction on a half-closed connection.
5150
5151             A host MAY implement a "half-duplex" TCP close sequence, so
5152             that an application that has called CLOSE cannot continue to
5153             read data from the connection.  If such a host issues a
5154             CLOSE call while received data is still pending in TCP, or
5155             if new data is received after CLOSE is called, its TCP
5156             SHOULD send a RST to show that data was lost.
5157
5158             When a connection is closed actively, it MUST linger in
5159             TIME-WAIT state for a time 2xMSL (Maximum Segment Lifetime).
5160             However, it MAY accept a new SYN from the remote TCP to
5161             reopen the connection directly from TIME-WAIT state, if it:
5162
5163             (1)  assigns its initial sequence number for the new
5164                  connection to be larger than the largest sequence
5165                  number it used on the previous connection incarnation,
5166                  and
5167
5168             (2)  returns to TIME-WAIT state if the SYN turns out to be
5169                  an old duplicate.
5170
5171
5172             DISCUSSION:
5173                  TCP's full-duplex data-preserving close is a feature
5174                  that is not included in the analogous ISO transport
5175                  protocol TP4.
5176
5177                  Some systems have not implemented half-closed
5178                  connections, presumably because they do not fit into
5179                  the I/O model of their particular operating system.  On
5180                  these systems, once an application has called CLOSE, it
5181                  can no longer read input data from the connection; this
5182                  is referred to as a "half-duplex" TCP close sequence.
5183
5184                  The graceful close algorithm of TCP requires that the
5185                  connection state remain defined on (at least)  one end
5186                  of the connection, for a timeout period of 2xMSL, i.e.,
5187                  4 minutes.  During this period, the (remote socket,
5188
5189
5190
5191 Internet Engineering Task Force                                [Page 88]
5192 \f
5193
5194
5195
5196 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5197
5198
5199                  local socket) pair that defines the connection is busy
5200                  and cannot be reused.  To shorten the time that a given
5201                  port pair is tied up, some TCPs allow a new SYN to be
5202                  accepted in TIME-WAIT state.
5203
5204          4.2.2.14  Data Communication: RFC-793 Section 3.7, page 40
5205
5206             Since RFC-793 was written, there has been extensive work on
5207             TCP algorithms to achieve efficient data communication.
5208             Later sections of the present document describe required and
5209             recommended TCP algorithms to determine when to send data
5210             (Section 4.2.3.4), when to send an acknowledgment (Section
5211             4.2.3.2), and when to update the window (Section 4.2.3.3).
5212
5213             DISCUSSION:
5214                  One important performance issue is "Silly Window
5215                  Syndrome" or "SWS" [TCP:5], a stable pattern of small
5216                  incremental window movements resulting in extremely
5217                  poor TCP performance.  Algorithms to avoid SWS are
5218                  described below for both the sending side (Section
5219                  4.2.3.4) and the receiving side (Section 4.2.3.3).
5220
5221                  In brief, SWS is caused by the receiver advancing the
5222                  right window edge whenever it has any new buffer space
5223                  available to receive data and by the sender using any
5224                  incremental window, no matter how small, to send more
5225                  data [TCP:5].  The result can be a stable pattern of
5226                  sending tiny data segments, even though both sender and
5227                  receiver have a large total buffer space for the
5228                  connection.  SWS can only occur during the transmission
5229                  of a large amount of data; if the connection goes
5230                  quiescent, the problem will disappear.  It is caused by
5231                  typical straightforward implementation of window
5232                  management, but the sender and receiver algorithms
5233                  given below will avoid it.
5234
5235                  Another important TCP performance issue is that some
5236                  applications, especially remote login to character-at-
5237                  a-time hosts, tend to send streams of one-octet data
5238                  segments.  To avoid deadlocks, every TCP SEND call from
5239                  such applications must be "pushed", either explicitly
5240                  by the application or else implicitly by TCP.  The
5241                  result may be a stream of TCP segments that contain one
5242                  data octet each, which makes very inefficient use of
5243                  the Internet and contributes to Internet congestion.
5244                  The Nagle Algorithm described in Section 4.2.3.4
5245                  provides a simple and effective solution to this
5246                  problem.  It does have the effect of clumping
5247
5248
5249
5250 Internet Engineering Task Force                                [Page 89]
5251 \f
5252
5253
5254
5255 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5256
5257
5258                  characters over Telnet connections; this may initially
5259                  surprise users accustomed to single-character echo, but
5260                  user acceptance has not been a problem.
5261
5262                  Note that the Nagle algorithm and the send SWS
5263                  avoidance algorithm play complementary roles in
5264                  improving performance.  The Nagle algorithm discourages
5265                  sending tiny segments when the data to be sent
5266                  increases in small increments, while the SWS avoidance
5267                  algorithm discourages small segments resulting from the
5268                  right window edge advancing in small increments.
5269
5270                  A careless implementation can send two or more
5271                  acknowledgment segments per data segment received.  For
5272                  example, suppose the receiver acknowledges every data
5273                  segment immediately.  When the application program
5274                  subsequently consumes the data and increases the
5275                  available receive buffer space again, the receiver may
5276                  send a second acknowledgment segment to update the
5277                  window at the sender.  The extreme case occurs with
5278                  single-character segments on TCP connections using the
5279                  Telnet protocol for remote login service.  Some
5280                  implementations have been observed in which each
5281                  incoming 1-character segment generates three return
5282                  segments: (1) the acknowledgment, (2) a one byte
5283                  increase in the window, and (3) the echoed character,
5284                  respectively.
5285
5286          4.2.2.15  Retransmission Timeout: RFC-793 Section 3.7, page 41
5287
5288             The algorithm suggested in RFC-793 for calculating the
5289             retransmission timeout is now known to be inadequate; see
5290             Section 4.2.3.1 below.
5291
5292             Recent work by Jacobson [TCP:7] on Internet congestion and
5293             TCP retransmission stability has produced a transmission
5294             algorithm combining "slow start" with "congestion
5295             avoidance".  A TCP MUST implement this algorithm.
5296
5297             If a retransmitted packet is identical to the original
5298             packet (which implies not only that the data boundaries have
5299             not changed, but also that the window and acknowledgment
5300             fields of the header have not changed), then the same IP
5301             Identification field MAY be used (see Section 3.2.1.5).
5302
5303             IMPLEMENTATION:
5304                  Some TCP implementors have chosen to "packetize" the
5305                  data stream, i.e., to pick segment boundaries when
5306
5307
5308
5309 Internet Engineering Task Force                                [Page 90]
5310 \f
5311
5312
5313
5314 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5315
5316
5317                  segments are originally sent and to queue these
5318                  segments in a "retransmission queue" until they are
5319                  acknowledged.  Another design (which may be simpler) is
5320                  to defer packetizing until each time data is
5321                  transmitted or retransmitted, so there will be no
5322                  segment retransmission queue.
5323
5324                  In an implementation with a segment retransmission
5325                  queue, TCP performance may be enhanced by repacketizing
5326                  the segments awaiting acknowledgment when the first
5327                  retransmission timeout occurs.  That is, the
5328                  outstanding segments that fitted would be combined into
5329                  one maximum-sized segment, with a new IP Identification
5330                  value.  The TCP would then retain this combined segment
5331                  in the retransmit queue until it was acknowledged.
5332                  However, if the first two segments in the
5333                  retransmission queue totalled more than one maximum-
5334                  sized segment, the TCP would retransmit only the first
5335                  segment using the original IP Identification field.
5336
5337          4.2.2.16  Managing the Window: RFC-793 Section 3.7, page 41
5338
5339             A TCP receiver SHOULD NOT shrink the window, i.e., move the
5340             right window edge to the left.  However, a sending TCP MUST
5341             be robust against window shrinking, which may cause the
5342             "useable window" (see Section 4.2.3.4) to become negative.
5343
5344             If this happens, the sender SHOULD NOT send new data, but
5345             SHOULD retransmit normally the old unacknowledged data
5346             between SND.UNA and SND.UNA+SND.WND.  The sender MAY also
5347             retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT
5348             time out the connection if data beyond the right window edge
5349             is not acknowledged.  If the window shrinks to zero, the TCP
5350             MUST probe it in the standard way (see next Section).
5351
5352             DISCUSSION:
5353                  Many TCP implementations become confused if the window
5354                  shrinks from the right after data has been sent into a
5355                  larger window.  Note that TCP has a heuristic to select
5356                  the latest window update despite possible datagram
5357                  reordering; as a result, it may ignore a window update
5358                  with a smaller window than previously offered if
5359                  neither the sequence number nor the acknowledgment
5360                  number is increased.
5361
5362
5363
5364
5365
5366
5367
5368 Internet Engineering Task Force                                [Page 91]
5369 \f
5370
5371
5372
5373 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5374
5375
5376          4.2.2.17  Probing Zero Windows: RFC-793 Section 3.7, page 42
5377
5378             Probing of zero (offered) windows MUST be supported.
5379
5380             A TCP MAY keep its offered receive window closed
5381             indefinitely.  As long as the receiving TCP continues to
5382             send acknowledgments in response to the probe segments, the
5383             sending TCP MUST allow the connection to stay open.
5384
5385             DISCUSSION:
5386                  It is extremely important to remember that ACK
5387                  (acknowledgment) segments that contain no data are not
5388                  reliably transmitted by TCP.  If zero window probing is
5389                  not supported, a connection may hang forever when an
5390                  ACK segment that re-opens the window is lost.
5391
5392                  The delay in opening a zero window generally occurs
5393                  when the receiving application stops taking data from
5394                  its TCP.  For example, consider a printer daemon
5395                  application, stopped because the printer ran out of
5396                  paper.
5397
5398             The transmitting host SHOULD send the first zero-window
5399             probe when a zero window has existed for the retransmission
5400             timeout period (see Section 4.2.2.15), and SHOULD increase
5401             exponentially the interval between successive probes.
5402
5403             DISCUSSION:
5404                  This procedure minimizes delay if the zero-window
5405                  condition is due to a lost ACK segment containing a
5406                  window-opening update.  Exponential backoff is
5407                  recommended, possibly with some maximum interval not
5408                  specified here.  This procedure is similar to that of
5409                  the retransmission algorithm, and it may be possible to
5410                  combine the two procedures in the implementation.
5411
5412          4.2.2.18  Passive OPEN Calls:  RFC-793 Section 3.8
5413
5414             Every passive OPEN call either creates a new connection
5415             record in LISTEN state, or it returns an error; it MUST NOT
5416             affect any previously created connection record.
5417
5418             A TCP that supports multiple concurrent users MUST provide
5419             an OPEN call that will functionally allow an application to
5420             LISTEN on a port while a connection block with the same
5421             local port is in SYN-SENT or SYN-RECEIVED state.
5422
5423             DISCUSSION:
5424
5425
5426
5427 Internet Engineering Task Force                                [Page 92]
5428 \f
5429
5430
5431
5432 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5433
5434
5435                  Some applications (e.g., SMTP servers) may need to
5436                  handle multiple connection attempts at about the same
5437                  time.  The probability of a connection attempt failing
5438                  is reduced by giving the application some means of
5439                  listening for a new connection at the same time that an
5440                  earlier connection attempt is going through the three-
5441                  way handshake.
5442
5443             IMPLEMENTATION:
5444                  Acceptable implementations of concurrent opens may
5445                  permit multiple passive OPEN calls, or they may allow
5446                  "cloning" of LISTEN-state connections from a single
5447                  passive OPEN call.
5448
5449          4.2.2.19  Time to Live: RFC-793 Section 3.9, page 52
5450
5451             RFC-793 specified that TCP was to request the IP layer to
5452             send TCP segments with TTL = 60.  This is obsolete; the TTL
5453             value used to send TCP segments MUST be configurable.  See
5454             Section 3.2.1.7 for discussion.
5455
5456          4.2.2.20  Event Processing: RFC-793 Section 3.9
5457
5458             While it is not strictly required, a TCP SHOULD be capable
5459             of queueing out-of-order TCP segments.  Change the "may" in
5460             the last sentence of the first paragraph on page 70 to
5461             "should".
5462
5463             DISCUSSION:
5464                  Some small-host implementations have omitted segment
5465                  queueing because of limited buffer space.  This
5466                  omission may be expected to adversely affect TCP
5467                  throughput, since loss of a single segment causes all
5468                  later segments to appear to be "out of sequence".
5469
5470             In general, the processing of received segments MUST be
5471             implemented to aggregate ACK segments whenever possible.
5472             For example, if the TCP is processing a series of queued
5473             segments, it MUST process them all before sending any ACK
5474             segments.
5475
5476             Here are some detailed error corrections and notes on the
5477             Event Processing section of RFC-793.
5478
5479             (a)  CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK
5480                  state, not CLOSING.
5481
5482             (b)  LISTEN state, check for SYN (pp. 65, 66): With a SYN
5483
5484
5485
5486 Internet Engineering Task Force                                [Page 93]
5487 \f
5488
5489
5490
5491 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5492
5493
5494                  bit, if the security/compartment or the precedence is
5495                  wrong for the segment, a reset is sent.  The wrong form
5496                  of reset is shown in the text; it should be:
5497
5498                    <SEQ=0><ACK=SEG.SEQ+SEG.LEN><CTL=RST,ACK>
5499
5500
5501             (c)  SYN-SENT state, Check for SYN, p. 68: When the
5502                  connection enters ESTABLISHED state, the following
5503                  variables must be set:
5504                     SND.WND <- SEG.WND
5505                     SND.WL1 <- SEG.SEQ
5506                     SND.WL2 <- SEG.ACK
5507
5508
5509             (d)  Check security and precedence, p. 71: The first heading
5510                  "ESTABLISHED STATE" should really be a list of all
5511                  states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-
5512                  1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and
5513                  TIME-WAIT.
5514
5515             (e)  Check SYN bit, p. 71:  "In SYN-RECEIVED state and if
5516                  the connection was initiated with a passive OPEN, then
5517                  return this connection to the LISTEN state and return.
5518                  Otherwise...".
5519
5520             (f)  Check ACK field, SYN-RECEIVED state, p. 72: When the
5521                  connection enters ESTABLISHED state, the variables
5522                  listed in (c) must be set.
5523
5524             (g)  Check ACK field, ESTABLISHED state, p. 72: The ACK is a
5525                  duplicate if SEG.ACK =< SND.UNA (the = was omitted).
5526                  Similarly, the window should be updated if: SND.UNA =<
5527                  SEG.ACK =< SND.NXT.
5528
5529             (h)  USER TIMEOUT, p. 77:
5530
5531                  It would be better to notify the application of the
5532                  timeout rather than letting TCP force the connection
5533                  closed.  However, see also Section 4.2.3.5.
5534
5535
5536          4.2.2.21  Acknowledging Queued Segments: RFC-793 Section 3.9
5537
5538             A TCP MAY send an ACK segment acknowledging RCV.NXT when a
5539             valid segment arrives that is in the window but not at the
5540             left window edge.
5541
5542
5543
5544
5545 Internet Engineering Task Force                                [Page 94]
5546 \f
5547
5548
5549
5550 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5551
5552
5553             DISCUSSION:
5554                  RFC-793 (see page 74) was ambiguous about whether or
5555                  not an ACK segment should be sent when an out-of-order
5556                  segment was received, i.e., when SEG.SEQ was unequal to
5557                  RCV.NXT.
5558
5559                  One reason for ACKing out-of-order segments might be to
5560                  support an experimental algorithm known as "fast
5561                  retransmit".   With this algorithm, the sender uses the
5562                  "redundant" ACK's to deduce that a segment has been
5563                  lost before the retransmission timer has expired.  It
5564                  counts the number of times an ACK has been received
5565                  with the same value of SEG.ACK and with the same right
5566                  window edge.  If more than a threshold number of such
5567                  ACK's is received, then the segment containing the
5568                  octets starting at SEG.ACK is assumed to have been lost
5569                  and is retransmitted, without awaiting a timeout.  The
5570                  threshold is chosen to compensate for the maximum
5571                  likely segment reordering in the Internet.  There is
5572                  not yet enough experience with the fast retransmit
5573                  algorithm to determine how useful it is.
5574
5575       4.2.3  SPECIFIC ISSUES
5576
5577          4.2.3.1  Retransmission Timeout Calculation
5578
5579             A host TCP MUST implement Karn's algorithm and Jacobson's
5580             algorithm for computing the retransmission timeout ("RTO").
5581
5582             o    Jacobson's algorithm for computing the smoothed round-
5583                  trip ("RTT") time incorporates a simple measure of the
5584                  variance [TCP:7].
5585
5586             o    Karn's algorithm for selecting RTT measurements ensures
5587                  that ambiguous round-trip times will not corrupt the
5588                  calculation of the smoothed round-trip time [TCP:6].
5589
5590             This implementation also MUST include "exponential backoff"
5591             for successive RTO values for the same segment.
5592             Retransmission of SYN segments SHOULD use the same algorithm
5593             as data segments.
5594
5595             DISCUSSION:
5596                  There were two known problems with the RTO calculations
5597                  specified in RFC-793.  First, the accurate measurement
5598                  of RTTs is difficult when there are retransmissions.
5599                  Second, the algorithm to compute the smoothed round-
5600                  trip time is inadequate [TCP:7], because it incorrectly
5601
5602
5603
5604 Internet Engineering Task Force                                [Page 95]
5605 \f
5606
5607
5608
5609 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5610
5611
5612                  assumed that the variance in RTT values would be small
5613                  and constant.  These problems were solved by Karn's and
5614                  Jacobson's algorithm, respectively.
5615
5616                  The performance increase resulting from the use of
5617                  these improvements varies from noticeable to dramatic.
5618                  Jacobson's algorithm for incorporating the measured RTT
5619                  variance is especially important on a low-speed link,
5620                  where the natural variation of packet sizes causes a
5621                  large variation in RTT.  One vendor found link
5622                  utilization on a 9.6kb line went from 10% to 90% as a
5623                  result of implementing Jacobson's variance algorithm in
5624                  TCP.
5625
5626             The following values SHOULD be used to initialize the
5627             estimation parameters for a new connection:
5628
5629             (a)  RTT = 0 seconds.
5630
5631             (b)  RTO = 3 seconds.  (The smoothed variance is to be
5632                  initialized to the value that will result in this RTO).
5633
5634             The recommended upper and lower bounds on the RTO are known
5635             to be inadequate on large internets.  The lower bound SHOULD
5636             be measured in fractions of a second (to accommodate high
5637             speed LANs) and the upper bound should be 2*MSL, i.e., 240
5638             seconds.
5639
5640             DISCUSSION:
5641                  Experience has shown that these initialization values
5642                  are reasonable, and that in any case the Karn and
5643                  Jacobson algorithms make TCP behavior reasonably
5644                  insensitive to the initial parameter choices.
5645
5646          4.2.3.2  When to Send an ACK Segment
5647
5648             A host that is receiving a stream of TCP data segments can
5649             increase efficiency in both the Internet and the hosts by
5650             sending fewer than one ACK (acknowledgment) segment per data
5651             segment received; this is known as a "delayed ACK" [TCP:5].
5652
5653             A TCP SHOULD implement a delayed ACK, but an ACK should not
5654             be excessively delayed; in particular, the delay MUST be
5655             less than 0.5 seconds, and in a stream of full-sized
5656             segments there SHOULD be an ACK for at least every second
5657             segment.
5658
5659             DISCUSSION:
5660
5661
5662
5663 Internet Engineering Task Force                                [Page 96]
5664 \f
5665
5666
5667
5668 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5669
5670
5671                  A delayed ACK gives the application an opportunity to
5672                  update the window and perhaps to send an immediate
5673                  response.  In particular, in the case of character-mode
5674                  remote login, a delayed ACK can reduce the number of
5675                  segments sent by the server by a factor of 3 (ACK,
5676                  window update, and echo character all combined in one
5677                  segment).
5678
5679                  In addition, on some large multi-user hosts, a delayed
5680                  ACK can substantially reduce protocol processing
5681                  overhead by reducing the total number of packets to be
5682                  processed [TCP:5].  However, excessive delays on ACK's
5683                  can disturb the round-trip timing and packet "clocking"
5684                  algorithms [TCP:7].
5685
5686          4.2.3.3  When to Send a Window Update
5687
5688             A TCP MUST include a SWS avoidance algorithm in the receiver
5689             [TCP:5].
5690
5691             IMPLEMENTATION:
5692                  The receiver's SWS avoidance algorithm determines when
5693                  the right window edge may be advanced; this is
5694                  customarily known as "updating the window".  This
5695                  algorithm combines with the delayed ACK algorithm (see
5696                  Section 4.2.3.2) to determine when an ACK segment
5697                  containing the current window will really be sent to
5698                  the receiver.  We use the notation of RFC-793; see
5699                  Figures 4 and 5 in that document.
5700
5701                  The solution to receiver SWS is to avoid advancing the
5702                  right window edge RCV.NXT+RCV.WND in small increments,
5703                  even if data is received from the network in small
5704                  segments.
5705
5706                  Suppose the total receive buffer space is RCV.BUFF.  At
5707                  any given moment, RCV.USER octets of this total may be
5708                  tied up with data that has been received and
5709                  acknowledged but which the user process has not yet
5710                  consumed.  When the connection is quiescent, RCV.WND =
5711                  RCV.BUFF and RCV.USER = 0.
5712
5713                  Keeping the right window edge fixed as data arrives and
5714                  is acknowledged requires that the receiver offer less
5715                  than its full buffer space, i.e., the receiver must
5716                  specify a RCV.WND that keeps RCV.NXT+RCV.WND constant
5717                  as RCV.NXT increases.  Thus, the total buffer space
5718                  RCV.BUFF is generally divided into three parts:
5719
5720
5721
5722 Internet Engineering Task Force                                [Page 97]
5723 \f
5724
5725
5726
5727 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5728
5729
5730
5731                  |<------- RCV.BUFF ---------------->|
5732                       1             2            3
5733              ----|---------|------------------|------|----
5734                         RCV.NXT               ^
5735                                            (Fixed)
5736
5737              1 - RCV.USER =  data received but not yet consumed;
5738              2 - RCV.WND =   space advertised to sender;
5739              3 - Reduction = space available but not yet
5740                              advertised.
5741
5742
5743                  The suggested SWS avoidance algorithm for the receiver
5744                  is to keep RCV.NXT+RCV.WND fixed until the reduction
5745                  satisfies:
5746
5747                       RCV.BUFF - RCV.USER - RCV.WND  >=
5748
5749                              min( Fr * RCV.BUFF, Eff.snd.MSS )
5750
5751                  where Fr is a fraction whose recommended value is 1/2,
5752                  and Eff.snd.MSS is the effective send MSS for the
5753                  connection (see Section 4.2.2.6).  When the inequality
5754                  is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
5755
5756                  Note that the general effect of this algorithm is to
5757                  advance RCV.WND in increments of Eff.snd.MSS (for
5758                  realistic receive buffers:  Eff.snd.MSS < RCV.BUFF/2).
5759                  Note also that the receiver must use its own
5760                  Eff.snd.MSS, assuming it is the same as the sender's.
5761
5762          4.2.3.4  When to Send Data
5763
5764             A TCP MUST include a SWS avoidance algorithm in the sender.
5765
5766             A TCP SHOULD implement the Nagle Algorithm [TCP:9] to
5767             coalesce short segments.  However, there MUST be a way for
5768             an application to disable the Nagle algorithm on an
5769             individual connection.  In all cases, sending data is also
5770             subject to the limitation imposed by the Slow Start
5771             algorithm (Section 4.2.2.15).
5772
5773             DISCUSSION:
5774                  The Nagle algorithm is generally as follows:
5775
5776                       If there is unacknowledged data (i.e., SND.NXT >
5777                       SND.UNA), then the sending TCP buffers all user
5778
5779
5780
5781 Internet Engineering Task Force                                [Page 98]
5782 \f
5783
5784
5785
5786 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5787
5788
5789                       data (regardless of the PSH bit), until the
5790                       outstanding data has been acknowledged or until
5791                       the TCP can send a full-sized segment (Eff.snd.MSS
5792                       bytes; see Section 4.2.2.6).
5793
5794                  Some applications (e.g., real-time display window
5795                  updates) require that the Nagle algorithm be turned
5796                  off, so small data segments can be streamed out at the
5797                  maximum rate.
5798
5799             IMPLEMENTATION:
5800                  The sender's SWS avoidance algorithm is more difficult
5801                  than the receivers's, because the sender does not know
5802                  (directly) the receiver's total buffer space RCV.BUFF.
5803                  An approach which has been found to work well is for
5804                  the sender to calculate Max(SND.WND), the maximum send
5805                  window it has seen so far on the connection, and to use
5806                  this value as an estimate of RCV.BUFF.  Unfortunately,
5807                  this can only be an estimate; the receiver may at any
5808                  time reduce the size of RCV.BUFF.  To avoid a resulting
5809                  deadlock, it is necessary to have a timeout to force
5810                  transmission of data, overriding the SWS avoidance
5811                  algorithm.  In practice, this timeout should seldom
5812                  occur.
5813
5814                  The "useable window" [TCP:5] is:
5815
5816                       U = SND.UNA + SND.WND - SND.NXT
5817
5818                  i.e., the offered window less the amount of data sent
5819                  but not acknowledged.  If D is the amount of data
5820                  queued in the sending TCP but not yet sent, then the
5821                  following set of rules is recommended.
5822
5823                  Send data:
5824
5825                  (1)  if a maximum-sized segment can be sent, i.e, if:
5826
5827                            min(D,U) >= Eff.snd.MSS;
5828
5829
5830                  (2)  or if the data is pushed and all queued data can
5831                       be sent now, i.e., if:
5832
5833                           [SND.NXT = SND.UNA and] PUSHED and D <= U
5834
5835                       (the bracketed condition is imposed by the Nagle
5836                       algorithm);
5837
5838
5839
5840 Internet Engineering Task Force                                [Page 99]
5841 \f
5842
5843
5844
5845 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5846
5847
5848                  (3)  or if at least a fraction Fs of the maximum window
5849                       can be sent, i.e., if:
5850
5851                           [SND.NXT = SND.UNA and]
5852
5853                                   min(D.U) >= Fs * Max(SND.WND);
5854
5855
5856                  (4)  or if data is PUSHed and the override timeout
5857                       occurs.
5858
5859                  Here Fs is a fraction whose recommended value is 1/2.
5860                  The override timeout should be in the range 0.1 - 1.0
5861                  seconds.  It may be convenient to combine this timer
5862                  with the timer used to probe zero windows (Section
5863                  4.2.2.17).
5864
5865                  Finally, note that the SWS avoidance algorithm just
5866                  specified is to be used instead of the sender-side
5867                  algorithm contained in [TCP:5].
5868
5869          4.2.3.5  TCP Connection Failures
5870
5871             Excessive retransmission of the same segment by TCP
5872             indicates some failure of the remote host or the Internet
5873             path.  This failure may be of short or long duration.  The
5874             following procedure MUST be used to handle excessive
5875             retransmissions of data segments [IP:11]:
5876
5877             (a)  There are two thresholds R1 and R2 measuring the amount
5878                  of retransmission that has occurred for the same
5879                  segment.  R1 and R2 might be measured in time units or
5880                  as a count of retransmissions.
5881
5882             (b)  When the number of transmissions of the same segment
5883                  reaches or exceeds threshold R1, pass negative advice
5884                  (see Section 3.3.1.4) to the IP layer, to trigger
5885                  dead-gateway diagnosis.
5886
5887             (c)  When the number of transmissions of the same segment
5888                  reaches a threshold R2 greater than R1, close the
5889                  connection.
5890
5891             (d)  An application MUST be able to set the value for R2 for
5892                  a particular connection.  For example, an interactive
5893                  application might set R2 to "infinity," giving the user
5894                  control over when to disconnect.
5895
5896
5897
5898
5899 Internet Engineering Task Force                               [Page 100]
5900 \f
5901
5902
5903
5904 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5905
5906
5907             (d)  TCP SHOULD inform the application of the delivery
5908                  problem (unless such information has been disabled by
5909                  the application; see Section 4.2.4.1), when R1 is
5910                  reached and before R2.  This will allow a remote login
5911                  (User Telnet) application program to inform the user,
5912                  for example.
5913
5914             The value of R1 SHOULD correspond to at least 3
5915             retransmissions, at the current RTO.  The value of R2 SHOULD
5916             correspond to at least 100 seconds.
5917
5918             An attempt to open a TCP connection could fail with
5919             excessive retransmissions of the SYN segment or by receipt
5920             of a RST segment or an ICMP Port Unreachable.  SYN
5921             retransmissions MUST be handled in the general way just
5922             described for data retransmissions, including notification
5923             of the application layer.
5924
5925             However, the values of R1 and R2 may be different for SYN
5926             and data segments.  In particular, R2 for a SYN segment MUST
5927             be set large enough to provide retransmission of the segment
5928             for at least 3 minutes.  The application can close the
5929             connection (i.e., give up on the open attempt) sooner, of
5930             course.
5931
5932             DISCUSSION:
5933                  Some Internet paths have significant setup times, and
5934                  the number of such paths is likely to increase in the
5935                  future.
5936
5937          4.2.3.6  TCP Keep-Alives
5938
5939             Implementors MAY include "keep-alives" in their TCP
5940             implementations, although this practice is not universally
5941             accepted.  If keep-alives are included, the application MUST
5942             be able to turn them on or off for each TCP connection, and
5943             they MUST default to off.
5944
5945             Keep-alive packets MUST only be sent when no data or
5946             acknowledgement packets have been received for the
5947             connection within an interval.  This interval MUST be
5948             configurable and MUST default to no less than two hours.
5949
5950             It is extremely important to remember that ACK segments that
5951             contain no data are not reliably transmitted by TCP.
5952             Consequently, if a keep-alive mechanism is implemented it
5953             MUST NOT interpret failure to respond to any specific probe
5954             as a dead connection.
5955
5956
5957
5958 Internet Engineering Task Force                               [Page 101]
5959 \f
5960
5961
5962
5963 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
5964
5965
5966             An implementation SHOULD send a keep-alive segment with no
5967             data; however, it MAY be configurable to send a keep-alive
5968             segment containing one garbage octet, for compatibility with
5969             erroneous TCP implementations.
5970
5971             DISCUSSION:
5972                  A "keep-alive" mechanism periodically probes the other
5973                  end of a connection when the connection is otherwise
5974                  idle, even when there is no data to be sent.  The TCP
5975                  specification does not include a keep-alive mechanism
5976                  because it could:  (1) cause perfectly good connections
5977                  to break during transient Internet failures; (2)
5978                  consume unnecessary bandwidth ("if no one is using the
5979                  connection, who cares if it is still good?"); and (3)
5980                  cost money for an Internet path that charges for
5981                  packets.
5982
5983                  Some TCP implementations, however, have included a
5984                  keep-alive mechanism.  To confirm that an idle
5985                  connection is still active, these implementations send
5986                  a probe segment designed to elicit a response from the
5987                  peer TCP.  Such a segment generally contains SEG.SEQ =
5988                  SND.NXT-1 and may or may not contain one garbage octet
5989                  of data.  Note that on a quiet connection SND.NXT =
5990                  RCV.NXT, so that this SEG.SEQ will be outside the
5991                  window.  Therefore, the probe causes the receiver to
5992                  return an acknowledgment segment, confirming that the
5993                  connection is still live.  If the peer has dropped the
5994                  connection due to a network partition or a crash, it
5995                  will respond with a RST instead of an acknowledgment
5996                  segment.
5997
5998                  Unfortunately, some misbehaved TCP implementations fail
5999                  to respond to a segment with SEG.SEQ = SND.NXT-1 unless
6000                  the segment contains data.  Alternatively, an
6001                  implementation could determine whether a peer responded
6002                  correctly to keep-alive packets with no garbage data
6003                  octet.
6004
6005                  A TCP keep-alive mechanism should only be invoked in
6006                  server applications that might otherwise hang
6007                  indefinitely and consume resources unnecessarily if a
6008                  client crashes or aborts a connection during a network
6009                  failure.
6010
6011
6012
6013
6014
6015
6016
6017 Internet Engineering Task Force                               [Page 102]
6018 \f
6019
6020
6021
6022 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6023
6024
6025          4.2.3.7  TCP Multihoming
6026
6027             If an application on a multihomed host does not specify the
6028             local IP address when actively opening a TCP connection,
6029             then the TCP MUST ask the IP layer to select a local IP
6030             address before sending the (first) SYN.  See the function
6031             GET_SRCADDR() in Section 3.4.
6032
6033             At all other times, a previous segment has either been sent
6034             or received on this connection, and TCP MUST use the same
6035             local address is used that was used in those previous
6036             segments.
6037
6038          4.2.3.8  IP Options
6039
6040             When received options are passed up to TCP from the IP
6041             layer, TCP MUST ignore options that it does not understand.
6042
6043             A TCP MAY support the Time Stamp and Record Route options.
6044
6045             An application MUST be able to specify a source route when
6046             it actively opens a TCP connection, and this MUST take
6047             precedence over a source route received in a datagram.
6048
6049             When a TCP connection is OPENed passively and a packet
6050             arrives with a completed IP Source Route option (containing
6051             a return route), TCP MUST save the return route and use it
6052             for all segments sent on this connection.  If a different
6053             source route arrives in a later segment, the later
6054             definition SHOULD override the earlier one.
6055
6056          4.2.3.9  ICMP Messages
6057
6058             TCP MUST act on an ICMP error message passed up from the IP
6059             layer, directing it to the connection that created the
6060             error.  The necessary demultiplexing information can be
6061             found in the IP header contained within the ICMP message.
6062
6063             o    Source Quench
6064
6065                  TCP MUST react to a Source Quench by slowing
6066                  transmission on the connection.  The RECOMMENDED
6067                  procedure is for a Source Quench to trigger a "slow
6068                  start," as if a retransmission timeout had occurred.
6069
6070             o    Destination Unreachable -- codes 0, 1, 5
6071
6072                  Since these Unreachable messages indicate soft error
6073
6074
6075
6076 Internet Engineering Task Force                               [Page 103]
6077 \f
6078
6079
6080
6081 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6082
6083
6084                  conditions, TCP MUST NOT abort the connection, and it
6085                  SHOULD make the information available to the
6086                  application.
6087
6088                  DISCUSSION:
6089                       TCP could report the soft error condition directly
6090                       to the application layer with an upcall to the
6091                       ERROR_REPORT routine, or it could merely note the
6092                       message and report it to the application only when
6093                       and if the TCP connection times out.
6094
6095             o    Destination Unreachable -- codes 2-4
6096
6097                  These are hard error conditions, so TCP SHOULD abort
6098                  the connection.
6099
6100             o    Time Exceeded -- codes 0, 1
6101
6102                  This should be handled the same way as Destination
6103                  Unreachable codes 0, 1, 5 (see above).
6104
6105             o    Parameter Problem
6106
6107                  This should be handled the same way as Destination
6108                  Unreachable codes 0, 1, 5 (see above).
6109
6110
6111          4.2.3.10  Remote Address Validation
6112
6113             A TCP implementation MUST reject as an error a local OPEN
6114             call for an invalid remote IP address (e.g., a broadcast or
6115             multicast address).
6116
6117             An incoming SYN with an invalid source address must be
6118             ignored either by TCP or by the IP layer (see Section
6119             3.2.1.3).
6120
6121             A TCP implementation MUST silently discard an incoming SYN
6122             segment that is addressed to a broadcast or multicast
6123             address.
6124
6125          4.2.3.11  TCP Traffic Patterns
6126
6127             IMPLEMENTATION:
6128                  The TCP protocol specification [TCP:1] gives the
6129                  implementor much freedom in designing the algorithms
6130                  that control the message flow over the connection --
6131                  packetizing, managing the window, sending
6132
6133
6134
6135 Internet Engineering Task Force                               [Page 104]
6136 \f
6137
6138
6139
6140 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6141
6142
6143                  acknowledgments, etc.  These design decisions are
6144                  difficult because a TCP must adapt to a wide range of
6145                  traffic patterns.  Experience has shown that a TCP
6146                  implementor needs to verify the design on two extreme
6147                  traffic patterns:
6148
6149                  o    Single-character Segments
6150
6151                       Even if the sender is using the Nagle Algorithm,
6152                       when a TCP connection carries remote login traffic
6153                       across a low-delay LAN the receiver will generally
6154                       get a stream of single-character segments.  If
6155                       remote terminal echo mode is in effect, the
6156                       receiver's system will generally echo each
6157                       character as it is received.
6158
6159                  o    Bulk Transfer
6160
6161                       When TCP is used for bulk transfer, the data
6162                       stream should be made up (almost) entirely of
6163                       segments of the size of the effective MSS.
6164                       Although TCP uses a sequence number space with
6165                       byte (octet) granularity, in bulk-transfer mode
6166                       its operation should be as if TCP used a sequence
6167                       space that counted only segments.
6168
6169                  Experience has furthermore shown that a single TCP can
6170                  effectively and efficiently handle these two extremes.
6171
6172                  The most important tool for verifying a new TCP
6173                  implementation is a packet trace program.  There is a
6174                  large volume of experience showing the importance of
6175                  tracing a variety of traffic patterns with other TCP
6176                  implementations and studying the results carefully.
6177
6178
6179          4.2.3.12  Efficiency
6180
6181             IMPLEMENTATION:
6182                  Extensive experience has led to the following
6183                  suggestions for efficient implementation of TCP:
6184
6185                  (a)  Don't Copy Data
6186
6187                       In bulk data transfer, the primary CPU-intensive
6188                       tasks are copying data from one place to another
6189                       and checksumming the data.  It is vital to
6190                       minimize the number of copies of TCP data.  Since
6191
6192
6193
6194 Internet Engineering Task Force                               [Page 105]
6195 \f
6196
6197
6198
6199 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6200
6201
6202                       the ultimate speed limitation may be fetching data
6203                       across the memory bus, it may be useful to combine
6204                       the copy with checksumming, doing both with a
6205                       single memory fetch.
6206
6207                  (b)  Hand-Craft the Checksum Routine
6208
6209                       A good TCP checksumming routine is typically two
6210                       to five times faster than a simple and direct
6211                       implementation of the definition.  Great care and
6212                       clever coding are often required and advisable to
6213                       make the checksumming code "blazing fast".  See
6214                       [TCP:10].
6215
6216                  (c)  Code for the Common Case
6217
6218                       TCP protocol processing can be complicated, but
6219                       for most segments there are only a few simple
6220                       decisions to be made.  Per-segment processing will
6221                       be greatly speeded up by coding the main line to
6222                       minimize the number of decisions in the most
6223                       common case.
6224
6225
6226       4.2.4  TCP/APPLICATION LAYER INTERFACE
6227
6228          4.2.4.1  Asynchronous Reports
6229
6230             There MUST be a mechanism for reporting soft TCP error
6231             conditions to the application.  Generically, we assume this
6232             takes the form of an application-supplied ERROR_REPORT
6233             routine that may be upcalled [INTRO:7] asynchronously from
6234             the transport layer:
6235
6236                ERROR_REPORT(local connection name, reason, subreason)
6237
6238             The precise encoding of the reason and subreason parameters
6239             is not specified here.  However, the conditions that are
6240             reported asynchronously to the application MUST include:
6241
6242             *    ICMP error message arrived (see 4.2.3.9)
6243
6244             *    Excessive retransmissions (see 4.2.3.5)
6245
6246             *    Urgent pointer advance (see 4.2.2.4).
6247
6248             However, an application program that does not want to
6249             receive such ERROR_REPORT calls SHOULD be able to
6250
6251
6252
6253 Internet Engineering Task Force                               [Page 106]
6254 \f
6255
6256
6257
6258 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6259
6260
6261             effectively disable these calls.
6262
6263             DISCUSSION:
6264                  These error reports generally reflect soft errors that
6265                  can be ignored without harm by many applications.  It
6266                  has been suggested that these error report calls should
6267                  default to "disabled," but this is not required.
6268
6269          4.2.4.2  Type-of-Service
6270
6271             The application layer MUST be able to specify the Type-of-
6272             Service (TOS) for segments that are sent on a connection.
6273             It not required, but the application SHOULD be able to
6274             change the TOS during the connection lifetime.  TCP SHOULD
6275             pass the current TOS value without change to the IP layer,
6276             when it sends segments on the connection.
6277
6278             The TOS will be specified independently in each direction on
6279             the connection, so that the receiver application will
6280             specify the TOS used for ACK segments.
6281
6282             TCP MAY pass the most recently received TOS up to the
6283             application.
6284
6285             DISCUSSION
6286                  Some applications (e.g., SMTP) change the nature of
6287                  their communication during the lifetime of a
6288                  connection, and therefore would like to change the TOS
6289                  specification.
6290
6291                  Note also that the OPEN call specified in RFC-793
6292                  includes a parameter ("options") in which the caller
6293                  can specify IP options such as source route, record
6294                  route, or timestamp.
6295
6296          4.2.4.3  Flush Call
6297
6298             Some TCP implementations have included a FLUSH call, which
6299             will empty the TCP send queue of any data for which the user
6300             has issued SEND calls but which is still to the right of the
6301             current send window.  That is, it flushes as much queued
6302             send data as possible without losing sequence number
6303             synchronization.  This is useful for implementing the "abort
6304             output" function of Telnet.
6305
6306
6307
6308
6309
6310
6311
6312 Internet Engineering Task Force                               [Page 107]
6313 \f
6314
6315
6316
6317 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6318
6319
6320          4.2.4.4  Multihoming
6321
6322             The user interface outlined in sections 2.7 and 3.8 of RFC-
6323             793 needs to be extended for multihoming.  The OPEN call
6324             MUST have an optional parameter:
6325
6326                 OPEN( ... [local IP address,] ... )
6327
6328             to allow the specification of the local IP address.
6329
6330             DISCUSSION:
6331                  Some TCP-based applications need to specify the local
6332                  IP address to be used to open a particular connection;
6333                  FTP is an example.
6334
6335             IMPLEMENTATION:
6336                  A passive OPEN call with a specified "local IP address"
6337                  parameter will await an incoming connection request to
6338                  that address.  If the parameter is unspecified, a
6339                  passive OPEN will await an incoming connection request
6340                  to any local IP address, and then bind the local IP
6341                  address of the connection to the particular address
6342                  that is used.
6343
6344                  For an active OPEN call, a specified "local IP address"
6345                  parameter will be used for opening the connection.  If
6346                  the parameter is unspecified, the networking software
6347                  will choose an appropriate local IP address (see
6348                  Section 3.3.4.2) for the connection
6349
6350       4.2.5  TCP REQUIREMENT SUMMARY
6351
6352                                                  |        | | | |S| |
6353                                                  |        | | | |H| |F
6354                                                  |        | | | |O|M|o
6355                                                  |        | |S| |U|U|o
6356                                                  |        | |H| |L|S|t
6357                                                  |        |M|O| |D|T|n
6358                                                  |        |U|U|M| | |o
6359                                                  |        |S|L|A|N|N|t
6360                                                  |        |T|D|Y|O|O|t
6361 FEATURE                                          |SECTION | | | |T|T|e
6362 -------------------------------------------------|--------|-|-|-|-|-|--
6363                                                  |        | | | | | |
6364 Push flag                                        |        | | | | | |
6365   Aggregate or queue un-pushed data              |4.2.2.2 | | |x| | |
6366   Sender collapse successive PSH flags           |4.2.2.2 | |x| | | |
6367   SEND call can specify PUSH                     |4.2.2.2 | | |x| | |
6368
6369
6370
6371 Internet Engineering Task Force                               [Page 108]
6372 \f
6373
6374
6375
6376 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6377
6378
6379     If cannot: sender buffer indefinitely        |4.2.2.2 | | | | |x|
6380     If cannot: PSH last segment                  |4.2.2.2 |x| | | | |
6381   Notify receiving ALP of PSH                    |4.2.2.2 | | |x| | |1
6382   Send max size segment when possible            |4.2.2.2 | |x| | | |
6383                                                  |        | | | | | |
6384 Window                                           |        | | | | | |
6385   Treat as unsigned number                       |4.2.2.3 |x| | | | |
6386   Handle as 32-bit number                        |4.2.2.3 | |x| | | |
6387   Shrink window from right                       |4.2.2.16| | | |x| |
6388   Robust against shrinking window                |4.2.2.16|x| | | | |
6389   Receiver's window closed indefinitely          |4.2.2.17| | |x| | |
6390   Sender probe zero window                       |4.2.2.17|x| | | | |
6391     First probe after RTO                        |4.2.2.17| |x| | | |
6392     Exponential backoff                          |4.2.2.17| |x| | | |
6393   Allow window stay zero indefinitely            |4.2.2.17|x| | | | |
6394   Sender timeout OK conn with zero wind          |4.2.2.17| | | | |x|
6395                                                  |        | | | | | |
6396 Urgent Data                                      |        | | | | | |
6397   Pointer points to last octet                   |4.2.2.4 |x| | | | |
6398   Arbitrary length urgent data sequence          |4.2.2.4 |x| | | | |
6399   Inform ALP asynchronously of urgent data       |4.2.2.4 |x| | | | |1
6400   ALP can learn if/how much urgent data Q'd      |4.2.2.4 |x| | | | |1
6401                                                  |        | | | | | |
6402 TCP Options                                      |        | | | | | |
6403   Receive TCP option in any segment              |4.2.2.5 |x| | | | |
6404   Ignore unsupported options                     |4.2.2.5 |x| | | | |
6405   Cope with illegal option length                |4.2.2.5 |x| | | | |
6406   Implement sending & receiving MSS option       |4.2.2.6 |x| | | | |
6407   Send MSS option unless 536                     |4.2.2.6 | |x| | | |
6408   Send MSS option always                         |4.2.2.6 | | |x| | |
6409   Send-MSS default is 536                        |4.2.2.6 |x| | | | |
6410   Calculate effective send seg size              |4.2.2.6 |x| | | | |
6411                                                  |        | | | | | |
6412 TCP Checksums                                    |        | | | | | |
6413   Sender compute checksum                        |4.2.2.7 |x| | | | |
6414   Receiver check checksum                        |4.2.2.7 |x| | | | |
6415                                                  |        | | | | | |
6416 Use clock-driven ISN selection                   |4.2.2.9 |x| | | | |
6417                                                  |        | | | | | |
6418 Opening Connections                              |        | | | | | |
6419   Support simultaneous open attempts             |4.2.2.10|x| | | | |
6420   SYN-RCVD remembers last state                  |4.2.2.11|x| | | | |
6421   Passive Open call interfere with others        |4.2.2.18| | | | |x|
6422   Function: simultan. LISTENs for same port      |4.2.2.18|x| | | | |
6423   Ask IP for src address for SYN if necc.        |4.2.3.7 |x| | | | |
6424     Otherwise, use local addr of conn.           |4.2.3.7 |x| | | | |
6425   OPEN to broadcast/multicast IP Address         |4.2.3.14| | | | |x|
6426   Silently discard seg to bcast/mcast addr       |4.2.3.14|x| | | | |
6427
6428
6429
6430 Internet Engineering Task Force                               [Page 109]
6431 \f
6432
6433
6434
6435 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6436
6437
6438                                                  |        | | | | | |
6439 Closing Connections                              |        | | | | | |
6440   RST can contain data                           |4.2.2.12| |x| | | |
6441   Inform application of aborted conn             |4.2.2.13|x| | | | |
6442   Half-duplex close connections                  |4.2.2.13| | |x| | |
6443     Send RST to indicate data lost               |4.2.2.13| |x| | | |
6444   In TIME-WAIT state for 2xMSL seconds           |4.2.2.13|x| | | | |
6445     Accept SYN from TIME-WAIT state              |4.2.2.13| | |x| | |
6446                                                  |        | | | | | |
6447 Retransmissions                                  |        | | | | | |
6448   Jacobson Slow Start algorithm                  |4.2.2.15|x| | | | |
6449   Jacobson Congestion-Avoidance algorithm        |4.2.2.15|x| | | | |
6450   Retransmit with same IP ident                  |4.2.2.15| | |x| | |
6451   Karn's algorithm                               |4.2.3.1 |x| | | | |
6452   Jacobson's RTO estimation alg.                 |4.2.3.1 |x| | | | |
6453   Exponential backoff                            |4.2.3.1 |x| | | | |
6454   SYN RTO calc same as data                      |4.2.3.1 | |x| | | |
6455   Recommended initial values and bounds          |4.2.3.1 | |x| | | |
6456                                                  |        | | | | | |
6457 Generating ACK's:                                |        | | | | | |
6458   Queue out-of-order segments                    |4.2.2.20| |x| | | |
6459   Process all Q'd before send ACK                |4.2.2.20|x| | | | |
6460   Send ACK for out-of-order segment              |4.2.2.21| | |x| | |
6461   Delayed ACK's                                  |4.2.3.2 | |x| | | |
6462     Delay < 0.5 seconds                          |4.2.3.2 |x| | | | |
6463     Every 2nd full-sized segment ACK'd           |4.2.3.2 |x| | | | |
6464   Receiver SWS-Avoidance Algorithm               |4.2.3.3 |x| | | | |
6465                                                  |        | | | | | |
6466 Sending data                                     |        | | | | | |
6467   Configurable TTL                               |4.2.2.19|x| | | | |
6468   Sender SWS-Avoidance Algorithm                 |4.2.3.4 |x| | | | |
6469   Nagle algorithm                                |4.2.3.4 | |x| | | |
6470     Application can disable Nagle algorithm      |4.2.3.4 |x| | | | |
6471                                                  |        | | | | | |
6472 Connection Failures:                             |        | | | | | |
6473   Negative advice to IP on R1 retxs              |4.2.3.5 |x| | | | |
6474   Close connection on R2 retxs                   |4.2.3.5 |x| | | | |
6475   ALP can set R2                                 |4.2.3.5 |x| | | | |1
6476   Inform ALP of  R1<=retxs<R2                    |4.2.3.5 | |x| | | |1
6477   Recommended values for R1, R2                  |4.2.3.5 | |x| | | |
6478   Same mechanism for SYNs                        |4.2.3.5 |x| | | | |
6479     R2 at least 3 minutes for SYN                |4.2.3.5 |x| | | | |
6480                                                  |        | | | | | |
6481 Send Keep-alive Packets:                         |4.2.3.6 | | |x| | |
6482   - Application can request                      |4.2.3.6 |x| | | | |
6483   - Default is "off"                             |4.2.3.6 |x| | | | |
6484   - Only send if idle for interval               |4.2.3.6 |x| | | | |
6485   - Interval configurable                        |4.2.3.6 |x| | | | |
6486
6487
6488
6489 Internet Engineering Task Force                               [Page 110]
6490 \f
6491
6492
6493
6494 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6495
6496
6497   - Default at least 2 hrs.                      |4.2.3.6 |x| | | | |
6498   - Tolerant of lost ACK's                       |4.2.3.6 |x| | | | |
6499                                                  |        | | | | | |
6500 IP Options                                       |        | | | | | |
6501   Ignore options TCP doesn't understand          |4.2.3.8 |x| | | | |
6502   Time Stamp support                             |4.2.3.8 | | |x| | |
6503   Record Route support                           |4.2.3.8 | | |x| | |
6504   Source Route:                                  |        | | | | | |
6505     ALP can specify                              |4.2.3.8 |x| | | | |1
6506       Overrides src rt in datagram               |4.2.3.8 |x| | | | |
6507     Build return route from src rt               |4.2.3.8 |x| | | | |
6508     Later src route overrides                    |4.2.3.8 | |x| | | |
6509                                                  |        | | | | | |
6510 Receiving ICMP Messages from IP                  |4.2.3.9 |x| | | | |
6511   Dest. Unreach (0,1,5) => inform ALP            |4.2.3.9 | |x| | | |
6512   Dest. Unreach (0,1,5) => abort conn            |4.2.3.9 | | | | |x|
6513   Dest. Unreach (2-4) => abort conn              |4.2.3.9 | |x| | | |
6514   Source Quench => slow start                    |4.2.3.9 | |x| | | |
6515   Time Exceeded => tell ALP, don't abort         |4.2.3.9 | |x| | | |
6516   Param Problem => tell ALP, don't abort         |4.2.3.9 | |x| | | |
6517                                                  |        | | | | | |
6518 Address Validation                               |        | | | | | |
6519   Reject OPEN call to invalid IP address         |4.2.3.10|x| | | | |
6520   Reject SYN from invalid IP address             |4.2.3.10|x| | | | |
6521   Silently discard SYN to bcast/mcast addr       |4.2.3.10|x| | | | |
6522                                                  |        | | | | | |
6523 TCP/ALP Interface Services                       |        | | | | | |
6524   Error Report mechanism                         |4.2.4.1 |x| | | | |
6525   ALP can disable Error Report Routine           |4.2.4.1 | |x| | | |
6526   ALP can specify TOS for sending                |4.2.4.2 |x| | | | |
6527     Passed unchanged to IP                       |4.2.4.2 | |x| | | |
6528   ALP can change TOS during connection           |4.2.4.2 | |x| | | |
6529   Pass received TOS up to ALP                    |4.2.4.2 | | |x| | |
6530   FLUSH call                                     |4.2.4.3 | | |x| | |
6531   Optional local IP addr parm. in OPEN           |4.2.4.4 |x| | | | |
6532 -------------------------------------------------|--------|-|-|-|-|-|--
6533 -------------------------------------------------|--------|-|-|-|-|-|--
6534
6535 FOOTNOTES:
6536
6537 (1)  "ALP" means Application-Layer program.
6538
6539
6540
6541
6542
6543
6544
6545
6546
6547
6548 Internet Engineering Task Force                               [Page 111]
6549 \f
6550
6551
6552
6553 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6554
6555
6556 5.  REFERENCES
6557
6558 INTRODUCTORY REFERENCES
6559
6560
6561 [INTRO:1] "Requirements for Internet Hosts -- Application and Support,"
6562      IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123,
6563      October 1989.
6564
6565 [INTRO:2]  "Requirements for Internet Gateways,"  R. Braden and J.
6566      Postel, RFC-1009, June 1987.
6567
6568 [INTRO:3]  "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006,
6569      (three volumes), SRI International, December 1985.
6570
6571 [INTRO:4]  "Official Internet Protocols," J. Reynolds and J. Postel,
6572      RFC-1011, May 1987.
6573
6574      This document is republished periodically with new RFC numbers; the
6575      latest version must be used.
6576
6577 [INTRO:5]  "Protocol Document Order Information," O. Jacobsen and J.
6578      Postel, RFC-980, March 1986.
6579
6580 [INTRO:6]  "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May
6581      1987.
6582
6583      This document is republished periodically with new RFC numbers; the
6584      latest version must be used.
6585
6586 [INTRO:7] "Modularity and Efficiency in Protocol Implementations," D.
6587      Clark, RFC-817, July 1982.
6588
6589 [INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM
6590      SOSP, Orcas Island, Washington, December 1985.
6591
6592
6593 Secondary References:
6594
6595
6596 [INTRO:9]  "A Protocol for Packet Network Intercommunication," V. Cerf
6597      and R. Kahn, IEEE Transactions on Communication, May 1974.
6598
6599 [INTRO:10]  "The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.
6600      Cohen, Computer Networks, Vol. 5, No. 4, July 1981.
6601
6602 [INTRO:11]  "The DARPA Internet Protocol Suite," B. Leiner, J. Postel,
6603      R. Cole and D. Mills, Proceedings INFOCOM 85, IEEE, Washington DC,
6604
6605
6606
6607 Internet Engineering Task Force                               [Page 112]
6608 \f
6609
6610
6611
6612 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6613
6614
6615      March 1985.  Also in: IEEE Communications Magazine, March 1985.
6616      Also available as ISI-RS-85-153.
6617
6618 [INTRO:12] "Final Text of DIS8473, Protocol for Providing the
6619      Connectionless Mode Network Service," ANSI, published as RFC-994,
6620      March 1986.
6621
6622 [INTRO:13] "End System to Intermediate System Routing Exchange
6623      Protocol," ANSI X3S3.3, published as RFC-995, April 1986.
6624
6625
6626 LINK LAYER REFERENCES
6627
6628
6629 [LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893,
6630      April 1984.
6631
6632 [LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826,
6633      November 1982.
6634
6635 [LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet
6636      Networks," C. Hornig, RFC-894, April 1984.
6637
6638 [LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802
6639      "Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.
6640
6641      This RFC contains a great deal of information of importance to
6642      Internet implementers planning to use IEEE 802 networks.
6643
6644
6645 IP LAYER REFERENCES
6646
6647
6648 [IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.
6649
6650 [IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792,
6651      September 1981.
6652
6653 [IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel,
6654      RFC-950, August 1985.
6655
6656 [IP:4]  "Host Extensions for IP Multicasting," S. Deering, RFC-1112,
6657      August 1989.
6658
6659 [IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department
6660      of Defense, August 1983.
6661
6662      This specification, as amended by RFC-963, is intended to describe
6663
6664
6665
6666 Internet Engineering Task Force                               [Page 113]
6667 \f
6668
6669
6670
6671 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6672
6673
6674      the Internet Protocol but has some serious omissions (e.g., the
6675      mandatory subnet extension [IP:3] and the optional multicasting
6676      extension [IP:4]).  It is also out of date.  If there is a
6677      conflict, RFC-791, RFC-792, and RFC-950 must be taken as
6678      authoritative, while the present document is authoritative over
6679      all.
6680
6681 [IP:6] "Some Problems with the Specification of the Military Standard
6682      Internet Protocol," D. Sidhu, RFC-963, November 1985.
6683
6684 [IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel,
6685      RFC-879, November 1983.
6686
6687      Discusses and clarifies the relationship between the TCP Maximum
6688      Segment Size option and the IP datagram size.
6689
6690 [IP:8] "Internet Protocol Security Options,"  B. Schofield, RFC-1108,
6691      October 1989.
6692
6693 [IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM
6694      SIGCOMM-87, August 1987.  Published as ACM Comp Comm Review, Vol.
6695      17, no. 5.
6696
6697      This useful paper discusses the problems created by Internet
6698      fragmentation and presents alternative solutions.
6699
6700 [IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July
6701      1982.
6702
6703      This and the following paper should be read by every implementor.
6704
6705 [IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.
6706
6707 SECONDARY IP REFERENCES:
6708
6709
6710 [IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.
6711      Mogul, RFC-922, October 1984.
6712
6713 [IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July
6714      1982.
6715
6716 [IP:14] "Something a Host Could Do with Source Quench: The Source Quench
6717      Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July
6718      1987.
6719
6720      This RFC first described directed broadcast addresses.  However,
6721      the bulk of the RFC is concerned with gateways, not hosts.
6722
6723
6724
6725 Internet Engineering Task Force                               [Page 114]
6726 \f
6727
6728
6729
6730 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6731
6732
6733 UDP REFERENCES:
6734
6735
6736 [UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.
6737
6738
6739 TCP REFERENCES:
6740
6741
6742 [TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September
6743      1981.
6744
6745
6746 [TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of
6747      Defense, August 1984.
6748
6749      This specification as amended by RFC-964 is intended to describe
6750      the same protocol as RFC-793 [TCP:1].  If there is a conflict,
6751      RFC-793 takes precedence, and the present document is authoritative
6752      over both.
6753
6754
6755 [TCP:3] "Some Problems with the Specification of the Military Standard
6756      Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964,
6757      November 1985.
6758
6759
6760 [TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel,
6761      RFC-879, November 1983.
6762
6763
6764 [TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813,
6765      July 1982.
6766
6767
6768 [TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM
6769      SIGCOMM-87, August 1987.
6770
6771
6772 [TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88,
6773      August 1988.
6774
6775
6776 SECONDARY TCP REFERENCES:
6777
6778
6779 [TCP:8] "Modularity and Efficiency in Protocol Implementation," D.
6780      Clark, RFC-817, July 1982.
6781
6782
6783
6784 Internet Engineering Task Force                               [Page 115]
6785 \f
6786
6787
6788
6789 RFC1122                  TRANSPORT LAYER -- TCP             October 1989
6790
6791
6792 [TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.
6793
6794
6795 [TCP:10] "Computing the Internet Checksum," R. Braden, D. Borman, and C.
6796      Partridge, RFC-1071, September 1988.
6797
6798
6799 [TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden,
6800      RFC-1072, October 1988.
6801
6802
6803 Security Considerations
6804
6805    There are many security issues in the communication layers of host
6806    software, but a full discussion is beyond the scope of this RFC.
6807
6808    The Internet architecture generally provides little protection
6809    against spoofing of IP source addresses, so any security mechanism
6810    that is based upon verifying the IP source address of a datagram
6811    should be treated with suspicion.  However, in restricted
6812    environments some source-address checking may be possible.  For
6813    example, there might be a secure LAN whose gateway to the rest of the
6814    Internet discarded any incoming datagram with a source address that
6815    spoofed the LAN address.  In this case, a host on the LAN could use
6816    the source address to test for local vs. remote source.  This problem
6817    is complicated by source routing, and some have suggested that
6818    source-routed datagram forwarding by hosts (see Section 3.3.5) should
6819    be outlawed for security reasons.
6820
6821    Security-related issues are mentioned in sections concerning the IP
6822    Security option (Section 3.2.1.8), the ICMP Parameter Problem message
6823    (Section 3.2.2.5), IP options in UDP datagrams (Section 4.1.3.2), and
6824    reserved TCP ports (Section 4.2.2.1).
6825
6826 Author's Address
6827
6828    Robert Braden
6829    USC/Information Sciences Institute
6830    4676 Admiralty Way
6831    Marina del Rey, CA 90292-6695
6832
6833    Phone: (213) 822 1511
6834
6835    EMail: Braden@ISI.EDU
6836
6837
6838
6839
6840
6841
6842
6843 Internet Engineering Task Force                               [Page 116]
6844 \f