1 .\" $NetBSD: tcp.4,v 1.23 2007/06/20 15:29:17 christos Exp $
2 .\" $FreeBSD: tcp.4,v 1.11.2.16 2004/02/16 22:21:47 bms Exp $
4 .\" Copyright (c) 1983, 1991, 1993
5 .\" The Regents of the University of California. All rights reserved.
7 .\" Redistribution and use in source and binary forms, with or without
8 .\" modification, are permitted provided that the following conditions
10 .\" 1. Redistributions of source code must retain the above copyright
11 .\" notice, this list of conditions and the following disclaimer.
12 .\" 2. Redistributions in binary form must reproduce the above copyright
13 .\" notice, this list of conditions and the following disclaimer in the
14 .\" documentation and/or other materials provided with the distribution.
15 .\" 3. Neither the name of the University nor the names of its contributors
16 .\" may be used to endorse or promote products derived from this software
17 .\" without specific prior written permission.
19 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31 .\" @(#)tcp.4 8.1 (Berkeley) 6/5/93
38 .Nd Internet Transmission Control Protocol
43 .Fn socket AF_INET SOCK_STREAM 0
45 .Fn socket AF_INET6 SOCK_STREAM 0
49 provides reliable, flow-controlled, two-way transmission of data.
50 It is a byte-stream protocol used to support the
54 uses the standard Internet address format and, in addition, provides
55 a per-host collection of
57 Thus, each address is composed of an Internet address specifying
58 the host and network, with a specific
60 port on the host identifying the peer entity.
68 Active sockets initiate connections to passive
72 sockets are created active; to create a passive socket the
74 system call must be used
75 after binding the socket with the
78 Only passive sockets may use the
80 call to accept incoming connections.
81 Only active sockets may use the
83 call to initiate connections.
87 their location to match incoming connection requests from multiple networks.
88 This technique, termed
89 .Dq wildcard addressing ,
91 server to provide service to clients on multiple networks.
92 To create a socket which listens on all networks, the Internet
98 port may still be specified at this time; if the port is not
99 specified the system will assign one.
100 Once a connection has been established the socket's address is
101 fixed by the peer entity's location.
102 The address assigned the socket is the address associated with the
103 network interface through which packets are being transmitted and received.
104 Normally this address corresponds to the peer entity's network.
107 supports a number of socket options which can be set with
111 .Bl -tag -width TCP_KEEPINTVL
113 Under most circumstances,
115 sends data when it is presented;
116 when outstanding data has not yet been acknowledged, it gathers
117 small amounts of output to be sent in a single packet once
118 an acknowledgement is received.
119 For a small number of clients, such as window systems
120 that send a stream of mouse events which receive no replies,
121 this packetization may cause significant delays.
124 provides a boolean option,
127 .Aq Pa netinet/tcp.h ,
128 to defeat this algorithm.
130 By default, a sender- and receiver-TCP
131 will negotiate among themselves to determine the maximum segment size
132 to be used for each connection.
135 option allows the user to determine the result of this negotiation,
136 and to reduce it if desired.
138 This option enables the use of MD5 digests (also known as TCP-MD5)
139 on writes to the specified socket.
140 In the current release, only outgoing traffic is digested;
141 digests on incoming traffic are not verified.
142 The current default behavior for the system is to respond to a system
143 advertising this option with TCP-MD5; this may change.
145 One common use for this in a
147 router deployment is to enable
148 based routers to interwork with Cisco equipment at peering points.
149 Support for this feature conforms to RFC 2385.
150 Only IPv4 (AF_INET) sessions are supported.
152 In order for this option to function correctly, it is necessary for the
153 administrator to add a tcp-md5 key entry to the system's security
154 associations database (SADB) using the
157 This entry must have an SPI of 0x1000 and can therefore only be specified
158 on a per-host basis at this time.
160 If an SADB entry cannot be found for the destination, the outgoing traffic
161 will have an invalid digest option prepended, and the following error message
162 will be visible on the system console:
163 .Em "tcp_signature_compute: SADB lookup failed for %d.%d.%d.%d" .
165 .\" XXX: We always do it.
168 .\" option is enabled,
169 TCP probes a connection that
170 has been idle for some amount of time.
171 The default value for this idle period is 4 hours.
174 option can be used to affect this value for a given socket, and specifies
175 the number of seconds of idle time between keepalive probes.
178 value, with a value greater than 0.
179 .\" range of 1 to N (where N is
183 .\" .Dv net.inet.tcp.keepidle ).
186 .\" which is defined in the
187 .\" .In sys/protosw.h
192 option is enabled, TCP probes a connection that
193 has been idle for some amount of time.
194 If the remote system does not
195 respond to a keepalive probe, TCP retransmits the probe after some
197 The default value for this retransmit interval is 150 seconds.
200 option can be used to affect this value for
201 a given socket, and specifies the number of seconds to wait before
202 retransmitting a keepalive probe.
205 value, with a value greater than 0.
206 .\" range of 1 to N (where N is the
209 .\" .Dv net.inet.tcp.keepintvl ).
213 option is enabled, TCP probes a connection that
214 has been idle for some amount of time.
215 If the remote system does not
216 respond to a keepalive probe, TCP retransmits the probe a certain
217 number of times before a connection is considered to be broken.
218 The default value for this keepalive probe retransmit limit is 8.
221 option can be used to affect this value for a given socket,
222 and specifies the maximum number of keepalive probes to be sent.
225 value, with a value greater than 0.
226 .\" range of 0 to N (where N is the
229 .\" .Dv net.inet.tcp.keepcnt ).
231 If a TCP connection cannot be established within some amount of time,
232 TCP will time out the connect attempt.
233 The default value for this initial connection establishment timeout
237 option can be used to affect this initial timeout period for a given
238 socket, and specifies the number of seconds to wait before the connect
239 attempt is timed out.
240 For passive connections, the
242 option value is inherited from the listening socket.
245 value, with a value greater than 0.
246 .\" range of 0 to N (where N is the
249 .\" .Dv net.inet.tcp.keepinit ).
252 The option level for the
254 call is the protocol number for
257 .Xr getprotobyname 3 .
262 implementation, if the
264 option was set on a passive socket, the sockets returned by
266 erroneously did not have the
268 option set; the behavior was corrected to inherit
275 network level may be used with
281 Incoming connection requests that are source-routed are noted,
282 and the reverse source route is used in responding.
284 There are many adjustable parameters that control various aspects
287 TCP behavior; these parameters are documented in
292 RFC 1323 extensions for high performance
294 Send/receive buffer sizes
296 Default maximum segment size (MSS)
302 Hughes/Touch/Heidemann Congestion Window Monitoring algorithm
306 newReno algorithm for congestion control
308 Logging of connection refusals
310 RST packet rate limits
312 SACK (Selective Acknowledgment)
314 ECN (Explicit Congestion Notification)
316 Congestion window increase methods; the traditional packet counting or
317 RFC 3465 Appropriate Byte Counting
320 A socket operation may fail with one of the following errors returned:
321 .Bl -tag -width [EADDRNOTAVAIL]
323 when trying to establish a connection on a socket which
326 when the system runs out of memory for
327 an internal data structure;
329 when a connection was dropped
330 due to excessive retransmissions;
333 forces the connection to be closed;
334 .It Bq Er ECONNREFUSED
336 peer actively refuses connection establishment (usually because
337 no process is listening to the port);
340 is made to create a socket with a port which has already been
342 .It Bq Er EADDRNOTAVAIL
343 when an attempt is made to create a
344 socket with a network address for which no network interface
360 .%T "Transmission Control Protocol"
366 .%T "Requirements for Internet Hosts -- Communication Layers"
371 protocol stack appeared in