Documentation/networking/tcp.txt

   1 TCP protocol
   2 ============
   3
   4 <<<<<<< HEAD:Documentation/networking/tcp.txt
   5 Last updated: 21 June 2005
   6 =======
   7 Last updated: 9 February 2008
   8 >>>>>>> 264e3e889d86e552b4191d69bb60f4f3b383135a:Documentation/networking/tcp.txt
   9
  10 Contents
  11 ========
  12
  13 - Congestion control
  14 - How the new TCP output machine [nyi] works
  15
  16 Congestion control
  17 ==================
  18
  19 The following variables are used in the tcp_sock for congestion control:
  20 snd_cwnd                The size of the congestion window
  21 snd_ssthresh            Slow start threshold. We are in slow start if
  22                         snd_cwnd is less than this.
  23 snd_cwnd_cnt            A counter used to slow down the rate of increase
  24                         once we exceed slow start threshold.
  25 snd_cwnd_clamp          This is the maximum size that snd_cwnd can grow to.
  26 snd_cwnd_stamp          Timestamp for when congestion window last validated.
  27 snd_cwnd_used           Used as a highwater mark for how much of the
  28                         congestion window is in use. It is used to adjust
  29                         snd_cwnd down when the link is limited by the
  30                         application rather than the network.
  31
  32 As of 2.6.13, Linux supports pluggable congestion control algorithms.
  33 A congestion control mechanism can be registered through functions in
  34 tcp_cong.c. The functions used by the congestion control mechanism are
  35 registered via passing a tcp_congestion_ops struct to
  36 tcp_register_congestion_control. As a minimum name, ssthresh,
  37 cong_avoid, min_cwnd must be valid.
  38
  39 Private data for a congestion control mechanism is stored in tp->ca_priv.
  40 tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
  41 is important to check the size of your private data will fit this space, or
  42 alternatively space could be allocated elsewhere and a pointer to it could
  43 be stored here.
  44
  45 There are three kinds of congestion control algorithms currently: The
  46 simplest ones are derived from TCP reno (highspeed, scalable) and just
  47 provide an alternative the congestion window calculation. More complex
  48 ones like BIC try to look at other events to provide better
  49 heuristics.  There are also round trip time based algorithms like
  50 Vegas and Westwood+.
  51
  52 Good TCP congestion control is a complex problem because the algorithm
  53 needs to maintain fairness and performance. Please review current
  54 research and RFC's before developing new modules.
  55
  56 The method that is used to determine which congestion control mechanism is
  57 determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
  58 The default congestion control will be the last one registered (LIFO);
  59 <<<<<<< HEAD:Documentation/networking/tcp.txt
  60 so if you built everything as modules. the default will be reno. If you
  61 build with the default's from Kconfig, then BIC will be builtin (not a module)
  62 and it will end up the default.
  63 =======
  64 so if you built everything as modules, the default will be reno. If you
  65 build with the defaults from Kconfig, then CUBIC will be builtin (not a
  66 module) and it will end up the default.
  67 >>>>>>> 264e3e889d86e552b4191d69bb60f4f3b383135a:Documentation/networking/tcp.txt
  68
  69 If you really want a particular default value then you will need
  70 to set it with the sysctl.  If you use a sysctl, the module will be autoloaded
  71 if needed and you will get the expected protocol. If you ask for an
  72 unknown congestion method, then the sysctl attempt will fail.
  73
  74 If you remove a tcp congestion control module, then you will get the next
  75 available one. Since reno cannot be built as a module, and cannot be
  76 deleted, it will always be available.
  77
  78 How the new TCP output machine [nyi] works.
  79 ===========================================
  80
  81 Data is kept on a single queue. The skb->users flag tells us if the frame is
  82 one that has been queued already. To add a frame we throw it on the end. Ack
  83 walks down the list from the start.
  84
  85 We keep a set of control flags
  86
  87
  88         sk->tcp_pend_event
  89
  90                 TCP_PEND_ACK                    Ack needed
  91                 TCP_ACK_NOW                     Needed now
  92                 TCP_WINDOW                      Window update check
  93                 TCP_WINZERO                     Zero probing
  94
  95
  96         sk->transmit_queue              The transmission frame begin
  97         sk->transmit_new                First new frame pointer
  98         sk->transmit_end                Where to add frames
  99
 100         sk->tcp_last_tx_ack             Last ack seen
 101         sk->tcp_dup_ack                 Dup ack count for fast retransmit
 102
 103
 104 Frames are queued for output by tcp_write. We do our best to send the frames
 105 off immediately if possible, but otherwise queue and compute the body
 106 checksum in the copy.
 107
 108 When a write is done we try to clear any pending events and piggy back them.
 109 If the window is full we queue full sized frames. On the first timeout in
 110 zero window we split this.
 111
 112 On a timer we walk the retransmit list to send any retransmits, update the
 113 backoff timers etc. A change of route table stamp causes a change of header
 114 and recompute. We add any new tcp level headers and refinish the checksum
 115 before sending.
 116