net_qos.txt

   1                                                                                                                                         31mar02
   2                                                                                 Abbreviations:
   3                                                                 packet(s) - pkt(s)
   4                                                                 interrupt(s) - intrp(s)
   5                                                                 Fast Forwarding - FF
   6 By kernel 2.2.x network subsystem not threaded and only 1 pkt at a time could
   7 enter system.
   8 Introduction (2.3.43) of softnet patch that creates backlog queue per processor,
   9 meant network stack can concurrently process as many pkts as processors.
  10 >From 2.3.58 IRQ affinity introduced.In SMP machine set of processors can be de-
  11 dicate to do network processing,by attaching interfaces to the set,while
  12 other processors for other types of workloads.
  13 On 2.3.99 was study called Fast Forwarding found that Linux reaches con-
  14 gestion collapse at 60Kpps(K pkts per sec).
  15
  16 source: http://lwn.net/2001/features/KernelSummit/ (march 2001)
  17         The network driver API (Kernel summit)
  18 Jamal Hadi Salim led a session describing changes to the network driver
  19 interface. The stock Linux kernel performs poorly under very heavy network
  20 loads- as the number of pkts received goes up, the number of pkts
  21 actually processed begins to drop, until it approaches zero in especially
  22 hostile situations. The desire, of course, is to change that.A number of
  23 problems have been indentified in the current networking stack, including:
  24 * In heavy load situations, the too many intrps are generated. When several
  25   tens of thousands of pkts must be dispatched every second, the system sim-
  26   ply does not have the resources to deal with a hardware intrp for every pkt.
  27 * When load gets heavy, the system needs to start dropping pkts. Currently,
  28   pkts are dropped far too late, after resources have been expended on them.
  29 * Pkts are examined and classified several times;to do once and remember.
  30 * On SMP systems, pkts can be reordered in the networking subsystem, leading to
  31   suboptimal performance later on.
  32 * Heavy load on a single interface can lead to unfair behavior as the other
  33   interfaces on the system are ignored.
  34 Jamal's work (done with Robert Olsson and Alexey Kuznetsov) has focused
  35 primarily on the first two problems.The 1st thing that has been done is to
  36 provide a mechanism to tell drivers that the networking load is high. The
  37 drivers should then tell their interfaces to cut back on intrps. After all,when
  38 a heavy stream of pkts is coming in, there will always be a few of them waiting
  39 in the DMA buffers, and the intrps carry little new information.When intrps are
  40 off, the networking code will instead poll drivers when it is ready to accept
  41 new pkts. Each interface has a quota stating how many pkts will be accepted;
  42 this quota limits the pkt traffic into the rest of the kernel, and distributes
  43 processing more fairly across the interfaces. If the traffic is heavy, it is
  44 entirely likely that the DMA rings for one or more drivers will overflow, since
  45 the kernel is not polling often enough. Once that happens, pkts will be dropped
  46 by the interface itself (it has, after all, no place to put them). Thus the
  47 kernel need not process them at all, and they do not even cross the I/O bus.
  48 The end result is an order of magnitude increase in the number of pkts a Linux
  49 system can route. This work is clearly successful, and will likely show up in
  50 2.5 in some form.
  51
  52 source: http://lwn.net/2001/1004/kernel.php3 (october 2001)
  53 The NAPI work is based on the techniques discussed before, but the work has
  54 progressed since then. It has not, perhaps, received the degree of attention
  55 that it should have, though this discussion has raised its profile somewhat.
  56 Now it might become truly widely known...
  57 NAPI works with modern network adaptors which implement a "ring" of DMA buffers;
  58 each pkt, as it is received,is placed into the next buffer in the ring.Normally,
  59 the processor is intrped for each packet,and the system is expected to empty the
  60 packet from the ring. The NAPI patch responds to the 1st intrp by telling the
  61 adaptor to stop intrping;it will then check the ring occasionally as it
  62 processes pkts and pull new ones without the need for further intrps.
  63 People who have been on the net for a long time might appreciate this analogy:
  64 back in the 1980's, many of us had our systems configured to beep (intrp) at us
  65 ever time an email message arrived. In 2001, beeping mail notifiers are far less
  66 common. There's almost always new mail, there's no need for the system to be
  67 obnoxious about it. Similarly, on a loaded system, there will always be new
  68 pkts to process, so there is no need for all those intrps.
  69 When the networking code checks an interface and finds that no more pkts have
  70 arrived, intrps are reenabled and polling stops.
  71 NAPI takes things little farther by eliminating the pkt backlog queue currently
  72 maintained in the 2.4 network stack. Instead, the adaptor's DMA ring becomes
  73 that queue. In this way, system memory is conserved, pkts are less likely to be
  74 reordered, and, if the load requires that pkts be dropped, they will be disposed
  75 of before ever being copied into the kernel.
  76 NAPI requires some changes to the network driver interface,of course.The changes
  77 have been designed to be incremental. Drivers which have not been converted will
  78 continue to function as always (well, at least, as in 2.4.x), but the higher
  79 performance enabled by NAPI will require modifications.
  80