1 .. SPDX-License-Identifier: GPL-2.0
2 .. include:: <isonum.txt>
4 =============================================
5 Chelsio N210 10Gb Ethernet Network Controller
6 =============================================
8 Driver Release Notes for Linux
27 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
28 Controller. This driver supports the Chelsio N210 NIC and is backward
29 compatible with the Chelsio N110 model 10Gb NICs.
35 Adaptive Interrupts (adaptive-rx)
36 ---------------------------------
38 This feature provides an adaptive algorithm that adjusts the interrupt
39 coalescing parameters, allowing the driver to dynamically adapt the latency
40 settings to achieve the highest performance during various types of network
43 The interface used to control this feature is ethtool. Please see the
44 ethtool manpage for additional usage information.
46 By default, adaptive-rx is disabled.
47 To enable adaptive-rx::
49 ethtool -C <interface> adaptive-rx on
51 To disable adaptive-rx, use ethtool::
53 ethtool -C <interface> adaptive-rx off
55 After disabling adaptive-rx, the timer latency value will be set to 50us.
56 You may set the timer latency after disabling adaptive-rx::
58 ethtool -C <interface> rx-usecs <microseconds>
60 An example to set the timer latency value to 100us on eth0::
62 ethtool -C eth0 rx-usecs 100
64 You may also provide a timer latency value while disabling adaptive-rx::
66 ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
68 If adaptive-rx is disabled and a timer latency value is specified, the timer
69 will be set to the specified value until changed by the user or until
70 adaptive-rx is enabled.
72 To view the status of the adaptive-rx and timer latency values::
74 ethtool -c <interface>
77 TCP Segmentation Offloading (TSO) Support
78 -----------------------------------------
80 This feature, also known as "large send", enables a system's protocol stack
81 to offload portions of outbound TCP processing to a network interface card
82 thereby reducing system CPU utilization and enhancing performance.
84 The interface used to control this feature is ethtool version 1.8 or higher.
85 Please see the ethtool manpage for additional usage information.
87 By default, TSO is enabled.
90 ethtool -K <interface> tso off
94 ethtool -K <interface> tso on
96 To view the status of TSO::
98 ethtool -k <interface>
104 The following information is provided as an example of how to change system
105 parameters for "performance tuning" an what value to use. You may or may not
106 want to change these system parameters, depending on your server/workstation
107 application. Doing so is not warranted in any way by Chelsio Communications,
108 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
109 of data or damage to equipment.
111 Your distribution may have a different way of doing things, or you may prefer
112 a different method. These commands are shown only to provide an example of
113 what to do and are by no means definitive.
115 Making any of the following system changes will only last until you reboot
116 your system. You may want to write a script that runs at boot-up which
117 includes the optimal settings for your system.
119 Setting PCI Latency Timer::
125 Disabling TCP timestamp::
127 sysctl -w net.ipv4.tcp_timestamps=0
131 sysctl -w net.ipv4.tcp_sack=0
133 Setting large number of incoming connection requests::
135 sysctl -w net.ipv4.tcp_max_syn_backlog=3000
137 Setting maximum receive socket buffer size::
139 sysctl -w net.core.rmem_max=1024000
141 Setting maximum send socket buffer size::
143 sysctl -w net.core.wmem_max=1024000
145 Set smp_affinity (on a multiprocessor system) to a single CPU::
147 echo 1 > /proc/irq/<interrupt_number>/smp_affinity
149 Setting default receive socket buffer size::
151 sysctl -w net.core.rmem_default=524287
153 Setting default send socket buffer size::
155 sysctl -w net.core.wmem_default=524287
157 Setting maximum option memory buffers::
159 sysctl -w net.core.optmem_max=524287
161 Setting maximum backlog (# of unprocessed packets before kernel drops)::
163 sysctl -w net.core.netdev_max_backlog=300000
165 Setting TCP read buffers (min/default/max)::
167 sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
169 Setting TCP write buffers (min/pressure/max)::
171 sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
173 Setting TCP buffer space (min/pressure/max)::
175 sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
177 TCP window size for single connections:
179 The receive buffer (RX_WINDOW) size must be at least as large as the
180 Bandwidth-Delay Product of the communication link between the sender and
181 receiver. Due to the variations of RTT, you may want to increase the buffer
182 size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
183 "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
185 At 10Gb speeds, use the following formula::
187 RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
188 Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
190 RX_WINDOW sizes of 256KB - 512KB should be sufficient.
192 Setting the min, max, and default receive buffer (RX_WINDOW) size::
194 sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
196 TCP window size for multiple connections:
197 The receive buffer (RX_WINDOW) size may be calculated the same as single
198 connections, but should be divided by the number of connections. The
199 smaller window prevents congestion and facilitates better pacing,
200 especially if/when MAC level flow control does not work well or when it is
201 not supported on the machine. Experimentation may be necessary to attain
202 the correct value. This method is provided as a starting point for the
203 correct receive buffer size.
205 Setting the min, max, and default receive buffer (RX_WINDOW) size is
206 performed in the same manner as single connection.
212 The following messages are the most common messages logged by syslog. These
213 may be found in /var/log/messages.
217 Chelsio Network Driver - version 2.1.1
221 eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
225 eth#: link is up at 10 Gbps, full duplex
235 These issues have been identified during testing. The following information
236 is provided as a workaround to the problem. In some cases, this problem is
237 inherent to Linux or to a particular Linux Distribution and/or hardware
240 1. Large number of TCP retransmits on a multiprocessor (SMP) system.
242 On a system with multiple CPUs, the interrupt (IRQ) for the network
243 controller may be bound to more than one CPU. This will cause TCP
244 retransmits if the packet data were to be split across different CPUs
245 and re-assembled in a different order than expected.
247 To eliminate the TCP retransmits, set smp_affinity on the particular
248 interrupt to a single CPU. You can locate the interrupt (IRQ) used on
249 the N110/N210 by using ifconfig::
251 ifconfig <dev_name> | grep Interrupt
253 Set the smp_affinity to a single CPU::
255 echo 1 > /proc/irq/<interrupt_number>/smp_affinity
257 It is highly suggested that you do not run the irqbalance daemon on your
258 system, as this will change any smp_affinity setting you have applied.
259 The irqbalance daemon runs on a 10 second interval and binds interrupts
260 to the least loaded CPU determined by the daemon. To disable this daemon::
262 chkconfig --level 2345 irqbalance off
264 By default, some Linux distributions enable the kernel feature,
265 irqbalance, which performs the same function as the daemon. To disable
266 this feature, add the following line to your bootloader::
270 Example using the Grub bootloader::
272 title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
274 kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
275 initrd /initrd-2.4.21-27.ELsmp.img
277 2. After running insmod, the driver is loaded and the incorrect network
278 interface is brought up without running ifup.
280 When using 2.4.x kernels, including RHEL kernels, the Linux kernel
281 invokes a script named "hotplug". This script is primarily used to
282 automatically bring up USB devices when they are plugged in, however,
283 the script also attempts to automatically bring up a network interface
284 after loading the kernel module. The hotplug script does this by scanning
285 the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
286 for HWADDR=<mac_address>.
288 If the hotplug script does not find the HWADDRR within any of the
289 ifcfg-eth# files, it will bring up the device with the next available
290 interface name. If this interface is already configured for a different
291 network card, your new interface will have incorrect IP address and
294 To solve this issue, you can add the HWADDR=<mac_address> key to the
295 interface config file of your network controller.
297 To disable this "hotplug" feature, you may add the driver (module name)
298 to the "blacklist" file located in /etc/hotplug. It has been noted that
299 this does not work for network devices because the net.agent script
300 does not use the blacklist file. Simply remove, or rename, the net.agent
301 script located in /etc/hotplug to disable this feature.
303 3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
304 on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
306 If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
307 chipset, you may experience the "133-Mhz Mode Split Completion Data
308 Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
311 AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
312 can provide stale data via split completion cycles to a PCI-X card that
313 is operating at 133 Mhz", causing data corruption.
315 AMD's provides three workarounds for this problem, however, Chelsio
316 recommends the first option for best performance with this bug:
318 For 133Mhz secondary bus operation, limit the transaction length and
319 the number of outstanding transactions, via BIOS configuration
320 programming of the PCI-X card, to the following:
322 Data Length (bytes): 1k
324 Total allowed outstanding transactions: 2
326 Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
327 section 56, "133-MHz Mode Split Completion Data Corruption" for more
328 details with this bug and workarounds suggested by AMD.
330 It may be possible to work outside AMD's recommended PCI-X settings, try
331 increasing the Data Length to 2k bytes for increased performance. If you
332 have issues with these settings, please revert to the "safe" settings
333 and duplicate the problem before submitting a bug or asking for support.
337 The default setting on most systems is 8 outstanding transactions
338 and 2k bytes data length.
340 4. On multiprocessor systems, it has been noted that an application which
341 is handling 10Gb networking can switch between CPUs causing degraded
342 and/or unstable performance.
344 If running on an SMP system and taking performance measurements, it
345 is suggested you either run the latest netperf-2.4.0+ or use a binding
346 tool such as Tim Hockin's procstate utilities (runon)
347 <http://www.hockin.org/~thockin/procstate/>.
349 Binding netserver and netperf (or other applications) to particular
350 CPUs will have a significant difference in performance measurements.
351 You may need to experiment which CPU to bind the application to in
352 order to achieve the best performance for your system.
354 If you are developing an application designed for 10Gb networking,
355 please keep in mind you may want to look at kernel functions
356 sched_setaffinity & sched_getaffinity to bind your application.
358 If you are just running user-space applications such as ftp, telnet,
359 etc., you may want to try the runon tool provided by Tim Hockin's
360 procstate utility. You could also try binding the interface to a
361 particular CPU: runon 0 ifup eth0
367 If you have problems with the software or hardware, please contact our
368 customer support team via email at support@chelsio.com or check our website
369 at http://www.chelsio.com
371 -------------------------------------------------------------------------------
375 Chelsio Communications
379 http://www.chelsio.com
381 This program is free software; you can redistribute it and/or modify
382 it under the terms of the GNU General Public License, version 2, as
383 published by the Free Software Foundation.
385 You should have received a copy of the GNU General Public License along
386 with this program; if not, write to the Free Software Foundation, Inc.,
387 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
389 THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
390 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
391 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
393 Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.