2 .\" This file and its contents are supplied under the terms of the
3 .\" Common Development and Distribution License ("CDDL"), version 1.0.
4 .\" You may only use this file in accordance with the terms of version
7 .\" A full copy of the text of the CDDL should have accompanied this
8 .\" source. A copy of the CDDL is also available via the Internet at
9 .\" http://www.illumos.org/license/CDDL.
12 .\" Copyright 2016 Joyent, Inc.
22 .Nd MAC networking device driver overview
24 .In sys/mac_provider.h
31 framework provides a means for implementing high-performance networking
32 device drivers. It is the successor to the GLD interfaces and is
33 sometimes referred to as the GLDv3. The remainder of this manual
34 introduces the aspects of writing devices drivers that leverage the MAC
35 framework. While both the GLDv3 and MAC framework refer to the same thing, in
36 this manual page we use the term the
38 to refer to the device driver interface.
40 MAC device drivers are character devices. They define the standard
45 entry points to initialize the module, as well as
51 The main interface with MAC is through a series of callbacks defined in
54 structure. These callbacks control all the aspects of the device. They
55 range from sending data, getting and setting of
56 properties, controlling mac address filters, and also managing
59 The MAC framework takes care of many aspects of the device driver's
60 management. A device that uses the MAC framework does not have to worry
61 about creating device nodes or implementing
65 routines. In addition, all of the work to interact with
67 is taken care of automatically and transparently.
68 .Ss Initializing MAC Support
69 For a device to be used in the framework, it must register with the
70 framework and take specific actions during
77 All device drivers have to define a
79 structure which is pointed to by a
81 structure and the corresponding NULL-terminated
85 structure should have a
87 structure defined for it; however, it does not need to implement any of
92 Normally, in a driver's
94 entry point, it passes its
98 To properly register with MAC, the driver must call
102 If for some reason the
104 function fails, then the driver must be removed by a call to
105 .Xr mac_fini_ops 9F .
107 Conversely, in the driver's
109 routine, it should call
111 after it successfully calls
113 For an example of how to use the
117 functions, see the examples section in
118 .Xr mac_init_ops 9F .
119 .Ss Registering with MAC
120 Every instance of a device should register separately with MAC.
121 To register with MAC, a driver must allocate a
123 structure, fill it in, and then call
124 .Xr mac_register 9F .
127 structure contains information about the device and all of the required
128 function pointers that will be used as callbacks by the framework.
130 These steps should all be taken during a device's
132 entry point. It is recommended that the driver perform this sequence of
133 steps after the device has finished its initialization of the chipset
134 and interrupts, though interrupts should not be enabled at that point.
137 it will start receiving callbacks from the MAC framework.
139 To allocate the registration structure, the driver should call
141 Device drivers should generally always pass the symbol
145 Upon successful completion, the driver will receive a
147 structure which it should fill in. The structure and its members are
149 .Xr mac_register 9S .
153 structure is not allocated as a part of the
155 structure. In general, device drivers declare this statically. See the
157 section for more information on how to fill it out.
159 Once the structure has been filled in, the driver should call
161 to register itself with MAC. The handle that it uses to register with
162 should be part of the driver's soft state. It will be used in various
163 other support functions and callbacks.
165 If the call is successful, then the device driver
166 should enable interrupts and finish any other initialization required.
169 failed, then it should unwind its initialization and should return
175 The MAC framework interacts with a device driver through a series of
176 callbacks. These callbacks are described in their individual manual
177 pages and the collection of callbacks is indicated in the
179 manual page. This section does not focus on the specific functions, but
180 rather on interactions between them and the rest of the device driver
183 A device driver should make no assumptions about when the various
184 callbacks will be called and whether or not they will be called
186 For example, a device driver may be asked to transmit data through a call to its
188 entry point while it is being asked to get a device property through a
192 As such, while some calls may be serialized to the device, such as setting
193 properties, the device driver should always presume that all of its data needs
194 to be protected with locks.
195 While the device is holding locks, it is safe for it call the following MAC
197 .Bl -bullet -offset indent -compact
199 .Xr mac_hcksum_get 9F
201 .Xr mac_hcksum_set 9F
205 .Xr mac_maxsdu_update 9F
207 .Xr mac_prop_info_set_default_link_flowctrl 9F
209 .Xr mac_prop_info_set_default_str 9F
211 .Xr mac_prop_info_set_default_uint8 9F
213 .Xr mac_prop_info_set_default_uint32 9F
215 .Xr mac_prop_info_set_default_uint64 9F
217 .Xr mac_prop_info_set_perm 9F
219 .Xr mac_prop_info_set_range_uint32 9F
222 Any other MAC related routines should not be called with locks held,
224 .Xr mac_link_update 9F
227 Other routines in the DDI may be called while locks are held; however,
228 device driver writers should be careful about calling blocking routines
229 while locks are held or in interrupt context, though it is generally
232 A device driver will often receive data through the means of an
233 interrupt. When that interrupt occurs, the device driver will receive
234 one or more frames with optional metadata. Often each frame has a
235 corresponding descriptor which has information about whether or not
236 there were errors or whether or not the device successfully checksummed
239 During a single interrupt, a device driver should process a fixed number
240 of frames. For each frame the device driver should:
241 .Bl -enum -offset indent
243 First check whether or not the frame has errors.
244 If errors were detected, then the frame should not be sent to the operating
246 It is recommended that devices keep kstats (see
248 for more information) and bump the counter whenever such an error is
249 detected. If the device distinguishes between the types of errors, then
250 separate kstats for each class of error are recommended. See the
252 section for more information on the various error cases that should be
255 Once the frame has been determined to be valid, the device driver should
256 transform the frame into a
260 for more information on how to transform and prepare a message block.
262 If the device supports hardware checksumming (see the
264 section for more information on checksumming), then the device driver
265 should set the corresponding checksumming information with a call to
266 .Xr mac_hcksum_set 9F .
268 It should then append this new message block to the
270 of the message block chain, linking it to the
272 pointer. It is vitally important that all the frames be chained in the
273 order that they were received. If the device driver mistakenly reorders
274 frames, then it may cause performance impacts in the TCP stack and
275 potentially impact application correctness.
278 Once all the frames have been processed and assembled, the device driver
279 should deliver them to the rest of the operating system by calling
281 The device driver should try to give as many mblk_t structures to the
286 once for every assembled mblk_t.
288 The device driver must not hold any locks across the call to
290 When this function is called, received data will be pushed through the
291 networking stack and some replies may be generated and given to the
294 It is not the device driver's responsibility to determine whether or not
295 the system can keep up with a driver's delivery rate of frames. The rest
296 of the networking stack will handle issues related to keeping up
297 appropriately and ensure that kernel memory is not exhausted by packets
298 that are not being processed.
300 Finally, the device driver should make sure that any other housekeeping
301 activities required for the ring are taken care of such that more data
303 .Ss Transmitting Data and Back Pressure
304 A device driver will be asked to transmit a message block chain by
307 entry point called. While the driver is processing the message blocks,
308 it may run out of resources. For example, a transmit descriptor ring may
309 become full. At that point, the device driver should return the
310 remaining unprocessed frames. The act of returning frames indicates that
311 the device has asserted flow control.
312 Once this has been done, no additional calls will be made to the
313 driver's transmit entry point and the back pressure will be propagated
314 throughout the rest of the networking stack.
316 At some point in the future when resources have become available again,
317 for example after an interrupt indicating that some portion of the
318 transmit ring has been sent, then the device driver must notify the
319 system that it can continue transmission. To do this, the
321 .Xr mac_tx_update 9F .
322 After that point, the driver will receive calls to its
324 entry point again. As mentioned in the section on callbacks, the device
325 driver should avoid holding any particular locks across the call to
326 .Xr mac_tx_update 9F .
327 .Ss Interrupt Coalescing
328 For devices operating at higher data rates, interrupt coalescing is an
329 important part of a well functioning device and may impact the
330 performance of the device. Not all devices support interrupt
331 coalescing. If interrupt coalescing is supported on the device, it is
332 recommended that device driver writers provide private properties for
333 their device to control the interrupt coalescing rate. This will make it
334 much easier to perform experiments and observe the impact of different
335 interrupt rates on the rest of the system.
336 .Ss MAC Address Filter Management
337 The MAC framework will attempt to use as many MAC address filters as a
338 device has. To program a multicast address filter, the driver's
340 entry point will be called. If the device driver runs out of filters, it
341 should not take any special action and just return the appropriate error
342 as documented in the corresponding manual pages for the entry points.
343 The framework will ensure that the device is placed in promiscuous mode
346 It is the responsibility of the device driver to keep track of the
347 data link's state. Many devices provide a means of receiving an
348 interrupt when the state of the link changes. When such a change
349 happens, the driver should update its internal data structures and then
351 .Xr mac_link_update 9F
352 to inform the MAC layer that this has occurred. If the device driver
353 does not properly inform the system about link changes, then various
354 features like link aggregations and other mechanisms that leverage the
355 link state will not work correctly.
356 .Ss Link Speed and Auto-negotiation
357 Many networking devices support more than one possible speed that they
358 can operate at. The selection of a speed is often performed through
359 .Em auto-negotiation ,
360 though some devices allow the user to control what speeds are advertised
363 Logically, there are two different sets of things that the device driver
364 needs to keep track of while it's operating:
367 The supported speeds in hardware.
369 The enabled speeds from the user.
372 By default, when a link first comes up, the device driver should
373 generally configure the link to support the common set of speeds and
374 perform auto-negotiation.
376 A user can control what speeds a device advertises via auto-negotiation
377 and whether or not it performs auto-negotiation at all by using a series
378 of properties that have
380 in the name. These are read/write properties and there is one for each
381 speed supported in the operating system. For a full list of them, see
386 In addition to these properties, there is a corresponding set of
389 in the name. These are similar to the
391 family of properties, but they are read-only and indicate what the
392 device has actually negotiated. While they are generally similar to the
394 family of properties, they may change depending on power settings. See
396 .Sy Ethernet Link Properties
399 for more information.
401 It's worth discussing how these different values get used throughout the
402 different entry points. The first entry point to consider is the
404 entry point. For a given speed, the driver should consult whether or not
405 the hardware supports this speed. If it does, it should fill in the
406 default value that the hardware takes and whether or not the property is
407 writable. The properties should also be updated to indicate whether or
408 not it is writable. This holds for both the
412 family of properties.
414 The next entry point is
416 Here, the device should first consult whether the given speed is
417 supported. If it is not, then the driver should return
419 If it does, then it should return the current value of the property.
421 The last property endpoint is the
423 entry point. Here, the same logic applies. Before the driver considers
424 whether or not the property is writable, it should first check whether
425 or not it's a supported property. If it's not, then it should return
427 Otherwise, it should proceed to check whether the property is writable,
428 and if it is and a valid value, then it should update the property and
429 restart the link's negotiation.
431 Finally, there is the
433 entry point. Several of the statistics that are queried relate to
434 auto-negotiation and hardware capabilities. When a statistic relates to
435 the hardware supporting a given speed, the
437 properties should be ignored. The only thing that should be consulted is
438 what the hardware itself supports. Otherwise, the statistics should look
439 at what is currently being advertised by the device.
440 .Ss Unregistering from MAC
443 routine, it should unregister the device instance from MAC by calling
444 .Xr mac_unregister 9F
445 on the handle that it originally called it on. If the call to
446 .Xr mac_unregister 9F
447 failed, then the device is likely still in use and the driver should
450 .Ss Interacting with Devices
451 Administrators always interact with devices through the
453 command line interface. The state of devices such as whether the link is
458 various link properties such as the
465 are all exposed. It is also the preferred way that these properties are
468 While device tunables may be presented in a
470 file, it is recommended instead to expose such things through
472 private properties, whether explicitly documented or not.
474 Capabilities in the MAC Framework are optional features that a device
475 supports which indicate various hardware features that the device
476 supports. The two current capabilities that the system supports are
477 related to being able to hardware perform large send offloads (LSO),
478 often also known as TCP segmentation and the ability for hardware to
479 calculate and verify the checksums present in IPv4, IPV6, and protocol
480 headers such as TCP and UDP.
482 The MAC framework will query a device for support of a capability
485 function. Each capability has its own constant and may have
486 corresponding data that goes along with it and a specific structure that
487 the device is required to fill in. Note, the set of capabilities changes
488 over time and there are also private capabilities in the system. Several
489 of the capabilities are used in the implementation of the MAC framework.
491 .Sy MAC_CAPAB_RINGS ,
492 represent feature that have not been stabilized and thus both API and
493 binary compatibility for them is not guaranteed. It is important that
494 the device driver handles unknown capabilities correctly. For more
498 The following capabilities are
499 stable and defined in the system:
503 capability indicates to the system that the device driver supports some
504 amount of checksumming. The specific data for this capability is a
507 To indicate no support for any kind of checksumming, the driver should
508 either set this value to zero or simply return that it doesn't support
511 Note, the values that the driver declares in this capability indicate
512 what it can do when it transmits data. If the driver can only
513 verify checksums when receiving data, then it should not indicate that
514 it supports this capability. The following set of flags may be combined
515 through a bitwise inclusive OR:
517 .It Sy HCKSUM_INET_PARTIAL
518 This indicates that the hardware can calculate a partial checksum for
519 both IPv4 and IPv6; however, it requires the pseudo-header checksum be
520 calculated for it. The pseudo-header checksum will be available for the
522 .Xr mac_hcksum_get 9F .
523 Note this does not imply that the hardware is capable of calculating the
524 IPv4 header checksum. That should be indicated with the
525 .Sy HCKSUM_IPHDRCKSUM flag.
526 .It Sy HCKSUM_INET_FULL_V4
527 This indicates that the hardware will fully calculate the L4 checksum
528 for outgoing IPv4 packets and does not require a pseudo-header checksum.
529 Note this does not imply that the hardware is capable of calculating the
530 IPv4 header checksum. That should be indicated with the
531 .Sy HCKSUM_IPHDRCKSUM .
532 .It Sy HCKSUM_INET_FULL_V6
533 This indicates that the hardware will fully calculate the L4 checksum
534 for outgoing IPv6 packets and does not require a pseudo-header checksum.
535 .It Sy HCKSUM_IPHDRCKSUM
536 This indicates that the hardware supports calculating the checksum for
537 the IPv4 header itself.
540 When in a driver's transmit function, the driver will be processing a
541 single frame. It should call
542 .Xr mac_hcksum_get 9F
543 to see what checksum flags are set on it. Note that the flags that are
544 set on it are different from the ones described above and are documented
545 in its manual page. These flags indicate how the driver is expected to
546 program the hardware and what checksumming is required. Not all frames
547 will require hardware checksumming or will ask the hardware to checksum
550 If a driver supports offloading the receive checksum and verification,
551 it should check to see what the hardware indicated was verified. The
552 driver should then call
553 .Xr mac_hcksum_set 9F .
554 The flags used are different from the ones above and are discussed in
556 .Xr mac_hcksum_set 9F
557 manual page. If there is no checksum information available or the driver
558 does not support checksumming, then it should simply not call
559 .Xr mac_hcksum_set 9F .
561 Note that the checksum flags should be set on the first
562 mblk_t that makes up a given message. In other words, if multiple
563 mblk_t structures are linked together by the
565 member to describe a single frame, then it should only be called on the
566 first mblk_t of that set. However, each distinct message should have the
567 checksum bits set on it, if applicable. In other words, each mblk_t that
568 is linked together by the
570 pointer may have checksum flags set.
572 It is recommended that device drivers provide a private property or
574 property to control whether or not checksumming is enabled for both rx
575 and tx; however, the default disposition is recommended to be enabled
576 for both. This way if hardware bugs are found in the checksumming
577 implementation, they can be disabled without requiring software updates.
578 The transmit property should be checked when determining how to reply to
580 and the receive property should be checked in the context of the receive
585 capability indicates that the driver supports various forms of large
586 send offload (LSO). The private data is a pointer to a
588 structure. At the moment, LSO support is limited to TCP inside of IPv4.
589 This structure has the following members which are used to indicate
590 various types of LSO support.
591 .Bd -literal -offset indent
592 t_uscalar_t lso_flags;
593 lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4;
598 member is used to indicate which members are valid and should be
599 considered. Each flag represents a different form of LSO. The member
600 should be set to the bitwise inclusive OR of the following values:
601 .Bl -tag -width Dv -offset indent
602 .It Sy LSO_TX_BASIC_TCP_IPV4
603 This indicates hardware support for performing TCP segmentation
604 offloading over IPv4. When this flag is set, the
605 .Sy lso_basic_tcp_ipv4
606 member must be filled in.
610 .Sy lso_basic_tcp_ipv4
611 member is a structure with the following members:
612 .Bd -literal -offset indent
615 .Bd -filled -offset indent
618 member should be set to the maximum size of the TCP data
619 payload that can be offloaded to the hardware.
622 Like with checksumming, it is recommended that driver writers provide a
623 means for disabling the support of LSO even if it is enabled by default.
624 This deals with the case where issues that pop up for LSO may be worked
625 around without requiring additional driver work.
627 Properties in the MAC framework represent aspects of a link. These
628 include things like the link's current state and MTU. Many of the
629 properties in the system are focused around auto-negotiation and
630 controlling what link speeds are advertised. Information about
631 properties is covered by three different device entry points. The
633 entry point obtains metadata about the property. The
635 entry point obtains the property. The
637 entry point updates the property to a new value.
639 Many of the properties listed below are read-only. Each property
640 indicates whether it's read-only or it's read/write. However, driver
641 writers may not implement the ability to set all writable properties.
642 Many of these depend on the card itself. In particular, all properties
643 that relate to auto-negotiation and are read/write may not be updated
644 if the hardware in question does not support toggling what link speeds
645 are auto-negotiated. While copper Ethernet often does not have this
646 restriction, it often exists with various fiber standards and phys.
648 The following properties are the subset of MAC framework properties that
649 driver writers should be aware of and handle. While other properties
650 exist in the system, driver writers should always return an error when a
651 property not listed below is encountered. See
655 for more information on how to handle them.
657 .It Sy MAC_PROP_DUPLEX
667 property is used to indicate whether or not the link is duplex. A duplex
668 link may have traffic flowing in both directions at the same time. The
670 is an enumeration which may be set to any of the following values:
672 .It Sy LINK_DUPLEX_UNKNOWN
673 The current state of the link is unknown. This may be because the link
674 has not negotiated to a specific speed or it is down.
675 .It Sy LINK_DUPLEX_HALF
676 The link is running at half duplex. Communication may travel in only one
677 direction on the link at a given time.
678 .It Sy LINK_DUPLEX_FULL
679 The link is running at full duplex. Communication may travel in both
680 directions on the link simultaneously.
682 .It Sy MAC_PROP_SPEED
692 property stores the current link speed in bits per second. A link
693 that is running at 100 MBit/s would store the value 100000000ULL. A link
694 that is running at 40 Gbit/s would store the value 40000000000ULL.
695 .It Sy MAC_PROP_STATUS
705 property is used to indicate the current state of the link. It indicates
706 whether the link is up or down. The
708 is an enumeration which may be set to any of the following values:
710 .It Sy LINK_STATE_UNKNOWN
711 The current state of the link is unknown. This may be because the
714 endpoint has not been called so it has not attempted to start the link.
715 .It Sy LINK_STATE_DOWN
716 The link is down. This may be because of a negotiation problem, a cable
717 problem, or some other device specific issue.
719 The link is up. If auto-negotiation is in use, it should have completed.
720 Traffic should be able to flow over the link, barring other issues.
722 .It Sy MAC_PROP_AUTONEG
732 property indicates whether or not the device is currently configured to
733 perform auto-negotiation. A value of
735 indicates that auto-negotiation is disabled. A
737 value indicates that auto-negotiation is enabled. Devices should
738 generally default to enabling auto-negotiation.
740 When getting this property, the device driver should return the current
741 state. When setting this property, if the device supports operating in
742 the requested mode, then the device driver should reset the link to
743 negotiate to the new speed after updating any internal registers.
754 property determines the maximum transmission unit (MTU). This indicates
755 the maximum size packet that the device can transmit, ignoring its own
756 headers. For an Ethernet device, this would exclude the size of the
757 Ethernet header and any VLAN headers that would be placed. It is up to
758 the driver to ensure that any MTU values that it accepts when adding in
759 its margin and header sizes does not exceed its maximum frame size.
761 By default, drivers for Ethernet should initialize this value and the
764 When getting this property, the driver should return its current
765 recorded MTU. When setting this property, the driver should first
766 validate that it is within the device's valid range and then it must
768 .Xr mac_maxsdu_update 9F .
769 Note that the call may fail. If the call completes successfully, the
770 driver should update the hardware with the new value of the MTU and
771 perform any other work needed to handle it.
773 If the device does not support changing the MTU after the device's
775 entry point has been called, then driver writers should return
777 .It Sy MAC_PROP_FLOWCTRL
780 .Sy link_flowctrl_t |
786 .Sy MAC_PROP_FLOWCTRL
787 property manages the configuration of pause frames as part of Ethernet
788 flow control. Note, this only describes what this device will advertise.
789 What is actually enabled may be different and is subject to the rules of
790 auto-negotiation. The
792 is an enumeration that may be set to one of the following values:
794 .It Sy LINK_FLOWCTRL_NONE
795 Flow control is disabled. No pause frames should be generated or
797 .It Sy LINK_FLOWCTRL_RX
798 The device can receive pause frames; however, it should not generate
800 .It Sy LINK_FLOWCTRL_TX
801 The device can generate pause frames; however, it does not support
803 .It Sy LINK_FLOWCTRL_BI
804 The device supports both sending and receiving pause frames.
807 When getting this property, the device driver should return the way that
808 it has configured the device, not what the device has actually
809 negotiated. When setting the property, it should update the hardware and
810 allow the link to potentially perform auto-negotiation again.
813 The remaining properties are all about various auto-negotiation link
814 speeds. They fall into two different buckets: properties with
816 in the name and properties with
818 in the name. For any given supported speed, there is one of each. The
820 set of properties are read/write properties that control what should be
821 advertised by the device. When these are retrieved, they should return
822 the current value of the property. When they are set, they should change
823 how the hardware advertises the specific speed and trigger any kind of
824 link reset and auto-negotiation, if enabled, to occur.
828 set of properties are read-only properties. They are meant to reflect
829 what has actually been negotiated. These may be different from the
831 family of properties, especially when different power management
832 settings are at play.
835 .Sx Link Speed and Auto-negotiation
836 section for more information.
838 The properties are ordered in increasing link speed:
840 .It Sy MAC_PROP_ADV_10HDX_CAP
849 .Sy MAC_PROP_ADV_10HDX_CAP
850 property describes whether or not 10 Mbit/s half-duplex support is
852 .It Sy MAC_PROP_EN_10HDX_CAP
861 .Sy MAC_PROP_EN_10HDX_CAP
862 property describes whether or not 10 Mbit/s half-duplex support is
864 .It Sy MAC_PROP_ADV_10FDX_CAP
873 .Sy MAC_PROP_ADV_10FDX_CAP
874 property describes whether or not 10 Mbit/s full-duplex support is
876 .It Sy MAC_PROP_EN_10FDX_CAP
885 .Sy MAC_PROP_EN_10FDX_CAP
886 property describes whether or not 10 Mbit/s full-duplex support is
888 .It Sy MAC_PROP_ADV_100HDX_CAP
897 .Sy MAC_PROP_ADV_100HDX_CAP
898 property describes whether or not 100 Mbit/s half-duplex support is
900 .It Sy MAC_PROP_EN_100HDX_CAP
909 .Sy MAC_PROP_EN_100HDX_CAP
910 property describes whether or not 100 Mbit/s half-duplex support is
912 .It Sy MAC_PROP_ADV_100FDX_CAP
921 .Sy MAC_PROP_ADV_100FDX_CAP
922 property describes whether or not 100 Mbit/s full-duplex support is
924 .It Sy MAC_PROP_EN_100FDX_CAP
933 .Sy MAC_PROP_EN_100FDX_CAP
934 property describes whether or not 100 Mbit/s full-duplex support is
936 .It Sy MAC_PROP_ADV_100T4_CAP
945 .Sy MAC_PROP_ADV_100T4_CAP
946 property describes whether or not 100 Mbit/s Ethernet using the
947 100BASE-T4 standard is
949 .It Sy MAC_PROP_EN_100T4_CAP
958 .Sy MAC_PROP_ADV_100T4_CAP
959 property describes whether or not 100 Mbit/s Ethernet using the
960 100BASE-T4 standard is
962 .It Sy MAC_PROP_ADV_1000HDX_CAP
971 .Sy MAC_PROP_ADV_1000HDX_CAP
972 property describes whether or not 1 Gbit/s half-duplex support is
974 .It Sy MAC_PROP_EN_1000HDX_CAP
983 .Sy MAC_PROP_EN_1000HDX_CAP
984 property describes whether or not 1 Gbit/s half-duplex support is
986 .It Sy MAC_PROP_ADV_1000FDX_CAP
995 .Sy MAC_PROP_ADV_1000FDX_CAP
996 property describes whether or not 1 Gbit/s full-duplex support is
998 .It Sy MAC_PROP_EN_1000FDX_CAP
1007 .Sy MAC_PROP_EN_1000FDX_CAP
1008 property describes whether or not 1 Gbit/s full-duplex support is
1010 .It Sy MAC_PROP_ADV_2500FDX_CAP
1011 .Bd -filled -compact
1019 .Sy MAC_PROP_ADV_2500FDX_CAP
1020 property describes whether or not 2.5 Gbit/s full-duplex support is
1022 .It Sy MAC_PROP_EN_2500FDX_CAP
1023 .Bd -filled -compact
1031 .Sy MAC_PROP_EN_2500FDX_CAP
1032 property describes whether or not 2.5 Gbit/s full-duplex support is
1034 .It Sy MAC_PROP_ADV_5000FDX_CAP
1035 .Bd -filled -compact
1043 .Sy MAC_PROP_ADV_5000FDX_CAP
1044 property describes whether or not 5.0 Gbit/s full-duplex support is
1046 .It Sy MAC_PROP_EN_5000FDX_CAP
1047 .Bd -filled -compact
1055 .Sy MAC_PROP_EN_5000FDX_CAP
1056 property describes whether or not 5.0 Gbit/s full-duplex support is
1058 .It Sy MAC_PROP_ADV_10GFDX_CAP
1059 .Bd -filled -compact
1067 .Sy MAC_PROP_ADV_10GFDX_CAP
1068 property describes whether or not 10 Gbit/s full-duplex support is
1070 .It Sy MAC_PROP_EN_10GFDX_CAP
1071 .Bd -filled -compact
1079 .Sy MAC_PROP_EN_10GFDX_CAP
1080 property describes whether or not 10 Gbit/s full-duplex support is
1082 .It Sy MAC_PROP_ADV_40GFDX_CAP
1083 .Bd -filled -compact
1091 .Sy MAC_PROP_ADV_40GFDX_CAP
1092 property describes whether or not 40 Gbit/s full-duplex support is
1094 .It Sy MAC_PROP_EN_40GFDX_CAP
1095 .Bd -filled -compact
1103 .Sy MAC_PROP_EN_40GFDX_CAP
1104 property describes whether or not 40 Gbit/s full-duplex support is
1106 .It Sy MAC_PROP_ADV_100GFDX_CAP
1107 .Bd -filled -compact
1115 .Sy MAC_PROP_ADV_100GFDX_CAP
1116 property describes whether or not 100 Gbit/s full-duplex support is
1118 .It Sy MAC_PROP_EN_100GFDX_CAP
1119 .Bd -filled -compact
1127 .Sy MAC_PROP_EN_100GFDX_CAP
1128 property describes whether or not 100 Gbit/s full-duplex support is
1131 .Ss Private Properties
1132 In addition to the defined properties above, drivers are allowed to
1133 define private properties. These private properties are device-specific
1134 properties. All private properties share the same constant,
1135 .Sy MAC_PROP_PRIVATE .
1136 Properties are distinguished by a name, which is a character string. The
1137 list of such private properties is defined when registering with mac in
1144 The driver may define whatever semantics it wants for these private
1145 properties. They will not be listed when running
1147 unless explicitly requested by name. All such properties should start
1148 with a leading underscore character and then consist of alphanumeric
1149 ASCII characters and additional underscores or hyphens.
1152 .Sy MAC_PROP_PRIVATE
1153 may show up in all three property related entry points:
1154 .Xr mc_propinfo 9E ,
1158 Device drivers should tell the different properties apart by using the
1160 function to compare it to the set of properties that it knows about.
1161 When encountering properties that it doesn't know, it should treat them
1162 like all other unknown properties.
1164 The MAC framework defines a couple different sets of statistics which
1165 are based on various standards for devices to implement. Statistics are
1166 retrieved through the
1168 entry point. There are both statistics that are required for all devices
1169 and then there is a separate set of Ethernet specific statistics. Not
1170 all devices will support every statistic. In many cases, several device
1171 registers will need to be combined to create the proper stat.
1173 In general, if the device is not keeping track of these statistics, then
1174 it is recommended that the driver store these values as a
1176 to ensure that overflow does not occur.
1178 If a device does not support a specific statistic, then it is fine to
1179 return that it is not supported. The same should be used for
1180 unrecognized statistics. See
1182 for more information on the proper way to handle these.
1183 .Ss General Device Statistics
1184 The following statistics are based on MIB-II statistics from both RFC
1187 .It Sy MAC_STAT_IFSPEED
1188 The device's current speed in bits per second.
1189 .It Sy MAC_STAT_MULTIRCV
1190 The total number of received multicast packets.
1191 .It Sy MAC_STAT_BRDCSTRCV
1192 The total number of received broadcast packets.
1193 .It Sy MAC_STAT_MULTIXMT
1194 The total number of transmitted multicast packets.
1195 .It Sy MAC_STAT_BRDCSTXMT
1196 The total number of received broadcast packets.
1197 .It Sy MAC_STAT_NORCVBUF
1198 The total number of packets discarded by the hardware due to a lack of
1200 .It Sy MAC_STAT_IERRORS
1201 The total number of errors detected on input.
1202 .It Sy MAC_STAT_UNKNOWNS
1203 The total number of received packets that were discarded because they
1204 were of an unknown protocol.
1205 .It Sy MAC_STAT_NOXMTBUF
1206 The total number of outgoing packets dropped due to a lack of transmit
1208 .It Sy MAC_STAT_OERRORS
1209 The total number of outgoing packets that resulted in errors.
1210 .It Sy MAC_STAT_COLLISIONS
1211 Total number of collisions encountered by the transmitter.
1212 .It Sy MAC_STAT_RBYTES
1215 received by the device, regardless of packet type.
1216 .It Sy MAC_STAT_IPACKETS
1219 received by the device, regardless of packet type.
1220 .It Sy MAC_STAT_OBYTES
1223 transmitted by the device, regardless of packet type.
1224 .It Sy MAC_STAT_OPACKETS
1227 sent by the device, regardless of packet type.
1228 .It Sy MAC_STAT_UNDERFLOWS
1229 The total number of packets that were smaller than the minimum sized
1230 packet for the device and were therefore dropped.
1231 .It Sy MAC_STAT_OVERFLOWS
1232 The total number of packets that were larger than the maximum sized
1233 packet for the device and were therefore dropped.
1235 .Ss Ethernet Specific Statistics
1236 The following statistics are specific to Ethernet devices. They refer to
1237 values from RFC 1643 and include various MII/GMII specific stats. Many
1238 of these are also defined in IEEE 802.3.
1240 .It Sy ETHER_STAT_ADV_CAP_1000FDX
1241 Indicates that the device is advertising support for 1 Gbit/s
1242 full-duplex operation.
1243 .It Sy ETHER_STAT_ADV_CAP_1000HDX
1244 Indicates that the device is advertising support for 1 Gbit/s
1245 half-duplex operation.
1246 .It Sy ETHER_STAT_ADV_CAP_100FDX
1247 Indicates that the device is advertising support for 100 Mbit/s
1248 full-duplex operation.
1249 .It Sy ETHER_STAT_ADV_CAP_100GFDX
1250 Indicates that the device is advertising support for 100 Gbit/s
1251 full-duplex operation.
1252 .It Sy ETHER_STAT_ADV_CAP_100HDX
1253 Indicates that the device is advertising support for 100 Mbit/s
1254 half-duplex operation.
1255 .It Sy ETHER_STAT_ADV_CAP_100T4
1256 Indicates that the device is advertising support for 100 Mbit/s
1257 100BASE-T4 operation.
1258 .It Sy ETHER_STAT_ADV_CAP_10FDX
1259 Indicates that the device is advertising support for 10 Mbit/s
1260 full-duplex operation.
1261 .It Sy ETHER_STAT_ADV_CAP_10GFDX
1262 Indicates that the device is advertising support for 10 Gbit/s
1263 full-duplex operation.
1264 .It Sy ETHER_STAT_ADV_CAP_10HDX
1265 Indicates that the device is advertising support for 10 Mbit/s
1266 half-duplex operation.
1267 .It Sy ETHER_STAT_ADV_CAP_2500FDX
1268 Indicates that the device is advertising support for 2.5 Gbit/s
1269 full-duplex operation.
1270 .It Sy ETHER_STAT_ADV_CAP_40GFDX
1271 Indicates that the device is advertising support for 40 Gbit/s
1272 full-duplex operation.
1273 .It Sy ETHER_STAT_ADV_CAP_5000FDX
1274 Indicates that the device is advertising support for 5.0 Gbit/s
1275 full-duplex operation.
1276 .It Sy ETHER_STAT_ADV_CAP_ASMPAUSE
1277 Indicates that the device is advertising support for receiving pause
1279 .It Sy ETHER_STAT_ADV_CAP_AUTONEG
1280 Indicates that the device is advertising support for auto-negotiation.
1281 .It Sy ETHER_STAT_ADV_CAP_PAUSE
1282 Indicates that the device is advertising support for generating pause
1284 .It Sy ETHER_STAT_ADV_REMFAULT
1285 Indicates that the device is advertising support for detecting faults in
1286 the remote link peer.
1287 .It Sy ETHER_STAT_ALIGN_ERRORS
1288 Indicates the number of times an alignment error was generated by the
1289 Ethernet device. This is a count of packets that were not an integral
1290 number of octets and failed the FCS check.
1291 .It Sy ETHER_STAT_CAP_1000FDX
1292 Indicates the device supports 1 Gbit/s full-duplex operation.
1293 .It Sy ETHER_STAT_CAP_1000HDX
1294 Indicates the device supports 1 Gbit/s half-duplex operation.
1295 .It Sy ETHER_STAT_CAP_100FDX
1296 Indicates the device supports 100 Mbit/s full-duplex operation.
1297 .It Sy ETHER_STAT_CAP_100GFDX
1298 Indicates the device supports 100 Gbit/s full-duplex operation.
1299 .It Sy ETHER_STAT_CAP_100HDX
1300 Indicates the device supports 100 Mbit/s half-duplex operation.
1301 .It Sy ETHER_STAT_CAP_100T4
1302 Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1303 .It Sy ETHER_STAT_CAP_10FDX
1304 Indicates the device supports 10 Mbit/s full-duplex operation.
1305 .It Sy ETHER_STAT_CAP_10GFDX
1306 Indicates the device supports 10 Gbit/s full-duplex operation.
1307 .It Sy ETHER_STAT_CAP_10HDX
1308 Indicates the device supports 10 Mbit/s half-duplex operation.
1309 .It Sy ETHER_STAT_CAP_2500FDX
1310 Indicates the device supports 2.5 Gbit/s full-duplex operation.
1311 .It Sy ETHER_STAT_CAP_40GFDX
1312 Indicates the device supports 40 Gbit/s full-duplex operation.
1313 .It Sy ETHER_STAT_CAP_5000FDX
1314 Indicates the device supports 5.0 Gbit/s full-duplex operation.
1315 .It Sy ETHER_STAT_CAP_ASMPAUSE
1316 Indicates that the device supports the ability to receive pause frames.
1317 .It Sy ETHER_STAT_CAP_AUTONEG
1318 Indicates that the device supports the ability to perform link
1320 .It Sy ETHER_STAT_CAP_PAUSE
1321 Indicates that the device supports the ability to transmit pause frames.
1322 .It Sy ETHER_STAT_CAP_REMFAULT
1323 Indicates that the device supports the ability of detecting a remote
1324 fault in a link peer.
1325 .It Sy ETHER_STAT_CARRIER_ERRORS
1326 Indicates the number of times that the Ethernet carrier sense condition
1327 was lost or not asserted.
1328 .It Sy ETHER_STAT_DEFER_XMTS
1329 Indicates the number of frames for which the device was unable to
1330 transmit the frame due to being busy and had to try again.
1331 .It Sy ETHER_STAT_EX_COLLISIONS
1332 Indicates the number of frames that failed to send due to an excessive
1333 number of collisions.
1334 .It Sy ETHER_STAT_FCS_ERRORS
1335 Indicates the number of times that a frame check sequence failed.
1336 .It Sy ETHER_STAT_FIRST_COLLISIONS
1337 Indicates the number of times that a frame was eventually transmitted
1338 successfully, but only after a single collision.
1339 .It Sy ETHER_STAT_JABBER_ERRORS
1340 Indicates the number of frames that were received that were both larger
1341 than the maximum packet size and failed the frame check sequence.
1342 .It Sy ETHER_STAT_LINK_ASMPAUSE
1343 Indicates whether the link is currently configured to accept pause
1345 .It Sy ETHER_STAT_LINK_AUTONEG
1346 Indicates whether the current link state is a result of
1348 .It Sy ETHER_STAT_LINK_DUPLEX
1349 Indicates the current duplex state of the link. The values used here
1350 should be the same as documented for
1351 .Sy MAC_PROP_DUPLEX .
1352 .It Sy ETHER_STAT_LINK_PAUSE
1353 Indicates whether the link is currently configured to generate pause
1355 .It Sy ETHER_STAT_LP_CAP_1000FDX
1356 Indicates the remote device supports 1 Gbit/s full-duplex operation.
1357 .It Sy ETHER_STAT_LP_CAP_1000HDX
1358 Indicates the remote device supports 1 Gbit/s half-duplex operation.
1359 .It Sy ETHER_STAT_LP_CAP_100FDX
1360 Indicates the remote device supports 100 Mbit/s full-duplex operation.
1361 .It Sy ETHER_STAT_LP_CAP_100GFDX
1362 Indicates the remote device supports 100 Gbit/s full-duplex operation.
1363 .It Sy ETHER_STAT_LP_CAP_100HDX
1364 Indicates the remote device supports 100 Mbit/s half-duplex operation.
1365 .It Sy ETHER_STAT_LP_CAP_100T4
1366 Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1367 .It Sy ETHER_STAT_LP_CAP_10FDX
1368 Indicates the remote device supports 10 Mbit/s full-duplex operation.
1369 .It Sy ETHER_STAT_LP_CAP_10GFDX
1370 Indicates the remote device supports 10 Gbit/s full-duplex operation.
1371 .It Sy ETHER_STAT_LP_CAP_10HDX
1372 Indicates the remote device supports 10 Mbit/s half-duplex operation.
1373 .It Sy ETHER_STAT_LP_CAP_2500FDX
1374 Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1375 .It Sy ETHER_STAT_LP_CAP_40GFDX
1376 Indicates the remote device supports 40 Gbit/s full-duplex operation.
1377 .It Sy ETHER_STAT_LP_CAP_5000FDX
1378 Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1379 .It Sy ETHER_STAT_LP_CAP_ASMPAUSE
1380 Indicates that the remote device supports the ability to receive pause
1382 .It Sy ETHER_STAT_LP_CAP_AUTONEG
1383 Indicates that the remote device supports the ability to perform link
1385 .It Sy ETHER_STAT_LP_CAP_PAUSE
1386 Indicates that the remote device supports the ability to transmit pause
1388 .It Sy ETHER_STAT_LP_CAP_REMFAULT
1389 Indicates that the remote device supports the ability of detecting a
1390 remote fault in a link peer.
1391 .It Sy ETHER_STAT_MACRCV_ERRORS
1392 Indicates the number of times that the internal MAC layer encountered an
1393 error when attempting to receive and process a frame.
1394 .It Sy ETHER_STAT_MACXMT_ERRORS
1395 Indicates the number of times that the internal MAC layer encountered an
1396 error when attempting to process and transmit a frame.
1397 .It Sy ETHER_STAT_MULTI_COLLISIONS
1398 Indicates the number of times that a frame was eventually transmitted
1399 successfully, but only after more than one collision.
1400 .It Sy ETHER_STAT_SQE_ERRORS
1401 Indicates the number of times that an SQE error occurred. The specific
1402 conditions for this error are documented in IEEE 802.3.
1403 .It Sy ETHER_STAT_TOOLONG_ERRORS
1404 Indicates the number of frames that were received that were longer than
1405 the maximum frame size supported by the device.
1406 .It Sy ETHER_STAT_TOOSHORT_ERRORS
1407 Indicates the number of frames that were received that were shorter than
1408 the minimum frame size supported by the device.
1409 .It Sy ETHER_STAT_TX_LATE_COLLISIONS
1410 Indicates the number of times a collision was detected late on the
1412 .It Sy ETHER_STAT_XCVR_ADDR
1413 Indicates the address of the MII/GMII receiver address.
1414 .It Sy ETHER_STAT_XCVR_ID
1415 Indicates the id of the MII/GMII receiver address.
1416 .It Sy ETHER_STAT_XCVR_INUSE
1417 Indicates what kind of receiver is in use. The following values may be
1420 .It Sy XCVR_UNDEFINED
1421 The receiver type is undefined by the hardware.
1423 There is no receiver in use by the hardware.
1425 The receiver supports 10BASE-T operation.
1427 The receiver supports 100BASE-T4 operation.
1429 The receiver supports 100BASE-TX operation.
1431 The receiver supports 100BASE-T2 operation.
1433 The receiver supports 1000BASE-X operation. This is used for all fiber
1436 The receiver supports 1000BASE-T operation. This is used for all copper
1440 .Ss Device Specific kstats
1441 In addition to the defined statistics above, if the device driver
1442 maintains additional statistics or the device provides additional
1443 statistics, it should create its own kstats through the
1445 function to allow operators to observe them.
1446 .Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1447 Device drivers are the first line of defense for dealing with broken
1448 devices and bugs in their firmware. While most devices will rarely fail,
1449 it is important that when designing and implementing the device driver
1450 that particular attention is paid in the design with respect to RAS
1451 (Reliability, Availability, and Serviceability). While everything
1452 described in this section is optional, it is highly recommended that
1453 all new device drivers follow these guidelines.
1455 The Fault Management Architecture (FMA) provides facilities for
1456 detecting and reporting various classes of defects and faults.
1457 Specifically for networking device drivers, issues that should be
1458 detected and reported include:
1459 .Bl -bullet -offset indent
1461 Device internal uncorrectable errors
1463 Device internal correctable errors
1465 PCI and PCI Express transport errors
1467 Device temperature alarms
1469 Device transmission stalls
1471 Device communication timeouts
1473 High invalid interrupts
1476 All such errors fall into three primary categories:
1477 .Bl -enum -offset indent
1479 Errors detected by the Fault Management Architecture
1481 Errors detected by the device and indicated to the device driver
1483 Errors detected by the device driver
1485 .Ss Fault Management Setup and Teardown
1486 Drivers should initialize support for the fault management framework by
1491 routine. By registering with the fault management framework, a device
1492 driver is given the chance to detect and notice transport errors as well
1493 as report other errors that exist. While a device driver does not need to
1494 indicate that it is capable of all such capabilities described in
1495 .Xr ddi_fm_init 9F ,
1496 we suggest that device drivers at least register the
1497 .Sy DDI_FM_EREPORT_CAPABLE
1498 so as to allow the driver to report issues that it detects.
1500 If the driver registers with the fault management framework during its
1502 entry point, it must call
1507 .Ss Transport Errors
1508 Many modern networking devices leverage PCI or PCI Express. As such,
1509 there are two primary ways that device drivers access data: they either
1510 memory map device registers and use routines like
1514 or they use direct memory access (DMA).
1515 New device drivers should always enable checking of the transport layer by
1516 marking their support in the
1517 .Xr ddi_device_acc_attr 9S
1518 structure and using routines like
1519 .Xr ddi_fm_acc_err_get 9F
1521 .Xr ddi_fm_dma_err_get 9F
1522 to detect if errors have occurred.
1523 .Ss Device Indicated Errors
1524 Many devices have capabilities to announce to a device driver that a
1525 fatal correctable error or uncorrectable error has occurred. Other
1526 devices have the ability to indicate that various physical issues have
1527 occurred such as a fan failing or a temperature sensor having fired.
1529 Drivers should wire themselves to receive notifications when these
1530 events occur. The means and capabilities will vary from device to
1531 device. For example, some devices will generate information about these
1532 notifications through special interrupts. Other devices may have a
1533 register that software can poll. In the cases where polling is required,
1534 driver writers should try not to poll too frequently and should
1535 generally only poll when the device is actively being used, e.g. between
1541 .Ss Driver Transmit Stall Detection
1542 One of the primary responsibilities of a hardened device driver is to
1543 perform transmit stall detection. The core idea behind tx stall
1544 detection is that the driver should record when it's getting activity
1545 related to when data has been successfully transmitted. Most devices
1546 should be transmitting data on a regular basis as long as the link is
1547 up. If it is not, then this may indicate that the device is stuck and
1548 needs to be reset. At this time, the MAC framework does not provide any
1549 resources for performing these checks; however, polling on each
1550 individual transmit ring for the last completion time while something is
1551 actively being transmitted through the use of routines such as
1553 may be a reasonable starting point.
1554 .Ss Driver Command Timeout Detection
1555 Each device is programmed in different ways. Some devices are programmed
1556 through asynchronous commands while others are programmed by writing
1557 directly to memory mapped registers. If a device receives asynchronous
1558 replies to commands, then the device driver should set reasonable
1559 timeouts for all such commands and plan on detecting them. If a timeout
1560 occurs, the driver should presume that there is an issue with the
1561 hardware and proceed to abort the command or reset the device.
1563 Many devices do not have such a communication mechanism. However,
1564 whenever there is some activity where the device driver must wait, then
1565 it should be prepared for the fact that the device may never get back to
1566 it and react appropriately by performing some kind of device reset.
1567 .Ss Reacting to Errors
1568 When any of the above categories of errors has been triggered, the
1569 behavior that the device driver should take depends on the kind of
1570 error. If a fatal error, for example, a transport error, a transmit
1571 stall was detected, or the device indicated an uncorrectable error was
1572 detected, then it is
1573 important that the driver take the following steps:
1574 .Bl -enum -offset indent
1576 Set a flag in the device driver's state that indicates that it has hit
1577 an error condition. When this error condition flag is asserted,
1578 transmitted packets should be accepted and dropped and actions that would
1579 require writing to the device state should fail with an error. This flag
1580 should remain until the device has been successfully restarted.
1582 If the error was not a transport error that was indicated by the fault
1583 management architecture, e.g. a transport error that was detected, then
1584 the device driver should post an
1586 indicating what has occurred with the
1587 .Xr ddi_fm_ereport_post 9F
1590 The device driver should indicate that the device's service was lost
1592 .Xr ddi_fm_service_impact 9F
1594 .Sy DDI_SERVICE_LOST .
1596 At this point the device driver should issue a device reset through some
1597 device-specific means.
1599 When the device reset has been completed, then the device driver should
1600 restore all of the programmed state to the device. This includes things
1601 like the current MTU, advertised auto-negotiation speeds, MAC address
1604 Finally, when service has been restored, the device driver should call
1605 .Xr ddi_fm_service_impact 9F
1607 .Sy DDI_SERVICE_RESTORED .
1610 When a non-fatal error occurs, then the device driver should submit an
1611 ereport and should optionally mark the device degraded using
1612 .Xr ddi_fm_service_impact 9F
1614 .Sy DDI_SERVICE_DEGRADED
1615 value depending on the nature of the problem that has occurred.
1617 Device drivers should never make the decision to remove a device from
1618 service based on errors that have occurred nor should they panic the
1619 system. Rather, the device driver should always try to notify the
1620 operating system with various ereports and allow its policy decisions to
1621 occur. The decision to retire a device lies in the hands of the fault
1622 management architecture. It knows more about the operator's intent and
1623 the surrounding system's state than the device driver itself does and it
1624 will make the call to offline and retire the device if it is required.
1626 When resetting a device, a device driver must exercise caution. If a
1627 device driver has not been written to plan for a device reset, then it
1628 may not correctly restore the device's state after such a reset. Such
1629 state should be stored in the instance's private state data as the MAC
1630 framework does not know about device resets and will not inform the
1631 device again about the expected, programmed state.
1633 One wrinkle with device resets is that many networking cards show up as
1634 multiple PCI functions on a single device, for example, each port may
1635 show up as a separate function and thus have a separate instance of the
1636 device driver attached. When resetting a function, device driver writers
1637 should carefully read the device programming manuals and verify whether
1638 or not a reset impacts only the stalled function or if it impacts all
1639 function across the device.
1641 If the only way to reset a given function is through the device, then
1642 this may require more coordination and work on the part of the device
1643 driver to ensure that all the other instances are correctly restored.
1644 In cases where this occurs, some devices offer ways of injecting
1645 interrupts onto those other functions to notify them that this is
1648 The networking stack manages framed data through the use of the
1650 structure. The mblk allows for a single message to be made up of
1651 individual blocks. Each part is linked together through its
1653 member. However, it also allows for multiple messages to be chained
1654 together through the use of the
1656 member. While the networking stack works with these structures, device
1657 drivers generally work with DMA regions. There are two different
1658 strategies that device drivers use for handling these two different
1659 cases: copying and binding.
1661 The first way that device drivers handle interfacing between the two is
1662 by having two separate regions of memory.
1663 One part is memory which has been allocated for DMA through a call to
1664 .Xr ddi_dma_mem_alloc 9F
1665 and the other is memory associated with the memory block.
1667 In this case, a driver will use
1669 to copy memory between the two distinct regions. When transmitting a
1670 packet, it will copy the memory from the mblk_t to the DMA region. When
1671 receiving memory, it will allocate a mblk_t through the
1673 routine, copy the memory across with
1675 and then increment the mblk_t's
1679 If, when receiving, memory is not available for a new message block,
1680 then the frame should be skipped and effectively dropped. A kstat should
1681 be bumped when such an occasion occurs.
1683 An alternative approach to copying data is to use DMA binding. When
1684 using DMA binding, the OS takes care of mapping between DMA memory and
1685 normal device memory. The exact process is a bit different between
1686 transmit and receive.
1688 When transmitting a device driver has an mblk_t and needs to call the
1689 .Xr ddi_dma_addr_bind_handle 9F
1690 function to bind it to an already existing DMA handle. At that point, it
1691 will receive various DMA cookies that it can use to obtain the addresses
1692 to program the device with for transmitting data. Once the transmit is
1693 done, the driver must then make sure to call
1695 to release the data. It must not call
1697 before it receives an interrupt from the device indicating that the data
1698 has been transmitted, otherwise it risks sending arbitrary kernel
1701 When receiving data, the device can perform a similar operation. First,
1702 it must bind the DMA memory into the kernel's virtual memory address
1703 space through a call to the
1704 .Xr ddi_dma_addr_bind_handle 9F
1705 function if it has not already. Once it has, it must then call
1707 to try and create a new mblk_t which leverages the associated memory. It
1708 can then pass that mblk_t up to the stack.
1710 When deciding which of these options to use, there are many different
1711 considerations that must be made. The answer as to whether to bind
1712 memory or to copy data is not always simpler.
1714 The first thing to remember is that DMA resources may be finite on a
1715 given platform. Consider the case of receiving data. A device driver
1716 that binds one of its receive descriptors may not get it back for quite
1717 some time as it may be used by the kernel until an application actually
1718 consumes it. Device drivers that try to bind memory for receive, often
1719 work with the constraint that they must be able to replace that DMA
1720 memory with another DMA descriptor. If they were not replaced, then
1721 eventually the device would not be able to receive additional data into
1724 On the other hand, particularly for larger frames, copying every packet
1725 from one buffer to another can be a source of additional latency and
1726 memory waste in the system. For larger copies, the cost of copying may
1727 dwarf any potential cost of performing DMA binding.
1729 For device driver authors that are unsure of what to do, they should
1730 first employ the copying method to simplify the act of writing the
1731 device driver. The copying method is simpler and also allows the device
1732 driver author not to worry about allocated DMA memory that is still
1733 outstanding when it is asked to unload.
1735 If device driver writers are worried about the cost, it is recommended
1736 to make the decision as to whether or not to copy or bind DMA data
1737 a separate private property for both transmitting and receiving. That
1738 private property should indicate the size of the received frame at which
1739 to switch from one format to the other. This way, data can be gathered
1740 to determine what the impact of each method is on a given platform.
1753 .Xr mc_getcapab 9E ,
1756 .Xr mc_multicst 9E ,
1758 .Xr mc_propinfo 9E ,
1759 .Xr mc_setpromisc 9E ,
1768 .Xr ddi_dma_addr_bind_handle 9F ,
1769 .Xr ddi_dma_mem_alloc 9F ,
1770 .Xr ddi_fm_acc_err_get 9F ,
1771 .Xr ddi_fm_dma_err_get 9F ,
1772 .Xr ddi_fm_ereport_post 9F ,
1773 .Xr ddi_fm_fini 9F ,
1774 .Xr ddi_fm_init 9F ,
1775 .Xr ddi_fm_service_impact 9F ,
1780 .Xr kstat_create 9F ,
1782 .Xr mac_fini_ops 9F ,
1783 .Xr mac_hcksum_get 9F ,
1784 .Xr mac_hcksum_set 9F ,
1785 .Xr mac_init_ops 9F ,
1786 .Xr mac_link_update 9F ,
1787 .Xr mac_lso_get 9F ,
1788 .Xr mac_maxsdu_update 9F ,
1789 .Xr mac_prop_info_set_default_link_flowctrl 9F ,
1790 .Xr mac_prop_info_set_default_str 9F ,
1791 .Xr mac_prop_info_set_default_uint32 9F ,
1792 .Xr mac_prop_info_set_default_uint64 9F ,
1793 .Xr mac_prop_info_set_default_uint8 9F ,
1794 .Xr mac_prop_info_set_perm 9F ,
1795 .Xr mac_prop_info_set_range_uint32 9F ,
1796 .Xr mac_register 9F ,
1798 .Xr mac_unregister 9F ,
1799 .Xr mod_install 9F ,
1804 .Xr ddi_device_acc_attr 9S ,
1806 .Xr mac_callbacks 9S ,
1807 .Xr mac_register 9S ,
1814 .%T RFC 1213 Management Information Base for Network Management of
1815 .%T TCP/IP-based internets: MIB-II
1821 .%T RFC 1573 Evolution of the Interfaces Group of MIB-II
1826 .%T RFC 1643 Definitions of Managed Objects for the Ethernet-like