2 .\" This file and its contents are supplied under the terms of the
3 .\" Common Development and Distribution License ("CDDL"), version 1.0.
4 .\" You may only use this file in accordance with the terms of version
7 .\" A full copy of the text of the CDDL should have accompanied this
8 .\" source. A copy of the CDDL is also available via the Internet at
9 .\" http://www.illumos.org/license/CDDL.
12 .\" Copyright 2016 Joyent, Inc.
20 .Nd MAC networking device driver overview
22 .In sys/mac_provider.h
29 framework provides a means for implementing high-performance networking
31 It is the successor to the GLD interfaces and is sometimes referred to as the
33 The remainder of this manual introduces the aspects of writing devices drivers
34 that leverage the MAC framework.
35 While both the GLDv3 and MAC framework refer to the same thing, in this manual
36 page we use the term the
38 to refer to the device driver interface.
40 MAC device drivers are character devices.
41 They define the standard
46 entry points to initialize the module, as well as
52 The main interface with MAC is through a series of callbacks defined in
56 These callbacks control all the aspects of the device.
57 They range from sending data, getting and setting of properties, controlling mac
58 address filters, and also managing promiscuous mode.
60 The MAC framework takes care of many aspects of the device driver's
62 A device that uses the MAC framework does not have to worry about creating
63 device nodes or implementing
68 In addition, all of the work to interact with
70 is taken care of automatically and transparently.
71 .Ss Initializing MAC Support
72 For a device to be used in the framework, it must register with the
73 framework and take specific actions during
80 All device drivers have to define a
82 structure which is pointed to by a
84 structure and the corresponding NULL-terminated
89 structure should have a
91 structure defined for it; however, it does not need to implement any of
96 Normally, in a driver's
98 entry point, it passes its
100 structure directly to
102 To properly register with MAC, the driver must call
106 If for some reason the
108 function fails, then the driver must be removed by a call to
109 .Xr mac_fini_ops 9F .
111 Conversely, in the driver's
113 routine, it should call
115 after it successfully calls
117 For an example of how to use the
121 functions, see the examples section in
122 .Xr mac_init_ops 9F .
123 .Ss Registering with MAC
124 Every instance of a device should register separately with MAC.
125 To register with MAC, a driver must allocate a
127 structure, fill it in, and then call
128 .Xr mac_register 9F .
131 structure contains information about the device and all of the required
132 function pointers that will be used as callbacks by the framework.
134 These steps should all be taken during a device's
137 It is recommended that the driver perform this sequence of steps after the
138 device has finished its initialization of the chipset and interrupts, though
139 interrupts should not be enabled at that point.
142 it will start receiving callbacks from the MAC framework.
144 To allocate the registration structure, the driver should call
146 Device drivers should generally always pass the symbol
150 Upon successful completion, the driver will receive a
152 structure which it should fill in.
153 The structure and its members are documented in
154 .Xr mac_register 9S .
158 structure is not allocated as a part of the
161 In general, device drivers declare this statically.
164 section for more information on how to fill it out.
166 Once the structure has been filled in, the driver should call
168 to register itself with MAC.
169 The handle that it uses to register with should be part of the driver's soft
171 It will be used in various other support functions and callbacks.
173 If the call is successful, then the device driver
174 should enable interrupts and finish any other initialization required.
177 failed, then it should unwind its initialization and should return
183 The MAC framework interacts with a device driver through a series of
185 These callbacks are described in their individual manual pages and the
186 collection of callbacks is indicated in the
189 This section does not focus on the specific functions, but rather on
190 interactions between them and the rest of the device driver framework.
192 A device driver should make no assumptions about when the various
193 callbacks will be called and whether or not they will be called
195 For example, a device driver may be asked to transmit data through a call to its
197 entry point while it is being asked to get a device property through a
201 As such, while some calls may be serialized to the device, such as setting
202 properties, the device driver should always presume that all of its data needs
203 to be protected with locks.
204 While the device is holding locks, it is safe for it call the following MAC
206 .Bl -bullet -offset indent -compact
208 .Xr mac_hcksum_get 9F
210 .Xr mac_hcksum_set 9F
214 .Xr mac_maxsdu_update 9F
216 .Xr mac_prop_info_set_default_link_flowctrl 9F
218 .Xr mac_prop_info_set_default_str 9F
220 .Xr mac_prop_info_set_default_uint8 9F
222 .Xr mac_prop_info_set_default_uint32 9F
224 .Xr mac_prop_info_set_default_uint64 9F
226 .Xr mac_prop_info_set_perm 9F
228 .Xr mac_prop_info_set_range_uint32 9F
231 Any other MAC related routines should not be called with locks held,
233 .Xr mac_link_update 9F
236 Other routines in the DDI may be called while locks are held; however,
237 device driver writers should be careful about calling blocking routines
238 while locks are held or in interrupt context, though it is generally
241 A device driver will often receive data through the means of an
243 When that interrupt occurs, the device driver will receive one or more frames
244 with optional metadata.
245 Often each frame has a corresponding descriptor which has information about
246 whether or not there were errors or whether or not the device successfully
247 checksummed the packet.
249 During a single interrupt, a device driver should process a fixed number
251 For each frame the device driver should:
252 .Bl -enum -offset indent
254 First check whether or not the frame has errors.
255 If errors were detected, then the frame should not be sent to the operating
257 It is recommended that devices keep kstats (see
259 for more information) and bump the counter whenever such an error is
261 If the device distinguishes between the types of errors, then separate kstats
262 for each class of error are recommended.
265 section for more information on the various error cases that should be
268 Once the frame has been determined to be valid, the device driver should
269 transform the frame into a
273 for more information on how to transform and prepare a message block.
275 If the device supports hardware checksumming (see the
277 section for more information on checksumming), then the device driver
278 should set the corresponding checksumming information with a call to
279 .Xr mac_hcksum_set 9F .
281 It should then append this new message block to the
283 of the message block chain, linking it to the
286 It is vitally important that all the frames be chained in the order that they
288 If the device driver mistakenly reorders frames, then it may cause performance
289 impacts in the TCP stack and potentially impact application correctness.
292 Once all the frames have been processed and assembled, the device driver
293 should deliver them to the rest of the operating system by calling
295 The device driver should try to give as many mblk_t structures to the
301 once for every assembled mblk_t.
303 The device driver must not hold any locks across the call to
305 When this function is called, received data will be pushed through the
306 networking stack and some replies may be generated and given to the
309 It is not the device driver's responsibility to determine whether or not
310 the system can keep up with a driver's delivery rate of frames.
311 The rest of the networking stack will handle issues related to keeping up
312 appropriately and ensure that kernel memory is not exhausted by packets
313 that are not being processed.
315 Finally, the device driver should make sure that any other housekeeping
316 activities required for the ring are taken care of such that more data
318 .Ss Transmitting Data and Back Pressure
319 A device driver will be asked to transmit a message block chain by
323 While the driver is processing the message blocks, it may run out of resources.
324 For example, a transmit descriptor ring may become full.
325 At that point, the device driver should return the remaining unprocessed frames.
326 The act of returning frames indicates that the device has asserted flow control.
327 Once this has been done, no additional calls will be made to the
328 driver's transmit entry point and the back pressure will be propagated
329 throughout the rest of the networking stack.
331 At some point in the future when resources have become available again,
332 for example after an interrupt indicating that some portion of the
333 transmit ring has been sent, then the device driver must notify the
334 system that it can continue transmission.
335 To do this, the driver should call
336 .Xr mac_tx_update 9F .
337 After that point, the driver will receive calls to its
340 As mentioned in the section on callbacks, the device driver should avoid holding
341 any particular locks across the call to
342 .Xr mac_tx_update 9F .
343 .Ss Interrupt Coalescing
344 For devices operating at higher data rates, interrupt coalescing is an
345 important part of a well functioning device and may impact the
346 performance of the device.
347 Not all devices support interrupt coalescing.
348 If interrupt coalescing is supported on the device, it is recommended that
349 device driver writers provide private properties for their device to control the
350 interrupt coalescing rate.
351 This will make it much easier to perform experiments and observe the impact of
352 different interrupt rates on the rest of the system.
353 .Ss MAC Address Filter Management
354 The MAC framework will attempt to use as many MAC address filters as a
356 To program a multicast address filter, the driver's
358 entry point will be called.
359 If the device driver runs out of filters, it should not take any special action
360 and just return the appropriate error as documented in the corresponding manual
361 pages for the entry points.
362 The framework will ensure that the device is placed in promiscuous mode
365 It is the responsibility of the device driver to keep track of the
367 Many devices provide a means of receiving an interrupt when the state of the
369 When such a change happens, the driver should update its internal data
370 structures and then call
371 .Xr mac_link_update 9F
372 to inform the MAC layer that this has occurred.
373 If the device driver does not properly inform the system about link changes,
374 then various features like link aggregations and other mechanisms that leverage
375 the link state will not work correctly.
376 .Ss Link Speed and Auto-negotiation
377 Many networking devices support more than one possible speed that they
379 The selection of a speed is often performed through
380 .Em auto-negotiation ,
381 though some devices allow the user to control what speeds are advertised
384 Logically, there are two different sets of things that the device driver
385 needs to keep track of while it's operating:
388 The supported speeds in hardware.
390 The enabled speeds from the user.
393 By default, when a link first comes up, the device driver should
394 generally configure the link to support the common set of speeds and
395 perform auto-negotiation.
397 A user can control what speeds a device advertises via auto-negotiation
398 and whether or not it performs auto-negotiation at all by using a series
399 of properties that have
402 These are read/write properties and there is one for each speed supported in the
404 For a full list of them, see the
408 In addition to these properties, there is a corresponding set of
412 These are similar to the
414 family of properties, but they are read-only and indicate what the
415 device has actually negotiated.
416 While they are generally similar to the
418 family of properties, they may change depending on power settings.
420 .Sy Ethernet Link Properties
423 for more information.
425 It's worth discussing how these different values get used throughout the
426 different entry points.
427 The first entry point to consider is the
430 For a given speed, the driver should consult whether or not the hardware
432 If it does, it should fill in the default value that the hardware takes and
433 whether or not the property is writable.
434 The properties should also be updated to indicate whether or not it is writable.
435 This holds for both the
439 family of properties.
441 The next entry point is
443 Here, the device should first consult whether the given speed is
445 If it is not, then the driver should return
447 If it does, then it should return the current value of the property.
449 The last property endpoint is the
452 Here, the same logic applies.
453 Before the driver considers whether or not the property is writable, it should
454 first check whether or not it's a supported property.
455 If it's not, then it should return
457 Otherwise, it should proceed to check whether the property is writable,
458 and if it is and a valid value, then it should update the property and
459 restart the link's negotiation.
461 Finally, there is the
464 Several of the statistics that are queried relate to auto-negotiation and
465 hardware capabilities.
466 When a statistic relates to the hardware supporting a given speed, the
468 properties should be ignored.
469 The only thing that should be consulted is what the hardware itself supports.
470 Otherwise, the statistics should look at what is currently being advertised by
472 .Ss Unregistering from MAC
475 routine, it should unregister the device instance from MAC by calling
476 .Xr mac_unregister 9F
477 on the handle that it originally called it on.
479 .Xr mac_unregister 9F
480 failed, then the device is likely still in use and the driver should
483 .Ss Interacting with Devices
484 Administrators always interact with devices through the
486 command line interface.
487 The state of devices such as whether the link is considered
491 various link properties such as the
499 It is also the preferred way that these properties are set and configured.
501 While device tunables may be presented in a
503 file, it is recommended instead to expose such things through
505 private properties, whether explicitly documented or not.
507 Capabilities in the MAC Framework are optional features that a device
508 supports which indicate various hardware features that the device
510 The two current capabilities that the system supports are related to being able
511 to hardware perform large send offloads (LSO), often also known as TCP
512 segmentation and the ability for hardware to calculate and verify the checksums
513 present in IPv4, IPV6, and protocol headers such as TCP and UDP.
515 The MAC framework will query a device for support of a capability
519 Each capability has its own constant and may have corresponding data that goes
520 along with it and a specific structure that the device is required to fill in.
521 Note, the set of capabilities changes over time and there are also private
522 capabilities in the system.
523 Several of the capabilities are used in the implementation of the MAC framework.
525 .Sy MAC_CAPAB_RINGS ,
526 represent feature that have not been stabilized and thus both API and binary
527 compatibility for them is not guaranteed.
528 It is important that the device driver handles unknown capabilities correctly.
529 For more information, see
532 The following capabilities are
533 stable and defined in the system:
537 capability indicates to the system that the device driver supports some
538 amount of checksumming.
539 The specific data for this capability is a pointer to a
541 To indicate no support for any kind of checksumming, the driver should
542 either set this value to zero or simply return that it doesn't support
545 Note, the values that the driver declares in this capability indicate
546 what it can do when it transmits data.
547 If the driver can only verify checksums when receiving data, then it should not
548 indicate that it supports this capability.
549 The following set of flags may be combined through a bitwise inclusive OR:
551 .It Sy HCKSUM_INET_PARTIAL
552 This indicates that the hardware can calculate a partial checksum for
553 both IPv4 and IPv6; however, it requires the pseudo-header checksum be
555 The pseudo-header checksum will be available for the mblk_t when calling
556 .Xr mac_hcksum_get 9F .
557 Note this does not imply that the hardware is capable of calculating the
558 IPv4 header checksum.
559 That should be indicated with the
560 .Sy HCKSUM_IPHDRCKSUM flag.
561 .It Sy HCKSUM_INET_FULL_V4
562 This indicates that the hardware will fully calculate the L4 checksum
563 for outgoing IPv4 packets and does not require a pseudo-header checksum.
564 Note this does not imply that the hardware is capable of calculating the
565 IPv4 header checksum.
566 That should be indicated with the
567 .Sy HCKSUM_IPHDRCKSUM .
568 .It Sy HCKSUM_INET_FULL_V6
569 This indicates that the hardware will fully calculate the L4 checksum
570 for outgoing IPv6 packets and does not require a pseudo-header checksum.
571 .It Sy HCKSUM_IPHDRCKSUM
572 This indicates that the hardware supports calculating the checksum for
573 the IPv4 header itself.
576 When in a driver's transmit function, the driver will be processing a
579 .Xr mac_hcksum_get 9F
580 to see what checksum flags are set on it.
581 Note that the flags that are set on it are different from the ones described
582 above and are documented in its manual page.
583 These flags indicate how the driver is expected to program the hardware and what
584 checksumming is required.
585 Not all frames will require hardware checksumming or will ask the hardware to
588 If a driver supports offloading the receive checksum and verification,
589 it should check to see what the hardware indicated was verified.
590 The driver should then call
591 .Xr mac_hcksum_set 9F .
592 The flags used are different from the ones above and are discussed in
594 .Xr mac_hcksum_set 9F
596 If there is no checksum information available or the driver does not support
597 checksumming, then it should simply not call
598 .Xr mac_hcksum_set 9F .
600 Note that the checksum flags should be set on the first
601 mblk_t that makes up a given message.
602 In other words, if multiple mblk_t structures are linked together by the
604 member to describe a single frame, then it should only be called on the
605 first mblk_t of that set.
606 However, each distinct message should have the checksum bits set on it, if
608 In other words, each mblk_t that is linked together by the
610 pointer may have checksum flags set.
612 It is recommended that device drivers provide a private property or
614 property to control whether or not checksumming is enabled for both rx
615 and tx; however, the default disposition is recommended to be enabled
617 This way if hardware bugs are found in the checksumming implementation, they can
618 be disabled without requiring software updates.
619 The transmit property should be checked when determining how to reply to
621 and the receive property should be checked in the context of the receive
626 capability indicates that the driver supports various forms of large
628 The private data is a pointer to a
631 At the moment, LSO support is limited to TCP inside of IPv4.
632 This structure has the following members which are used to indicate
633 various types of LSO support.
634 .Bd -literal -offset indent
635 t_uscalar_t lso_flags;
636 lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4;
641 member is used to indicate which members are valid and should be
643 Each flag represents a different form of LSO.
644 The member should be set to the bitwise inclusive OR of the following values:
645 .Bl -tag -width Dv -offset indent
646 .It Sy LSO_TX_BASIC_TCP_IPV4
647 This indicates hardware support for performing TCP segmentation
648 offloading over IPv4.
649 When this flag is set, the
650 .Sy lso_basic_tcp_ipv4
651 member must be filled in.
655 .Sy lso_basic_tcp_ipv4
656 member is a structure with the following members:
657 .Bd -literal -offset indent
660 .Bd -filled -offset indent
663 member should be set to the maximum size of the TCP data
664 payload that can be offloaded to the hardware.
667 Like with checksumming, it is recommended that driver writers provide a
668 means for disabling the support of LSO even if it is enabled by default.
669 This deals with the case where issues that pop up for LSO may be worked
670 around without requiring additional driver work.
672 Properties in the MAC framework represent aspects of a link.
673 These include things like the link's current state and MTU.
674 Many of the properties in the system are focused around auto-negotiation and
675 controlling what link speeds are advertised.
676 Information about properties is covered by three different device entry points.
679 entry point obtains metadata about the property.
682 entry point obtains the property.
685 entry point updates the property to a new value.
687 Many of the properties listed below are read-only.
688 Each property indicates whether it's read-only or it's read/write.
689 However, driver writers may not implement the ability to set all writable
691 Many of these depend on the card itself.
692 In particular, all properties that relate to auto-negotiation and are read/write
693 may not be updated if the hardware in question does not support toggling what
694 link speeds are auto-negotiated.
695 While copper Ethernet often does not have this restriction, it often exists with
696 various fiber standards and phys.
698 The following properties are the subset of MAC framework properties that
699 driver writers should be aware of and handle.
700 While other properties exist in the system, driver writers should always return
701 an error when a property not listed below is encountered.
706 for more information on how to handle them.
708 .It Sy MAC_PROP_DUPLEX
718 property is used to indicate whether or not the link is duplex.
719 A duplex link may have traffic flowing in both directions at the same time.
722 is an enumeration which may be set to any of the following values:
724 .It Sy LINK_DUPLEX_UNKNOWN
725 The current state of the link is unknown.
726 This may be because the link has not negotiated to a specific speed or it is
728 .It Sy LINK_DUPLEX_HALF
729 The link is running at half duplex.
730 Communication may travel in only one direction on the link at a given time.
731 .It Sy LINK_DUPLEX_FULL
732 The link is running at full duplex.
733 Communication may travel in both directions on the link simultaneously.
735 .It Sy MAC_PROP_SPEED
745 property stores the current link speed in bits per second.
746 A link that is running at 100 MBit/s would store the value 100000000ULL.
747 A link that is running at 40 Gbit/s would store the value 40000000000ULL.
748 .It Sy MAC_PROP_STATUS
758 property is used to indicate the current state of the link.
759 It indicates whether the link is up or down.
762 is an enumeration which may be set to any of the following values:
764 .It Sy LINK_STATE_UNKNOWN
765 The current state of the link is unknown.
766 This may be because the driver's
768 endpoint has not been called so it has not attempted to start the link.
769 .It Sy LINK_STATE_DOWN
771 This may be because of a negotiation problem, a cable problem, or some other
772 device specific issue.
775 If auto-negotiation is in use, it should have completed.
776 Traffic should be able to flow over the link, barring other issues.
778 .It Sy MAC_PROP_AUTONEG
788 property indicates whether or not the device is currently configured to
789 perform auto-negotiation.
792 indicates that auto-negotiation is disabled.
795 value indicates that auto-negotiation is enabled.
796 Devices should generally default to enabling auto-negotiation.
798 When getting this property, the device driver should return the current
800 When setting this property, if the device supports operating in the requested
801 mode, then the device driver should reset the link to negotiate to the new speed
802 after updating any internal registers.
813 property determines the maximum transmission unit (MTU).
814 This indicates the maximum size packet that the device can transmit, ignoring
816 For an Ethernet device, this would exclude the size of the Ethernet header and
817 any VLAN headers that would be placed.
818 It is up to the driver to ensure that any MTU values that it accepts when adding
819 in its margin and header sizes does not exceed its maximum frame size.
821 By default, drivers for Ethernet should initialize this value and the
824 When getting this property, the driver should return its current
826 When setting this property, the driver should first validate that it is within
827 the device's valid range and then it must call
828 .Xr mac_maxsdu_update 9F .
829 Note that the call may fail.
830 If the call completes successfully, the driver should update the hardware with
831 the new value of the MTU and perform any other work needed to handle it.
833 If the device does not support changing the MTU after the device's
835 entry point has been called, then driver writers should return
837 .It Sy MAC_PROP_FLOWCTRL
840 .Sy link_flowctrl_t |
846 .Sy MAC_PROP_FLOWCTRL
847 property manages the configuration of pause frames as part of Ethernet
849 Note, this only describes what this device will advertise.
850 What is actually enabled may be different and is subject to the rules of
854 is an enumeration that may be set to one of the following values:
856 .It Sy LINK_FLOWCTRL_NONE
857 Flow control is disabled.
858 No pause frames should be generated or honored.
859 .It Sy LINK_FLOWCTRL_RX
860 The device can receive pause frames; however, it should not generate
862 .It Sy LINK_FLOWCTRL_TX
863 The device can generate pause frames; however, it does not support
865 .It Sy LINK_FLOWCTRL_BI
866 The device supports both sending and receiving pause frames.
869 When getting this property, the device driver should return the way that
870 it has configured the device, not what the device has actually
872 When setting the property, it should update the hardware and allow the link to
873 potentially perform auto-negotiation again.
876 The remaining properties are all about various auto-negotiation link
878 They fall into two different buckets: properties with
880 in the name and properties with
883 For any given supported speed, there is one of each.
886 set of properties are read/write properties that control what should be
887 advertised by the device.
888 When these are retrieved, they should return the current value of the property.
889 When they are set, they should change how the hardware advertises the specific
890 speed and trigger any kind of link reset and auto-negotiation, if enabled, to
895 set of properties are read-only properties.
896 They are meant to reflect what has actually been negotiated.
897 These may be different from the
899 family of properties, especially when different power management
900 settings are at play.
903 .Sx Link Speed and Auto-negotiation
904 section for more information.
906 The properties are ordered in increasing link speed:
908 .It Sy MAC_PROP_ADV_10HDX_CAP
917 .Sy MAC_PROP_ADV_10HDX_CAP
918 property describes whether or not 10 Mbit/s half-duplex support is
920 .It Sy MAC_PROP_EN_10HDX_CAP
929 .Sy MAC_PROP_EN_10HDX_CAP
930 property describes whether or not 10 Mbit/s half-duplex support is
932 .It Sy MAC_PROP_ADV_10FDX_CAP
941 .Sy MAC_PROP_ADV_10FDX_CAP
942 property describes whether or not 10 Mbit/s full-duplex support is
944 .It Sy MAC_PROP_EN_10FDX_CAP
953 .Sy MAC_PROP_EN_10FDX_CAP
954 property describes whether or not 10 Mbit/s full-duplex support is
956 .It Sy MAC_PROP_ADV_100HDX_CAP
965 .Sy MAC_PROP_ADV_100HDX_CAP
966 property describes whether or not 100 Mbit/s half-duplex support is
968 .It Sy MAC_PROP_EN_100HDX_CAP
977 .Sy MAC_PROP_EN_100HDX_CAP
978 property describes whether or not 100 Mbit/s half-duplex support is
980 .It Sy MAC_PROP_ADV_100FDX_CAP
989 .Sy MAC_PROP_ADV_100FDX_CAP
990 property describes whether or not 100 Mbit/s full-duplex support is
992 .It Sy MAC_PROP_EN_100FDX_CAP
1001 .Sy MAC_PROP_EN_100FDX_CAP
1002 property describes whether or not 100 Mbit/s full-duplex support is
1004 .It Sy MAC_PROP_ADV_100T4_CAP
1005 .Bd -filled -compact
1013 .Sy MAC_PROP_ADV_100T4_CAP
1014 property describes whether or not 100 Mbit/s Ethernet using the
1015 100BASE-T4 standard is
1017 .It Sy MAC_PROP_EN_100T4_CAP
1018 .Bd -filled -compact
1026 .Sy MAC_PROP_ADV_100T4_CAP
1027 property describes whether or not 100 Mbit/s Ethernet using the
1028 100BASE-T4 standard is
1030 .It Sy MAC_PROP_ADV_1000HDX_CAP
1031 .Bd -filled -compact
1039 .Sy MAC_PROP_ADV_1000HDX_CAP
1040 property describes whether or not 1 Gbit/s half-duplex support is
1042 .It Sy MAC_PROP_EN_1000HDX_CAP
1043 .Bd -filled -compact
1051 .Sy MAC_PROP_EN_1000HDX_CAP
1052 property describes whether or not 1 Gbit/s half-duplex support is
1054 .It Sy MAC_PROP_ADV_1000FDX_CAP
1055 .Bd -filled -compact
1063 .Sy MAC_PROP_ADV_1000FDX_CAP
1064 property describes whether or not 1 Gbit/s full-duplex support is
1066 .It Sy MAC_PROP_EN_1000FDX_CAP
1067 .Bd -filled -compact
1075 .Sy MAC_PROP_EN_1000FDX_CAP
1076 property describes whether or not 1 Gbit/s full-duplex support is
1078 .It Sy MAC_PROP_ADV_2500FDX_CAP
1079 .Bd -filled -compact
1087 .Sy MAC_PROP_ADV_2500FDX_CAP
1088 property describes whether or not 2.5 Gbit/s full-duplex support is
1090 .It Sy MAC_PROP_EN_2500FDX_CAP
1091 .Bd -filled -compact
1099 .Sy MAC_PROP_EN_2500FDX_CAP
1100 property describes whether or not 2.5 Gbit/s full-duplex support is
1102 .It Sy MAC_PROP_ADV_5000FDX_CAP
1103 .Bd -filled -compact
1111 .Sy MAC_PROP_ADV_5000FDX_CAP
1112 property describes whether or not 5.0 Gbit/s full-duplex support is
1114 .It Sy MAC_PROP_EN_5000FDX_CAP
1115 .Bd -filled -compact
1123 .Sy MAC_PROP_EN_5000FDX_CAP
1124 property describes whether or not 5.0 Gbit/s full-duplex support is
1126 .It Sy MAC_PROP_ADV_10GFDX_CAP
1127 .Bd -filled -compact
1135 .Sy MAC_PROP_ADV_10GFDX_CAP
1136 property describes whether or not 10 Gbit/s full-duplex support is
1138 .It Sy MAC_PROP_EN_10GFDX_CAP
1139 .Bd -filled -compact
1147 .Sy MAC_PROP_EN_10GFDX_CAP
1148 property describes whether or not 10 Gbit/s full-duplex support is
1150 .It Sy MAC_PROP_ADV_40GFDX_CAP
1151 .Bd -filled -compact
1159 .Sy MAC_PROP_ADV_40GFDX_CAP
1160 property describes whether or not 40 Gbit/s full-duplex support is
1162 .It Sy MAC_PROP_EN_40GFDX_CAP
1163 .Bd -filled -compact
1171 .Sy MAC_PROP_EN_40GFDX_CAP
1172 property describes whether or not 40 Gbit/s full-duplex support is
1174 .It Sy MAC_PROP_ADV_100GFDX_CAP
1175 .Bd -filled -compact
1183 .Sy MAC_PROP_ADV_100GFDX_CAP
1184 property describes whether or not 100 Gbit/s full-duplex support is
1186 .It Sy MAC_PROP_EN_100GFDX_CAP
1187 .Bd -filled -compact
1195 .Sy MAC_PROP_EN_100GFDX_CAP
1196 property describes whether or not 100 Gbit/s full-duplex support is
1199 .Ss Private Properties
1200 In addition to the defined properties above, drivers are allowed to
1201 define private properties.
1202 These private properties are device-specific properties.
1203 All private properties share the same constant,
1204 .Sy MAC_PROP_PRIVATE .
1205 Properties are distinguished by a name, which is a character string.
1206 The list of such private properties is defined when registering with mac in the
1212 The driver may define whatever semantics it wants for these private
1214 They will not be listed when running
1216 unless explicitly requested by name.
1217 All such properties should start with a leading underscore character and then
1218 consist of alphanumeric ASCII characters and additional underscores or hyphens.
1221 .Sy MAC_PROP_PRIVATE
1222 may show up in all three property related entry points:
1223 .Xr mc_propinfo 9E ,
1227 Device drivers should tell the different properties apart by using the
1229 function to compare it to the set of properties that it knows about.
1230 When encountering properties that it doesn't know, it should treat them
1231 like all other unknown properties.
1233 The MAC framework defines a couple different sets of statistics which
1234 are based on various standards for devices to implement.
1235 Statistics are retrieved through the
1238 There are both statistics that are required for all devices and then there is a
1239 separate set of Ethernet specific statistics.
1240 Not all devices will support every statistic.
1241 In many cases, several device registers will need to be combined to create the
1244 In general, if the device is not keeping track of these statistics, then
1245 it is recommended that the driver store these values as a
1247 to ensure that overflow does not occur.
1249 If a device does not support a specific statistic, then it is fine to
1250 return that it is not supported.
1251 The same should be used for unrecognized statistics.
1254 for more information on the proper way to handle these.
1255 .Ss General Device Statistics
1256 The following statistics are based on MIB-II statistics from both RFC
1259 .It Sy MAC_STAT_IFSPEED
1260 The device's current speed in bits per second.
1261 .It Sy MAC_STAT_MULTIRCV
1262 The total number of received multicast packets.
1263 .It Sy MAC_STAT_BRDCSTRCV
1264 The total number of received broadcast packets.
1265 .It Sy MAC_STAT_MULTIXMT
1266 The total number of transmitted multicast packets.
1267 .It Sy MAC_STAT_BRDCSTXMT
1268 The total number of received broadcast packets.
1269 .It Sy MAC_STAT_NORCVBUF
1270 The total number of packets discarded by the hardware due to a lack of
1272 .It Sy MAC_STAT_IERRORS
1273 The total number of errors detected on input.
1274 .It Sy MAC_STAT_UNKNOWNS
1275 The total number of received packets that were discarded because they
1276 were of an unknown protocol.
1277 .It Sy MAC_STAT_NOXMTBUF
1278 The total number of outgoing packets dropped due to a lack of transmit
1280 .It Sy MAC_STAT_OERRORS
1281 The total number of outgoing packets that resulted in errors.
1282 .It Sy MAC_STAT_COLLISIONS
1283 Total number of collisions encountered by the transmitter.
1284 .It Sy MAC_STAT_RBYTES
1287 received by the device, regardless of packet type.
1288 .It Sy MAC_STAT_IPACKETS
1291 received by the device, regardless of packet type.
1292 .It Sy MAC_STAT_OBYTES
1295 transmitted by the device, regardless of packet type.
1296 .It Sy MAC_STAT_OPACKETS
1299 sent by the device, regardless of packet type.
1300 .It Sy MAC_STAT_UNDERFLOWS
1301 The total number of packets that were smaller than the minimum sized
1302 packet for the device and were therefore dropped.
1303 .It Sy MAC_STAT_OVERFLOWS
1304 The total number of packets that were larger than the maximum sized
1305 packet for the device and were therefore dropped.
1307 .Ss Ethernet Specific Statistics
1308 The following statistics are specific to Ethernet devices.
1309 They refer to values from RFC 1643 and include various MII/GMII specific stats.
1310 Many of these are also defined in IEEE 802.3.
1312 .It Sy ETHER_STAT_ADV_CAP_1000FDX
1313 Indicates that the device is advertising support for 1 Gbit/s
1314 full-duplex operation.
1315 .It Sy ETHER_STAT_ADV_CAP_1000HDX
1316 Indicates that the device is advertising support for 1 Gbit/s
1317 half-duplex operation.
1318 .It Sy ETHER_STAT_ADV_CAP_100FDX
1319 Indicates that the device is advertising support for 100 Mbit/s
1320 full-duplex operation.
1321 .It Sy ETHER_STAT_ADV_CAP_100GFDX
1322 Indicates that the device is advertising support for 100 Gbit/s
1323 full-duplex operation.
1324 .It Sy ETHER_STAT_ADV_CAP_100HDX
1325 Indicates that the device is advertising support for 100 Mbit/s
1326 half-duplex operation.
1327 .It Sy ETHER_STAT_ADV_CAP_100T4
1328 Indicates that the device is advertising support for 100 Mbit/s
1329 100BASE-T4 operation.
1330 .It Sy ETHER_STAT_ADV_CAP_10FDX
1331 Indicates that the device is advertising support for 10 Mbit/s
1332 full-duplex operation.
1333 .It Sy ETHER_STAT_ADV_CAP_10GFDX
1334 Indicates that the device is advertising support for 10 Gbit/s
1335 full-duplex operation.
1336 .It Sy ETHER_STAT_ADV_CAP_10HDX
1337 Indicates that the device is advertising support for 10 Mbit/s
1338 half-duplex operation.
1339 .It Sy ETHER_STAT_ADV_CAP_2500FDX
1340 Indicates that the device is advertising support for 2.5 Gbit/s
1341 full-duplex operation.
1342 .It Sy ETHER_STAT_ADV_CAP_40GFDX
1343 Indicates that the device is advertising support for 40 Gbit/s
1344 full-duplex operation.
1345 .It Sy ETHER_STAT_ADV_CAP_5000FDX
1346 Indicates that the device is advertising support for 5.0 Gbit/s
1347 full-duplex operation.
1348 .It Sy ETHER_STAT_ADV_CAP_ASMPAUSE
1349 Indicates that the device is advertising support for receiving pause
1351 .It Sy ETHER_STAT_ADV_CAP_AUTONEG
1352 Indicates that the device is advertising support for auto-negotiation.
1353 .It Sy ETHER_STAT_ADV_CAP_PAUSE
1354 Indicates that the device is advertising support for generating pause
1356 .It Sy ETHER_STAT_ADV_REMFAULT
1357 Indicates that the device is advertising support for detecting faults in
1358 the remote link peer.
1359 .It Sy ETHER_STAT_ALIGN_ERRORS
1360 Indicates the number of times an alignment error was generated by the
1362 This is a count of packets that were not an integral number of octets and failed
1364 .It Sy ETHER_STAT_CAP_1000FDX
1365 Indicates the device supports 1 Gbit/s full-duplex operation.
1366 .It Sy ETHER_STAT_CAP_1000HDX
1367 Indicates the device supports 1 Gbit/s half-duplex operation.
1368 .It Sy ETHER_STAT_CAP_100FDX
1369 Indicates the device supports 100 Mbit/s full-duplex operation.
1370 .It Sy ETHER_STAT_CAP_100GFDX
1371 Indicates the device supports 100 Gbit/s full-duplex operation.
1372 .It Sy ETHER_STAT_CAP_100HDX
1373 Indicates the device supports 100 Mbit/s half-duplex operation.
1374 .It Sy ETHER_STAT_CAP_100T4
1375 Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1376 .It Sy ETHER_STAT_CAP_10FDX
1377 Indicates the device supports 10 Mbit/s full-duplex operation.
1378 .It Sy ETHER_STAT_CAP_10GFDX
1379 Indicates the device supports 10 Gbit/s full-duplex operation.
1380 .It Sy ETHER_STAT_CAP_10HDX
1381 Indicates the device supports 10 Mbit/s half-duplex operation.
1382 .It Sy ETHER_STAT_CAP_2500FDX
1383 Indicates the device supports 2.5 Gbit/s full-duplex operation.
1384 .It Sy ETHER_STAT_CAP_40GFDX
1385 Indicates the device supports 40 Gbit/s full-duplex operation.
1386 .It Sy ETHER_STAT_CAP_5000FDX
1387 Indicates the device supports 5.0 Gbit/s full-duplex operation.
1388 .It Sy ETHER_STAT_CAP_ASMPAUSE
1389 Indicates that the device supports the ability to receive pause frames.
1390 .It Sy ETHER_STAT_CAP_AUTONEG
1391 Indicates that the device supports the ability to perform link
1393 .It Sy ETHER_STAT_CAP_PAUSE
1394 Indicates that the device supports the ability to transmit pause frames.
1395 .It Sy ETHER_STAT_CAP_REMFAULT
1396 Indicates that the device supports the ability of detecting a remote
1397 fault in a link peer.
1398 .It Sy ETHER_STAT_CARRIER_ERRORS
1399 Indicates the number of times that the Ethernet carrier sense condition
1400 was lost or not asserted.
1401 .It Sy ETHER_STAT_DEFER_XMTS
1402 Indicates the number of frames for which the device was unable to
1403 transmit the frame due to being busy and had to try again.
1404 .It Sy ETHER_STAT_EX_COLLISIONS
1405 Indicates the number of frames that failed to send due to an excessive
1406 number of collisions.
1407 .It Sy ETHER_STAT_FCS_ERRORS
1408 Indicates the number of times that a frame check sequence failed.
1409 .It Sy ETHER_STAT_FIRST_COLLISIONS
1410 Indicates the number of times that a frame was eventually transmitted
1411 successfully, but only after a single collision.
1412 .It Sy ETHER_STAT_JABBER_ERRORS
1413 Indicates the number of frames that were received that were both larger
1414 than the maximum packet size and failed the frame check sequence.
1415 .It Sy ETHER_STAT_LINK_ASMPAUSE
1416 Indicates whether the link is currently configured to accept pause
1418 .It Sy ETHER_STAT_LINK_AUTONEG
1419 Indicates whether the current link state is a result of
1421 .It Sy ETHER_STAT_LINK_DUPLEX
1422 Indicates the current duplex state of the link.
1423 The values used here should be the same as documented for
1424 .Sy MAC_PROP_DUPLEX .
1425 .It Sy ETHER_STAT_LINK_PAUSE
1426 Indicates whether the link is currently configured to generate pause
1428 .It Sy ETHER_STAT_LP_CAP_1000FDX
1429 Indicates the remote device supports 1 Gbit/s full-duplex operation.
1430 .It Sy ETHER_STAT_LP_CAP_1000HDX
1431 Indicates the remote device supports 1 Gbit/s half-duplex operation.
1432 .It Sy ETHER_STAT_LP_CAP_100FDX
1433 Indicates the remote device supports 100 Mbit/s full-duplex operation.
1434 .It Sy ETHER_STAT_LP_CAP_100GFDX
1435 Indicates the remote device supports 100 Gbit/s full-duplex operation.
1436 .It Sy ETHER_STAT_LP_CAP_100HDX
1437 Indicates the remote device supports 100 Mbit/s half-duplex operation.
1438 .It Sy ETHER_STAT_LP_CAP_100T4
1439 Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1440 .It Sy ETHER_STAT_LP_CAP_10FDX
1441 Indicates the remote device supports 10 Mbit/s full-duplex operation.
1442 .It Sy ETHER_STAT_LP_CAP_10GFDX
1443 Indicates the remote device supports 10 Gbit/s full-duplex operation.
1444 .It Sy ETHER_STAT_LP_CAP_10HDX
1445 Indicates the remote device supports 10 Mbit/s half-duplex operation.
1446 .It Sy ETHER_STAT_LP_CAP_2500FDX
1447 Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1448 .It Sy ETHER_STAT_LP_CAP_40GFDX
1449 Indicates the remote device supports 40 Gbit/s full-duplex operation.
1450 .It Sy ETHER_STAT_LP_CAP_5000FDX
1451 Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1452 .It Sy ETHER_STAT_LP_CAP_ASMPAUSE
1453 Indicates that the remote device supports the ability to receive pause
1455 .It Sy ETHER_STAT_LP_CAP_AUTONEG
1456 Indicates that the remote device supports the ability to perform link
1458 .It Sy ETHER_STAT_LP_CAP_PAUSE
1459 Indicates that the remote device supports the ability to transmit pause
1461 .It Sy ETHER_STAT_LP_CAP_REMFAULT
1462 Indicates that the remote device supports the ability of detecting a
1463 remote fault in a link peer.
1464 .It Sy ETHER_STAT_MACRCV_ERRORS
1465 Indicates the number of times that the internal MAC layer encountered an
1466 error when attempting to receive and process a frame.
1467 .It Sy ETHER_STAT_MACXMT_ERRORS
1468 Indicates the number of times that the internal MAC layer encountered an
1469 error when attempting to process and transmit a frame.
1470 .It Sy ETHER_STAT_MULTI_COLLISIONS
1471 Indicates the number of times that a frame was eventually transmitted
1472 successfully, but only after more than one collision.
1473 .It Sy ETHER_STAT_SQE_ERRORS
1474 Indicates the number of times that an SQE error occurred.
1475 The specific conditions for this error are documented in IEEE 802.3.
1476 .It Sy ETHER_STAT_TOOLONG_ERRORS
1477 Indicates the number of frames that were received that were longer than
1478 the maximum frame size supported by the device.
1479 .It Sy ETHER_STAT_TOOSHORT_ERRORS
1480 Indicates the number of frames that were received that were shorter than
1481 the minimum frame size supported by the device.
1482 .It Sy ETHER_STAT_TX_LATE_COLLISIONS
1483 Indicates the number of times a collision was detected late on the
1485 .It Sy ETHER_STAT_XCVR_ADDR
1486 Indicates the address of the MII/GMII receiver address.
1487 .It Sy ETHER_STAT_XCVR_ID
1488 Indicates the id of the MII/GMII receiver address.
1489 .It Sy ETHER_STAT_XCVR_INUSE
1490 Indicates what kind of receiver is in use.
1491 The following values may be used:
1493 .It Sy XCVR_UNDEFINED
1494 The receiver type is undefined by the hardware.
1496 There is no receiver in use by the hardware.
1498 The receiver supports 10BASE-T operation.
1500 The receiver supports 100BASE-T4 operation.
1502 The receiver supports 100BASE-TX operation.
1504 The receiver supports 100BASE-T2 operation.
1506 The receiver supports 1000BASE-X operation.
1507 This is used for all fiber receivers.
1509 The receiver supports 1000BASE-T operation.
1510 This is used for all copper receivers.
1513 .Ss Device Specific kstats
1514 In addition to the defined statistics above, if the device driver
1515 maintains additional statistics or the device provides additional
1516 statistics, it should create its own kstats through the
1518 function to allow operators to observe them.
1519 .Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1520 Device drivers are the first line of defense for dealing with broken
1521 devices and bugs in their firmware.
1522 While most devices will rarely fail, it is important that when designing and
1523 implementing the device driver that particular attention is paid in the design
1524 with respect to RAS (Reliability, Availability, and Serviceability).
1525 While everything described in this section is optional, it is highly recommended
1526 that all new device drivers follow these guidelines.
1528 The Fault Management Architecture (FMA) provides facilities for
1529 detecting and reporting various classes of defects and faults.
1530 Specifically for networking device drivers, issues that should be
1531 detected and reported include:
1532 .Bl -bullet -offset indent
1534 Device internal uncorrectable errors
1536 Device internal correctable errors
1538 PCI and PCI Express transport errors
1540 Device temperature alarms
1542 Device transmission stalls
1544 Device communication timeouts
1546 High invalid interrupts
1549 All such errors fall into three primary categories:
1550 .Bl -enum -offset indent
1552 Errors detected by the Fault Management Architecture
1554 Errors detected by the device and indicated to the device driver
1556 Errors detected by the device driver
1558 .Ss Fault Management Setup and Teardown
1559 Drivers should initialize support for the fault management framework by
1565 By registering with the fault management framework, a device driver is given the
1566 chance to detect and notice transport errors as well as report other errors that
1568 While a device driver does not need to indicate that it is capable of all such
1569 capabilities described in
1570 .Xr ddi_fm_init 9F ,
1571 we suggest that device drivers at least register the
1572 .Sy DDI_FM_EREPORT_CAPABLE
1573 so as to allow the driver to report issues that it detects.
1575 If the driver registers with the fault management framework during its
1577 entry point, it must call
1582 .Ss Transport Errors
1583 Many modern networking devices leverage PCI or PCI Express.
1584 As such, there are two primary ways that device drivers access data: they either
1585 memory map device registers and use routines like
1589 or they use direct memory access (DMA).
1590 New device drivers should always enable checking of the transport layer by
1591 marking their support in the
1592 .Xr ddi_device_acc_attr_t 9S
1593 structure and using routines like
1594 .Xr ddi_fm_acc_err_get 9F
1596 .Xr ddi_fm_dma_err_get 9F
1597 to detect if errors have occurred.
1598 .Ss Device Indicated Errors
1599 Many devices have capabilities to announce to a device driver that a
1600 fatal correctable error or uncorrectable error has occurred.
1601 Other devices have the ability to indicate that various physical issues have
1602 occurred such as a fan failing or a temperature sensor having fired.
1604 Drivers should wire themselves to receive notifications when these
1606 The means and capabilities will vary from device to device.
1607 For example, some devices will generate information about these notifications
1608 through special interrupts.
1609 Other devices may have a register that software can poll.
1610 In the cases where polling is required, driver writers should try not to poll
1611 too frequently and should generally only poll when the device is actively being
1612 used, e.g. between calls to the
1617 .Ss Driver Transmit Stall Detection
1618 One of the primary responsibilities of a hardened device driver is to
1619 perform transmit stall detection.
1620 The core idea behind tx stall detection is that the driver should record when
1621 it's getting activity related to when data has been successfully transmitted.
1622 Most devices should be transmitting data on a regular basis as long as the link
1624 If it is not, then this may indicate that the device is stuck and needs to be
1626 At this time, the MAC framework does not provide any resources for performing
1627 these checks; however, polling on each individual transmit ring for the last
1628 completion time while something is actively being transmitted through the use of
1631 may be a reasonable starting point.
1632 .Ss Driver Command Timeout Detection
1633 Each device is programmed in different ways.
1634 Some devices are programmed through asynchronous commands while others are
1635 programmed by writing directly to memory mapped registers.
1636 If a device receives asynchronous replies to commands, then the device driver
1637 should set reasonable timeouts for all such commands and plan on detecting them.
1638 If a timeout occurs, the driver should presume that there is an issue with the
1639 hardware and proceed to abort the command or reset the device.
1641 Many devices do not have such a communication mechanism.
1642 However, whenever there is some activity where the device driver must wait, then
1643 it should be prepared for the fact that the device may never get back to
1644 it and react appropriately by performing some kind of device reset.
1645 .Ss Reacting to Errors
1646 When any of the above categories of errors has been triggered, the
1647 behavior that the device driver should take depends on the kind of
1649 If a fatal error, for example, a transport error, a transmit stall was detected,
1650 or the device indicated an uncorrectable error was detected, then it is
1651 important that the driver take the following steps:
1652 .Bl -enum -offset indent
1654 Set a flag in the device driver's state that indicates that it has hit
1656 When this error condition flag is asserted, transmitted packets should be
1657 accepted and dropped and actions that would require writing to the device state
1658 should fail with an error.
1659 This flag should remain until the device has been successfully restarted.
1661 If the error was not a transport error that was indicated by the fault
1662 management architecture, e.g. a transport error that was detected, then
1663 the device driver should post an
1665 indicating what has occurred with the
1666 .Xr ddi_fm_ereport_post 9F
1669 The device driver should indicate that the device's service was lost
1671 .Xr ddi_fm_service_impact 9F
1673 .Sy DDI_SERVICE_LOST .
1675 At this point the device driver should issue a device reset through some
1676 device-specific means.
1678 When the device reset has been completed, then the device driver should
1679 restore all of the programmed state to the device.
1680 This includes things like the current MTU, advertised auto-negotiation speeds,
1681 MAC address filters, and more.
1683 Finally, when service has been restored, the device driver should call
1684 .Xr ddi_fm_service_impact 9F
1686 .Sy DDI_SERVICE_RESTORED .
1689 When a non-fatal error occurs, then the device driver should submit an
1690 ereport and should optionally mark the device degraded using
1691 .Xr ddi_fm_service_impact 9F
1693 .Sy DDI_SERVICE_DEGRADED
1694 value depending on the nature of the problem that has occurred.
1696 Device drivers should never make the decision to remove a device from
1697 service based on errors that have occurred nor should they panic the
1699 Rather, the device driver should always try to notify the operating system with
1700 various ereports and allow its policy decisions to occur.
1701 The decision to retire a device lies in the hands of the fault management
1703 It knows more about the operator's intent and the surrounding system's state
1704 than the device driver itself does and it will make the call to offline and
1705 retire the device if it is required.
1707 When resetting a device, a device driver must exercise caution.
1708 If a device driver has not been written to plan for a device reset, then it
1709 may not correctly restore the device's state after such a reset.
1710 Such state should be stored in the instance's private state data as the MAC
1711 framework does not know about device resets and will not inform the
1712 device again about the expected, programmed state.
1714 One wrinkle with device resets is that many networking cards show up as
1715 multiple PCI functions on a single device, for example, each port may
1716 show up as a separate function and thus have a separate instance of the
1717 device driver attached.
1718 When resetting a function, device driver writers should carefully read the
1719 device programming manuals and verify whether or not a reset impacts only the
1720 stalled function or if it impacts all function across the device.
1722 If the only way to reset a given function is through the device, then
1723 this may require more coordination and work on the part of the device
1724 driver to ensure that all the other instances are correctly restored.
1725 In cases where this occurs, some devices offer ways of injecting
1726 interrupts onto those other functions to notify them that this is
1729 The networking stack manages framed data through the use of the
1732 The mblk allows for a single message to be made up of individual blocks.
1733 Each part is linked together through its
1736 However, it also allows for multiple messages to be chained together through the
1740 While the networking stack works with these structures, device drivers generally
1741 work with DMA regions.
1742 There are two different strategies that device drivers use for handling these
1743 two different cases: copying and binding.
1745 The first way that device drivers handle interfacing between the two is
1746 by having two separate regions of memory.
1747 One part is memory which has been allocated for DMA through a call to
1748 .Xr ddi_dma_alloc 9F
1749 and the other is memory associated with the memory block.
1751 In this case, a driver will use
1753 to copy memory between the two distinct regions.
1754 When transmitting a packet, it will copy the memory from the mblk_t to the DMA
1756 When receiving memory, it will allocate a mblk_t through the
1758 routine, copy the memory across with
1760 and then increment the mblk_t's
1764 If, when receiving, memory is not available for a new message block,
1765 then the frame should be skipped and effectively dropped.
1766 A kstat should be bumped when such an occasion occurs.
1768 An alternative approach to copying data is to use DMA binding.
1769 When using DMA binding, the OS takes care of mapping between DMA memory and
1770 normal device memory.
1771 The exact process is a bit different between transmit and receive.
1773 When transmitting a device driver has an mblk_t and needs to call the
1774 .Xr ddi_dma_addr_bind_handle 9F
1775 function to bind it to an already existing DMA handle.
1776 At that point, it will receive various DMA cookies that it can use to obtain the
1777 addresses to program the device with for transmitting data.
1778 Once the transmit is done, the driver must then make sure to call
1780 to release the data.
1783 before it receives an interrupt from the device indicating that the data
1784 has been transmitted, otherwise it risks sending arbitrary kernel
1787 When receiving data, the device can perform a similar operation.
1788 First, it must bind the DMA memory into the kernel's virtual memory address
1789 space through a call to the
1790 .Xr ddi_dma_addr_bind_handle 9F
1791 function if it has not already.
1792 Once it has, it must then call
1794 to try and create a new mblk_t which leverages the associated memory.
1795 It can then pass that mblk_t up to the stack.
1797 When deciding which of these options to use, there are many different
1798 considerations that must be made.
1799 The answer as to whether to bind memory or to copy data is not always simpler.
1801 The first thing to remember is that DMA resources may be finite on a
1803 Consider the case of receiving data.
1804 A device driver that binds one of its receive descriptors may not get it back
1805 for quite some time as it may be used by the kernel until an application
1806 actually consumes it.
1807 Device drivers that try to bind memory for receive, often work with the
1808 constraint that they must be able to replace that DMA memory with another DMA
1810 If they were not replaced, then eventually the device would not be able to
1811 receive additional data into the ring.
1813 On the other hand, particularly for larger frames, copying every packet
1814 from one buffer to another can be a source of additional latency and
1815 memory waste in the system.
1816 For larger copies, the cost of copying may dwarf any potential cost of
1817 performing DMA binding.
1819 For device driver authors that are unsure of what to do, they should
1820 first employ the copying method to simplify the act of writing the
1822 The copying method is simpler and also allows the device driver author not to
1823 worry about allocated DMA memory that is still outstanding when it is asked to
1826 If device driver writers are worried about the cost, it is recommended
1827 to make the decision as to whether or not to copy or bind DMA data
1828 a separate private property for both transmitting and receiving.
1829 That private property should indicate the size of the received frame at which
1830 to switch from one format to the other.
1831 This way, data can be gathered to determine what the impact of each method is on
1845 .Xr mc_getcapab 9E ,
1848 .Xr mc_multicst 9E ,
1850 .Xr mc_propinfo 9E ,
1851 .Xr mc_setpromisc 9E ,
1860 .Xr ddi_dma_addr_bind_handle 9F ,
1861 .Xr ddi_dma_alloc 9F ,
1862 .Xr ddi_fm_acc_err_get 9F ,
1863 .Xr ddi_fm_dma_err_get 9F ,
1864 .Xr ddi_fm_ereport_post 9F ,
1865 .Xr ddi_fm_fini 9F ,
1866 .Xr ddi_fm_init 9F ,
1867 .Xr ddi_fm_service_impact 9F ,
1872 .Xr kstat_create 9F ,
1874 .Xr mac_fini_ops 9F ,
1875 .Xr mac_hcksum_get 9F ,
1876 .Xr mac_hcksum_set 9F ,
1877 .Xr mac_init_ops 9F ,
1878 .Xr mac_link_update 9F ,
1879 .Xr mac_lso_get 9F ,
1880 .Xr mac_maxsdu_update 9F ,
1881 .Xr mac_prop_info_set_default_link_flowctrl 9F ,
1882 .Xr mac_prop_info_set_default_str 9F ,
1883 .Xr mac_prop_info_set_default_uint32 9F ,
1884 .Xr mac_prop_info_set_default_uint64 9F ,
1885 .Xr mac_prop_info_set_default_uint8 9F ,
1886 .Xr mac_prop_info_set_perm 9F ,
1887 .Xr mac_prop_info_set_range_uint32 9F ,
1888 .Xr mac_register 9F ,
1890 .Xr mac_unregister 9F ,
1893 .Xr mod_install 9F ,
1898 .Xr ddi_device_acc_attr_t 9S ,
1900 .Xr kstat_create 9S ,
1901 .Xr mac_callbacks 9S ,
1902 .Xr mac_register 9S ,
1909 .%T RFC 1213 Management Information Base for Network Management of
1910 .%T TCP/IP-based internets: MIB-II
1916 .%T RFC 1573 Evolution of the Interfaces Group of MIB-II
1921 .%T RFC 1643 Definitions of Managed Objects for the Ethernet-like