1 .. SPDX-License-Identifier: GPL-2.0
10 While performing the hardware offloading process, much of the hardware
11 specifics cannot be presented. These details are useful for debugging, and
12 ``devlink-dpipe`` provides a standardized way to provide visibility into the
15 For example, the routing longest prefix match (LPM) algorithm used by the
16 Linux kernel may differ from the hardware implementation. The pipeline debug
17 API (DPIPE) is aimed at providing the user visibility into the ASIC's
18 pipeline in a generic way.
20 The hardware offload process is expected to be done in a way that the user
21 should not be able to distinguish between the hardware vs. software
22 implementation. In this process, hardware specifics are neglected. In
23 reality those details can have lots of meaning and should be exposed in some
26 This problem is made even more complex when one wishes to offload the
27 control path of the whole networking stack to a switch ASIC. Due to
28 differences in the hardware and software models some processes cannot be
29 represented correctly.
31 One example is the kernel's LPM algorithm which in many cases differs
32 greatly to the hardware implementation. The configuration API is the same,
33 but one cannot rely on the Forward Information Base (FIB) to look like the
34 Level Path Compression trie (LPC-trie) in hardware.
36 In many situations trying to analyze systems failure solely based on the
37 kernel's dump may not be enough. By combining this data with complementary
38 information about the underlying hardware, this debugging can be made
39 easier; additionally, the information can be useful when debugging
45 The ``devlink-dpipe`` interface closes this gap. The hardware's pipeline is
46 modeled as a graph of match/action tables. Each table represents a specific
47 hardware block. This model is not new, first being used by the P4 language.
49 Traditionally it has been used as an alternative model for hardware
50 configuration, but the ``devlink-dpipe`` interface uses it for visibility
51 purposes as a standard complementary tool. The system's view from
52 ``devlink-dpipe`` should change according to the changes done by the
53 standard configuration tools.
55 For example, it’s quiet common to implement Access Control Lists (ACL)
56 using Ternary Content Addressable Memory (TCAM). The TCAM memory can be
57 divided into TCAM regions. Complex TC filters can have multiple rules with
58 different priorities and different lookup keys. On the other hand hardware
59 TCAM regions have a predefined lookup key. Offloading the TC filter rules
60 using TCAM engine can result in multiple TCAM regions being interconnected
61 in a chain (which may affect the data path latency). In response to a new TC
62 filter new tables should be created describing those regions.
67 The ``DPIPE`` model introduces several objects:
73 A ``header`` describes packet formats and provides names for fields within
74 the packet. A ``table`` describes hardware blocks. An ``entry`` describes
75 the actual content of a specific table.
77 The hardware pipeline is not port specific, but rather describes the whole
78 ASIC. Thus it is tied to the top of the ``devlink`` infrastructure.
80 Drivers can register and unregister tables at run time, in order to support
81 dynamic behavior. This dynamic behavior is mandatory for describing hardware
82 blocks like TCAM regions which can be allocated and freed dynamically.
84 ``devlink-dpipe`` generally is not intended for configuration. The exception
85 is hardware counting for a specific table.
87 The following commands are used to obtain the ``dpipe`` objects from
90 * ``table_get``: Receive a table's description.
91 * ``headers_get``: Receive a device's supported headers.
92 * ``entries_get``: Receive a table's current entries.
93 * ``counters_set``: Enable or disable counters on a table.
98 The driver should implement the following operations for each table:
100 * ``matches_dump``: Dump the supported matches.
101 * ``actions_dump``: Dump the supported actions.
102 * ``entries_dump``: Dump the actual content of the table.
103 * ``counters_set_update``: Synchronize hardware with counters enabled or
109 In a similar way to P4 headers and fields are used to describe a table's
110 behavior. There is a slight difference between the standard protocol headers
111 and specific ASIC metadata. The protocol headers should be declared in the
112 ``devlink`` core API. On the other hand ASIC meta data is driver specific
113 and should be defined in the driver. Additionally, each driver-specific
114 devlink documentation file should document the driver-specific ``dpipe``
115 headers it implements. The headers and fields are identified by enumeration.
117 In order to provide further visibility some ASIC metadata fields could be
118 mapped to kernel objects. For example, internal router interface indexes can
119 be directly mapped to the net device ifindex. FIB table indexes used by
120 different Virtual Routing and Forwarding (VRF) tables can be mapped to
121 internal routing table indexes.
126 Matches are kept primitive and close to hardware operation. Match types like
127 LPM are not supported due to the fact that this is exactly a process we wish
128 to describe in full detail. Example of matches:
130 * ``field_exact``: Exact match on a specific field.
131 * ``field_exact_mask``: Exact match on a specific field after masking.
132 * ``field_range``: Match on a specific range.
134 The id's of the header and the field should be specified in order to
135 identify the specific field. Furthermore, the header index should be
136 specified in order to distinguish multiple headers of the same type in a
142 Similar to match, the actions are kept primitive and close to hardware
143 operation. For example:
145 * ``field_modify``: Modify the field value.
146 * ``field_inc``: Increment the field value.
147 * ``push_header``: Add a header.
148 * ``pop_header``: Remove a header.
153 Entries of a specific table can be dumped on demand. Each eentry is
154 identified with an index and its properties are described by a list of
155 match/action values and specific counter. By dumping the tables content the
156 interactions between tables can be resolved.
161 The following is an example of the abstraction model of the L3 part of
162 Mellanox Spectrum ASIC. The blocks are described in the order they appear in
163 the pipeline. The table sizes in the following examples are not real
164 hardware sizes and are provided for demonstration purposes.
169 The LPM algorithm can be implemented as a list of hash tables. Each hash
170 table contains routes with the same prefix length. The root of the list is
171 /32, and in case of a miss the hardware will continue to the next hash
172 table. The depth of the search will affect the data path latency.
174 In case of a hit the entry contains information about the next stage of the
175 pipeline which resolves the MAC address. The next stage can be either local
176 host table for directly connected routes, or adjacency table for next-hops.
177 The ``meta.lpm_prefix`` field is used to connect two LPM tables.
181 table lpm_prefix_16 {
183 counters_enabled: true,
184 match: { meta.vr_id: exact,
185 ipv4.dst_addr: exact_mask,
186 ipv6.dst_addr: exact_mask,
187 meta.lpm_prefix: exact },
188 action: { meta.adj_index: set,
189 meta.adj_group_size: set,
191 meta.lpm_prefix: set },
197 In the case of local routes the LPM lookup already resolves the egress
198 router interface (RIF), yet the exact MAC address is not known. The local
199 host table is a hash table combining the output interface id with
200 destination IP address as a key. The result is the MAC address.
206 counters_enabled: true,
207 match: { meta.rif_port: exact,
208 ipv4.dst_addr: exact},
209 action: { ethernet.daddr: set }
215 In case of remote routes this table does the ECMP. The LPM lookup results in
216 ECMP group size and index that serves as a global offset into this table.
217 Concurrently a hash of the packet is generated. Based on the ECMP group size
218 and the packet's hash a local offset is generated. Multiple LPM entries can
219 point to the same adjacency group.
225 counters_enabled: true,
226 match: { meta.adj_index: exact,
227 meta.adj_group_size: exact,
228 meta.packet_hash_index: exact },
229 action: { ethernet.daddr: set,
236 In case the egress RIF and destination MAC have been resolved by previous
237 tables this table does multiple operations like TTL decrease and MTU check.
238 Then the decision of forward/drop is taken and the port L3 statistics are
239 updated based on the packet's type (broadcast, unicast, multicast).
245 counters_enabled: true,
246 match: { meta.rif_port: exact,
247 meta.is_l3_unicast: exact,
248 meta.is_l3_broadcast: exact,
249 meta.is_l3_multicast, exact },
250 action: { meta.l3_drop: set,
251 meta.l3_forward: set }