3 The `mesh` dialect contains a set of attributes, operations and interfaces that
4 are useful for representing sharding and communication on a device mesh
9 ## Collective Communication Operations
10 There are a number of operations in the Mesh dialect to facilitate
11 communication between devices in a mesh.
12 It is assumed that the user is familiar with collective operations.
13 [Wikipedia](https://en.wikipedia.org/wiki/Collective_operation) has a good
15 The main addition is that the collectives in this dialect have mesh
19 The operation attributes `mesh` and `mesh_axes` specifies a list of device mesh
20 axes that partition the devices into disjoint groups.
21 The collective operation is performed between devices in the same group.
22 Devices that have the same coordinates outside of axes `mesh_axes` are in the
24 A group is described by its multi-index along the axes outside of `mesh_axes`.
25 For example if we have a device mesh of size `2x3x4x5` and the partition mesh
26 axes list is `[0, 1]` then devices are partitioned into the groups
27 `{ { (i, j, k, m) | 0<=i<2, 0<=j<3 } | 0<=k<4, 0<=m<5 }`.
28 The device groups would be `{ (k, m) | 0<=k<4, 0<=m<5 }`.
29 Devices (1, 0, 2, 3) and (1, 1, 2, 3) will be in the same group.
30 Device (1, 0, 2, 4) will be in another group.
31 Some collective operations like all-to-all and all-gather care about the
33 The order of device in a device group is induced by the order of axes in
35 The axes are ordered from outer to inner.
36 If we have an axis list `[3, 1]` then device `(i, 1, k, 0)` will precede
37 both devices `(i, 0, k, 1)` and `(i, 2, k, 0)`.
40 Some operations like `broadcast`, `scatter` and `send` specify devices in each
42 These devices are represented with their multi-index over the mesh axes that
43 are not constant within a device group.
44 These are the axes specified by `mesh_axes` attribute.
46 For Example on a 3D mesh an operation with `mesh_axes = [0, 2]` would specify
47 an in-group device with `(i, j)`. Then for each group with index `g` on the
48 second axis, the in-group device would be `(i, g, j)`.
50 Collectives that involve the whole device group to perform a single operation
51 are pure. The exceptions are `send` and `recv`.
53 There is an assumption that the execution is SPMD.
54 Not only that each process runs the same program, but that at the point of
55 execution of a collective operation, all processes are in a coherent state.
56 All compiler transformations must be consistent.
57 Collective operations in the IR that may correspond to the same runtime
58 collective operation must be transformed in a consistent manner.
59 For example if a collective operation is optimized out, than it must also
60 not appear in any path of execution on any process.
62 Having the operations as `Pure` implies that if an interpreter is to execute
63 the IR containing the `mesh` collectives, all processes would execute the same
64 line when they reach a pure collective operation.
65 This requirement stems from the need to be compatible with general optimization
66 passes like dead code and common sub-expression elimination.
70 [include "Dialects/MeshOps.md"]
74 [include "Dialects/MeshAttrs.md"]