3 The `omp` dialect is for representing directives, clauses and other definitions
4 of the [OpenMP programming model](https://www.openmp.org). This directive-based
5 programming model, defined for the C, C++ and Fortran programming languages,
6 provides abstractions to simplify the development of parallel and accelerated
7 programs. All versions of the OpenMP specification can be found
8 [here](https://www.openmp.org/specifications/).
10 Operations in this MLIR dialect generally correspond to a single OpenMP
11 directive, taking arguments that represent their supported clauses, though this
12 is not always the case. For a detailed information of operations, types and
13 other definitions in this dialect, refer to the automatically-generated
14 [ODS Documentation](ODS.md).
18 ## Operation Naming Conventions
20 This section aims to standardize how dialect operation names are chosen, to
21 ensure a level of consistency. There are two categories of names: tablegen names
22 and assembly names. The former also corresponds to the C++ class that is
23 generated for the operation, whereas the latter is used to represent it in MLIR
26 Tablegen names are CamelCase, with the first letter capitalized and an "Op"
27 suffix, whereas assembly names are snake_case, with all lowercase letters and
28 words separated by underscores.
30 If the operation corresponds to a directive, clause or other kind of definition
31 in the OpenMP specification, it must use the same name split into words in the
32 same way. For example, the `target data` directive would become `TargetDataOp` /
33 `omp.target_data`, whereas `taskloop` would become `TaskloopOp` /
36 Operations intended to carry extra information for another particular operation
37 or clause must be named after that other operation or clause, followed by the
38 name of the additional information. The assembly name must use a period to
39 separate both parts. For example, the operation used to define some extra
40 mapping information is named `MapInfoOp` / `omp.map.info`. The same rules are
41 followed if multiple operations are created for different variants of the same
42 directive, e.g. `atomic` becomes `Atomic{Read,Write,Update,Capture}Op` /
43 `omp.atomic.{read,write,update,capture}`.
45 ## Clause-Based Operation Definition
47 One main feature of the OpenMP specification is that, even though the set of
48 clauses that could be applied to a given directive is independent from other
49 directives, these clauses can generally apply to multiple directives. Since
50 clauses usually define which arguments the corresponding MLIR operation takes,
51 it is possible (and preferred) to define OpenMP dialect operations based on the
52 list of clauses taken by the corresponding directive. This makes it simpler to
53 keep their representation consistent across operations and minimizes redundancy
56 To achieve this, the base `OpenMP_Clause` tablegen class has been created. It is
57 intended to be used to create clause definitions that can be then attached to
58 multiple `OpenMP_Op` definitions, resulting in the latter inheriting by default
59 all properties defined by clauses attached, similarly to the trait mechanism.
60 This mechanism is implemented in
61 [OpenMPOpBase.td](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPOpBase.td).
65 OpenMP clause definitions are located in
66 [OpenMPClauses.td](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPClauses.td).
67 For each clause, an `OpenMP_Clause` subclass and a definition based on it must
68 be created. The subclass must take a `bit` template argument for each of the
69 properties it can populate on associated `OpenMP_Op`s. These must be forwarded
70 to the base class. The definition must be an instantiation of the base class
71 where all these template arguments are set to `false`. The definition's name
72 must be `OpenMP_<Name>Clause`, whereas its base class' must be
73 `OpenMP_<Name>ClauseSkip`. Following this pattern makes it possible to
74 optionally skip the inheritance of some properties when defining operations:
75 [more info](#overriding-clause-inherited-properties).
77 Clauses can define the following properties:
78 - `list<Traits> traits`: To be used when having a certain clause always
79 implies some op trait, like the `map` clause and the `MapClauseOwningInterface`.
80 - `dag(ins) arguments`: Mandatory property holding values and attributes
81 used to represent the clause. Argument names use snake_case and should contain
82 the clause name to avoid name clashes between clauses. Variadic arguments
83 (non-attributes) must contain the "_vars" suffix.
84 - `string {req,opt}AssemblyFormat`: Optional formatting strings to produce
85 custom human-friendly printers and parsers for arguments associated with the
86 clause. It will be combined with assembly formats for other clauses as explained
87 [below](#adding-an-operation).
88 - `string description`: Optional description text to describe the clause and
90 - `string extraClassDeclaration`: Optional C++ declarations to be added to
91 operation classes including the clause.
96 class OpenMP_ExampleClauseSkip<
97 bit traits = false, bit arguments = false, bit assemblyFormat = false,
98 bit description = false, bit extraClassDeclaration = false
99 > : OpenMP_Clause<traits, arguments, assemblyFormat, description,
100 extraClassDeclaration> {
102 Optional<AnyType>:$example_var
105 let optAssemblyFormat = [{
106 `example` `(` $example_var `:` type($example_var) `)`
110 The `example_var` argument defines the variable to which the EXAMPLE clause
115 def OpenMP_ExampleClause : OpenMP_ExampleClauseSkip<>;
118 ### Adding an Operation
120 Operations in the OpenMP dialect, located in
121 [OpenMPOps.td](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td),
122 can be defined like any other regular operation by just specifying a `mnemonic`
123 and optional list of `traits` when inheriting from `OpenMP_Op`, and then
124 defining the expected `description`, `arguments`, etc. properties inside of its
125 body. However, in most cases, basing the operation definition on its list of
126 accepted clauses is significantly simpler because some of the properties can
127 just be inherited from these clauses.
129 In general, the way to achieve this is to specify, in addition to the `mnemonic`
130 and optional list of `traits`, a list of `clauses` where all the applicable
131 `OpenMP_<Name>Clause` definitions are added. Then, the only properties that
132 would have to be defined in the operation's body are the `summary` and
133 `description`. For the latter, only the operation itself would have to be
134 defined, and the description for its clause-inherited arguments is appended
135 through the inherited `clausesDescription` property. By convention, the list of
136 clauses for an operation must be specified in alphabetical order.
138 If the operation is intended to have a single region, this is better achieved by
139 setting the `singleRegion=true` template argument of `OpenMP_Op` rather manually
140 populating the `regions` property of the operation, because that way the default
141 `assemblyFormat` is also updated correspondingly.
146 def ExampleOp : OpenMP_Op<"example", traits = [
147 AttrSizedOperandSegments, ...
149 OpenMP_AlignedClause, OpenMP_IfClause, OpenMP_LinearClause, ...
150 ], singleRegion = true> {
151 let summary = "example construct";
153 The example construct represents...
154 }] # clausesDescription;
158 This is possible because the `arguments`, `assemblyFormat` and
159 `extraClassDeclaration` properties of the operation are by default
160 populated by concatenating the corresponding properties of the clauses on the
161 list. In the case of the `assemblyFormat`, this involves combining the
162 `reqAssemblyFormat` and the `optAssemblyFormat` properties. The
163 `reqAssemblyFormat` of all clauses is concatenated first and separated using
164 spaces, whereas the `optAssemblyFormat` is wrapped in an `oilist()` and
165 interleaved with "|" instead of spaces. The resulting `assemblyFormat` contains
166 the required assembly format strings, followed by the optional assembly format
167 strings, optionally the `$region` and the `attr-dict`.
169 ### Overriding Clause-Inherited Properties
171 Although the clause-based definition of operations can greatly reduce work, it's
172 also somewhat restrictive, since there may be some situations where only part of
173 the operation definition can be automated in that manner. For a fine-grained
174 control over properties inherited from each clause two features are available:
176 - Inhibition of properties. By using `OpenMP_<Name>ClauseSkip` tablegen
177 classes, the list of properties copied from the clause to the operation can be
178 selected. For example, `OpenMP_IfClauseSkip<assemblyFormat = true>` would result
179 in every property defined for the `OpenMP_IfClause` except for the
180 `assemblyFormat` being used to initially populate the properties of the
182 - Augmentation of properties. There are times when there is a need to add to
183 a clause-populated operation property. Instead of overriding the property in the
184 definition of the operation and having to manually replicate what would
185 otherwise be automatically populated before adding to it, some internal
186 properties are defined to hold this default value: `clausesArgs`,
187 `clausesAssemblyFormat`, `clauses{Req,Opt}AssemblyFormat` and
188 `clausesExtraClassDeclaration`.
190 In the following example, assuming both the `OpenMP_InReductionClause` and the
191 `OpenMP_ReductionClause` define a `getReductionVars` extra class declaration,
192 we skip the conflicting `extraClassDeclaration`s inherited by both clauses and
193 provide another implementation, without having to also re-define other
194 declarations inherited from the `OpenMP_AllocateClause`:
197 def ExampleOp : OpenMP_Op<"example", traits = [
198 AttrSizedOperandSegments, ...
200 OpenMP_AllocateClause,
201 OpenMP_InReductionClauseSkip<extraClassDeclaration = true>,
202 OpenMP_ReductionClauseSkip<extraClassDeclaration = true>
203 ], singleRegion = true> {
204 let summary = "example construct";
206 This operation represents...
207 }] # clausesDescription;
209 // Override the clause-populated extraClassDeclaration and add the default
210 // back via appending clausesExtraClassDeclaration to it. This has the effect
211 // of adding one declaration. Since this property is skipped for the
212 // InReduction and Reduction clauses, clausesExtraClassDeclaration won't
213 // incorporate the definition of this property for these clauses.
214 let extraClassDeclaration = [{
215 SmallVector<Value> getReductionVars() {
216 // Concatenate inReductionVars and reductionVars and return the result...
218 }] # clausesExtraClassDeclaration;
222 These features are intended for complex edge cases, but an effort should be made
223 to avoid having to use them, since they may introduce inconsistencies and
224 complexity to the dialect.
226 ### Tablegen Verification Pass
228 As a result of the implicit way in which fundamental properties of MLIR
229 operations are populated following this approach, and the ability to override
230 them, forgetting to append clause-inherited values might result in hard to debug
233 For this reason, the `-verify-openmp-ops` tablegen pseudo-backend was created.
234 It runs before any other tablegen backends are triggered for the
235 [OpenMPOps.td](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td)
236 file and warns any time a property defined for a clause is not found in the
237 corresponding operation, except if it is explicitly skipped as described
238 [above](#overriding-clause-inherited-properties). This way, in case of a later
239 tablegen failure while processing OpenMP dialect operations, earlier messages
240 triggered by that pass can point to a likely solution.
242 ### Operand Structures
244 One consequence of basing the representation of operations on the set of values
245 and attributes defined for each clause applicable to the corresponding OpenMP
246 directive is that operation argument lists tend to be long. This has the effect
247 of making C++ operation builders difficult to work with and easy to mistakenly
248 pass arguments in the wrong order, which may sometimes introduce hard to detect
251 A solution provided to this issue are operand structures. The main idea behind
252 them is that there is one defined for each clause, holding a set of fields that
253 contain the data needed to initialize each of the arguments associated with that
254 clause. Clause operand structures are aggregated into operation operand
255 structures via class inheritance. Then, a custom builder is defined for each
256 operation taking the corresponding operand structure as a parameter. Since each
257 argument is a named member of the structure, it becomes much simpler to set up
258 the desired arguments to create a new operation.
260 Ad-hoc operand structures available for use within the ODS definition of custom
261 operation builders might be defined in
262 [OpenMPClauseOperands.h](https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/OpenMP/OpenMPClauseOperands.h).
263 However, this is generally not needed for clause-based operation definitions.
264 The `-gen-openmp-clause-ops` tablegen backend, triggered when building the 'omp'
265 dialect, will automatically produce structures in the following way:
267 - It will create a `<Name>ClauseOps` structure for each `OpenMP_Clause`
268 definition with one field per argument.
269 - The name of each field will match the tablegen name of the corresponding
270 argument, except for replacing snake case with camel case.
271 - The type of the field will be obtained from the corresponding tablegen
273 - Values are represented with `mlir::Value`, except for `Variadic`, which
274 makes it an `llvm::SmallVector<mlir::Value>`.
275 - `OptionalAttr` is represented by the translation of its `baseAttr`.
276 - `TypedArrayAttrBase`-based attribute types are represented by wrapping the
277 translation of their `elementAttr` in an `llvm::SmallVector`. The only
278 exception for this case is if the `elementAttr` is a "scalar" (i.e. non
279 array-like) attribute type, in which case the more generic `mlir::Attribute`
280 will be used in place of its `storageType`.
281 - For `ElementsAttrBase`-based attribute types a best effort is attempted to
282 obtain an element type (`llvm::APInt`, `llvm::APFloat` or
283 `DenseArrayAttrBase`'s `returnType`) to be wrapped in an `llvm::SmallVector`.
284 If it cannot be obtained, which will happen with non-builtin direct subclasses
285 of `ElementsAttrBase`, a warning will be emitted and the `storageType` (i.e.
286 specific `mlir::Attribute` subclass) will be used instead.
287 - Other attribute types will be represented with their `storageType`.
288 - It will create `<Name>Operands` structure for each operation, which is an
289 empty structure subclassing all operand structures defined for the corresponding
290 `OpenMP_Op`'s clauses.
292 ### Entry Block Argument-Defining Clauses
294 In their MLIR representation, certain OpenMP clauses introduce a mapping between
295 values defined outside the operation they are applied to and entry block
296 arguments for the region of that MLIR operation. This enables, for example, the
297 introduction of private copies of the same underlying variable defined outside
298 the MLIR operation the clause is attached to. Currently, clauses with this
299 property can be classified into three main categories:
300 - Map-like clauses: `map`, `use_device_addr` and `use_device_ptr`.
301 - Reduction-like clauses: `in_reduction`, `reduction` and `task_reduction`.
302 - Privatization clauses: `private`.
304 All three kinds of entry block argument-defining clauses use a similar custom
305 assembly format representation, only differing based on the different pieces of
306 information attached to each kind. Below, one example of each is shown:
309 omp.target map_entries(%x -> %x.m, %y -> %y.m : !llvm.ptr, !llvm.ptr) {
310 // Use %x.m, %y.m in place of %x and %y...
313 omp.wsloop reduction(@add.i32 %x -> %x.r, byref @add.f32 %y -> %y.r : !llvm.ptr, !llvm.ptr) {
314 // Use %x.r, %y.r in place of %x and %y...
317 omp.parallel private(@x.privatizer %x -> %x.p, @y.privatizer %y -> %y.p : !llvm.ptr, !llvm.ptr) {
318 // Use %x.p, %y.p in place of %x and %y...
322 As a consequence of parsing and printing the operation's first region entry
323 block argument names together with the custom assembly format of these clauses,
324 entry block arguments (i.e. the `^bb0(...):` line) must not be explicitly
325 defined for these operations. Additionally, it is not possible to implement this
326 feature while allowing each clause to be independently parsed and printed,
327 because they need to be printed/parsed together with the corresponding
328 operation's first region. They must have a well-defined ordering in which
329 multiple of these clauses are specified for a given operation, as well.
331 The parsing/printing of these clauses together with the region provides the
332 ability to define entry block arguments directly after the `->`. Forcing a
333 specific ordering between these clauses makes the block argument ordering
334 well-defined, which is the property used to easily match each clause with the
335 entry block arguments defined by it.
337 Custom printers and parsers for operation regions based on the entry block
338 argument-defining clauses they take are implemented based on the
339 `{parse,print}BlockArgRegion` functions, which take care of the sorting and
340 formatting of each kind of clause, minimizing code duplication resulting from
341 this approach. One example of the custom assembly format of an operation taking
342 the `private` and `reduction` clauses is the following:
345 let assemblyFormat = clausesAssemblyFormat # [{
346 custom<PrivateReductionRegion>($region, $private_vars, type($private_vars),
347 $private_syms, $reduction_vars, type($reduction_vars), $reduction_byref,
348 $reduction_syms) attr-dict
352 The `BlockArgOpenMPOpInterface` has been introduced to simplify the addition and
353 handling of these kinds of clauses. It holds `num<ClauseName>BlockArgs()`
354 functions that by default return 0, to be overriden by each clause through the
355 `extraClassDeclaration` property. Based on these functions and the expected
356 alphabetical sorting between entry block argument-defining clauses, it
357 implements `get<ClauseName>BlockArgs()` functions that are the intended method
358 of accessing clause-defined block arguments.
360 ## Loop-Associated Directives
362 Loop-associated OpenMP constructs are represented in the dialect as loop wrapper
363 operations. These implement the `LoopWrapperInterface`, which enforces a series
364 of restrictions upon the operation:
365 - It has the `NoTerminator` and `SingleBlock` traits;
366 - It contains a single region; and
367 - Its only block contains exactly one operation, which must be another loop
368 wrapper or `omp.loop_nest` operation.
370 This approach splits the representation for a loop nest and the loop-associated
371 constructs that specify how its iterations are executed, possibly across various
372 SIMD lanes (`omp.simd`), threads (`omp.wsloop`), teams of threads
373 (`omp.distribute`) or tasks (`omp.taskloop`). The ability to directly nest
374 multiple loop wrappers to impact the execution of a single loop nest is used to
375 represent composite constructs in a modular way.
377 The `omp.loop_nest` operation represents a collapsed rectangular loop nest that
378 must always be wrapped by at least one loop wrapper, which defines how it is
379 intended to be executed. It serves as a simpler and more restrictive
380 representation of OpenMP loops while a more general approach to support
381 non-rectangular loop nests, loop transformations and non-perfectly nested loops
382 based on a new `omp.canonical_loop` definition is developed.
384 The following example shows how a `parallel {do,for}` construct would be
390 omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) {
391 %a = load %a[%i] : memref<?xf32>
392 %b = load %b[%i] : memref<?xf32>
393 %sum = arith.addf %a, %b : f32
394 store %sum, %c[%i] : memref<?xf32>
403 ### Loop Transformations
405 In addition to the worksharing loop-associated constructs described above, the
406 OpenMP specification also defines a set of loop transformation constructs. They
407 replace the associated loop(s) before worksharing constructs are executed on the
408 generated loop(s). Some examples of such constructs are `tile` and `unroll`.
410 A general approach for representing these types of OpenMP constructs has not yet
411 been implemented, but it is closely linked to the `omp.canonical_loop` work.
412 Nevertheless, loop transformation that the `collapse` clause for loop-associated
413 worksharing constructs defines can be represented by introducing multiple
414 bounds, step and induction variables to the `omp.loop_nest` operation.
416 ## Compound Construct Representation
418 The OpenMP specification defines certain shortcuts that allow specifying
419 multiple constructs in a single directive, which are referred to as compound
420 constructs (e.g. `parallel do` contains the `parallel` and `do` constructs).
421 These can be further classified into [combined](#combined-constructs) and
422 [composite](#composite-constructs) constructs. This section describes how they
423 are represented in the dialect.
425 When clauses are specified for compound constructs, the OpenMP specification
426 defines a set of rules to decide to which leaf constructs they apply, as well as
427 potentially introducing some other implicit clauses. These rules must be taken
428 into account by those creating the MLIR representation, since it is a per-leaf
429 representation that expects these rules to have already been followed.
431 ### Combined Constructs
433 Combined constructs are semantically equivalent to specifying one construct
434 immediately nested inside another. This property is used to simplify the dialect
435 by representing them through the operations associated to each leaf construct.
436 For example, `target teams` would be represented as follows:
450 ### Composite Constructs
452 Composite constructs are similar to combined constructs in that they specify the
453 effect of one construct being applied immediately after another. However, they
454 group together constructs that cannot be directly nested into each other.
455 Specifically, they group together multiple loop-associated constructs that apply
456 to the same collapsed loop nest.
458 As of version 5.2 of the OpenMP specification, the list of composite constructs
462 - `distribute parallel {do,for}`;
463 - `distribute parallel {do,for} simd`; and
466 Even though the list of composite constructs is relatively short and it would
467 also be possible to create dialect operations for each, it was decided to
468 allow attaching multiple loop wrappers to a single loop instead. This minimizes
469 redundancy in the dialect and maximizes its modularity, since there is a single
470 operation for each leaf construct regardless of whether it can be part of a
471 composite construct. On the other hand, this means the `omp.loop_nest` operation
472 will have to be interpreted differently depending on how many and which loop
473 wrappers are attached to it.
475 To simplify the detection of operations taking part in the representation of a
476 composite construct, the `ComposableOpInterface` was introduced. Its purpose is
477 to handle the `omp.composite` discardable dialect attribute that can optionally
478 be attached to these operations. Operation verifiers will ensure its presence is
479 consistent with the context the operation appears in, so that it is valid when
480 the attribute is present if and only if it represents a leaf of a composite
483 For example, the `distribute simd` composite construct is represented as
489 omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) {
497 One exception to this is the representation of the
498 `distribute parallel {do,for}` composite construct. The presence of a
499 block-associated `parallel` leaf construct would introduce many problems if it
500 was allowed to work as a loop wrapper. In this case, the "hoisted `omp.parallel`
501 representation" is used instead. This consists in making `omp.parallel` the
502 parent operation, with a nested `omp.loop_nest` wrapped by `omp.distribute` and
503 `omp.wsloop` (and `omp.simd`, in the `distribute parallel {do,for} simd` case).
505 This approach works because `parallel` is a parallelism-generating construct,
506 whereas `distribute` is a worksharing construct impacting the higher level
507 `teams` construct, making the ordering between these constructs not cause
508 semantic mismatches. This property is also exploited by LLVM's SPMD-mode.
515 omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) {