3 This document describes the mechanisms of producing LLVM IR from MLIR. The
4 overall flow is two-stage:
6 1. **conversion** of the IR to a set of dialects translatable to LLVM IR, for
7 example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific
8 dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md),
9 [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md);
10 2. **translation** of MLIR dialects to LLVM IR.
12 This flow allows the non-trivial transformation to be performed within MLIR
13 using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and
14 potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR
15 are expected to closely match the corresponding LLVM IR instructions and
16 intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well
17 as reduces the churn in case of changes.
19 Note that many different dialects can be lowered to LLVM but are provided as
20 different sets of patterns and have different passes available to mlir-opt.
21 However, this is primarily useful for testing and prototyping, and using the
22 collection of patterns together is highly recommended. One place this is
23 important and visible is the ControlFlow dialect's branching operations which
24 will fail to apply if their types mismatch with the blocks they jump to in the
27 SPIR-V to LLVM dialect conversion has a
28 [dedicated document](SPIRVToLLVMDialectConversion.md).
32 ## Conversion to the LLVM Dialect
34 Conversion to the LLVM dialect from other dialects is the first step to produce
35 LLVM IR. All non-trivial IR modifications are expected to happen at this stage
36 or before. The conversion is *progressive*: most passes convert one dialect to
37 the LLVM dialect and keep operations from other dialects intact. For example,
38 the `-finalize-memref-to-llvm` pass will only convert operations from the
39 `memref` dialect but will not convert operations from other dialects even if
40 they use or produce `memref`-typed values.
42 The process relies on the [Dialect Conversion](DialectConversion.md)
43 infrastructure and, in particular, on the
44 [materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter`
45 to support progressive lowering by injecting `unrealized_conversion_cast`
46 operations between converted and unconverted operations. After multiple partial
47 conversions to the LLVM dialect are performed, the cast operations that became
48 noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass
49 is not specific to the LLVM dialect and can remove any noop casts.
51 ### Conversion of Built-in Types
53 Built-in types have a default conversion to LLVM dialect types provided by the
54 `LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend
55 this type converter to support other types. Extra care must be taken if the
56 conversion rules for built-in types are overridden: all conversion must use the
59 #### LLVM Dialect-compatible Types
61 The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the
62 LLVM dialect are kept as is.
66 Complex type is converted into an LLVM dialect literal structure type with two
72 The elemental type is converted recursively using these rules.
79 !llvm.struct<(f32, f32)>
84 Index type is converted into an LLVM dialect integer type with the bitwidth
85 specified by the [data layout](DataLayout.md) of the closest module. For
86 example, on x86-64 CPUs it converts to i64. This behavior can be overridden by
87 the type converter configuration, which is often exposed as a pass option by
98 #### Ranked MemRef Types
100 Ranked memref types are converted into an LLVM dialect literal structure type
101 that contains the dynamic information associated with the memref object,
102 referred to as *descriptor*. Only memrefs in the
103 **[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the
104 LLVM dialect with the default descriptor format. Memrefs with other, less
105 trivial layouts should be converted into the strided form first, e.g., by
106 materializing the non-trivial address remapping due to layout as `affine.apply`
109 The default memref descriptor is a struct with the following fields:
111 1. The pointer to the data buffer as allocated, referred to as "allocated
112 pointer". This is only useful for deallocating the memref.
113 2. The pointer to the properly aligned data pointer that the memref indexes,
114 referred to as "aligned pointer".
115 3. A lowered converted `index`-type integer containing the distance in number
116 of elements between the beginning of the (aligned) buffer and the first
117 element to be accessed through the memref, referred to as "offset".
118 4. An array containing as many converted `index`-type integers as the rank of
119 the memref: the array represents the size, in number of elements, of the
120 memref along the given dimension.
121 5. A second array containing as many converted `index`-type integers as the
122 rank of memref: the second array represents the "stride" (in tensor
123 abstraction sense), i.e. the number of consecutive elements of the
124 underlying buffer one needs to jump over to get to the next logically
127 For constant memref dimensions, the corresponding size entry is a constant whose
128 runtime value matches the static value. This normalization serves as an ABI for
129 the memref type to interoperate with externally linked functions. In the
130 particular case of rank `0` memrefs, the size and stride arrays are omitted,
131 resulting in a struct containing two pointers + offset.
136 // Assuming index is converted to i64.
138 memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)>
139 memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64,
140 array<1 x i64>, array<1 x i64>)>
141 memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
142 array<1 x i64>, array<1 x i64>)>
143 memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
144 array<5 x i64>, array<5 x i64>)>
145 memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
146 array<5 x i64>, array<5 x i64>)>
148 // Memref types can have vectors as element types
149 memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>,
150 ptr<vector<4 x f32>>, i64,
151 array<2 x i64>, array<2 x i64>)>
154 #### Unranked MemRef Types
156 Unranked memref types are converted to LLVM dialect literal structure type that
157 contains the dynamic information associated with the memref object, referred to
158 as *unranked descriptor*. It contains:
160 1. a converted `index`-typed integer representing the dynamic rank of the
162 2. a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with
163 the contents listed above.
165 This descriptor is primarily intended for interfacing with rank-polymorphic
166 library functions. The pointer to the ranked memref descriptor points to some
167 *allocated* memory, which may reside on stack of the current function or in
168 heap. Conversion patterns for operations producing unranked memrefs are expected
169 to manage the allocation. Note that this may lead to stack allocations
170 (`llvm.alloca`) being performed in a loop and not reclaimed until the end of the
175 Function types are converted to LLVM dialect function types as follows:
177 - function argument and result types are converted recursively using these
179 - if a function type has multiple results, they are wrapped into an LLVM
180 dialect literal structure type since LLVM function types must have exactly
182 - if a function type has no results, the corresponding LLVM dialect function
183 type will have one `!llvm.void` result since LLVM function types must have a
185 - function types used in arguments of another function type are wrapped in an
186 LLVM dialect pointer type to comply with LLVM IR expectations;
187 - the structs corresponding to `memref` types, both ranked and unranked,
188 appearing as function arguments are unbundled into individual function
189 arguments to allow for specifying metadata such as aliasing information on
191 - the conversion of `memref`-typed arguments is subject to
192 [calling conventions](TargetLLVMIR.md#calling-conventions).
193 - if a function type has boolean attribute `func.varargs` being set, the
194 converted LLVM function will be variadic.
199 // Zero-ary function type with no results:
201 // is converted to a zero-ary function with `void` result.
204 // Unary function with one result:
206 // has its argument and result type converted, before creating the LLVM dialect
208 !llvm.func<i64 (i32)>
210 // Binary function with one result:
212 // has its arguments handled separately
213 !llvm.func<i64 (i32, f32)>
215 // Binary function with two results:
216 (i32, f32) -> (i64, f64)
217 // has its result aggregated into a structure type.
218 !llvm.func<struct<(i64, f64)> (i32, f32)>
220 // Function-typed arguments or results in higher-order functions:
221 (() -> ()) -> (() -> ())
222 // are converted into pointers to functions.
223 !llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)>
225 // These rules apply recursively: a function type taking a function that takes
227 ( ( (i32) -> (i64) ) -> () ) -> ()
228 // is converted into a function type taking a pointer-to-function that takes
229 // another point-to-function.
230 !llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)>
232 // A memref descriptor appearing as function argument:
234 // gets converted into a list of individual scalar components of a descriptor.
235 !llvm.func<void (ptr<f32>, ptr<f32>, i64)>
237 // The list of arguments is linearized and one can freely mix memref and other
238 // types in this list:
239 (memref<f32>, f32) -> ()
240 // which gets converted into a flat list.
241 !llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)>
243 // For nD ranked memref descriptors:
244 (memref<?x?xf32>) -> ()
245 // the converted signature will contain 2n+1 `index`-typed integer arguments,
246 // offset, n sizes and n strides, per memref argument type.
247 !llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)>
249 // Same rules apply to unranked descriptors:
250 (memref<*xf32>) -> ()
251 // which get converted into their components.
252 !llvm.func<void (i64, ptr<i8>)>
254 // However, returning a memref from a function is not affected:
255 () -> (memref<?xf32>)
256 // gets converted to a function returning a descriptor structure.
257 !llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()>
259 // If multiple memref-typed results are returned:
260 () -> (memref<f32>, memref<f64>)
261 // their descriptor structures are additionally packed into another structure,
262 // potentially with other non-memref typed results.
263 !llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>,
264 struct<(ptr<double>, ptr<double>, i64)>)> ()>
266 // If "func.varargs" attribute is set:
267 (i32) -> () attributes { "func.varargs" = true }
268 // the corresponding LLVM function will be variadic:
269 !llvm.func<void (i32, ...)>
272 Conversion patterns are available to convert built-in function operations and
273 standard call operations targeting those functions using these conversion rules.
275 #### Multi-dimensional Vector Types
277 LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can
278 be multi-dimensional. Vector types cannot be nested in either IR. In the
279 one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same
280 size with element type converted using these conversion rules. In the
281 n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
282 of one-dimensional vectors.
289 !llvm.array<4 x vector<8 x f32>>
291 memref<2 x vector<4x8 x f32>
293 !llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>>
294 i64, array<1 x i64>, array<1 x i64>)>
299 Tensor types cannot be converted to the LLVM dialect. Operations on tensors must
300 be [bufferized](Bufferization.md) before being converted.
302 ### Calling Conventions
304 Calling conventions provides a mechanism to customize the conversion of function
305 and function call operations without changing how individual types are handled
306 elsewhere. They are implemented simultaneously by the default type converter and
307 by the conversion patterns for the relevant operations.
309 #### Function Result Packing
311 In case of multi-result functions, the returned values are inserted into a
312 structure-typed value before being returned and extracted from it at the call
313 site. This transformation is a part of the conversion and is transparent to the
314 defines and uses of the values being returned.
319 func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
320 return %arg0, %arg1 : i32, i64
323 %0 = arith.constant 42 : i32
324 %1 = arith.constant 17 : i64
325 %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
326 "use_i32"(%2#0) : (i32) -> ()
327 "use_i64"(%2#1) : (i64) -> ()
330 // is transformed into
332 llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
333 // insert the vales into a structure
334 %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
335 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
336 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
338 // return the structure value
339 llvm.return %2 : !llvm.struct<(i32, i64)>
342 %0 = llvm.mlir.constant(42 : i32) : i32
343 %1 = llvm.mlir.constant(17 : i64) : i64
345 // call and extract the values from the structure
346 %2 = llvm.call @bar(%0, %1)
347 : (i32, i32) -> !llvm.struct<(i32, i64)>
348 %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
349 %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
352 "use_i32"(%3) : (i32) -> ()
353 "use_i64"(%4) : (i64) -> ()
357 #### Default Calling Convention for Ranked MemRef
359 The default calling convention converts `memref`-typed function arguments to
360 LLVM dialect literal structs
361 [defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into
362 individual scalar arguments.
366 This convention is implemented in the conversion of `func.func` and `func.call` to
367 the LLVM dialect, with the former unpacking the descriptor into a set of
368 individual values and the latter packing those values back into a descriptor so
369 as to make it transparently usable by other operations. Conversions from other
370 dialects should take this convention into account.
372 This specific convention is motivated by the necessity to specify alignment and
373 aliasing attributes on the raw pointers underpinning the memref.
378 func.func @foo(%arg0: memref<?xf32>) -> () {
379 "use"(%arg0) : (memref<?xf32>) -> ()
383 // Gets converted to the following
384 // (using type alias for brevity):
385 !llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
387 llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer.
388 %arg1: !llvm.ptr<f32>, // Aligned pointer.
389 %arg2: i64, // Offset.
390 %arg3: i64, // Size in dim 0.
391 %arg4: i64) { // Stride in dim 0.
392 // Populate memref descriptor structure.
393 %0 = llvm.mlir.undef : !llvm.memref_1d
394 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
395 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
396 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
397 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
398 %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
400 // Descriptor is now usable as a single value.
401 "use"(%5) : (!llvm.memref_1d) -> ()
408 %0 = "get"() : () -> (memref<?xf32>)
409 call @foo(%0) : (memref<?xf32>) -> ()
413 // Gets converted to the following
414 // (using type alias for brevity):
415 !llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
418 %0 = "get"() : () -> !llvm.memref_1d
420 // Unpack the memref descriptor.
421 %1 = llvm.extractvalue %0[0] : !llvm.memref_1d
422 %2 = llvm.extractvalue %0[1] : !llvm.memref_1d
423 %3 = llvm.extractvalue %0[2] : !llvm.memref_1d
424 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
425 %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
427 // Pass individual values to the callee.
428 llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
433 #### Default Calling Convention for Unranked MemRef
435 For unranked memrefs, the list of function arguments always contains two
436 elements, same as the unranked memref descriptor: an integer rank, and a
437 type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that
438 while the *calling convention* does not require allocation, *casting* to
439 unranked memref does since one cannot take an address of an SSA value containing
440 the ranked memref, which must be stored in some memory instead. The caller is in
441 charge of ensuring the thread safety and management of the allocated memory, in
442 particular the deallocation.
447 llvm.func @foo(%arg0: memref<*xf32>) -> () {
448 "use"(%arg0) : (memref<*xf32>) -> ()
452 // Gets converted to the following.
454 llvm.func @foo(%arg0: i64 // Rank.
455 %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor.
456 // Pack the unranked memref descriptor.
457 %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
458 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)>
459 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)>
461 "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> ()
468 %0 = "get"() : () -> (memref<*xf32>)
469 call @foo(%0): (memref<*xf32>) -> ()
473 // Gets converted to the following.
476 %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>)
478 // Unpack the memref descriptor.
479 %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
480 %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
482 // Pass individual values to the callee.
483 llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>)
488 **Lifetime.** The second element of the unranked memref descriptor points to
489 some memory in which the ranked memref descriptor is stored. By convention, this
490 memory is allocated on stack and has the lifetime of the function. (*Note:* due
491 to function-length lifetime, creation of multiple unranked memref descriptors,
492 e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
493 be returned from a function, the ranked descriptor it points to is copied into
494 dynamically allocated memory, and the pointer in the unranked descriptor is
495 updated accordingly. The allocation happens immediately before returning. It is
496 the responsibility of the caller to free the dynamically allocated memory. The
497 default conversion of `func.call` and `func.call_indirect` copies the ranked
498 descriptor to newly allocated memory on the caller's stack. Thus, the convention
499 of the ranked memref descriptor pointed to by an unranked memref descriptor
500 being stored on stack is respected.
502 #### Bare Pointer Calling Convention for Ranked MemRef
504 The "bare pointer" calling convention converts `memref`-typed function arguments
505 to a *single* pointer to the aligned data. Note that this does *not* apply to
506 uses of `memref` outside of function signatures, the default descriptor
507 structures are still used. This convention further restricts the supported cases
510 - `memref` types with default layout.
511 - `memref` types with all dimensions statically known.
512 - `memref` values allocated in such a way that the allocated and aligned
513 pointer match. Alternatively, the same function must handle allocation and
514 deallocation since only one pointer is passed to any callee.
519 func.func @callee(memref<2x4xf32>)
521 func.func @caller(%0 : memref<2x4xf32>) {
522 call @callee(%0) : (memref<2x4xf32>) -> ()
527 !descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64,
528 array<2xi64>, array<2xi64>)>
530 llvm.func @callee(!llvm.ptr<f32>)
532 llvm.func @caller(%arg0: !llvm.ptr<f32>) {
533 // A descriptor value is defined at the function entry point.
534 %0 = llvm.mlir.undef : !descriptor
536 // Both the allocated and aligned pointer are set up to the same value.
537 %1 = llvm.insertelement %arg0, %0[0] : !descriptor
538 %2 = llvm.insertelement %arg0, %1[1] : !descriptor
540 // The offset is set up to zero.
541 %3 = llvm.mlir.constant(0 : index) : i64
542 %4 = llvm.insertelement %3, %2[2] : !descriptor
544 // The sizes and strides are derived from the statically known values.
545 %5 = llvm.mlir.constant(2 : index) : i64
546 %6 = llvm.mlir.constant(4 : index) : i64
547 %7 = llvm.insertelement %5, %4[3, 0] : !descriptor
548 %8 = llvm.insertelement %6, %7[3, 1] : !descriptor
549 %9 = llvm.mlir.constant(1 : index) : i64
550 %10 = llvm.insertelement %9, %8[4, 0] : !descriptor
551 %11 = llvm.insertelement %10, %9[4, 1] : !descriptor
553 // The function call corresponds to extracting the aligned data pointer.
554 %12 = llvm.extractelement %11[1] : !descriptor
555 llvm.call @callee(%12) : (!llvm.ptr<f32>) -> ()
559 #### Bare Pointer Calling Convention For Unranked MemRef
561 The "bare pointer" calling convention does not support unranked memrefs as their
562 shape cannot be known at compile time.
564 ### Generic alloction and deallocation functions
566 When converting the Memref dialect, allocations and deallocations are converted
567 into calls to `malloc` (`aligned_alloc` if aligned allocations are requested)
568 and `free`. However, it is possible to convert them to more generic functions
569 which can be implemented by a runtime library, thus allowing custom allocation
570 strategies or runtime profiling. When the conversion pass is instructed to
571 perform such operation, the names of the calles are
572 `_mlir_memref_to_llvm_alloc`, `_mlir_memref_to_llvm_aligned_alloc` and
573 `_mlir_memref_to_llvm_free`. Their signatures are the same of `malloc`,
574 `aligned_alloc` and `free`.
576 ### C-compatible wrapper emission
578 In practical cases, it may be desirable to have externally-facing functions with
579 a single attribute corresponding to a MemRef argument. When interfacing with
580 LLVM IR produced from C, the code needs to respect the corresponding calling
581 convention. The conversion to the LLVM dialect provides an option to generate
582 wrapper functions that take memref descriptors as pointers-to-struct compatible
583 with data types produced by Clang when compiling C sources. The generation of
584 such wrapper functions can additionally be controlled at a function granularity
585 by setting the `llvm.emit_c_interface` unit attribute.
587 More specifically, a memref argument is converted into a pointer-to-struct
588 argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where
589 `T` is the converted element type and `N` is the memref rank. This type is
590 compatible with that produced by Clang for the following C++ structure template
591 instantiations or their equivalents in C.
594 template<typename T, size_t N>
595 struct MemRefDescriptor {
604 Furthermore, we also rewrite function results to pointer parameters if the
605 rewritten function result has a struct type. The special result parameter is
606 added as the first parameter and is of pointer-to-struct type.
608 If enabled, the option will do the following. For *external* functions declared
611 1. Declare a new function `_mlir_ciface_<original name>` where memref arguments
612 are converted to pointer-to-struct and the remaining arguments are converted
613 as usual. Results are converted to a special argument if they are of struct
615 2. Add a body to the original function (making it non-external) that
616 1. allocates memref descriptors,
618 3. potentially allocates space for the result struct, and
619 4. passes the pointers to these into the newly declared interface function,
621 5. collects the result of the call (potentially from the result struct),
623 6. returns it to the caller.
625 For (non-external) functions defined in the MLIR module.
627 1. Define a new function `_mlir_ciface_<original name>` where memref arguments
628 are converted to pointer-to-struct and the remaining arguments are converted
629 as usual. Results are converted to a special argument if they are of struct
631 2. Populate the body of the newly defined function with IR that
632 1. loads descriptors from pointers;
633 2. unpacks descriptor into individual non-aggregate values;
634 3. passes these values into the original function;
635 4. collects the results of the call and
636 5. either copies the results into the result struct or returns them to the
643 func.func @qux(%arg0: memref<?x?xf32>)
645 // Gets converted into the following
646 // (using type alias for brevity):
647 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
649 // Function with unpacked arguments.
650 llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
651 %arg2: i64, %arg3: i64, %arg4: i64,
652 %arg5: i64, %arg6: i64) {
653 // Populate memref descriptor (as per calling convention).
654 %0 = llvm.mlir.undef : !llvm.memref_2d
655 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
656 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
657 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
658 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
659 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
660 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
661 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
663 // Store the descriptor in a stack-allocated space.
664 %8 = llvm.mlir.constant(1 : index) : i64
665 %9 = llvm.alloca %8 x !llvm.memref_2d
666 : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
667 array<2xi64>, array<2xi64>)>>
668 llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
669 array<2xi64>, array<2xi64>)>>
671 // Call the interface function.
672 llvm.call @_mlir_ciface_qux(%9)
673 : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
674 array<2xi64>, array<2xi64>)>>) -> ()
676 // The stored descriptor will be freed on return.
680 // Interface function.
681 llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
682 array<2xi64>, array<2xi64>)>>)
686 func.func @foo(%arg0: memref<?x?xf32>) {
690 // Gets converted into the following
691 // (using type alias for brevity):
692 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
693 !llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
695 // Function with unpacked arguments.
696 llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
697 %arg2: i64, %arg3: i64, %arg4: i64,
698 %arg5: i64, %arg6: i64) {
702 // Interface function callable from C.
703 llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) {
704 // Load the descriptor.
705 %0 = llvm.load %arg0 : !llvm.memref_2d_ptr
707 // Unpack the descriptor as per calling convention.
708 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
709 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
710 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
711 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
712 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
713 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
714 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
715 llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
716 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64,
723 func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
724 return %arg0 : memref<?x?xf32>
727 // Gets converted into the following
728 // (using type alias for brevity):
729 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
730 !llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
732 // Function with unpacked arguments.
733 llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64,
734 %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
736 %0 = llvm.mlir.undef : !llvm.memref_2d
737 %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
738 %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
739 %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
740 %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
741 %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
742 %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
743 %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
744 llvm.return %7 : !llvm.memref_2d
747 // Interface function callable from C.
748 llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) {
749 %0 = llvm.load %arg1 : !llvm.memref_2d_ptr
750 %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
751 %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
752 %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
753 %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
754 %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
755 %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
756 %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
757 %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
758 : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d
759 llvm.store %8, %arg0 : !llvm.memref_2d_ptr
764 Rationale: Introducing auxiliary functions for C-compatible interfaces is
765 preferred to modifying the calling convention since it will minimize the effect
766 of C compatibility on intra-module calls or calls between MLIR-generated
767 functions. In particular, when calling external functions from an MLIR module in
768 a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
769 stack exhaustion and/or concurrent access to the same address. Auxiliary
770 interface function serves as an allocation scope in this case. Furthermore, when
771 targeting accelerators with separate memory spaces such as GPUs, stack-allocated
772 descriptors passed by pointer would have to be transferred to the device memory,
773 which introduces significant overhead. In such situations, auxiliary interface
774 functions are executed on host and only pass the values through device function
775 invocation mechanism.
777 Limitation: Right now we cannot generate C interface for variadic functions,
778 regardless of being non-external or external. Because C functions are unable to
779 "forward" variadic arguments like this:
783 void foo(int x, ...) {
784 // ERROR: no way to forward variadic arguments.
789 ### Address Computation
791 Accesses to a memref element are transformed into an access to an element of the
792 buffer pointed to by the descriptor. The position of the element in the buffer
793 is calculated by linearizing memref indices in row-major order (lexically first
794 index is the slowest varying, similar to C, but accounting for strides). The
795 computation of the linear address is emitted as arithmetic operation in the LLVM
796 IR dialect. Strides are extracted from the memref descriptor.
800 An access to a memref with indices:
803 %0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
806 is transformed into the equivalent of the following code:
809 // Compute the linearized index from strides.
810 // When strides or, in absence of explicit strides, the corresponding sizes are
811 // dynamic, extract the stride value from the descriptor.
812 %stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
813 array<4xi64>, array<4xi64>)>
814 %addr1 = arith.muli %stride1, %1 : i64
816 // When the stride or, in absence of explicit strides, the trailing sizes are
817 // known statically, this value is used as a constant. The natural value of
818 // strides is the product of all sizes following the current dimension.
819 %stride2 = llvm.mlir.constant(32 : index) : i64
820 %addr2 = arith.muli %stride2, %2 : i64
821 %addr3 = arith.addi %addr1, %addr2 : i64
823 %stride3 = llvm.mlir.constant(8 : index) : i64
824 %addr4 = arith.muli %stride3, %3 : i64
825 %addr5 = arith.addi %addr3, %addr4 : i64
827 // Multiplication with the known unit stride can be omitted.
828 %addr6 = arith.addi %addr5, %4 : i64
830 // If the linear offset is known to be zero, it can also be omitted. If it is
831 // dynamic, it is extracted from the descriptor.
832 %offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
833 array<4xi64>, array<4xi64>)>
834 %addr7 = arith.addi %addr6, %offset : i64
836 // All accesses are based on the aligned pointer.
837 %aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
838 array<4xi64>, array<4xi64>)>
840 // Get the address of the data pointer.
841 %ptr = llvm.getelementptr %aligned[%addr7]
842 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)>
845 // Perform the actual load.
846 %0 = llvm.load %ptr : !llvm.ptr<f32>
849 For stores, the address computation code is identical and only the actual store
850 operation is different.
852 Note: the conversion does not perform any sort of common subexpression
853 elimination when emitting memref accesses.
857 Utility classes common to many conversions to the LLVM dialect can be found
858 under `lib/Conversion/LLVMCommon`. They include the following.
860 - `LLVMConversionTarget` specifies all LLVM dialect operations as legal.
861 - `LLVMTypeConverter` implements the default type conversion as described
863 - `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM
864 dialect-specific functionality.
865 - `VectorConvertOpToLLVMPattern` extends the previous class to automatically
866 unroll operations on higher-dimensional vectors into lists of operations on
867 one-dimensional vectors before.
868 - `StructBuilder` provides a convenient API for building IR that creates or
869 accesses values of LLVM dialect structure types; it is derived by
870 `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the
871 built-in types convertible to LLVM dialect structure types.
873 ## Translation to LLVM IR
875 MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata`
876 operations can be translated to LLVM IR modules using the following scheme.
878 - Module-level globals are translated to LLVM IR global values.
879 - Module-level metadata are translated to LLVM IR metadata, which can be later
880 augmented with additional metadata defined on specific ops.
881 - All functions are declared in the module so that they can be referenced.
882 - Each function is then translated separately and has access to the complete
883 mappings between MLIR and LLVM IR globals, metadata, and functions.
884 - Within a function, blocks are traversed in topological order and translated
885 to LLVM IR basic blocks. In each basic block, PHI nodes are created for each
886 of the block arguments, but not connected to their source blocks.
887 - Within each block, operations are translated in their order. Each operation
888 has access to the same mappings as the function and additionally to the
889 mapping of values between MLIR and LLVM IR, including PHI nodes. Operations
890 with regions are responsible for translated the regions they contain.
891 - After operations in a function are translated, the PHI nodes of blocks in
892 this function are connected to their source values, which are now available.
894 The translation mechanism provides extension hooks for translating custom
895 operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`:
897 - `convertOperation` translates an operation that belongs to the current
898 dialect to LLVM IR given an `IRBuilderBase` and various mappings;
899 - `amendOperation` performs additional actions on an operation if it contains
900 a dialect attribute that belongs to the current dialect, for example sets up
901 instruction-level metadata.
903 Dialects containing operations or attributes that want to be translated to LLVM
904 IR must provide an implementation of this interface and register it with the
905 system. Note that registration may happen without creating the dialect, for
906 example, in a separate library to avoid the need for the "main" dialect library
907 to depend on LLVM IR libraries. The implementations of these methods may used
909 [`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html)
910 object provided to them which holds the state of the translation and contains
913 Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a
914 small, relatively stable set of instructions and types that MLIR intends to
915 model fully. Therefore, the extension mechanism is provided only for LLVM IR
916 constructs that are more often extended -- intrinsics and metadata. The primary
917 goal of the extension mechanism is to support sets of intrinsics, for example
918 those representing a particular instruction set. The extension mechanism does
919 not allow for customizing type or block translation, nor does it support custom
920 module-level operations. Such transformations should be performed within MLIR
921 and target the corresponding MLIR constructs.
923 ## Translation from LLVM IR
925 An experimental flow allows one to import a substantially limited subset of LLVM
926 IR into MLIR, producing LLVM dialect operations.
929 mlir-translate -import-llvm filename.ll