mlir/docs/TargetLLVMIR.md

   1 # LLVM IR Target
   2
   3 This document describes the mechanisms of producing LLVM IR from MLIR. The
   4 overall flow is two-stage:
   5
   6 1.  **conversion** of the IR to a set of dialects translatable to LLVM IR, for
   7     example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific
   8     dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md),
   9     [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md);
  10 2.  **translation** of MLIR dialects to LLVM IR.
  11
  12 This flow allows the non-trivial transformation to be performed within MLIR
  13 using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and
  14 potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR
  15 are expected to closely match the corresponding LLVM IR instructions and
  16 intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well
  17 as reduces the churn in case of changes.
  18
  19 Note that many different dialects can be lowered to LLVM but are provided as
  20 different sets of patterns and have different passes available to mlir-opt.
  21 However, this is primarily useful for testing and prototyping, and using the
  22 collection of patterns together is highly recommended. One place this is
  23 important and visible is the ControlFlow dialect's branching operations which
  24 will fail to apply if their types mismatch with the blocks they jump to in the
  25 parent op.
  26
  27 SPIR-V to LLVM dialect conversion has a
  28 [dedicated document](SPIRVToLLVMDialectConversion.md).
  29
  30 [TOC]
  31
  32 ## Conversion to the LLVM Dialect
  33
  34 Conversion to the LLVM dialect from other dialects is the first step to produce
  35 LLVM IR. All non-trivial IR modifications are expected to happen at this stage
  36 or before. The conversion is *progressive*: most passes convert one dialect to
  37 the LLVM dialect and keep operations from other dialects intact. For example,
  38 the `-finalize-memref-to-llvm` pass will only convert operations from the
  39 `memref` dialect but will not convert operations from other dialects even if
  40 they use or produce `memref`-typed values.
  41
  42 The process relies on the [Dialect Conversion](DialectConversion.md)
  43 infrastructure and, in particular, on the
  44 [materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter`
  45 to support progressive lowering by injecting `unrealized_conversion_cast`
  46 operations between converted and unconverted operations. After multiple partial
  47 conversions to the LLVM dialect are performed, the cast operations that became
  48 noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass
  49 is not specific to the LLVM dialect and can remove any noop casts.
  50
  51 ### Conversion of Built-in Types
  52
  53 Built-in types have a default conversion to LLVM dialect types provided by the
  54 `LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend
  55 this type converter to support other types. Extra care must be taken if the
  56 conversion rules for built-in types are overridden: all conversion must use the
  57 same type converter.
  58
  59 #### LLVM Dialect-compatible Types
  60
  61 The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the
  62 LLVM dialect are kept as is.
  63
  64 #### Complex Type
  65
  66 Complex type is converted into an LLVM dialect literal structure type with two
  67 elements:
  68
  69 -   real part;
  70 -   imaginary part.
  71
  72 The elemental type is converted recursively using these rules.
  73
  74 Example:
  75
  76 ```mlir
  77   complex<f32>
  78   // ->
  79   !llvm.struct<(f32, f32)>
  80 ```
  81
  82 #### Index Type
  83
  84 Index type is converted into an LLVM dialect integer type with the bitwidth
  85 specified by the [data layout](DataLayout.md) of the closest module. For
  86 example, on x86-64 CPUs it converts to i64. This behavior can be overridden by
  87 the type converter configuration, which is often exposed as a pass option by
  88 conversion passes.
  89
  90 Example:
  91
  92 ```mlir
  93   index
  94   // -> on x86_64
  95   i64
  96 ```
  97
  98 #### Ranked MemRef Types
  99
 100 Ranked memref types are converted into an LLVM dialect literal structure type
 101 that contains the dynamic information associated with the memref object,
 102 referred to as *descriptor*. Only memrefs in the
 103 **[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the
 104 LLVM dialect with the default descriptor format. Memrefs with other, less
 105 trivial layouts should be converted into the strided form first, e.g., by
 106 materializing the non-trivial address remapping due to layout as `affine.apply`
 107 operations.
 108
 109 The default memref descriptor is a struct with the following fields:
 110
 111 1.  The pointer to the data buffer as allocated, referred to as "allocated
 112     pointer". This is only useful for deallocating the memref.
 113 2.  The pointer to the properly aligned data pointer that the memref indexes,
 114     referred to as "aligned pointer".
 115 3.  A lowered converted `index`-type integer containing the distance in number
 116     of elements between the beginning of the (aligned) buffer and the first
 117     element to be accessed through the memref, referred to as "offset".
 118 4.  An array containing as many converted `index`-type integers as the rank of
 119     the memref: the array represents the size, in number of elements, of the
 120     memref along the given dimension.
 121 5.  A second array containing as many converted `index`-type integers as the
 122     rank of memref: the second array represents the "stride" (in tensor
 123     abstraction sense), i.e. the number of consecutive elements of the
 124     underlying buffer one needs to jump over to get to the next logically
 125     indexed element.
 126
 127 For constant memref dimensions, the corresponding size entry is a constant whose
 128 runtime value matches the static value. This normalization serves as an ABI for
 129 the memref type to interoperate with externally linked functions. In the
 130 particular case of rank `0` memrefs, the size and stride arrays are omitted,
 131 resulting in a struct containing two pointers + offset.
 132
 133 Examples:
 134
 135 ```mlir
 136 // Assuming index is converted to i64.
 137
 138 memref<f32> -> !llvm.struct<(ptr<f32> , ptr<f32>, i64)>
 139 memref<1 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64,
 140                                  array<1 x i64>, array<1 x i64>)>
 141 memref<? x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
 142                                  array<1 x i64>, array<1 x i64>)>
 143 memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
 144                                                array<5 x i64>, array<5 x i64>)>
 145 memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr<f32>, ptr<f32>, i64
 146                                              array<5 x i64>, array<5 x i64>)>
 147
 148 // Memref types can have vectors as element types
 149 memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr<vector<4 x f32>>,
 150                                              ptr<vector<4 x f32>>, i64,
 151                                              array<2 x i64>, array<2 x i64>)>
 152 ```
 153
 154 #### Unranked MemRef Types
 155
 156 Unranked memref types are converted to LLVM dialect literal structure type that
 157 contains the dynamic information associated with the memref object, referred to
 158 as *unranked descriptor*. It contains:
 159
 160 1.  a converted `index`-typed integer representing the dynamic rank of the
 161     memref;
 162 2.  a type-erased pointer (`!llvm.ptr<i8>`) to a ranked memref descriptor with
 163     the contents listed above.
 164
 165 This descriptor is primarily intended for interfacing with rank-polymorphic
 166 library functions. The pointer to the ranked memref descriptor points to some
 167 *allocated* memory, which may reside on stack of the current function or in
 168 heap. Conversion patterns for operations producing unranked memrefs are expected
 169 to manage the allocation. Note that this may lead to stack allocations
 170 (`llvm.alloca`) being performed in a loop and not reclaimed until the end of the
 171 current function.
 172
 173 #### Function Types
 174
 175 Function types are converted to LLVM dialect function types as follows:
 176
 177 -   function argument and result types are converted recursively using these
 178     rules;
 179 -   if a function type has multiple results, they are wrapped into an LLVM
 180     dialect literal structure type since LLVM function types must have exactly
 181     one result;
 182 -   if a function type has no results, the corresponding LLVM dialect function
 183     type will have one `!llvm.void` result since LLVM function types must have a
 184     result;
 185 -   function types used in arguments of another function type are wrapped in an
 186     LLVM dialect pointer type to comply with LLVM IR expectations;
 187 -   the structs corresponding to `memref` types, both ranked and unranked,
 188     appearing as function arguments are unbundled into individual function
 189     arguments to allow for specifying metadata such as aliasing information on
 190     individual pointers;
 191 -   the conversion of `memref`-typed arguments is subject to
 192     [calling conventions](TargetLLVMIR.md#calling-conventions).
 193 -   if a function type has boolean attribute `func.varargs` being set, the
 194     converted LLVM function will be variadic.
 195
 196 Examples:
 197
 198 ```mlir
 199 // Zero-ary function type with no results:
 200 () -> ()
 201 // is converted to a zero-ary function with `void` result.
 202 !llvm.func<void ()>
 203
 204 // Unary function with one result:
 205 (i32) -> (i64)
 206 // has its argument and result type converted, before creating the LLVM dialect
 207 // function type.
 208 !llvm.func<i64 (i32)>
 209
 210 // Binary function with one result:
 211 (i32, f32) -> (i64)
 212 // has its arguments handled separately
 213 !llvm.func<i64 (i32, f32)>
 214
 215 // Binary function with two results:
 216 (i32, f32) -> (i64, f64)
 217 // has its result aggregated into a structure type.
 218 !llvm.func<struct<(i64, f64)> (i32, f32)>
 219
 220 // Function-typed arguments or results in higher-order functions:
 221 (() -> ()) -> (() -> ())
 222 // are converted into pointers to functions.
 223 !llvm.func<ptr<func<void ()>> (ptr<func<void ()>>)>
 224
 225 // These rules apply recursively: a function type taking a function that takes
 226 // another function
 227 ( ( (i32) -> (i64) ) -> () ) -> ()
 228 // is converted into a function type taking a pointer-to-function that takes
 229 // another point-to-function.
 230 !llvm.func<void (ptr<func<void (ptr<func<i64 (i32)>>)>>)>
 231
 232 // A memref descriptor appearing as function argument:
 233 (memref<f32>) -> ()
 234 // gets converted into a list of individual scalar components of a descriptor.
 235 !llvm.func<void (ptr<f32>, ptr<f32>, i64)>
 236
 237 // The list of arguments is linearized and one can freely mix memref and other
 238 // types in this list:
 239 (memref<f32>, f32) -> ()
 240 // which gets converted into a flat list.
 241 !llvm.func<void (ptr<f32>, ptr<f32>, i64, f32)>
 242
 243 // For nD ranked memref descriptors:
 244 (memref<?x?xf32>) -> ()
 245 // the converted signature will contain 2n+1 `index`-typed integer arguments,
 246 // offset, n sizes and n strides, per memref argument type.
 247 !llvm.func<void (ptr<f32>, ptr<f32>, i64, i64, i64, i64, i64)>
 248
 249 // Same rules apply to unranked descriptors:
 250 (memref<*xf32>) -> ()
 251 // which get converted into their components.
 252 !llvm.func<void (i64, ptr<i8>)>
 253
 254 // However, returning a memref from a function is not affected:
 255 () -> (memref<?xf32>)
 256 // gets converted to a function returning a descriptor structure.
 257 !llvm.func<struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)> ()>
 258
 259 // If multiple memref-typed results are returned:
 260 () -> (memref<f32>, memref<f64>)
 261 // their descriptor structures are additionally packed into another structure,
 262 // potentially with other non-memref typed results.
 263 !llvm.func<struct<(struct<(ptr<f32>, ptr<f32>, i64)>,
 264                    struct<(ptr<double>, ptr<double>, i64)>)> ()>
 265
 266 // If "func.varargs" attribute is set:
 267 (i32) -> () attributes { "func.varargs" = true }
 268 // the corresponding LLVM function will be variadic:
 269 !llvm.func<void (i32, ...)>
 270 ```
 271
 272 Conversion patterns are available to convert built-in function operations and
 273 standard call operations targeting those functions using these conversion rules.
 274
 275 #### Multi-dimensional Vector Types
 276
 277 LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can
 278 be multi-dimensional. Vector types cannot be nested in either IR. In the
 279 one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same
 280 size with element type converted using these conversion rules. In the
 281 n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
 282 of one-dimensional vectors.
 283
 284 Examples:
 285
 286 ```
 287 vector<4x8 x f32>
 288 // ->
 289 !llvm.array<4 x vector<8 x f32>>
 290
 291 memref<2 x vector<4x8 x f32>
 292 // ->
 293 !llvm.struct<(ptr<array<4 x vector<8xf32>>>, ptr<array<4 x vector<8xf32>>>
 294               i64, array<1 x i64>, array<1 x i64>)>
 295 ```
 296
 297 #### Tensor Types
 298
 299 Tensor types cannot be converted to the LLVM dialect. Operations on tensors must
 300 be [bufferized](Bufferization.md) before being converted.
 301
 302 ### Calling Conventions
 303
 304 Calling conventions provides a mechanism to customize the conversion of function
 305 and function call operations without changing how individual types are handled
 306 elsewhere. They are implemented simultaneously by the default type converter and
 307 by the conversion patterns for the relevant operations.
 308
 309 #### Function Result Packing
 310
 311 In case of multi-result functions, the returned values are inserted into a
 312 structure-typed value before being returned and extracted from it at the call
 313 site. This transformation is a part of the conversion and is transparent to the
 314 defines and uses of the values being returned.
 315
 316 Example:
 317
 318 ```mlir
 319 func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
 320   return %arg0, %arg1 : i32, i64
 321 }
 322 func.func @bar() {
 323   %0 = arith.constant 42 : i32
 324   %1 = arith.constant 17 : i64
 325   %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
 326   "use_i32"(%2#0) : (i32) -> ()
 327   "use_i64"(%2#1) : (i64) -> ()
 328 }
 329
 330 // is transformed into
 331
 332 llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
 333   // insert the vales into a structure
 334   %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
 335   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
 336   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
 337
 338   // return the structure value
 339   llvm.return %2 : !llvm.struct<(i32, i64)>
 340 }
 341 llvm.func @bar() {
 342   %0 = llvm.mlir.constant(42 : i32) : i32
 343   %1 = llvm.mlir.constant(17 : i64) : i64
 344
 345   // call and extract the values from the structure
 346   %2 = llvm.call @bar(%0, %1)
 347      : (i32, i32) -> !llvm.struct<(i32, i64)>
 348   %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
 349   %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
 350
 351   // use as before
 352   "use_i32"(%3) : (i32) -> ()
 353   "use_i64"(%4) : (i64) -> ()
 354 }
 355 ```
 356
 357 #### Default Calling Convention for Ranked MemRef
 358
 359 The default calling convention converts `memref`-typed function arguments to
 360 LLVM dialect literal structs
 361 [defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into
 362 individual scalar arguments.
 363
 364 Examples:
 365
 366 This convention is implemented in the conversion of `func.func` and `func.call` to
 367 the LLVM dialect, with the former unpacking the descriptor into a set of
 368 individual values and the latter packing those values back into a descriptor so
 369 as to make it transparently usable by other operations. Conversions from other
 370 dialects should take this convention into account.
 371
 372 This specific convention is motivated by the necessity to specify alignment and
 373 aliasing attributes on the raw pointers underpinning the memref.
 374
 375 Examples:
 376
 377 ```mlir
 378 func.func @foo(%arg0: memref<?xf32>) -> () {
 379   "use"(%arg0) : (memref<?xf32>) -> ()
 380   return
 381 }
 382
 383 // Gets converted to the following
 384 // (using type alias for brevity):
 385 !llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
 386
 387 llvm.func @foo(%arg0: !llvm.ptr<f32>,  // Allocated pointer.
 388                %arg1: !llvm.ptr<f32>,  // Aligned pointer.
 389                %arg2: i64,             // Offset.
 390                %arg3: i64,             // Size in dim 0.
 391                %arg4: i64) {           // Stride in dim 0.
 392   // Populate memref descriptor structure.
 393   %0 = llvm.mlir.undef : !llvm.memref_1d
 394   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
 395   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
 396   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
 397   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
 398   %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
 399
 400   // Descriptor is now usable as a single value.
 401   "use"(%5) : (!llvm.memref_1d) -> ()
 402   llvm.return
 403 }
 404 ```
 405
 406 ```mlir
 407 func.func @bar() {
 408   %0 = "get"() : () -> (memref<?xf32>)
 409   call @foo(%0) : (memref<?xf32>) -> ()
 410   return
 411 }
 412
 413 // Gets converted to the following
 414 // (using type alias for brevity):
 415 !llvm.memref_1d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1xi64>, array<1xi64>)>
 416
 417 llvm.func @bar() {
 418   %0 = "get"() : () -> !llvm.memref_1d
 419
 420   // Unpack the memref descriptor.
 421   %1 = llvm.extractvalue %0[0] : !llvm.memref_1d
 422   %2 = llvm.extractvalue %0[1] : !llvm.memref_1d
 423   %3 = llvm.extractvalue %0[2] : !llvm.memref_1d
 424   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
 425   %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
 426
 427   // Pass individual values to the callee.
 428   llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
 429   llvm.return
 430 }
 431 ```
 432
 433 #### Default Calling Convention for Unranked MemRef
 434
 435 For unranked memrefs, the list of function arguments always contains two
 436 elements, same as the unranked memref descriptor: an integer rank, and a
 437 type-erased (`!llvm<"i8*">`) pointer to the ranked memref descriptor. Note that
 438 while the *calling convention* does not require allocation, *casting* to
 439 unranked memref does since one cannot take an address of an SSA value containing
 440 the ranked memref, which must be stored in some memory instead. The caller is in
 441 charge of ensuring the thread safety and management of the allocated memory, in
 442 particular the deallocation.
 443
 444 Example
 445
 446 ```mlir
 447 llvm.func @foo(%arg0: memref<*xf32>) -> () {
 448   "use"(%arg0) : (memref<*xf32>) -> ()
 449   return
 450 }
 451
 452 // Gets converted to the following.
 453
 454 llvm.func @foo(%arg0: i64              // Rank.
 455                %arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor.
 456   // Pack the unranked memref descriptor.
 457   %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
 458   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)>
 459   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)>
 460
 461   "use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> ()
 462   llvm.return
 463 }
 464 ```
 465
 466 ```mlir
 467 llvm.func @bar() {
 468   %0 = "get"() : () -> (memref<*xf32>)
 469   call @foo(%0): (memref<*xf32>) -> ()
 470   return
 471 }
 472
 473 // Gets converted to the following.
 474
 475 llvm.func @bar() {
 476   %0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>)
 477
 478   // Unpack the memref descriptor.
 479   %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
 480   %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
 481
 482   // Pass individual values to the callee.
 483   llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>)
 484   llvm.return
 485 }
 486 ```
 487
 488 **Lifetime.** The second element of the unranked memref descriptor points to
 489 some memory in which the ranked memref descriptor is stored. By convention, this
 490 memory is allocated on stack and has the lifetime of the function. (*Note:* due
 491 to function-length lifetime, creation of multiple unranked memref descriptors,
 492 e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
 493 be returned from a function, the ranked descriptor it points to is copied into
 494 dynamically allocated memory, and the pointer in the unranked descriptor is
 495 updated accordingly. The allocation happens immediately before returning. It is
 496 the responsibility of the caller to free the dynamically allocated memory. The
 497 default conversion of `func.call` and `func.call_indirect` copies the ranked
 498 descriptor to newly allocated memory on the caller's stack. Thus, the convention
 499 of the ranked memref descriptor pointed to by an unranked memref descriptor
 500 being stored on stack is respected.
 501
 502 #### Bare Pointer Calling Convention for Ranked MemRef
 503
 504 The "bare pointer" calling convention converts `memref`-typed function arguments
 505 to a *single* pointer to the aligned data. Note that this does *not* apply to
 506 uses of `memref` outside of function signatures, the default descriptor
 507 structures are still used. This convention further restricts the supported cases
 508 to the following.
 509
 510 -   `memref` types with default layout.
 511 -   `memref` types with all dimensions statically known.
 512 -   `memref` values allocated in such a way that the allocated and aligned
 513     pointer match. Alternatively, the same function must handle allocation and
 514     deallocation since only one pointer is passed to any callee.
 515
 516 Examples:
 517
 518 ```
 519 func.func @callee(memref<2x4xf32>)
 520
 521 func.func @caller(%0 : memref<2x4xf32>) {
 522   call @callee(%0) : (memref<2x4xf32>) -> ()
 523 }
 524
 525 // ->
 526
 527 !descriptor = !llvm.struct<(ptr<f32>, ptr<f32>, i64,
 528                             array<2xi64>, array<2xi64>)>
 529
 530 llvm.func @callee(!llvm.ptr<f32>)
 531
 532 llvm.func @caller(%arg0: !llvm.ptr<f32>) {
 533   // A descriptor value is defined at the function entry point.
 534   %0 = llvm.mlir.undef : !descriptor
 535
 536   // Both the allocated and aligned pointer are set up to the same value.
 537   %1 = llvm.insertelement %arg0, %0[0] : !descriptor
 538   %2 = llvm.insertelement %arg0, %1[1] : !descriptor
 539
 540   // The offset is set up to zero.
 541   %3 = llvm.mlir.constant(0 : index) : i64
 542   %4 = llvm.insertelement %3, %2[2] : !descriptor
 543
 544   // The sizes and strides are derived from the statically known values.
 545   %5 = llvm.mlir.constant(2 : index) : i64
 546   %6 = llvm.mlir.constant(4 : index) : i64
 547   %7 = llvm.insertelement %5, %4[3, 0] : !descriptor
 548   %8 = llvm.insertelement %6, %7[3, 1] : !descriptor
 549   %9 = llvm.mlir.constant(1 : index) : i64
 550   %10 = llvm.insertelement %9, %8[4, 0] : !descriptor
 551   %11 = llvm.insertelement %10, %9[4, 1] : !descriptor
 552
 553   // The function call corresponds to extracting the aligned data pointer.
 554   %12 = llvm.extractelement %11[1] : !descriptor
 555   llvm.call @callee(%12) : (!llvm.ptr<f32>) -> ()
 556 }
 557 ```
 558
 559 #### Bare Pointer Calling Convention For Unranked MemRef
 560
 561 The "bare pointer" calling convention does not support unranked memrefs as their
 562 shape cannot be known at compile time.
 563
 564 ### Generic alloction and deallocation functions
 565
 566 When converting the Memref dialect, allocations and deallocations are converted
 567 into calls to `malloc` (`aligned_alloc` if aligned allocations are requested)
 568 and `free`. However, it is possible to convert them to more generic functions
 569 which can be implemented by a runtime library, thus allowing custom allocation
 570 strategies or runtime profiling. When the conversion pass is  instructed to
 571 perform such operation, the names of the calles are
 572 `_mlir_memref_to_llvm_alloc`, `_mlir_memref_to_llvm_aligned_alloc` and
 573 `_mlir_memref_to_llvm_free`. Their signatures are the same of `malloc`,
 574 `aligned_alloc` and `free`.
 575
 576 ### C-compatible wrapper emission
 577
 578 In practical cases, it may be desirable to have externally-facing functions with
 579 a single attribute corresponding to a MemRef argument. When interfacing with
 580 LLVM IR produced from C, the code needs to respect the corresponding calling
 581 convention. The conversion to the LLVM dialect provides an option to generate
 582 wrapper functions that take memref descriptors as pointers-to-struct compatible
 583 with data types produced by Clang when compiling C sources. The generation of
 584 such wrapper functions can additionally be controlled at a function granularity
 585 by setting the `llvm.emit_c_interface` unit attribute.
 586
 587 More specifically, a memref argument is converted into a pointer-to-struct
 588 argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where
 589 `T` is the converted element type and `N` is the memref rank. This type is
 590 compatible with that produced by Clang for the following C++ structure template
 591 instantiations or their equivalents in C.
 592
 593 ```cpp
 594 template<typename T, size_t N>
 595 struct MemRefDescriptor {
 596   T *allocated;
 597   T *aligned;
 598   intptr_t offset;
 599   intptr_t sizes[N];
 600   intptr_t strides[N];
 601 };
 602 ```
 603
 604 Furthermore, we also rewrite function results to pointer parameters if the
 605 rewritten function result has a struct type. The special result parameter is
 606 added as the first parameter and is of pointer-to-struct type.
 607
 608 If enabled, the option will do the following. For *external* functions declared
 609 in the MLIR module.
 610
 611 1.  Declare a new function `_mlir_ciface_<original name>` where memref arguments
 612     are converted to pointer-to-struct and the remaining arguments are converted
 613     as usual. Results are converted to a special argument if they are of struct
 614     type.
 615 2.  Add a body to the original function (making it non-external) that
 616     1.  allocates memref descriptors,
 617     2.  populates them,
 618     3.  potentially allocates space for the result struct, and
 619     4.  passes the pointers to these into the newly declared interface function,
 620         then
 621     5.  collects the result of the call (potentially from the result struct),
 622         and
 623     6.  returns it to the caller.
 624
 625 For (non-external) functions defined in the MLIR module.
 626
 627 1.  Define a new function `_mlir_ciface_<original name>` where memref arguments
 628     are converted to pointer-to-struct and the remaining arguments are converted
 629     as usual. Results are converted to a special argument if they are of struct
 630     type.
 631 2.  Populate the body of the newly defined function with IR that
 632     1.  loads descriptors from pointers;
 633     2.  unpacks descriptor into individual non-aggregate values;
 634     3.  passes these values into the original function;
 635     4.  collects the results of the call and
 636     5.  either copies the results into the result struct or returns them to the
 637         caller.
 638
 639 Examples:
 640
 641 ```mlir
 642
 643 func.func @qux(%arg0: memref<?x?xf32>)
 644
 645 // Gets converted into the following
 646 // (using type alias for brevity):
 647 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
 648
 649 // Function with unpacked arguments.
 650 llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
 651                %arg2: i64, %arg3: i64, %arg4: i64,
 652                %arg5: i64, %arg6: i64) {
 653   // Populate memref descriptor (as per calling convention).
 654   %0 = llvm.mlir.undef : !llvm.memref_2d
 655   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
 656   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
 657   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
 658   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
 659   %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
 660   %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
 661   %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
 662
 663   // Store the descriptor in a stack-allocated space.
 664   %8 = llvm.mlir.constant(1 : index) : i64
 665   %9 = llvm.alloca %8 x !llvm.memref_2d
 666      : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
 667                                         array<2xi64>, array<2xi64>)>>
 668   llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
 669                                         array<2xi64>, array<2xi64>)>>
 670
 671   // Call the interface function.
 672   llvm.call @_mlir_ciface_qux(%9)
 673      : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
 674                           array<2xi64>, array<2xi64>)>>) -> ()
 675
 676   // The stored descriptor will be freed on return.
 677   llvm.return
 678 }
 679
 680 // Interface function.
 681 llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
 682                                               array<2xi64>, array<2xi64>)>>)
 683 ```
 684
 685 ```mlir
 686 func.func @foo(%arg0: memref<?x?xf32>) {
 687   return
 688 }
 689
 690 // Gets converted into the following
 691 // (using type alias for brevity):
 692 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
 693 !llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
 694
 695 // Function with unpacked arguments.
 696 llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
 697                %arg2: i64, %arg3: i64, %arg4: i64,
 698                %arg5: i64, %arg6: i64) {
 699   llvm.return
 700 }
 701
 702 // Interface function callable from C.
 703 llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) {
 704   // Load the descriptor.
 705   %0 = llvm.load %arg0 : !llvm.memref_2d_ptr
 706
 707   // Unpack the descriptor as per calling convention.
 708   %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
 709   %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
 710   %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
 711   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
 712   %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
 713   %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
 714   %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
 715   llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
 716     : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64,
 717        i64, i64) -> ()
 718   llvm.return
 719 }
 720 ```
 721
 722 ```mlir
 723 func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
 724   return %arg0 : memref<?x?xf32>
 725 }
 726
 727 // Gets converted into the following
 728 // (using type alias for brevity):
 729 !llvm.memref_2d = !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>
 730 !llvm.memref_2d_ptr = !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<2xi64>, array<2xi64>)>>
 731
 732 // Function with unpacked arguments.
 733 llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64,
 734                %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
 735     -> !llvm.memref_2d {
 736   %0 = llvm.mlir.undef : !llvm.memref_2d
 737   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
 738   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
 739   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
 740   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
 741   %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
 742   %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
 743   %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
 744   llvm.return %7 : !llvm.memref_2d
 745 }
 746
 747 // Interface function callable from C.
 748 llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) {
 749   %0 = llvm.load %arg1 : !llvm.memref_2d_ptr
 750   %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
 751   %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
 752   %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
 753   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
 754   %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
 755   %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
 756   %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
 757   %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
 758     : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d
 759   llvm.store %8, %arg0 : !llvm.memref_2d_ptr
 760   llvm.return
 761 }
 762 ```
 763
 764 Rationale: Introducing auxiliary functions for C-compatible interfaces is
 765 preferred to modifying the calling convention since it will minimize the effect
 766 of C compatibility on intra-module calls or calls between MLIR-generated
 767 functions. In particular, when calling external functions from an MLIR module in
 768 a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
 769 stack exhaustion and/or concurrent access to the same address. Auxiliary
 770 interface function serves as an allocation scope in this case. Furthermore, when
 771 targeting accelerators with separate memory spaces such as GPUs, stack-allocated
 772 descriptors passed by pointer would have to be transferred to the device memory,
 773 which introduces significant overhead. In such situations, auxiliary interface
 774 functions are executed on host and only pass the values through device function
 775 invocation mechanism.
 776
 777 Limitation: Right now we cannot generate C interface for variadic functions,
 778 regardless of being non-external or external. Because C functions are unable to
 779 "forward" variadic arguments like this:
 780 ```c
 781 void bar(int, ...);
 782
 783 void foo(int x, ...) {
 784   // ERROR: no way to forward variadic arguments.
 785   void bar(x, ...);
 786 }
 787 ```
 788
 789 ### Address Computation
 790
 791 Accesses to a memref element are transformed into an access to an element of the
 792 buffer pointed to by the descriptor. The position of the element in the buffer
 793 is calculated by linearizing memref indices in row-major order (lexically first
 794 index is the slowest varying, similar to C, but accounting for strides). The
 795 computation of the linear address is emitted as arithmetic operation in the LLVM
 796 IR dialect. Strides are extracted from the memref descriptor.
 797
 798 Examples:
 799
 800 An access to a memref with indices:
 801
 802 ```mlir
 803 %0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
 804 ```
 805
 806 is transformed into the equivalent of the following code:
 807
 808 ```mlir
 809 // Compute the linearized index from strides.
 810 // When strides or, in absence of explicit strides, the corresponding sizes are
 811 // dynamic, extract the stride value from the descriptor.
 812 %stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
 813                                                    array<4xi64>, array<4xi64>)>
 814 %addr1 = arith.muli %stride1, %1 : i64
 815
 816 // When the stride or, in absence of explicit strides, the trailing sizes are
 817 // known statically, this value is used as a constant. The natural value of
 818 // strides is the product of all sizes following the current dimension.
 819 %stride2 = llvm.mlir.constant(32 : index) : i64
 820 %addr2 = arith.muli %stride2, %2 : i64
 821 %addr3 = arith.addi %addr1, %addr2 : i64
 822
 823 %stride3 = llvm.mlir.constant(8 : index) : i64
 824 %addr4 = arith.muli %stride3, %3 : i64
 825 %addr5 = arith.addi %addr3, %addr4 : i64
 826
 827 // Multiplication with the known unit stride can be omitted.
 828 %addr6 = arith.addi %addr5, %4 : i64
 829
 830 // If the linear offset is known to be zero, it can also be omitted. If it is
 831 // dynamic, it is extracted from the descriptor.
 832 %offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
 833                                                array<4xi64>, array<4xi64>)>
 834 %addr7 = arith.addi %addr6, %offset : i64
 835
 836 // All accesses are based on the aligned pointer.
 837 %aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
 838                                                 array<4xi64>, array<4xi64>)>
 839
 840 // Get the address of the data pointer.
 841 %ptr = llvm.getelementptr %aligned[%addr7]
 842      : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)>
 843      -> !llvm.ptr<f32>
 844
 845 // Perform the actual load.
 846 %0 = llvm.load %ptr : !llvm.ptr<f32>
 847 ```
 848
 849 For stores, the address computation code is identical and only the actual store
 850 operation is different.
 851
 852 Note: the conversion does not perform any sort of common subexpression
 853 elimination when emitting memref accesses.
 854
 855 ### Utility Classes
 856
 857 Utility classes common to many conversions to the LLVM dialect can be found
 858 under `lib/Conversion/LLVMCommon`. They include the following.
 859
 860 -   `LLVMConversionTarget` specifies all LLVM dialect operations as legal.
 861 -   `LLVMTypeConverter` implements the default type conversion as described
 862     above.
 863 -   `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM
 864     dialect-specific functionality.
 865 -   `VectorConvertOpToLLVMPattern` extends the previous class to automatically
 866     unroll operations on higher-dimensional vectors into lists of operations on
 867     one-dimensional vectors before.
 868 -   `StructBuilder` provides a convenient API for building IR that creates or
 869     accesses values of LLVM dialect structure types; it is derived by
 870     `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the
 871     built-in types convertible to LLVM dialect structure types.
 872
 873 ## Translation to LLVM IR
 874
 875 MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata`
 876 operations can be translated to LLVM IR modules using the following scheme.
 877
 878 -   Module-level globals are translated to LLVM IR global values.
 879 -   Module-level metadata are translated to LLVM IR metadata, which can be later
 880     augmented with additional metadata defined on specific ops.
 881 -   All functions are declared in the module so that they can be referenced.
 882 -   Each function is then translated separately and has access to the complete
 883     mappings between MLIR and LLVM IR globals, metadata, and functions.
 884 -   Within a function, blocks are traversed in topological order and translated
 885     to LLVM IR basic blocks. In each basic block, PHI nodes are created for each
 886     of the block arguments, but not connected to their source blocks.
 887 -   Within each block, operations are translated in their order. Each operation
 888     has access to the same mappings as the function and additionally to the
 889     mapping of values between MLIR and LLVM IR, including PHI nodes. Operations
 890     with regions are responsible for translated the regions they contain.
 891 -   After operations in a function are translated, the PHI nodes of blocks in
 892     this function are connected to their source values, which are now available.
 893
 894 The translation mechanism provides extension hooks for translating custom
 895 operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`:
 896
 897 -   `convertOperation` translates an operation that belongs to the current
 898     dialect to LLVM IR given an `IRBuilderBase` and various mappings;
 899 -   `amendOperation` performs additional actions on an operation if it contains
 900     a dialect attribute that belongs to the current dialect, for example sets up
 901     instruction-level metadata.
 902
 903 Dialects containing operations or attributes that want to be translated to LLVM
 904 IR must provide an implementation of this interface and register it with the
 905 system. Note that registration may happen without creating the dialect, for
 906 example, in a separate library to avoid the need for the "main" dialect library
 907 to depend on LLVM IR libraries. The implementations of these methods may used
 908 the
 909 [`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html)
 910 object provided to them which holds the state of the translation and contains
 911 numerous utilities.
 912
 913 Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a
 914 small, relatively stable set of instructions and types that MLIR intends to
 915 model fully. Therefore, the extension mechanism is provided only for LLVM IR
 916 constructs that are more often extended -- intrinsics and metadata. The primary
 917 goal of the extension mechanism is to support sets of intrinsics, for example
 918 those representing a particular instruction set. The extension mechanism does
 919 not allow for customizing type or block translation, nor does it support custom
 920 module-level operations. Such transformations should be performed within MLIR
 921 and target the corresponding MLIR constructs.
 922
 923 ## Translation from LLVM IR
 924
 925 An experimental flow allows one to import a substantially limited subset of LLVM
 926 IR into MLIR, producing LLVM dialect operations.
 927
 928 ```
 929   mlir-translate -import-llvm filename.ll
 930 ```