mlir/docs/TargetLLVMIR.md

   1 # LLVM IR Target
   2
   3 This document describes the mechanisms of producing LLVM IR from MLIR. The
   4 overall flow is two-stage:
   5
   6 1.  **conversion** of the IR to a set of dialects translatable to LLVM IR, for
   7     example [LLVM Dialect](Dialects/LLVM.md) or one of the hardware-specific
   8     dialects derived from LLVM IR intrinsics such as [AMX](Dialects/AMX.md),
   9     [X86Vector](Dialects/X86Vector.md) or [ArmNeon](Dialects/ArmNeon.md);
  10 2.  **translation** of MLIR dialects to LLVM IR.
  11
  12 This flow allows the non-trivial transformation to be performed within MLIR
  13 using MLIR APIs and makes the translation between MLIR and LLVM IR *simple* and
  14 potentially bidirectional. As a corollary, dialect ops translatable to LLVM IR
  15 are expected to closely match the corresponding LLVM IR instructions and
  16 intrinsics. This minimizes the dependency on LLVM IR libraries in MLIR as well
  17 as reduces the churn in case of changes.
  18
  19 Note that many different dialects can be lowered to LLVM but are provided as
  20 different sets of patterns and have different passes available to mlir-opt.
  21 However, this is primarily useful for testing and prototyping, and using the
  22 collection of patterns together is highly recommended. One place this is
  23 important and visible is the ControlFlow dialect's branching operations which
  24 will fail to apply if their types mismatch with the blocks they jump to in the
  25 parent op.
  26
  27 SPIR-V to LLVM dialect conversion has a
  28 [dedicated document](SPIRVToLLVMDialectConversion.md).
  29
  30 [TOC]
  31
  32 ## Conversion to the LLVM Dialect
  33
  34 Conversion to the LLVM dialect from other dialects is the first step to produce
  35 LLVM IR. All non-trivial IR modifications are expected to happen at this stage
  36 or before. The conversion is *progressive*: most passes convert one dialect to
  37 the LLVM dialect and keep operations from other dialects intact. For example,
  38 the `-finalize-memref-to-llvm` pass will only convert operations from the
  39 `memref` dialect but will not convert operations from other dialects even if
  40 they use or produce `memref`-typed values.
  41
  42 The process relies on the [Dialect Conversion](DialectConversion.md)
  43 infrastructure and, in particular, on the
  44 [materialization](DialectConversion.md#type-conversion) hooks of `TypeConverter`
  45 to support progressive lowering by injecting `unrealized_conversion_cast`
  46 operations between converted and unconverted operations. After multiple partial
  47 conversions to the LLVM dialect are performed, the cast operations that became
  48 noop can be removed by the `-reconcile-unrealized-casts` pass. The latter pass
  49 is not specific to the LLVM dialect and can remove any noop casts.
  50
  51 ### Conversion of Built-in Types
  52
  53 Built-in types have a default conversion to LLVM dialect types provided by the
  54 `LLVMTypeConverter` class. Users targeting the LLVM dialect can reuse and extend
  55 this type converter to support other types. Extra care must be taken if the
  56 conversion rules for built-in types are overridden: all conversion must use the
  57 same type converter.
  58
  59 #### LLVM Dialect-compatible Types
  60
  61 The types [compatible](Dialects/LLVM.md#built-in-type-compatibility) with the
  62 LLVM dialect are kept as is.
  63
  64 #### Complex Type
  65
  66 Complex type is converted into an LLVM dialect literal structure type with two
  67 elements:
  68
  69 -   real part;
  70 -   imaginary part.
  71
  72 The elemental type is converted recursively using these rules.
  73
  74 Example:
  75
  76 ```mlir
  77   complex<f32>
  78   // ->
  79   !llvm.struct<(f32, f32)>
  80 ```
  81
  82 #### Index Type
  83
  84 Index type is converted into an LLVM dialect integer type with the bitwidth
  85 specified by the [data layout](DataLayout.md) of the closest module. For
  86 example, on x86-64 CPUs it converts to i64. This behavior can be overridden by
  87 the type converter configuration, which is often exposed as a pass option by
  88 conversion passes.
  89
  90 Example:
  91
  92 ```mlir
  93   index
  94   // -> on x86_64
  95   i64
  96 ```
  97
  98 #### Ranked MemRef Types
  99
 100 Ranked memref types are converted into an LLVM dialect literal structure type
 101 that contains the dynamic information associated with the memref object,
 102 referred to as *descriptor*. Only memrefs in the
 103 **[strided form](Dialects/Builtin.md/#strided-memref)** can be converted to the
 104 LLVM dialect with the default descriptor format. Memrefs with other, less
 105 trivial layouts should be converted into the strided form first, e.g., by
 106 materializing the non-trivial address remapping due to layout as `affine.apply`
 107 operations.
 108
 109 The default memref descriptor is a struct with the following fields:
 110
 111 1.  The pointer to the data buffer as allocated, referred to as "allocated
 112     pointer". This is only useful for deallocating the memref.
 113 2.  The pointer to the properly aligned data pointer that the memref indexes,
 114     referred to as "aligned pointer".
 115 3.  A lowered converted `index`-type integer containing the distance in number
 116     of elements between the beginning of the (aligned) buffer and the first
 117     element to be accessed through the memref, referred to as "offset".
 118 4.  An array containing as many converted `index`-type integers as the rank of
 119     the memref: the array represents the size, in number of elements, of the
 120     memref along the given dimension.
 121 5.  A second array containing as many converted `index`-type integers as the
 122     rank of memref: the second array represents the "stride" (in tensor
 123     abstraction sense), i.e. the number of consecutive elements of the
 124     underlying buffer one needs to jump over to get to the next logically
 125     indexed element.
 126
 127 For constant memref dimensions, the corresponding size entry is a constant whose
 128 runtime value matches the static value. This normalization serves as an ABI for
 129 the memref type to interoperate with externally linked functions. In the
 130 particular case of rank `0` memrefs, the size and stride arrays are omitted,
 131 resulting in a struct containing two pointers + offset.
 132
 133 Examples:
 134
 135 ```mlir
 136 // Assuming index is converted to i64.
 137
 138 memref<f32> -> !llvm.struct<(ptr , ptr, i64)>
 139 memref<1 x f32> -> !llvm.struct<(ptr, ptr, i64,
 140                                  array<1 x i64>, array<1 x i64>)>
 141 memref<? x f32> -> !llvm.struct<(ptr, ptr, i64
 142                                  array<1 x i64>, array<1 x i64>)>
 143 memref<10x42x42x43x123 x f32> -> !llvm.struct<(ptr, ptr, i64
 144                                                array<5 x i64>, array<5 x i64>)>
 145 memref<10x?x42x?x123 x f32> -> !llvm.struct<(ptr, ptr, i64
 146                                              array<5 x i64>, array<5 x i64>)>
 147
 148 // Memref types can have vectors as element types
 149 memref<1x? x vector<4xf32>> -> !llvm.struct<(ptr, ptr, i64, array<2 x i64>,
 150                                              array<2 x i64>)>
 151 ```
 152
 153 #### Unranked MemRef Types
 154
 155 Unranked memref types are converted to LLVM dialect literal structure type that
 156 contains the dynamic information associated with the memref object, referred to
 157 as *unranked descriptor*. It contains:
 158
 159 1.  a converted `index`-typed integer representing the dynamic rank of the
 160     memref;
 161 2.  a type-erased pointer (`!llvm.ptr`) to a ranked memref descriptor with
 162     the contents listed above.
 163
 164 This descriptor is primarily intended for interfacing with rank-polymorphic
 165 library functions. The pointer to the ranked memref descriptor points to some
 166 *allocated* memory, which may reside on stack of the current function or in
 167 heap. Conversion patterns for operations producing unranked memrefs are expected
 168 to manage the allocation. Note that this may lead to stack allocations
 169 (`llvm.alloca`) being performed in a loop and not reclaimed until the end of the
 170 current function.
 171
 172 #### Function Types
 173
 174 Function types are converted to LLVM dialect function types as follows:
 175
 176 -   function argument and result types are converted recursively using these
 177     rules;
 178 -   if a function type has multiple results, they are wrapped into an LLVM
 179     dialect literal structure type since LLVM function types must have exactly
 180     one result;
 181 -   if a function type has no results, the corresponding LLVM dialect function
 182     type will have one `!llvm.void` result since LLVM function types must have a
 183     result;
 184 -   function types used in arguments of another function type are wrapped in an
 185     LLVM dialect pointer type to comply with LLVM IR expectations;
 186 -   the structs corresponding to `memref` types, both ranked and unranked,
 187     appearing as function arguments are unbundled into individual function
 188     arguments to allow for specifying metadata such as aliasing information on
 189     individual pointers;
 190 -   the conversion of `memref`-typed arguments is subject to
 191     [calling conventions](TargetLLVMIR.md#calling-conventions).
 192 -   if a function type has boolean attribute `func.varargs` being set, the
 193     converted LLVM function will be variadic.
 194
 195 Examples:
 196
 197 ```mlir
 198 // Zero-ary function type with no results:
 199 () -> ()
 200 // is converted to a zero-ary function with `void` result.
 201 !llvm.func<void ()>
 202
 203 // Unary function with one result:
 204 (i32) -> (i64)
 205 // has its argument and result type converted, before creating the LLVM dialect
 206 // function type.
 207 !llvm.func<i64 (i32)>
 208
 209 // Binary function with one result:
 210 (i32, f32) -> (i64)
 211 // has its arguments handled separately
 212 !llvm.func<i64 (i32, f32)>
 213
 214 // Binary function with two results:
 215 (i32, f32) -> (i64, f64)
 216 // has its result aggregated into a structure type.
 217 !llvm.func<struct<(i64, f64)> (i32, f32)>
 218
 219 // Function-typed arguments or results in higher-order functions:
 220 (() -> ()) -> (() -> ())
 221 // are converted into opaque pointers.
 222 !llvm.func<ptr (ptr)>
 223
 224 // A memref descriptor appearing as function argument:
 225 (memref<f32>) -> ()
 226 // gets converted into a list of individual scalar components of a descriptor.
 227 !llvm.func<void (ptr, ptr, i64)>
 228
 229 // The list of arguments is linearized and one can freely mix memref and other
 230 // types in this list:
 231 (memref<f32>, f32) -> ()
 232 // which gets converted into a flat list.
 233 !llvm.func<void (ptr, ptr, i64, f32)>
 234
 235 // For nD ranked memref descriptors:
 236 (memref<?x?xf32>) -> ()
 237 // the converted signature will contain 2n+1 `index`-typed integer arguments,
 238 // offset, n sizes and n strides, per memref argument type.
 239 !llvm.func<void (ptr, ptr, i64, i64, i64, i64, i64)>
 240
 241 // Same rules apply to unranked descriptors:
 242 (memref<*xf32>) -> ()
 243 // which get converted into their components.
 244 !llvm.func<void (i64, ptr)>
 245
 246 // However, returning a memref from a function is not affected:
 247 () -> (memref<?xf32>)
 248 // gets converted to a function returning a descriptor structure.
 249 !llvm.func<struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)> ()>
 250
 251 // If multiple memref-typed results are returned:
 252 () -> (memref<f32>, memref<f64>)
 253 // their descriptor structures are additionally packed into another structure,
 254 // potentially with other non-memref typed results.
 255 !llvm.func<struct<(struct<(ptr, ptr, i64)>,
 256                    struct<(ptr, ptr, i64)>)> ()>
 257
 258 // If "func.varargs" attribute is set:
 259 (i32) -> () attributes { "func.varargs" = true }
 260 // the corresponding LLVM function will be variadic:
 261 !llvm.func<void (i32, ...)>
 262 ```
 263
 264 Conversion patterns are available to convert built-in function operations and
 265 standard call operations targeting those functions using these conversion rules.
 266
 267 #### Multi-dimensional Vector Types
 268
 269 LLVM IR only supports *one-dimensional* vectors, unlike MLIR where vectors can
 270 be multi-dimensional. Vector types cannot be nested in either IR. In the
 271 one-dimensional case, MLIR vectors are converted to LLVM IR vectors of the same
 272 size with element type converted using these conversion rules. In the
 273 n-dimensional case, MLIR vectors are converted to (n-1)-dimensional array types
 274 of one-dimensional vectors.
 275
 276 Examples:
 277
 278 ```
 279 vector<4x8 x f32>
 280 // ->
 281 !llvm.array<4 x vector<8 x f32>>
 282
 283 memref<2 x vector<4x8 x f32>
 284 // ->
 285 !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
 286 ```
 287
 288 #### Tensor Types
 289
 290 Tensor types cannot be converted to the LLVM dialect. Operations on tensors must
 291 be [bufferized](Bufferization.md) before being converted.
 292
 293 ### Conversion of LLVM Container Types with Non-Compatible Element Types
 294
 295 Progressive lowering may result in there LLVM container types, such
 296 as LLVM dialect structures, containing non-compatible types:
 297 `!llvm.struct<(index)>`. Such types are converted recursively using the rules
 298 described above.
 299
 300 Identified structures are converted to _new_ structures that have their
 301 identifiers prefixed with `_Converted.` since the bodies of identified types
 302 cannot be updated once initialized. Such names are considered _reserved_ and
 303 must not appear in the input code (in practice, C reserves names starting with
 304 `_` and a capital, and `.` cannot appear in valid C types anyway). If they do
 305 and have a different body than the result of the conversion, the type conversion
 306 will stop.
 307
 308 ### Calling Conventions
 309
 310 Calling conventions provides a mechanism to customize the conversion of function
 311 and function call operations without changing how individual types are handled
 312 elsewhere. They are implemented simultaneously by the default type converter and
 313 by the conversion patterns for the relevant operations.
 314
 315 #### Function Result Packing
 316
 317 In case of multi-result functions, the returned values are inserted into a
 318 structure-typed value before being returned and extracted from it at the call
 319 site. This transformation is a part of the conversion and is transparent to the
 320 defines and uses of the values being returned.
 321
 322 Example:
 323
 324 ```mlir
 325 func.func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
 326   return %arg0, %arg1 : i32, i64
 327 }
 328 func.func @bar() {
 329   %0 = arith.constant 42 : i32
 330   %1 = arith.constant 17 : i64
 331   %2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
 332   "use_i32"(%2#0) : (i32) -> ()
 333   "use_i64"(%2#1) : (i64) -> ()
 334 }
 335
 336 // is transformed into
 337
 338 llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
 339   // insert the vales into a structure
 340   %0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
 341   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
 342   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
 343
 344   // return the structure value
 345   llvm.return %2 : !llvm.struct<(i32, i64)>
 346 }
 347 llvm.func @bar() {
 348   %0 = llvm.mlir.constant(42 : i32) : i32
 349   %1 = llvm.mlir.constant(17 : i64) : i64
 350
 351   // call and extract the values from the structure
 352   %2 = llvm.call @bar(%0, %1)
 353      : (i32, i32) -> !llvm.struct<(i32, i64)>
 354   %3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
 355   %4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
 356
 357   // use as before
 358   "use_i32"(%3) : (i32) -> ()
 359   "use_i64"(%4) : (i64) -> ()
 360 }
 361 ```
 362
 363 #### Default Calling Convention for Ranked MemRef
 364
 365 The default calling convention converts `memref`-typed function arguments to
 366 LLVM dialect literal structs
 367 [defined above](TargetLLVMIR.md#ranked-memref-types) before unbundling them into
 368 individual scalar arguments.
 369
 370 Examples:
 371
 372 This convention is implemented in the conversion of `func.func` and `func.call` to
 373 the LLVM dialect, with the former unpacking the descriptor into a set of
 374 individual values and the latter packing those values back into a descriptor so
 375 as to make it transparently usable by other operations. Conversions from other
 376 dialects should take this convention into account.
 377
 378 This specific convention is motivated by the necessity to specify alignment and
 379 aliasing attributes on the raw pointers underpinning the memref.
 380
 381 Examples:
 382
 383 ```mlir
 384 func.func @foo(%arg0: memref<?xf32>) -> () {
 385   "use"(%arg0) : (memref<?xf32>) -> ()
 386   return
 387 }
 388
 389 // Gets converted to the following
 390 // (using type alias for brevity):
 391 !llvm.memref_1d = !llvm.struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)>
 392
 393 llvm.func @foo(%arg0: !llvm.ptr,       // Allocated pointer.
 394                %arg1: !llvm.ptr,       // Aligned pointer.
 395                %arg2: i64,             // Offset.
 396                %arg3: i64,             // Size in dim 0.
 397                %arg4: i64) {           // Stride in dim 0.
 398   // Populate memref descriptor structure.
 399   %0 = llvm.mlir.undef : !llvm.memref_1d
 400   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
 401   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
 402   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
 403   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
 404   %5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
 405
 406   // Descriptor is now usable as a single value.
 407   "use"(%5) : (!llvm.memref_1d) -> ()
 408   llvm.return
 409 }
 410 ```
 411
 412 ```mlir
 413 func.func @bar() {
 414   %0 = "get"() : () -> (memref<?xf32>)
 415   call @foo(%0) : (memref<?xf32>) -> ()
 416   return
 417 }
 418
 419 // Gets converted to the following
 420 // (using type alias for brevity):
 421 !llvm.memref_1d = !llvm.struct<(ptr, ptr, i64, array<1xi64>, array<1xi64>)>
 422
 423 llvm.func @bar() {
 424   %0 = "get"() : () -> !llvm.memref_1d
 425
 426   // Unpack the memref descriptor.
 427   %1 = llvm.extractvalue %0[0] : !llvm.memref_1d
 428   %2 = llvm.extractvalue %0[1] : !llvm.memref_1d
 429   %3 = llvm.extractvalue %0[2] : !llvm.memref_1d
 430   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
 431   %5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
 432
 433   // Pass individual values to the callee.
 434   llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
 435   llvm.return
 436 }
 437 ```
 438
 439 #### Default Calling Convention for Unranked MemRef
 440
 441 For unranked memrefs, the list of function arguments always contains two
 442 elements, same as the unranked memref descriptor: an integer rank, and a
 443 type-erased (`!llvm.ptr`) pointer to the ranked memref descriptor. Note that
 444 while the *calling convention* does not require allocation, *casting* to
 445 unranked memref does since one cannot take an address of an SSA value containing
 446 the ranked memref, which must be stored in some memory instead. The caller is in
 447 charge of ensuring the thread safety and management of the allocated memory, in
 448 particular the deallocation.
 449
 450 Example
 451
 452 ```mlir
 453 llvm.func @foo(%arg0: memref<*xf32>) -> () {
 454   "use"(%arg0) : (memref<*xf32>) -> ()
 455   return
 456 }
 457
 458 // Gets converted to the following.
 459
 460 llvm.func @foo(%arg0: i64              // Rank.
 461                %arg1: !llvm.ptr) { // Type-erased pointer to descriptor.
 462   // Pack the unranked memref descriptor.
 463   %0 = llvm.mlir.undef : !llvm.struct<(i64, ptr)>
 464   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr)>
 465   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr)>
 466
 467   "use"(%2) : (!llvm.struct<(i64, ptr)>) -> ()
 468   llvm.return
 469 }
 470 ```
 471
 472 ```mlir
 473 llvm.func @bar() {
 474   %0 = "get"() : () -> (memref<*xf32>)
 475   call @foo(%0): (memref<*xf32>) -> ()
 476   return
 477 }
 478
 479 // Gets converted to the following.
 480
 481 llvm.func @bar() {
 482   %0 = "get"() : () -> (!llvm.struct<(i64, ptr)>)
 483
 484   // Unpack the memref descriptor.
 485   %1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr)>
 486   %2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr)>
 487
 488   // Pass individual values to the callee.
 489   llvm.call @foo(%1, %2) : (i64, !llvm.ptr)
 490   llvm.return
 491 }
 492 ```
 493
 494 **Lifetime.** The second element of the unranked memref descriptor points to
 495 some memory in which the ranked memref descriptor is stored. By convention, this
 496 memory is allocated on stack and has the lifetime of the function. (*Note:* due
 497 to function-length lifetime, creation of multiple unranked memref descriptors,
 498 e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
 499 be returned from a function, the ranked descriptor it points to is copied into
 500 dynamically allocated memory, and the pointer in the unranked descriptor is
 501 updated accordingly. The allocation happens immediately before returning. It is
 502 the responsibility of the caller to free the dynamically allocated memory. The
 503 default conversion of `func.call` and `func.call_indirect` copies the ranked
 504 descriptor to newly allocated memory on the caller's stack. Thus, the convention
 505 of the ranked memref descriptor pointed to by an unranked memref descriptor
 506 being stored on stack is respected.
 507
 508 #### Bare Pointer Calling Convention for Ranked MemRef
 509
 510 The "bare pointer" calling convention converts `memref`-typed function arguments
 511 to a *single* pointer to the aligned data. Note that this does *not* apply to
 512 uses of `memref` outside of function signatures, the default descriptor
 513 structures are still used. This convention further restricts the supported cases
 514 to the following.
 515
 516 -   `memref` types with default layout.
 517 -   `memref` types with all dimensions statically known.
 518 -   `memref` values allocated in such a way that the allocated and aligned
 519     pointer match. Alternatively, the same function must handle allocation and
 520     deallocation since only one pointer is passed to any callee.
 521
 522 Examples:
 523
 524 ```
 525 func.func @callee(memref<2x4xf32>)
 526
 527 func.func @caller(%0 : memref<2x4xf32>) {
 528   call @callee(%0) : (memref<2x4xf32>) -> ()
 529 }
 530
 531 // ->
 532
 533 !descriptor = !llvm.struct<(ptr, ptr, i64,
 534                             array<2xi64>, array<2xi64>)>
 535
 536 llvm.func @callee(!llvm.ptr)
 537
 538 llvm.func @caller(%arg0: !llvm.ptr) {
 539   // A descriptor value is defined at the function entry point.
 540   %0 = llvm.mlir.undef : !descriptor
 541
 542   // Both the allocated and aligned pointer are set up to the same value.
 543   %1 = llvm.insertelement %arg0, %0[0] : !descriptor
 544   %2 = llvm.insertelement %arg0, %1[1] : !descriptor
 545
 546   // The offset is set up to zero.
 547   %3 = llvm.mlir.constant(0 : index) : i64
 548   %4 = llvm.insertelement %3, %2[2] : !descriptor
 549
 550   // The sizes and strides are derived from the statically known values.
 551   %5 = llvm.mlir.constant(2 : index) : i64
 552   %6 = llvm.mlir.constant(4 : index) : i64
 553   %7 = llvm.insertelement %5, %4[3, 0] : !descriptor
 554   %8 = llvm.insertelement %6, %7[3, 1] : !descriptor
 555   %9 = llvm.mlir.constant(1 : index) : i64
 556   %10 = llvm.insertelement %9, %8[4, 0] : !descriptor
 557   %11 = llvm.insertelement %10, %9[4, 1] : !descriptor
 558
 559   // The function call corresponds to extracting the aligned data pointer.
 560   %12 = llvm.extractelement %11[1] : !descriptor
 561   llvm.call @callee(%12) : (!llvm.ptr) -> ()
 562 }
 563 ```
 564
 565 #### Bare Pointer Calling Convention For Unranked MemRef
 566
 567 The "bare pointer" calling convention does not support unranked memrefs as their
 568 shape cannot be known at compile time.
 569
 570 ### Generic alloction and deallocation functions
 571
 572 When converting the Memref dialect, allocations and deallocations are converted
 573 into calls to `malloc` (`aligned_alloc` if aligned allocations are requested)
 574 and `free`. However, it is possible to convert them to more generic functions
 575 which can be implemented by a runtime library, thus allowing custom allocation
 576 strategies or runtime profiling. When the conversion pass is  instructed to
 577 perform such operation, the names of the calles are
 578 `_mlir_memref_to_llvm_alloc`, `_mlir_memref_to_llvm_aligned_alloc` and
 579 `_mlir_memref_to_llvm_free`. Their signatures are the same of `malloc`,
 580 `aligned_alloc` and `free`.
 581
 582 ### C-compatible wrapper emission
 583
 584 In practical cases, it may be desirable to have externally-facing functions with
 585 a single attribute corresponding to a MemRef argument. When interfacing with
 586 LLVM IR produced from C, the code needs to respect the corresponding calling
 587 convention. The conversion to the LLVM dialect provides an option to generate
 588 wrapper functions that take memref descriptors as pointers-to-struct compatible
 589 with data types produced by Clang when compiling C sources. The generation of
 590 such wrapper functions can additionally be controlled at a function granularity
 591 by setting the `llvm.emit_c_interface` unit attribute.
 592
 593 More specifically, a memref argument is converted into a pointer-to-struct
 594 argument of type `{T*, T*, i64, i64[N], i64[N]}*` in the wrapper function, where
 595 `T` is the converted element type and `N` is the memref rank. This type is
 596 compatible with that produced by Clang for the following C++ structure template
 597 instantiations or their equivalents in C.
 598
 599 ```cpp
 600 template<typename T, size_t N>
 601 struct MemRefDescriptor {
 602   T *allocated;
 603   T *aligned;
 604   intptr_t offset;
 605   intptr_t sizes[N];
 606   intptr_t strides[N];
 607 };
 608 ```
 609
 610 Furthermore, we also rewrite function results to pointer parameters if the
 611 rewritten function result has a struct type. The special result parameter is
 612 added as the first parameter and is of pointer-to-struct type.
 613
 614 If enabled, the option will do the following. For *external* functions declared
 615 in the MLIR module.
 616
 617 1.  Declare a new function `_mlir_ciface_<original name>` where memref arguments
 618     are converted to pointer-to-struct and the remaining arguments are converted
 619     as usual. Results are converted to a special argument if they are of struct
 620     type.
 621 2.  Add a body to the original function (making it non-external) that
 622     1.  allocates memref descriptors,
 623     2.  populates them,
 624     3.  potentially allocates space for the result struct, and
 625     4.  passes the pointers to these into the newly declared interface function,
 626         then
 627     5.  collects the result of the call (potentially from the result struct),
 628         and
 629     6.  returns it to the caller.
 630
 631 For (non-external) functions defined in the MLIR module.
 632
 633 1.  Define a new function `_mlir_ciface_<original name>` where memref arguments
 634     are converted to pointer-to-struct and the remaining arguments are converted
 635     as usual. Results are converted to a special argument if they are of struct
 636     type.
 637 2.  Populate the body of the newly defined function with IR that
 638     1.  loads descriptors from pointers;
 639     2.  unpacks descriptor into individual non-aggregate values;
 640     3.  passes these values into the original function;
 641     4.  collects the results of the call and
 642     5.  either copies the results into the result struct or returns them to the
 643         caller.
 644
 645 Examples:
 646
 647 ```mlir
 648
 649 func.func @qux(%arg0: memref<?x?xf32>)
 650
 651 // Gets converted into the following
 652 // (using type alias for brevity):
 653 !llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)>
 654
 655 // Function with unpacked arguments.
 656 llvm.func @qux(%arg0: !llvm.ptr, %arg1: !llvm.ptr,
 657                %arg2: i64, %arg3: i64, %arg4: i64,
 658                %arg5: i64, %arg6: i64) {
 659   // Populate memref descriptor (as per calling convention).
 660   %0 = llvm.mlir.undef : !llvm.memref_2d
 661   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
 662   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
 663   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
 664   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
 665   %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
 666   %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
 667   %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
 668
 669   // Store the descriptor in a stack-allocated space.
 670   %8 = llvm.mlir.constant(1 : index) : i64
 671   %9 = llvm.alloca %8 x !llvm.memref_2d
 672      : (i64) -> !llvm.ptr
 673   llvm.store %7, %9 : !llvm.memref_2d, !llvm.ptr
 674
 675   // Call the interface function.
 676   llvm.call @_mlir_ciface_qux(%9) : (!llvm.ptr) -> ()
 677
 678   // The stored descriptor will be freed on return.
 679   llvm.return
 680 }
 681
 682 // Interface function.
 683 llvm.func @_mlir_ciface_qux(!llvm.ptr)
 684 ```
 685
 686 ```mlir
 687 func.func @foo(%arg0: memref<?x?xf32>) {
 688   return
 689 }
 690
 691 // Gets converted into the following
 692 // (using type alias for brevity):
 693 !llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)>
 694
 695 // Function with unpacked arguments.
 696 llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr,
 697                %arg2: i64, %arg3: i64, %arg4: i64,
 698                %arg5: i64, %arg6: i64) {
 699   llvm.return
 700 }
 701
 702 // Interface function callable from C.
 703 llvm.func @_mlir_ciface_foo(%arg0: !llvm.ptr) {
 704   // Load the descriptor.
 705   %0 = llvm.load %arg0 : !llvm.ptr -> !llvm.memref_2d
 706
 707   // Unpack the descriptor as per calling convention.
 708   %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
 709   %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
 710   %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
 711   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
 712   %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
 713   %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
 714   %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
 715   llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
 716     : (!llvm.ptr, !llvm.ptr, i64, i64, i64,
 717        i64, i64) -> ()
 718   llvm.return
 719 }
 720 ```
 721
 722 ```mlir
 723 func.func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
 724   return %arg0 : memref<?x?xf32>
 725 }
 726
 727 // Gets converted into the following
 728 // (using type alias for brevity):
 729 !llvm.memref_2d = !llvm.struct<(ptr, ptr, i64, array<2xi64>, array<2xi64>)>
 730
 731 // Function with unpacked arguments.
 732 llvm.func @foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64,
 733                %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
 734     -> !llvm.memref_2d {
 735   %0 = llvm.mlir.undef : !llvm.memref_2d
 736   %1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
 737   %2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
 738   %3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
 739   %4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
 740   %5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
 741   %6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
 742   %7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
 743   llvm.return %7 : !llvm.memref_2d
 744 }
 745
 746 // Interface function callable from C.
 747 llvm.func @_mlir_ciface_foo(%arg0: !llvm.ptr, %arg1: !llvm.ptr) {
 748   %0 = llvm.load %arg1 : !llvm.ptr
 749   %1 = llvm.extractvalue %0[0] : !llvm.memref_2d
 750   %2 = llvm.extractvalue %0[1] : !llvm.memref_2d
 751   %3 = llvm.extractvalue %0[2] : !llvm.memref_2d
 752   %4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
 753   %5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
 754   %6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
 755   %7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
 756   %8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
 757     : (!llvm.ptr, !llvm.ptr, i64, i64, i64, i64, i64) -> !llvm.memref_2d
 758   llvm.store %8, %arg0 : !llvm.memref_2d, !llvm.ptr
 759   llvm.return
 760 }
 761 ```
 762
 763 Rationale: Introducing auxiliary functions for C-compatible interfaces is
 764 preferred to modifying the calling convention since it will minimize the effect
 765 of C compatibility on intra-module calls or calls between MLIR-generated
 766 functions. In particular, when calling external functions from an MLIR module in
 767 a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
 768 stack exhaustion and/or concurrent access to the same address. Auxiliary
 769 interface function serves as an allocation scope in this case. Furthermore, when
 770 targeting accelerators with separate memory spaces such as GPUs, stack-allocated
 771 descriptors passed by pointer would have to be transferred to the device memory,
 772 which introduces significant overhead. In such situations, auxiliary interface
 773 functions are executed on host and only pass the values through device function
 774 invocation mechanism.
 775
 776 Limitation: Right now we cannot generate C interface for variadic functions,
 777 regardless of being non-external or external. Because C functions are unable to
 778 "forward" variadic arguments like this:
 779 ```c
 780 void bar(int, ...);
 781
 782 void foo(int x, ...) {
 783   // ERROR: no way to forward variadic arguments.
 784   void bar(x, ...);
 785 }
 786 ```
 787
 788 ### Address Computation
 789
 790 Accesses to a memref element are transformed into an access to an element of the
 791 buffer pointed to by the descriptor. The position of the element in the buffer
 792 is calculated by linearizing memref indices in row-major order (lexically first
 793 index is the slowest varying, similar to C, but accounting for strides). The
 794 computation of the linear address is emitted as arithmetic operation in the LLVM
 795 IR dialect. Strides are extracted from the memref descriptor.
 796
 797 Examples:
 798
 799 An access to a memref with indices:
 800
 801 ```mlir
 802 %0 = memref.load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
 803 ```
 804
 805 is transformed into the equivalent of the following code:
 806
 807 ```mlir
 808 // Compute the linearized index from strides.
 809 // When strides or, in absence of explicit strides, the corresponding sizes are
 810 // dynamic, extract the stride value from the descriptor.
 811 %stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr, ptr, i64,
 812                                                    array<4xi64>, array<4xi64>)>
 813 %addr1 = arith.muli %stride1, %1 : i64
 814
 815 // When the stride or, in absence of explicit strides, the trailing sizes are
 816 // known statically, this value is used as a constant. The natural value of
 817 // strides is the product of all sizes following the current dimension.
 818 %stride2 = llvm.mlir.constant(32 : index) : i64
 819 %addr2 = arith.muli %stride2, %2 : i64
 820 %addr3 = arith.addi %addr1, %addr2 : i64
 821
 822 %stride3 = llvm.mlir.constant(8 : index) : i64
 823 %addr4 = arith.muli %stride3, %3 : i64
 824 %addr5 = arith.addi %addr3, %addr4 : i64
 825
 826 // Multiplication with the known unit stride can be omitted.
 827 %addr6 = arith.addi %addr5, %4 : i64
 828
 829 // If the linear offset is known to be zero, it can also be omitted. If it is
 830 // dynamic, it is extracted from the descriptor.
 831 %offset = llvm.extractvalue[2] : !llvm.struct<(ptr, ptr, i64,
 832                                                array<4xi64>, array<4xi64>)>
 833 %addr7 = arith.addi %addr6, %offset : i64
 834
 835 // All accesses are based on the aligned pointer.
 836 %aligned = llvm.extractvalue[1] : !llvm.struct<(ptr, ptr, i64,
 837                                                 array<4xi64>, array<4xi64>)>
 838
 839 // Get the address of the data pointer.
 840 %ptr = llvm.getelementptr %aligned[%addr7]
 841      : !llvm.struct<(ptr, ptr, i64, array<4xi64>, array<4xi64>)> -> !llvm.ptr
 842
 843 // Perform the actual load.
 844 %0 = llvm.load %ptr : !llvm.ptr -> f32
 845 ```
 846
 847 For stores, the address computation code is identical and only the actual store
 848 operation is different.
 849
 850 Note: the conversion does not perform any sort of common subexpression
 851 elimination when emitting memref accesses.
 852
 853 ### Utility Classes
 854
 855 Utility classes common to many conversions to the LLVM dialect can be found
 856 under `lib/Conversion/LLVMCommon`. They include the following.
 857
 858 -   `LLVMConversionTarget` specifies all LLVM dialect operations as legal.
 859 -   `LLVMTypeConverter` implements the default type conversion as described
 860     above.
 861 -   `ConvertOpToLLVMPattern` extends the conversion pattern class with LLVM
 862     dialect-specific functionality.
 863 -   `VectorConvertOpToLLVMPattern` extends the previous class to automatically
 864     unroll operations on higher-dimensional vectors into lists of operations on
 865     one-dimensional vectors before.
 866 -   `StructBuilder` provides a convenient API for building IR that creates or
 867     accesses values of LLVM dialect structure types; it is derived by
 868     `MemRefDescriptor`, `UrankedMemrefDescriptor` and `ComplexBuilder` for the
 869     built-in types convertible to LLVM dialect structure types.
 870
 871 ## Translation to LLVM IR
 872
 873 MLIR modules containing `llvm.func`, `llvm.mlir.global` and `llvm.metadata`
 874 operations can be translated to LLVM IR modules using the following scheme.
 875
 876 -   Module-level globals are translated to LLVM IR global values.
 877 -   Module-level metadata are translated to LLVM IR metadata, which can be later
 878     augmented with additional metadata defined on specific ops.
 879 -   All functions are declared in the module so that they can be referenced.
 880 -   Each function is then translated separately and has access to the complete
 881     mappings between MLIR and LLVM IR globals, metadata, and functions.
 882 -   Within a function, blocks are traversed in topological order and translated
 883     to LLVM IR basic blocks. In each basic block, PHI nodes are created for each
 884     of the block arguments, but not connected to their source blocks.
 885 -   Within each block, operations are translated in their order. Each operation
 886     has access to the same mappings as the function and additionally to the
 887     mapping of values between MLIR and LLVM IR, including PHI nodes. Operations
 888     with regions are responsible for translated the regions they contain.
 889 -   After operations in a function are translated, the PHI nodes of blocks in
 890     this function are connected to their source values, which are now available.
 891
 892 The translation mechanism provides extension hooks for translating custom
 893 operations to LLVM IR via a dialect interface `LLVMTranslationDialectInterface`:
 894
 895 -   `convertOperation` translates an operation that belongs to the current
 896     dialect to LLVM IR given an `IRBuilderBase` and various mappings;
 897 -   `amendOperation` performs additional actions on an operation if it contains
 898     a dialect attribute that belongs to the current dialect, for example sets up
 899     instruction-level metadata.
 900
 901 Dialects containing operations or attributes that want to be translated to LLVM
 902 IR must provide an implementation of this interface and register it with the
 903 system. Note that registration may happen without creating the dialect, for
 904 example, in a separate library to avoid the need for the "main" dialect library
 905 to depend on LLVM IR libraries. The implementations of these methods may used
 906 the
 907 [`ModuleTranslation`](https://mlir.llvm.org/doxygen/classmlir_1_1LLVM_1_1ModuleTranslation.html)
 908 object provided to them which holds the state of the translation and contains
 909 numerous utilities.
 910
 911 Note that this extension mechanism is *intentionally restrictive*. LLVM IR has a
 912 small, relatively stable set of instructions and types that MLIR intends to
 913 model fully. Therefore, the extension mechanism is provided only for LLVM IR
 914 constructs that are more often extended -- intrinsics and metadata. The primary
 915 goal of the extension mechanism is to support sets of intrinsics, for example
 916 those representing a particular instruction set. The extension mechanism does
 917 not allow for customizing type or block translation, nor does it support custom
 918 module-level operations. Such transformations should be performed within MLIR
 919 and target the corresponding MLIR constructs.
 920
 921 ## Translation from LLVM IR
 922
 923 An experimental flow allows one to import a substantially limited subset of LLVM
 924 IR into MLIR, producing LLVM dialect operations.
 925
 926 ```
 927   mlir-translate -import-llvm filename.ll
 928 ```