mlir/docs/Tutorials/transform/Ch1.md

   1 # Chapter 1: Combining Existing Transformations
   2
   3 ## Introduction
   4
   5 The Transform dialect allows one to precisely target transformations at specific operations in the IR and to chain them, that is to apply a transformation to operations produced by the previous transformation. To achieve this, transformations are expressed as other operations in the IR. We call these the IR containing these operations transform IR. And we call the IR that is being transformed payload IR.
   6
   7 Transform IR operations operate on values that may be associated with payload IR operations, values or attributes. We call the first two kinds of values operation and value handles, respectively. We call the last kind of values parameters.
   8
   9 The application of transform IR always starts from one top-level operation. In the C++ API, this operation is passed to the `applyTransforms` function. This top-level operation specifies if other transformations should be performed and how. The most common top-level operation, `transform.named_sequence` merely applies other transform operations listed in its body one after the other, similarly to a function or a macro.
  10
  11 Let us illustrate this with a simple sequence of transformations on the common “fully connected + bias + ReLU” ML layer, which boils down to performing a matrix multiplication, followed by an (elementwise) matrix addition and taking an elementwise maximum with 0. This can be expressed using the following IR:
  12
  13 ```mlir
  14 func.func @fc_relu(%lhs: tensor<512x512xf32>, %rhs: tensor<512x512xf32>,
  15                    %bias: tensor<512x512xf32>, %output: tensor<512x512xf32>)
  16                    -> tensor<512x512xf32> {
  17   // Matrix-matrix multiplication.
  18   %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
  19                           outs(%output: tensor<512x512xf32>) -> tensor<512x512xf32>
  20
  21   // Elementwise addition.
  22   %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
  23     ins(%matmul, %bias : tensor<512x512xf32>, tensor<512x512xf32>)
  24     outs(%output : tensor<512x512xf32>) -> tensor<512x512xf32>
  25
  26   // Elementwise max with 0 (ReLU).
  27   %c0f = arith.constant 0.0 : f32
  28   %relued = linalg.elemwise_binary { fun = #linalg.binary_fn<max_signed> }
  29     ins(%biased, %c0f : tensor<512x512xf32>, f32)
  30     outs(%output : tensor<512x512xf32>) -> tensor<512x512xf32>
  31   func.return %relued : tensor<512x512xf32>
  32 }
  33 ```
  34
  35 ## Top-Level Sequence Operation
  36
  37 For performance reasons, we would like to tile and fuse these operations to exploit cache locality. This is a sequence of transformations that need to be performed one after another, so we naturally start with the corresponding top-level transform operation.
  38
  39 ```mlir
  40 module attributes {transform.with_named_sequence} {
  41   transform.named_sequence @__transform_main(
  42       %arg0: !transform.any_op,
  43       %arg1: !transform.op<"linalg.matmul">,
  44       %arg2: !transform.op<"linalg.elemwise_binary">):
  45     transform.yield
  46   }
  47 }
  48 ```
  49
  50 There are several aspects worth noticing in this operation.
  51
  52 Its special name, `@__transform_main` and the first argument are mandated by the interpreter pass, similarly to how the entry point of C programs needs to be called `main` and may have the `int (int argc, char** argv)` signature. This argument will be associated with the top-level payload operation, most often the operation that the pass is applied to. Note that none of this is required when applying the transformation _programmatically_ via `applyTransforms` or `applyNamedSequence`.
  53
  54 The remaining entry block arguments are optional and can be associated with payload attributes, operations or values that are useful in the sequence. These are also specified when calling `applyTransforms`. In our case, we are interested in the matrix multiplication and elementwise operations that we are going to tile and fuse.
  55
  56 All value handles have Transform dialect types. These types specify certain properties of the payload IR entities associated with them. In this example, `transform.any_op` indicates that the handle is associated with arbitrary payload operations. On the contrary, `transform.op<"X">` indicates that the handle is associated _only_ with payload operations of kind `X`. These constraints are verified when the handle/payload association is created. For entry block arguments of top-level transform operations, this happens early in the `applyTransforms` function. If the constraints are not satisfied, the transform application fails and produces diagnostics for the user.
  57
  58 Finally, the operation is wrapped in a module with the `transform.with_named_sequence` attribute that triggers all necessary verifications if multiple named sequences exist.
  59
  60 ## Failure Propagation
  61
  62 The Transform dialect infrastructure has a particular mechanism for handling diagnostics that supports recoverable errors. It is best understood by considering the (unnamed) sequence operation that has a mandatory attribute specifying the failure propagation mode. There are two options:
  63
  64 *   “propagate” makes the sequence transformation fail if any of the nested transformation fails;
  65 *   “suppress” makes the sequence succeed even if one of the nested transformations fails, but without attempting to perform the transformations following the failed one in the sequence.
  66
  67 This latter allows the transformation script surrounding the sequence to continue despite errors within the sequence, assuming they are recoverable. As we are only building the transformation script, it is preferable to propagate failures so we know when something did not apply.
  68
  69 To check or debug a transform sequence, it is possible to print various entities associated with the transform IR values. For example, we can print the operations associated with the handles:
  70
  71 ```mlir
  72 transform.sequence failures(propagate) {
  73 ^bb0(%arg0: !transform.any_op,
  74      %arg1: !transform.op<"linalg.matmul">,
  75      %arg2: !transform.op<"linalg.elemwise_binary">):
  76   transform.debug.emit_remark_at %arg1, "matmul"
  77       : !transform.op<"linalg.matmul">
  78   transform.debug.emit_remark_at %arg2, "elemwise_binaries"
  79       : !transform.op<"linalg.elemwise_binary">
  80   transform.yield
  81 }
  82 ```
  83
  84 ## Transform Dialect Interpreter
  85
  86 Since we don’t want to recompile the compiler every time we change a transformation, we can use a Transform dialect interpreter pass to apply this transformation sequence to the payload IR. As we will see in the next chapter, it is possible to define custom passes or even integrate the transform interpreter into a larger pass. For now, we can use the existing test pass:
  87
  88
  89 ```sh
  90 $ mlir-opt sequence.mlir --pass-pipeline="
  91     builtin.module(transform-interpreter{
  92         debug-bind-trailing-args=linalg.matmul,linalg.elemwise_binary})"
  93 ```
  94
  95 The `sequence.mlir` file contains _both_ the payload IR function _and_ the transform IR sequence nested in the same module. The transform interpreter pass will apply the `@__transform_main` named sequence to the anchor operation of the pass. In our case, we also asked the interpreter pass to associate the two extra arguments of the top-level sequence with all `linalg.matmul` and `linalg.elemwise_binary` payload operations through the respective pass options. Running this pass results in the expected remarks:
  96
  97 ```sh
  98 sequence.mlir:7:13: remark: matmul
  99   %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
 100             ^
 101 sequence.mlir:7:13: note: see current operation: %0 = linalg.matmul ins(%arg0, %arg1 : tensor<512x512xf32>, tensor<512x512xf32>) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
 102 sequence.mlir:10:13: remark: elemwise_binaries
 103   %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
 104             ^
 105 sequence.mlir:10:13: note: see current operation: %1 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>} ins(%0, %arg2 : tensor<512x512xf32>, tensor<512x512xf32>) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
 106 sequence.mlir:14:13: remark: elemwise_binaries
 107   %relued = linalg.elemwise_binary { fun = #linalg.binary_fn<max_signed> }
 108             ^
 109 sequence.mlir:14:13: note: see current operation: %2 = linalg.elemwise_binary {fun = #linalg.binary_fn<max_signed>} ins(%1, %cst : tensor<512x512xf32>, f32) outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
 110 ```
 111
 112 Note that `%arg2` is associated with both elementwise payload operations. Any handle is associated with a list of entities. Individual transformations may or may not care about the order of elements in that list.
 113
 114
 115 ## Specifying Transformations
 116
 117 Now that we have handles to the operations we want to transform, we are ready to apply the transformations. Let us first try tiling the matmul operation itself.
 118
 119 ```mlir
 120 module attributes {transform.with_named_sequence} {
 121   transform.named_sequence @__transform_main(
 122        %arg0: !transform.any_op,
 123        %arg1: !transform.op<"linalg.matmul">,
 124        %arg2: !transform.op<"linalg.elemwise_binary">) {
 125     // The actual tiling transformation takes tile sizes as attributes.
 126     %loop, %tiled = transform.structured.tile_using_forall %arg1
 127                     tile_sizes [4, 32]
 128       : (!transform.op<"linalg.matmul">)
 129      -> (!transform.any_op, !transform.any_op)
 130     transform.yield
 131   }
 132 }
 133 ```
 134
 135 The transformation returns two handles, as indicated in its [documentation](https://mlir.llvm.org/docs/Dialects/Transform/#transformstructuredtile_using_forall-transformtileusingforallop):
 136
 137 *   A handle to `linalg.generic` operating on the subset of the original data.
 138 *   A handle to the `scf.forall` “multi-for” loop around tensors.
 139
 140 Running this transformation with the same command as above expectedly produces the tiled code.
 141
 142 ```mlir
 143 func.func @fc_relu(%arg0: tensor<512x512xf32>,
 144                    %arg1: tensor<512x512xf32>,
 145                    %arg2: tensor<512x512xf32>,
 146                    %arg3: tensor<512x512xf32>) -> tensor<512x512xf32> {
 147   %cst = arith.constant 0.000000e+00 : f32
 148   %0 = scf.forall (%arg4, %arg5) in (128, 16) shared_outs(%arg6 = %arg3) -> (tensor<512x512xf32>) {
 149     %3 = affine.apply affine_map<(d0) -> (d0 * 4)>(%arg4)
 150     %4 = affine.apply affine_map<(d0) -> (d0 * 32)>(%arg5)
 151     %extracted_slice = tensor.extract_slice %arg0[%3, 0] [4, 512] [1, 1]
 152                      : tensor<512x512xf32> to tensor<4x512xf32>
 153     %extracted_slice_0 = tensor.extract_slice %arg1[0, %4] [512, 32] [1, 1]
 154                        : tensor<512x512xf32> to tensor<512x32xf32>
 155     %extracted_slice_1 = tensor.extract_slice %arg6[%3, %4] [4, 32] [1, 1]
 156                       : tensor<512x512xf32> to tensor<4x32xf32>
 157     %5 = linalg.matmul
 158          ins(%extracted_slice, %extracted_slice_0
 159              : tensor<4x512xf32>, tensor<512x32xf32>)
 160          outs(%extracted_slice_1 : tensor<4x32xf32>) -> tensor<4x32xf32>
 161     scf.forall.in_parallel {
 162       tensor.parallel_insert_slice %5 into %arg6[%3, %4] [4, 32] [1, 1]
 163           : tensor<4x32xf32> into tensor<512x512xf32>
 164     }
 165   }
 166   %1 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
 167     ins(%0, %arg2 : tensor<512x512xf32>, tensor<512x512xf32>)
 168     outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
 169   %2 = linalg.elemwise_binary {fun = #linalg.binary_fn<max_signed>}
 170     ins(%1, %cst : tensor<512x512xf32>, f32)
 171     outs(%arg3 : tensor<512x512xf32>) -> tensor<512x512xf32>
 172   return %2 : tensor<512x512xf32>
 173 }
 174 ```
 175
 176 Besides producing new handles, the tiling transform operation _consumes_ the operand handle. This means that the handle is _invalidated_ after this operation, and is no longer supposed to be used. Transform operations are required to mark all their operands as either consumed or readonly. Transform operations usually consume the operand if the associated payload operations are erased or recreated (which means erased and created anew with similar structure). As handles are essentially references to payload operations, they would become dangling if the payload no longer exists.
 177
 178
 179 ## Handle Invalidation and Expensive Checks Mode
 180
 181 Undefined behavior is difficult to grapple with when it does happen, so the Transform dialect interpreter defaults to performing a set of additional, potentially expensive, checks that detect most undefined behavior in the transform IR. For example, if we wanted to  use the `%arg1` handle after it is consumed, it would cause undefined behavior that manifests as an assertion in the debug build, and likely as a segmentation fault in the release mode.
 182
 183 ```mlir
 184 module attributes {transform.with_named_sequence} {
 185   transform.named_sequence @__transform_main(
 186        %arg0: !transform.any_op,
 187        %arg1: !transform.op<"linalg.matmul">,
 188        %arg2: !transform.op<"linalg.elemwise_binary">) {
 189     // The actual tiling transformation takes tile sizes as attributes.
 190     %loop, %tiled = transform.structured.tile_using_forall %arg1 tile_sizes [4, 32]
 191         : (!transform.op<"linalg.matmul">) -> (!transform.any_op, !transform.any_op)
 192
 193     // This is trying to use an invalidated handle leading to undefined behavior.
 194     transform.debug.emit_remark_at %arg1, "remark" : !transform.op<"linalg.matmul">
 195     transform.yield
 196   }
 197 }
 198 ```
 199
 200 However, with the expensive checks enabled in the interpreter, a nice diagnostic is produced:
 201
 202 ```sh
 203 sequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op
 204   transform.debug.emit_remark_at %mm, "elemwise_binaries" : !transform.any_op
 205   ^
 206 sequence.mlir:26:9: note: handle to invalidated ops
 207   %mm = transform.cast %matmul : !transform.op<"linalg.matmul"> to !transform.any_op
 208         ^
 209 sequence.mlir:27:19: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
 210   %loop, %tiled = transform.structured.tile_using_forall %mm tile_sizes [4, 32]
 211 ```
 212
 213 When compile-time performance is a concern, and the transformation sequence is sufficiently stable, it is possible to disable expensive checks in the interpreter for improved performance by providing the `disable-expensive-checks` option to the pass or by setting the corresponding flag in the `TransformOptions` passed into `applyTransforms`.
 214
 215 One may observe that some operations such as `transform.cast` do not consume the operand (because they don’t erase the corresponding operation). So what would happen if we tried to use that operand instead?
 216
 217 ```mlir
 218 module attributes {transform.with_named_sequence} {
 219   transform.named_sequence @__transform_main
 220        %arg0: !transform.any_op,
 221        %arg1: !transform.op<"linalg.matmul">,
 222        %arg2: !transform.op<"linalg.elemwise_binary">) {
 223     // We can cast one type to another as long as operations are compatible
 224     // with both types. This creates "aliasing" handles.
 225     %casted = transform.cast %arg1 : !transform.op<"linalg.matmul">
 226         to !transform.any_op
 227
 228     // The actual tiling transformation takes tile sizes as attributes.
 229     %loop, %tiled = transform.structured.tile_using_forall %arg1
 230                     tile_sizes [4, 32]
 231       : (!transform.op<"linalg.matmul">)
 232      -> (!transform.any_op, !transform.any_op)
 233
 234     // Consuming an operand invalidates the consumed handle and any other handle
 235     // that is associated with the same payload operations, or payload
 236     // operations nested in them.
 237     transform.debug.emit_remark_at %casted, "remark"
 238       : !transform.any_op
 239     transform.yield
 240   }
 241 }
 242 ```
 243
 244 Both `%arg1` and `%casted` reference the same payload operation. Extending the reference analogy, these references alias. Naturally, when the payload operation is erased, all references to it become dangling. This is also the case for handles. In fact, consuming an operand invalidates the operand handle as well as any other handle that is associated with any of the same payload operations. The payload IR consideration is recursive: a handle associated with a payload operation _nested_ in the erased one is also invalidated (because erasing the operation also erases its regions and all contained operations). The expensive-checks mode can also handle this case.
 245
 246 ```sh
 247 sequence.mlir:28:3: error: op uses a handle invalidated by a previously executed transform op
 248   transform.debug.emit_remark_at %matmul, "elemwise_binaries" : !transform.op<"linalg.matmul">
 249   ^
 250 sequence.mlir:21:29: note: handle to invalidated ops
 251 ^bb0(%root: !transform.any_op, %matmul: !transform.op<"linalg.matmul">, %elemwise: !transform.op<"linalg.elemwise_binary">):
 252                             ^
 253 sequence.mlir:27:19: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
 254   %loop, %tiled = transform.structured.tile_using_forall %mm tile_sizes [4, 32]
 255 ```
 256
 257 ## Chaining Transformations with Handles
 258
 259 Going back to the transformation sequence, we have tiled the matrix multiplication, but we also want to tile and fuse the elementwise operations. The typical way of doing in the structured operations paradigm is to tile the last operation in some acyclic dataflow graph, and then progressively fuse the operations that produce its operands. This removes the need to explicitly tile all operations as fusion can adapt their sizes and inject recomputation if desired. So instead of tiling the matmul operation, we are going to tile the last operation in the chain, and then fuse the preceding operations into the loops produced by tiling.
 260
 261 ```mlir
 262 module attributes {transform.with_named_sequence} {
 263   transform.named_sequence @__transform_main(
 264        %arg0: !transform.any_op,
 265        %arg1: !transform.op<"linalg.matmul">,
 266        %arg2: !transform.op<"linalg.elemwise_binary">) {
 267     // Since the %arg2 handle is associated with both elementwise operations,
 268     // we need to split it into two handles so we can target only the second
 269     // elementwise operation.
 270     %add, %max = transform.split_handle %arg2
 271         : (!transform.op<"linalg.elemwise_binary">)
 272         -> (!transform.any_op, !transform.any_op)
 273
 274     // The actual tiling transformation takes tile sizes as attributes. It
 275     // produces a handle to the loop generated during tiling.
 276     %tiled_max, %loop =
 277         transform.structured.tile_using_forall %max tile_sizes [8, 32]
 278           : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
 279
 280     // We can now fuse the other operations into the loop. Here, we fuse
 281     // operations one by one. This requires the operation that is being fused to
 282     // define the value used within the loop, so the order of such fusions is
 283     // important. We could also use "transform.merge_handles" to obtain a single
 284     // handle to all operations and give it to `fuse_into_containing_op` that
 285     // would take care of the ordering in this case.
 286     %add_fused, %loop_0 =
 287         transform.structured.fuse_into_containing_op %add into %loop
 288           : (!transform.any_op, !transform.any_op)
 289             -> (!transform.any_op, !transform.any_op)
 290     %matmul_fused, %loop_1 =
 291         transform.structured.fuse_into_containing_op %arg1 into %loop_0
 292           : (!transform.op<"linalg.matmul">, !transform.any_op)
 293             -> (!transform.any_op, !transform.any_op)
 294
 295     transform.yield
 296   }
 297 }
 298 ```
 299
 300 This achieves the desired tiling and fusion.
 301
 302 ## More Handle Invalidation
 303
 304 Finally, let us assume there exists an efficient microkernel, or a hardware instruction expressed as an intrinsic function, for a 4x4 matrix multiplication. For this purpose, we need to tile the fused operation to the desired size, and then outline it. The resulting function call can then be replaced with a call to the microkernel.
 305
 306 ```mlir
 307 module attributes {transform.with_named_sequence} {
 308   transform.named_sequence @__transform_main(
 309        %arg0: !transform.any_op,
 310        %arg1: !transform.op<"linalg.matmul">,
 311        %arg2: !transform.op<"linalg.elemwise_binary">) {
 312     // Since the %arg2 handle is associated with both elementwise operations,
 313     // we need to split it into two handles so we can target only the second
 314     // elementwise operation.
 315     %add, %max = transform.split_handle %arg2
 316         : (!transform.op<"linalg.elemwise_binary">)
 317           -> (!transform.any_op, !transform.any_op)
 318
 319     // The actual tiling transformation takes tile sizes as attributes. It
 320     // produces a handle to the loop generated during tiling.
 321     %tiled, %loop = transform.structured.tile_using_forall %max
 322                     tile_sizes [8, 32]
 323         : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
 324
 325     // We can now fuse the other operations into the loop. Here, we fuse
 326     // operations one by one. This requires the operation that is being fused to
 327     // define the value used within the loop, so the order of such fusions is
 328     // important. We could also use "transform.merge_handles" to obtain a single
 329     // handle to all operations and give it to `fuse_into_containing_op` that
 330     // would take care of the ordering in this case.
 331     %add_fused, %loop_0 =
 332         transform.structured.fuse_into_containing_op %add into %loop
 333           : (!transform.any_op, !transform.any_op)
 334             -> (!transform.any_op, !transform.any_op)
 335     %matmul_fused, %loop_1 =
 336         transform.structured.fuse_into_containing_op %arg1 into %loop_0
 337           : (!transform.op<"linalg.matmul">, !transform.any_op)
 338             -> (!transform.any_op, !transform.any_op)
 339
 340     // Tile again to get the desired size. Note that this time this tiles the
 341     // "add" operation and fuses matmul into the loop, but doesn't affect the
 342     // "max" operation. This illustrates the precise targeting with the
 343     // transform dialect. Otherwise, it is difficult to differentiate "add" and
 344     // "max", both of which having the same kind.
 345     %tiled_2, %loop_2 =
 346         transform.structured.tile_using_forall %add_fused tile_sizes [4, 4]
 347           : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
 348     %matmul_fused_2, %loop_3 =
 349         transform.structured.fuse_into_containing_op %matmul_fused into %loop_2
 350           : (!transform.any_op, !transform.any_op)
 351             -> (!transform.any_op, !transform.any_op)
 352
 353     // Since outlining is currently only implemented for region-holding
 354     // operations such as loops, use tiling to size 1 to materialize the outer
 355     // loop that is going to be outlined.
 356     %_, %outline_target =
 357         transform.structured.tile_using_forall %tiled_2 tile_sizes [1]
 358           : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
 359     transform.structured.fuse_into_containing_op %matmul_fused_2
 360         into %outline_target
 361           : (!transform.any_op, !transform.any_op)
 362             -> (!transform.any_op, !transform.any_op)
 363     %func, %call = transform.loop.outline %outline_target
 364                    {func_name = "outlined"}
 365         : (!transform.any_op) -> (!transform.any_op, !transform.op<"func.call">)
 366
 367     transform.yield
 368   }
 369 }
 370 ```
 371
 372 This additional transformation also illustrates handle invalidation for nested operations. The `transform.loop.outline` operation consumes the handle to the loop, which invalidates it and all handles to any operations nested in it, such as `%2`. Attempting to use this handle will cause undefined behavior. (Note that it isn’t strictly necessary for this specific form of the outlining to consume the operand as the implementation only _moves_ the region without recreating the operations, but the author of the transformation chose to invalidate the handle anyway.)
 373
 374 Attempting to access the fusion result after outlining produces the following error
 375
 376 ```sh
 377 test/Examples/transform/Ch1/invalidation-2.mlir:109:3: error: op uses a handle invalidated by a previously executed transform op
 378   transform.debug.emit_remark_at %outline_target, "outlined loop" : !transform.any_op
 379   ^
 380 test/Examples/transform/Ch1/invalidation-2.mlir:102:25: note: handle to invalidated ops
 381   %outline_target, %_ = transform.structured.tile_using_forall %tiled_2 tile_sizes [1]
 382                         ^
 383 test/Examples/transform/Ch1/invalidation-2.mlir:106:18: note: invalidated by this transform op that consumes its operand #0 and invalidates all handles to payload IR entities associated with this operand and entities nested in them
 384   %func, %call = transform.loop.outline %outline_target {func_name = "outlined"}
 385                  ^
 386 test/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: ancestor payload op
 387   %biased = linalg.elemwise_binary { fun = #linalg.binary_fn<add> }
 388             ^
 389 test/Examples/transform/Ch1/invalidation-2.mlir:24:13: note: nested payload op
 390   %matmul = linalg.matmul ins(%lhs, %rhs: tensor<512x512xf32>, tensor<512x512xf32>)
 391 ```
 392
 393 Note that the “add” elementwise operation is indicated as payload ancestor because it was used to produce the tile loop, and the loop therefore has its location.
 394
 395 Finally, we would like to replace the call to the outlined function with a call to the microkernel. Unfortunately, the Transform dialect doesn’t have support for this transformation (and cannot have if the call is rewritten to a custom, out-of-tree operation). Therefore, we need to define new transform operations. The next chapters will describe how this can be done.
 396
 397 ## Tracking IR Modifications
 398
 399 The Transform dialect automatically tracks all IR changes that are made as part
 400 of transform ops. (Implementations must use the provided rewriter to modify IR.)
 401 If a payload op is erased, it is automatically removed from all handles that it
 402 is currently associated with. If a payload op is replaced, the transform dialect
 403 tries to find the replacement op and updates all handles accordingly. If a
 404 multi-result op is replaced with values that are defined by multiple ops, or if
 405 an op is replaced with an op of a different type, an error is produced. This is
 406 because it is unclear whether the direct replacements actually represent the
 407 computation of the original op. There are ways to customize this behavior. More
 408 details can be found at the documentation of `transform::TrackingListener`.