llvm/docs/HowToUpdateDebugInfo.rst

   1 =======================================================
   2 How to Update Debug Info: A Guide for LLVM Pass Authors
   3 =======================================================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 Certain kinds of code transformations can inadvertently result in a loss of
  12 debug info, or worse, make debug info misrepresent the state of a program.
  13
  14 This document specifies how to correctly update debug info in various kinds of
  15 code transformations, and offers suggestions for how to create targeted debug
  16 info tests for arbitrary transformations.
  17
  18 For more on the philosophy behind LLVM debugging information, see
  19 :doc:`SourceLevelDebugging`.
  20
  21 Rules for updating debug locations
  22 ==================================
  23
  24 .. _WhenToPreserveLocation:
  25
  26 When to preserve an instruction location
  27 ----------------------------------------
  28
  29 A transformation should preserve the debug location of an instruction if the
  30 instruction either remains in its basic block, or if its basic block is folded
  31 into a predecessor that branches unconditionally. The APIs to use are
  32 ``IRBuilder``, or ``Instruction::setDebugLoc``.
  33
  34 The purpose of this rule is to ensure that common block-local optimizations
  35 preserve the ability to set breakpoints on source locations corresponding to
  36 the instructions they touch. Debugging, crash logs, and SamplePGO accuracy
  37 would be severely impacted if that ability were lost.
  38
  39 Examples of transformations that should follow this rule include:
  40
  41 * Instruction scheduling. Block-local instruction reordering should not drop
  42   source locations, even though this may lead to jumpy single-stepping
  43   behavior.
  44
  45 * Simple jump threading. For example, if block ``B1`` unconditionally jumps to
  46   ``B2``, *and* is its unique predecessor, instructions from ``B2`` can be
  47   hoisted into ``B1``. Source locations from ``B2`` should be preserved.
  48
  49 * Peephole optimizations that replace or expand an instruction, like ``(add X
  50   X) => (shl X 1)``. The location of the ``shl`` instruction should be the same
  51   as the location of the ``add`` instruction.
  52
  53 * Tail duplication. For example, if blocks ``B1`` and ``B2`` both
  54   unconditionally branch to ``B3`` and ``B3`` can be folded into its
  55   predecessors, source locations from ``B3`` should be preserved.
  56
  57 Examples of transformations for which this rule *does not* apply include:
  58
  59 * LICM. E.g., if an instruction is moved from the loop body to the preheader,
  60   the rule for :ref:`dropping locations<WhenToDropLocation>` applies.
  61
  62 In addition to the rule above, a transformation should also preserve the debug
  63 location of an instruction that is moved between basic blocks, if the
  64 destination block already contains an instruction with an identical debug
  65 location.
  66
  67 Examples of transformations that should follow this rule include:
  68
  69 * Moving instructions between basic blocks. For example, if instruction ``I1``
  70   in ``BB1`` is moved before ``I2`` in ``BB2``, the source location of ``I1``
  71   can be preserved if it has the same source location as ``I2``.
  72
  73 .. _WhenToMergeLocation:
  74
  75 When to merge instruction locations
  76 -----------------------------------
  77
  78 A transformation should merge instruction locations if it replaces multiple
  79 instructions with a single merged instruction, *and* that merged instruction
  80 does not correspond to any of the original instructions' locations. The API to
  81 use is ``Instruction::applyMergedLocation``.
  82
  83 The purpose of this rule is to ensure that a) the single merged instruction
  84 has a location with an accurate scope attached, and b) to prevent misleading
  85 single-stepping (or breakpoint) behavior. Often, merged instructions are memory
  86 accesses which can trap: having an accurate scope attached greatly assists in
  87 crash triage by identifying the (possibly inlined) function where the bad
  88 memory access occurred. This rule is also meant to assist SamplePGO by banning
  89 scenarios in which a sample of a block containing a merged instruction is
  90 misattributed to a block containing one of the instructions-to-be-merged.
  91
  92 Examples of transformations that should follow this rule include:
  93
  94 * Merging identical loads/stores which occur on both sides of a CFG diamond
  95   (see the ``MergedLoadStoreMotion`` pass).
  96
  97 * Merging identical loop-invariant stores (see the LICM utility
  98   ``llvm::promoteLoopAccessesToScalars``).
  99
 100 * Peephole optimizations which combine multiple instructions together, like
 101   ``(add (mul A B) C) => llvm.fma.f32(A, B, C)``.  Note that the location of
 102   the ``fma`` does not exactly correspond to the locations of either the
 103   ``mul`` or the ``add`` instructions.
 104
 105 Examples of transformations for which this rule *does not* apply include:
 106
 107 * Block-local peepholes which delete redundant instructions, like
 108   ``(sext (zext i8 %x to i16) to i32) => (zext i8 %x to i32)``. The inner
 109   ``zext`` is modified but remains in its block, so the rule for
 110   :ref:`preserving locations<WhenToPreserveLocation>` should apply.
 111
 112 * Converting an if-then-else CFG diamond into a ``select``. Preserving the
 113   debug locations of speculated instructions can make it seem like a condition
 114   is true when it's not (or vice versa), which leads to a confusing
 115   single-stepping experience. The rule for
 116   :ref:`dropping locations<WhenToDropLocation>` should apply here.
 117
 118 * Hoisting identical instructions which appear in several successor blocks into
 119   a predecessor block (see ``BranchFolder::HoistCommonCodeInSuccs``). In this
 120   case there is no single merged instruction. The rule for
 121   :ref:`dropping locations<WhenToDropLocation>` applies.
 122
 123 .. _WhenToDropLocation:
 124
 125 When to drop an instruction location
 126 ------------------------------------
 127
 128 A transformation should drop debug locations if the rules for
 129 :ref:`preserving<WhenToPreserveLocation>` and
 130 :ref:`merging<WhenToMergeLocation>` debug locations do not apply. The API to
 131 use is ``Instruction::dropLocation()``.
 132
 133 The purpose of this rule is to prevent erratic or misleading single-stepping
 134 behavior in situations in which an instruction has no clear, unambiguous
 135 relationship to a source location.
 136
 137 To handle an instruction without a location, the DWARF generator
 138 defaults to allowing the last-set location after a label to cascade forward, or
 139 to setting a line 0 location with viable scope information if no previous
 140 location is available.
 141
 142 See the discussion in the section about
 143 :ref:`merging locations<WhenToMergeLocation>` for examples of when the rule for
 144 dropping locations applies.
 145
 146 Rules for updating debug values
 147 ===============================
 148
 149 Deleting an IR-level Instruction
 150 --------------------------------
 151
 152 When an ``Instruction`` is deleted, its debug uses change to ``undef``. This is
 153 a loss of debug info: the value of one or more source variables becomes
 154 unavailable, starting with the ``llvm.dbg.value(undef, ...)``. When there is no
 155 way to reconstitute the value of the lost instruction, this is the best
 156 possible outcome. However, it's often possible to do better:
 157
 158 * If the dying instruction can be RAUW'd, do so. The
 159   ``Value::replaceAllUsesWith`` API transparently updates debug uses of the
 160   dying instruction to point to the replacement value.
 161
 162 * If the dying instruction cannot be RAUW'd, call ``llvm::salvageDebugInfo`` on
 163   it. This makes a best-effort attempt to rewrite debug uses of the dying
 164   instruction by describing its effect as a ``DIExpression``.
 165
 166 * If one of the **operands** of a dying instruction would become trivially
 167   dead, use ``llvm::replaceAllDbgUsesWith`` to rewrite the debug uses of that
 168   operand. Consider the following example function:
 169
 170 .. code-block:: llvm
 171
 172   define i16 @foo(i16 %a) {
 173     %b = sext i16 %a to i32
 174     %c = and i32 %b, 15
 175     call void @llvm.dbg.value(metadata i32 %c, ...)
 176     %d = trunc i32 %c to i16
 177     ret i16 %d
 178   }
 179
 180 Now, here's what happens after the unnecessary truncation instruction ``%d`` is
 181 replaced with a simplified instruction:
 182
 183 .. code-block:: llvm
 184
 185   define i16 @foo(i16 %a) {
 186     call void @llvm.dbg.value(metadata i32 undef, ...)
 187     %simplified = and i16 %a, 15
 188     ret i16 %simplified
 189   }
 190
 191 Note that after deleting ``%d``, all uses of its operand ``%c`` become
 192 trivially dead. The debug use which used to point to ``%c`` is now ``undef``,
 193 and debug info is needlessly lost.
 194
 195 To solve this problem, do:
 196
 197 .. code-block:: cpp
 198
 199   llvm::replaceAllDbgUsesWith(%c, theSimplifiedAndInstruction, ...)
 200
 201 This results in better debug info because the debug use of ``%c`` is preserved:
 202
 203 .. code-block:: llvm
 204
 205   define i16 @foo(i16 %a) {
 206     %simplified = and i16 %a, 15
 207     call void @llvm.dbg.value(metadata i16 %simplified, ...)
 208     ret i16 %simplified
 209   }
 210
 211 You may have noticed that ``%simplified`` is narrower than ``%c``: this is not
 212 a problem, because ``llvm::replaceAllDbgUsesWith`` takes care of inserting the
 213 necessary conversion operations into the DIExpressions of updated debug uses.
 214
 215 Deleting a MIR-level MachineInstr
 216 ---------------------------------
 217
 218 TODO
 219
 220 How to automatically convert tests into debug info tests
 221 ========================================================
 222
 223 .. _IRDebugify:
 224
 225 Mutation testing for IR-level transformations
 226 ---------------------------------------------
 227
 228 An IR test case for a transformation can, in many cases, be automatically
 229 mutated to test debug info handling within that transformation. This is a
 230 simple way to test for proper debug info handling.
 231
 232 The ``debugify`` utility pass
 233 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 234
 235 The ``debugify`` testing utility is just a pair of passes: ``debugify`` and
 236 ``check-debugify``.
 237
 238 The first applies synthetic debug information to every instruction of the
 239 module, and the second checks that this DI is still available after an
 240 optimization has occurred, reporting any errors/warnings while doing so.
 241
 242 The instructions are assigned sequentially increasing line locations, and are
 243 immediately used by debug value intrinsics everywhere possible.
 244
 245 For example, here is a module before:
 246
 247 .. code-block:: llvm
 248
 249    define void @f(i32* %x) {
 250    entry:
 251      %x.addr = alloca i32*, align 8
 252      store i32* %x, i32** %x.addr, align 8
 253      %0 = load i32*, i32** %x.addr, align 8
 254      store i32 10, i32* %0, align 4
 255      ret void
 256    }
 257
 258 and after running ``opt -debugify``:
 259
 260 .. code-block:: llvm
 261
 262    define void @f(i32* %x) !dbg !6 {
 263    entry:
 264      %x.addr = alloca i32*, align 8, !dbg !12
 265      call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12
 266      store i32* %x, i32** %x.addr, align 8, !dbg !13
 267      %0 = load i32*, i32** %x.addr, align 8, !dbg !14
 268      call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14
 269      store i32 10, i32* %0, align 4, !dbg !15
 270      ret void, !dbg !16
 271    }
 272
 273    !llvm.dbg.cu = !{!0}
 274    !llvm.debugify = !{!3, !4}
 275    !llvm.module.flags = !{!5}
 276
 277    !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
 278    !1 = !DIFile(filename: "debugify-sample.ll", directory: "/")
 279    !2 = !{}
 280    !3 = !{i32 5}
 281    !4 = !{i32 2}
 282    !5 = !{i32 2, !"Debug Info Version", i32 3}
 283    !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8)
 284    !7 = !DISubroutineType(types: !2)
 285    !8 = !{!9, !11}
 286    !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
 287    !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
 288    !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
 289    !12 = !DILocation(line: 1, column: 1, scope: !6)
 290    !13 = !DILocation(line: 2, column: 1, scope: !6)
 291    !14 = !DILocation(line: 3, column: 1, scope: !6)
 292    !15 = !DILocation(line: 4, column: 1, scope: !6)
 293    !16 = !DILocation(line: 5, column: 1, scope: !6)
 294
 295 Using ``debugify``
 296 ^^^^^^^^^^^^^^^^^^
 297
 298 A simple way to use ``debugify`` is as follows:
 299
 300 .. code-block:: bash
 301
 302   $ opt -debugify -pass-to-test -check-debugify sample.ll
 303
 304 This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test`` and
 305 then check for missing DI. The ``-check-debugify`` step can of course be
 306 omitted in favor of more customizable FileCheck directives.
 307
 308 Some other ways to run debugify are available:
 309
 310 .. code-block:: bash
 311
 312    # Same as the above example.
 313    $ opt -enable-debugify -pass-to-test sample.ll
 314
 315    # Suppresses verbose debugify output.
 316    $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll
 317
 318    # Prepend -debugify before and append -check-debugify -strip after
 319    # each pass on the pipeline (similar to -verify-each).
 320    $ opt -debugify-each -O2 sample.ll
 321
 322 In order for ``check-debugify`` to work, the DI must be coming from
 323 ``debugify``. Thus, modules with existing DI will be skipped.
 324
 325 ``debugify`` can be used to test a backend, e.g:
 326
 327 .. code-block:: bash
 328
 329    $ opt -debugify < sample.ll | llc -o -
 330
 331 There is also a MIR-level debugify pass that can be run before each backend
 332 pass, see:
 333 :ref:`Mutation testing for MIR-level transformations<MIRDebugify>`.
 334
 335 ``debugify`` in regression tests
 336 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 337
 338 The output of the ``debugify`` pass must be stable enough to use in regression
 339 tests. Changes to this pass are not allowed to break existing tests.
 340
 341 .. note::
 342
 343    Regression tests must be robust. Avoid hardcoding line/variable numbers in
 344    check lines. In cases where this can't be avoided (say, if a test wouldn't
 345    be precise enough), moving the test to its own file is preferred.
 346
 347 .. _MIRDebugify:
 348
 349 Test original debug info preservation in optimizations
 350 ------------------------------------------------------
 351
 352 In addition to automatically generating debug info, the checks provided by
 353 the ``debugify`` utility pass can also be used to test the preservation of
 354 pre-existing debug info metadata. It could be run as follows:
 355
 356 .. code-block:: bash
 357
 358   # Run the pass by checking original Debug Info preservation.
 359   $ opt -verify-debuginfo-preserve -pass-to-test sample.ll
 360
 361   # Check the preservation of original Debug Info after each pass.
 362   $ opt -verify-each-debuginfo-preserve -O2 sample.ll
 363
 364 Furthermore, there is a way to export the issues that have been found into
 365 a JSON file as follows:
 366
 367 .. code-block:: bash
 368
 369   $ opt -verify-debuginfo-preserve -verify-di-preserve-export=sample.json -pass-to-test sample.ll
 370
 371 and then use the ``llvm/utils/llvm-original-di-preservation.py`` script
 372 to generate an HTML page with the issues reported in a more human readable form
 373 as follows:
 374
 375 .. code-block:: bash
 376
 377   $ llvm-original-di-preservation.py sample.json sample.html
 378
 379 Testing of original debug info preservation can be invoked from front-end level
 380 as follows:
 381
 382 .. code-block:: bash
 383
 384   # Test each pass.
 385   $ clang -Xclang -fverify-debuginfo-preserve -g -O2 sample.c
 386
 387   # Test each pass and export the issues report into the JSON file.
 388   $ clang -Xclang -fverify-debuginfo-preserve -Xclang -fverify-debuginfo-preserve-export=sample.json -g -O2 sample.c
 389
 390 Please do note that there are some known false positives, for source locations
 391 and debug intrinsic checking, so that will be addressed as a future work.
 392
 393 Mutation testing for MIR-level transformations
 394 ----------------------------------------------
 395
 396 A variant of the ``debugify`` utility described in
 397 :ref:`Mutation testing for IR-level transformations<IRDebugify>` can be used
 398 for MIR-level transformations as well: much like the IR-level pass,
 399 ``mir-debugify`` inserts sequentially increasing line locations to each
 400 ``MachineInstr`` in a ``Module``. And the MIR-level ``mir-check-debugify`` is
 401 similar to IR-level ``check-debugify`` pass.
 402
 403 For example, here is a snippet before:
 404
 405 .. code-block:: llvm
 406
 407   name:            test
 408   body:             |
 409     bb.1 (%ir-block.0):
 410       %0:_(s32) = IMPLICIT_DEF
 411       %1:_(s32) = IMPLICIT_DEF
 412       %2:_(s32) = G_CONSTANT i32 2
 413       %3:_(s32) = G_ADD %0, %2
 414       %4:_(s32) = G_SUB %3, %1
 415
 416 and after running ``llc -run-pass=mir-debugify``:
 417
 418 .. code-block:: llvm
 419
 420   name:            test
 421   body:             |
 422     bb.0 (%ir-block.0):
 423       %0:_(s32) = IMPLICIT_DEF debug-location !12
 424       DBG_VALUE %0(s32), $noreg, !9, !DIExpression(), debug-location !12
 425       %1:_(s32) = IMPLICIT_DEF debug-location !13
 426       DBG_VALUE %1(s32), $noreg, !11, !DIExpression(), debug-location !13
 427       %2:_(s32) = G_CONSTANT i32 2, debug-location !14
 428       DBG_VALUE %2(s32), $noreg, !9, !DIExpression(), debug-location !14
 429       %3:_(s32) = G_ADD %0, %2, debug-location !DILocation(line: 4, column: 1, scope: !6)
 430       DBG_VALUE %3(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !6)
 431       %4:_(s32) = G_SUB %3, %1, debug-location !DILocation(line: 5, column: 1, scope: !6)
 432       DBG_VALUE %4(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !6)
 433
 434 By default, ``mir-debugify`` inserts ``DBG_VALUE`` instructions **everywhere**
 435 it is legal to do so.  In particular, every (non-PHI) machine instruction that
 436 defines a register must be followed by a ``DBG_VALUE`` use of that def.  If
 437 an instruction does not define a register, but can be followed by a debug inst,
 438 MIRDebugify inserts a ``DBG_VALUE`` that references a constant.  Insertion of
 439 ``DBG_VALUE``'s can be disabled by setting ``-debugify-level=locations``.
 440
 441 To run MIRDebugify once, simply insert ``mir-debugify`` into your ``llc``
 442 invocation, like:
 443
 444 .. code-block:: bash
 445
 446   # Before some other pass.
 447   $ llc -run-pass=mir-debugify,other-pass ...
 448
 449   # After some other pass.
 450   $ llc -run-pass=other-pass,mir-debugify ...
 451
 452 To run MIRDebugify before each pass in a pipeline, use
 453 ``-debugify-and-strip-all-safe``. This can be combined with ``-start-before``
 454 and ``-start-after``. For example:
 455
 456 .. code-block:: bash
 457
 458   $ llc -debugify-and-strip-all-safe -run-pass=... <other llc args>
 459   $ llc -debugify-and-strip-all-safe -O1 <other llc args>
 460
 461 If you want to check it after each pass in a pipeline, use
 462 ``-debugify-check-and-strip-all-safe``. This can also be combined with
 463 ``-start-before`` and ``-start-after``. For example:
 464
 465 .. code-block:: bash
 466
 467   $ llc -debugify-check-and-strip-all-safe -run-pass=... <other llc args>
 468   $ llc -debugify-check-and-strip-all-safe -O1 <other llc args>
 469
 470 To check all debug info from a test, use ``mir-check-debugify``, like:
 471
 472 .. code-block:: bash
 473
 474   $ llc -run-pass=mir-debugify,other-pass,mir-check-debugify
 475
 476 To strip out all debug info from a test, use ``mir-strip-debug``, like:
 477
 478 .. code-block:: bash
 479
 480   $ llc -run-pass=mir-debugify,other-pass,mir-strip-debug
 481
 482 It can be useful to combine ``mir-debugify``, ``mir-check-debugify`` and/or
 483 ``mir-strip-debug`` to identify backend transformations which break in
 484 the presence of debug info. For example, to run the AArch64 backend tests
 485 with all normal passes "sandwiched" in between MIRDebugify and
 486 MIRStripDebugify mutation passes, run:
 487
 488 .. code-block:: bash
 489
 490   $ llvm-lit test/CodeGen/AArch64 -Dllc="llc -debugify-and-strip-all-safe"
 491
 492 Using LostDebugLocObserver
 493 --------------------------
 494
 495 TODO