llvm/docs/HowToUpdateDebugInfo.rst

   1 =======================================================
   2 How to Update Debug Info: A Guide for LLVM Pass Authors
   3 =======================================================
   4
   5 .. contents::
   6    :local:
   7
   8 Introduction
   9 ============
  10
  11 Certain kinds of code transformations can inadvertently result in a loss of
  12 debug info, or worse, make debug info misrepresent the state of a program.
  13
  14 This document specifies how to correctly update debug info in various kinds of
  15 code transformations, and offers suggestions for how to create targeted debug
  16 info tests for arbitrary transformations.
  17
  18 For more on the philosophy behind LLVM debugging information, see
  19 :doc:`SourceLevelDebugging`.
  20
  21 Rules for updating debug locations
  22 ==================================
  23
  24 .. _WhenToPreserveLocation:
  25
  26 When to preserve an instruction location
  27 ----------------------------------------
  28
  29 A transformation should preserve the debug location of an instruction if the
  30 instruction either remains in its basic block, or if its basic block is folded
  31 into a predecessor that branches unconditionally. The APIs to use are
  32 ``IRBuilder``, or ``Instruction::setDebugLoc``.
  33
  34 The purpose of this rule is to ensure that common block-local optimizations
  35 preserve the ability to set breakpoints on source locations corresponding to
  36 the instructions they touch. Debugging, crash logs, and SamplePGO accuracy
  37 would be severely impacted if that ability were lost.
  38
  39 Examples of transformations that should follow this rule include:
  40
  41 * Instruction scheduling. Block-local instruction reordering should not drop
  42   source locations, even though this may lead to jumpy single-stepping
  43   behavior.
  44
  45 * Simple jump threading. For example, if block ``B1`` unconditionally jumps to
  46   ``B2``, *and* is its unique predecessor, instructions from ``B2`` can be
  47   hoisted into ``B1``. Source locations from ``B2`` should be preserved.
  48
  49 * Peephole optimizations that replace or expand an instruction, like ``(add X
  50   X) => (shl X 1)``. The location of the ``shl`` instruction should be the same
  51   as the location of the ``add`` instruction.
  52
  53 * Tail duplication. For example, if blocks ``B1`` and ``B2`` both
  54   unconditionally branch to ``B3`` and ``B3`` can be folded into its
  55   predecessors, source locations from ``B3`` should be preserved.
  56
  57 Examples of transformations for which this rule *does not* apply include:
  58
  59 * LICM. E.g., if an instruction is moved from the loop body to the preheader,
  60   the rule for :ref:`dropping locations<WhenToDropLocation>` applies.
  61
  62 In addition to the rule above, a transformation should also preserve the debug
  63 location of an instruction that is moved between basic blocks, if the
  64 destination block already contains an instruction with an identical debug
  65 location.
  66
  67 Examples of transformations that should follow this rule include:
  68
  69 * Moving instructions between basic blocks. For example, if instruction ``I1``
  70   in ``BB1`` is moved before ``I2`` in ``BB2``, the source location of ``I1``
  71   can be preserved if it has the same source location as ``I2``.
  72
  73 .. _WhenToMergeLocation:
  74
  75 When to merge instruction locations
  76 -----------------------------------
  77
  78 A transformation should merge instruction locations if it replaces multiple
  79 instructions with a single merged instruction, *and* that merged instruction
  80 does not correspond to any of the original instructions' locations. The API to
  81 use is ``Instruction::applyMergedLocation``.
  82
  83 The purpose of this rule is to ensure that a) the single merged instruction
  84 has a location with an accurate scope attached, and b) to prevent misleading
  85 single-stepping (or breakpoint) behavior. Often, merged instructions are memory
  86 accesses which can trap: having an accurate scope attached greatly assists in
  87 crash triage by identifying the (possibly inlined) function where the bad
  88 memory access occurred. This rule is also meant to assist SamplePGO by banning
  89 scenarios in which a sample of a block containing a merged instruction is
  90 misattributed to a block containing one of the instructions-to-be-merged.
  91
  92 Examples of transformations that should follow this rule include:
  93
  94 * Merging identical loads/stores which occur on both sides of a CFG diamond
  95   (see the ``MergedLoadStoreMotion`` pass).
  96
  97 * Merging identical loop-invariant stores (see the LICM utility
  98   ``llvm::promoteLoopAccessesToScalars``).
  99
 100 * Peephole optimizations which combine multiple instructions together, like
 101   ``(add (mul A B) C) => llvm.fma.f32(A, B, C)``.  Note that the location of
 102   the ``fma`` does not exactly correspond to the locations of either the
 103   ``mul`` or the ``add`` instructions.
 104
 105 Examples of transformations for which this rule *does not* apply include:
 106
 107 * Block-local peepholes which delete redundant instructions, like
 108   ``(sext (zext i8 %x to i16) to i32) => (zext i8 %x to i32)``. The inner
 109   ``zext`` is modified but remains in its block, so the rule for
 110   :ref:`preserving locations<WhenToPreserveLocation>` should apply.
 111
 112 * Converting an if-then-else CFG diamond into a ``select``. Preserving the
 113   debug locations of speculated instructions can make it seem like a condition
 114   is true when it's not (or vice versa), which leads to a confusing
 115   single-stepping experience. The rule for
 116   :ref:`dropping locations<WhenToDropLocation>` should apply here.
 117
 118 * Hoisting identical instructions which appear in several successor blocks into
 119   a predecessor block (see ``BranchFolder::HoistCommonCodeInSuccs``). In this
 120   case there is no single merged instruction. The rule for
 121   :ref:`dropping locations<WhenToDropLocation>` applies.
 122
 123 .. _WhenToDropLocation:
 124
 125 When to drop an instruction location
 126 ------------------------------------
 127
 128 A transformation should drop debug locations if the rules for
 129 :ref:`preserving<WhenToPreserveLocation>` and
 130 :ref:`merging<WhenToMergeLocation>` debug locations do not apply. The API to
 131 use is ``Instruction::dropLocation()``.
 132
 133 The purpose of this rule is to prevent erratic or misleading single-stepping
 134 behavior in situations in which an instruction has no clear, unambiguous
 135 relationship to a source location.
 136
 137 To handle an instruction without a location, the DWARF generator
 138 defaults to allowing the last-set location after a label to cascade forward, or
 139 to setting a line 0 location with viable scope information if no previous
 140 location is available.
 141
 142 See the discussion in the section about
 143 :ref:`merging locations<WhenToMergeLocation>` for examples of when the rule for
 144 dropping locations applies.
 145
 146 Rules for updating debug values
 147 ===============================
 148
 149 Deleting an IR-level Instruction
 150 --------------------------------
 151
 152 When an ``Instruction`` is deleted, its debug uses change to ``undef``. This is
 153 a loss of debug info: the value of one or more source variables becomes
 154 unavailable, starting with the ``llvm.dbg.value(undef, ...)``. When there is no
 155 way to reconstitute the value of the lost instruction, this is the best
 156 possible outcome. However, it's often possible to do better:
 157
 158 * If the dying instruction can be RAUW'd, do so. The
 159   ``Value::replaceAllUsesWith`` API transparently updates debug uses of the
 160   dying instruction to point to the replacement value.
 161
 162 * If the dying instruction cannot be RAUW'd, call ``llvm::salvageDebugInfo`` on
 163   it. This makes a best-effort attempt to rewrite debug uses of the dying
 164   instruction by describing its effect as a ``DIExpression``.
 165
 166 * If one of the **operands** of a dying instruction would become trivially
 167   dead, use ``llvm::replaceAllDbgUsesWith`` to rewrite the debug uses of that
 168   operand. Consider the following example function:
 169
 170 .. code-block:: llvm
 171
 172   define i16 @foo(i16 %a) {
 173     %b = sext i16 %a to i32
 174     %c = and i32 %b, 15
 175     call void @llvm.dbg.value(metadata i32 %c, ...)
 176     %d = trunc i32 %c to i16
 177     ret i16 %d
 178   }
 179
 180 Now, here's what happens after the unnecessary truncation instruction ``%d`` is
 181 replaced with a simplified instruction:
 182
 183 .. code-block:: llvm
 184
 185   define i16 @foo(i16 %a) {
 186     call void @llvm.dbg.value(metadata i32 undef, ...)
 187     %simplified = and i16 %a, 15
 188     ret i16 %simplified
 189   }
 190
 191 Note that after deleting ``%d``, all uses of its operand ``%c`` become
 192 trivially dead. The debug use which used to point to ``%c`` is now ``undef``,
 193 and debug info is needlessly lost.
 194
 195 To solve this problem, do:
 196
 197 .. code-block:: cpp
 198
 199   llvm::replaceAllDbgUsesWith(%c, theSimplifiedAndInstruction, ...)
 200
 201 This results in better debug info because the debug use of ``%c`` is preserved:
 202
 203 .. code-block:: llvm
 204
 205   define i16 @foo(i16 %a) {
 206     %simplified = and i16 %a, 15
 207     call void @llvm.dbg.value(metadata i16 %simplified, ...)
 208     ret i16 %simplified
 209   }
 210
 211 You may have noticed that ``%simplified`` is narrower than ``%c``: this is not
 212 a problem, because ``llvm::replaceAllDbgUsesWith`` takes care of inserting the
 213 necessary conversion operations into the DIExpressions of updated debug uses.
 214
 215 Deleting a MIR-level MachineInstr
 216 ---------------------------------
 217
 218 TODO
 219
 220 Rules for updating ``DIAssignID`` Attachments
 221 =============================================
 222
 223 ``DIAssignID`` metadata attachments are used by Assignment Tracking, which is
 224 currently an experimental debug mode.
 225
 226 See :doc:`AssignmentTracking` for how to update them and for more info on
 227 Assignment Tracking.
 228
 229 How to automatically convert tests into debug info tests
 230 ========================================================
 231
 232 .. _IRDebugify:
 233
 234 Mutation testing for IR-level transformations
 235 ---------------------------------------------
 236
 237 An IR test case for a transformation can, in many cases, be automatically
 238 mutated to test debug info handling within that transformation. This is a
 239 simple way to test for proper debug info handling.
 240
 241 The ``debugify`` utility pass
 242 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 243
 244 The ``debugify`` testing utility is just a pair of passes: ``debugify`` and
 245 ``check-debugify``.
 246
 247 The first applies synthetic debug information to every instruction of the
 248 module, and the second checks that this DI is still available after an
 249 optimization has occurred, reporting any errors/warnings while doing so.
 250
 251 The instructions are assigned sequentially increasing line locations, and are
 252 immediately used by debug value intrinsics everywhere possible.
 253
 254 For example, here is a module before:
 255
 256 .. code-block:: llvm
 257
 258    define void @f(i32* %x) {
 259    entry:
 260      %x.addr = alloca i32*, align 8
 261      store i32* %x, i32** %x.addr, align 8
 262      %0 = load i32*, i32** %x.addr, align 8
 263      store i32 10, i32* %0, align 4
 264      ret void
 265    }
 266
 267 and after running ``opt -debugify``:
 268
 269 .. code-block:: llvm
 270
 271    define void @f(i32* %x) !dbg !6 {
 272    entry:
 273      %x.addr = alloca i32*, align 8, !dbg !12
 274      call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12
 275      store i32* %x, i32** %x.addr, align 8, !dbg !13
 276      %0 = load i32*, i32** %x.addr, align 8, !dbg !14
 277      call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14
 278      store i32 10, i32* %0, align 4, !dbg !15
 279      ret void, !dbg !16
 280    }
 281
 282    !llvm.dbg.cu = !{!0}
 283    !llvm.debugify = !{!3, !4}
 284    !llvm.module.flags = !{!5}
 285
 286    !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
 287    !1 = !DIFile(filename: "debugify-sample.ll", directory: "/")
 288    !2 = !{}
 289    !3 = !{i32 5}
 290    !4 = !{i32 2}
 291    !5 = !{i32 2, !"Debug Info Version", i32 3}
 292    !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8)
 293    !7 = !DISubroutineType(types: !2)
 294    !8 = !{!9, !11}
 295    !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
 296    !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
 297    !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
 298    !12 = !DILocation(line: 1, column: 1, scope: !6)
 299    !13 = !DILocation(line: 2, column: 1, scope: !6)
 300    !14 = !DILocation(line: 3, column: 1, scope: !6)
 301    !15 = !DILocation(line: 4, column: 1, scope: !6)
 302    !16 = !DILocation(line: 5, column: 1, scope: !6)
 303
 304 Using ``debugify``
 305 ^^^^^^^^^^^^^^^^^^
 306
 307 A simple way to use ``debugify`` is as follows:
 308
 309 .. code-block:: bash
 310
 311   $ opt -debugify -pass-to-test -check-debugify sample.ll
 312
 313 This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test`` and
 314 then check for missing DI. The ``-check-debugify`` step can of course be
 315 omitted in favor of more customizable FileCheck directives.
 316
 317 Some other ways to run debugify are available:
 318
 319 .. code-block:: bash
 320
 321    # Same as the above example.
 322    $ opt -enable-debugify -pass-to-test sample.ll
 323
 324    # Suppresses verbose debugify output.
 325    $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll
 326
 327    # Prepend -debugify before and append -check-debugify -strip after
 328    # each pass on the pipeline (similar to -verify-each).
 329    $ opt -debugify-each -O2 sample.ll
 330
 331 In order for ``check-debugify`` to work, the DI must be coming from
 332 ``debugify``. Thus, modules with existing DI will be skipped.
 333
 334 ``debugify`` can be used to test a backend, e.g:
 335
 336 .. code-block:: bash
 337
 338    $ opt -debugify < sample.ll | llc -o -
 339
 340 There is also a MIR-level debugify pass that can be run before each backend
 341 pass, see:
 342 :ref:`Mutation testing for MIR-level transformations<MIRDebugify>`.
 343
 344 ``debugify`` in regression tests
 345 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 346
 347 The output of the ``debugify`` pass must be stable enough to use in regression
 348 tests. Changes to this pass are not allowed to break existing tests.
 349
 350 .. note::
 351
 352    Regression tests must be robust. Avoid hardcoding line/variable numbers in
 353    check lines. In cases where this can't be avoided (say, if a test wouldn't
 354    be precise enough), moving the test to its own file is preferred.
 355
 356 .. _MIRDebugify:
 357
 358 Test original debug info preservation in optimizations
 359 ------------------------------------------------------
 360
 361 In addition to automatically generating debug info, the checks provided by
 362 the ``debugify`` utility pass can also be used to test the preservation of
 363 pre-existing debug info metadata. It could be run as follows:
 364
 365 .. code-block:: bash
 366
 367   # Run the pass by checking original Debug Info preservation.
 368   $ opt -verify-debuginfo-preserve -pass-to-test sample.ll
 369
 370   # Check the preservation of original Debug Info after each pass.
 371   $ opt -verify-each-debuginfo-preserve -O2 sample.ll
 372
 373 Limit number of observed functions to speed up the analysis:
 374
 375 .. code-block:: bash
 376
 377   # Test up to 100 functions (per compile unit) per pass.
 378   $ opt -verify-each-debuginfo-preserve -O2 -debugify-func-limit=100 sample.ll
 379
 380 Please do note that running ``-verify-each-debuginfo-preserve`` on big projects
 381 could be heavily time consuming. Therefore, we suggest using
 382 ``-debugify-func-limit`` with a suitable limit number to prevent extremely long
 383 builds.
 384
 385 Furthermore, there is a way to export the issues that have been found into
 386 a JSON file as follows:
 387
 388 .. code-block:: bash
 389
 390   $ opt -verify-debuginfo-preserve -verify-di-preserve-export=sample.json -pass-to-test sample.ll
 391
 392 and then use the ``llvm/utils/llvm-original-di-preservation.py`` script
 393 to generate an HTML page with the issues reported in a more human readable form
 394 as follows:
 395
 396 .. code-block:: bash
 397
 398   $ llvm-original-di-preservation.py sample.json sample.html
 399
 400 Testing of original debug info preservation can be invoked from front-end level
 401 as follows:
 402
 403 .. code-block:: bash
 404
 405   # Test each pass.
 406   $ clang -Xclang -fverify-debuginfo-preserve -g -O2 sample.c
 407
 408   # Test each pass and export the issues report into the JSON file.
 409   $ clang -Xclang -fverify-debuginfo-preserve -Xclang -fverify-debuginfo-preserve-export=sample.json -g -O2 sample.c
 410
 411 Please do note that there are some known false positives, for source locations
 412 and debug intrinsic checking, so that will be addressed as a future work.
 413
 414 Mutation testing for MIR-level transformations
 415 ----------------------------------------------
 416
 417 A variant of the ``debugify`` utility described in
 418 :ref:`Mutation testing for IR-level transformations<IRDebugify>` can be used
 419 for MIR-level transformations as well: much like the IR-level pass,
 420 ``mir-debugify`` inserts sequentially increasing line locations to each
 421 ``MachineInstr`` in a ``Module``. And the MIR-level ``mir-check-debugify`` is
 422 similar to IR-level ``check-debugify`` pass.
 423
 424 For example, here is a snippet before:
 425
 426 .. code-block:: llvm
 427
 428   name:            test
 429   body:             |
 430     bb.1 (%ir-block.0):
 431       %0:_(s32) = IMPLICIT_DEF
 432       %1:_(s32) = IMPLICIT_DEF
 433       %2:_(s32) = G_CONSTANT i32 2
 434       %3:_(s32) = G_ADD %0, %2
 435       %4:_(s32) = G_SUB %3, %1
 436
 437 and after running ``llc -run-pass=mir-debugify``:
 438
 439 .. code-block:: llvm
 440
 441   name:            test
 442   body:             |
 443     bb.0 (%ir-block.0):
 444       %0:_(s32) = IMPLICIT_DEF debug-location !12
 445       DBG_VALUE %0(s32), $noreg, !9, !DIExpression(), debug-location !12
 446       %1:_(s32) = IMPLICIT_DEF debug-location !13
 447       DBG_VALUE %1(s32), $noreg, !11, !DIExpression(), debug-location !13
 448       %2:_(s32) = G_CONSTANT i32 2, debug-location !14
 449       DBG_VALUE %2(s32), $noreg, !9, !DIExpression(), debug-location !14
 450       %3:_(s32) = G_ADD %0, %2, debug-location !DILocation(line: 4, column: 1, scope: !6)
 451       DBG_VALUE %3(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !6)
 452       %4:_(s32) = G_SUB %3, %1, debug-location !DILocation(line: 5, column: 1, scope: !6)
 453       DBG_VALUE %4(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !6)
 454
 455 By default, ``mir-debugify`` inserts ``DBG_VALUE`` instructions **everywhere**
 456 it is legal to do so.  In particular, every (non-PHI) machine instruction that
 457 defines a register must be followed by a ``DBG_VALUE`` use of that def.  If
 458 an instruction does not define a register, but can be followed by a debug inst,
 459 MIRDebugify inserts a ``DBG_VALUE`` that references a constant.  Insertion of
 460 ``DBG_VALUE``'s can be disabled by setting ``-debugify-level=locations``.
 461
 462 To run MIRDebugify once, simply insert ``mir-debugify`` into your ``llc``
 463 invocation, like:
 464
 465 .. code-block:: bash
 466
 467   # Before some other pass.
 468   $ llc -run-pass=mir-debugify,other-pass ...
 469
 470   # After some other pass.
 471   $ llc -run-pass=other-pass,mir-debugify ...
 472
 473 To run MIRDebugify before each pass in a pipeline, use
 474 ``-debugify-and-strip-all-safe``. This can be combined with ``-start-before``
 475 and ``-start-after``. For example:
 476
 477 .. code-block:: bash
 478
 479   $ llc -debugify-and-strip-all-safe -run-pass=... <other llc args>
 480   $ llc -debugify-and-strip-all-safe -O1 <other llc args>
 481
 482 If you want to check it after each pass in a pipeline, use
 483 ``-debugify-check-and-strip-all-safe``. This can also be combined with
 484 ``-start-before`` and ``-start-after``. For example:
 485
 486 .. code-block:: bash
 487
 488   $ llc -debugify-check-and-strip-all-safe -run-pass=... <other llc args>
 489   $ llc -debugify-check-and-strip-all-safe -O1 <other llc args>
 490
 491 To check all debug info from a test, use ``mir-check-debugify``, like:
 492
 493 .. code-block:: bash
 494
 495   $ llc -run-pass=mir-debugify,other-pass,mir-check-debugify
 496
 497 To strip out all debug info from a test, use ``mir-strip-debug``, like:
 498
 499 .. code-block:: bash
 500
 501   $ llc -run-pass=mir-debugify,other-pass,mir-strip-debug
 502
 503 It can be useful to combine ``mir-debugify``, ``mir-check-debugify`` and/or
 504 ``mir-strip-debug`` to identify backend transformations which break in
 505 the presence of debug info. For example, to run the AArch64 backend tests
 506 with all normal passes "sandwiched" in between MIRDebugify and
 507 MIRStripDebugify mutation passes, run:
 508
 509 .. code-block:: bash
 510
 511   $ llvm-lit test/CodeGen/AArch64 -Dllc="llc -debugify-and-strip-all-safe"
 512
 513 Using LostDebugLocObserver
 514 --------------------------
 515
 516 TODO