llvm/docs/InstrRefDebugInfo.md

   1 # Instruction referencing for debug info
   2
   3 This document explains how LLVM uses value tracking, or instruction
   4 referencing, to determine variable locations for debug info in the code
   5 generation stage of compilation. This content is aimed at those working on code
   6 generation targets and optimisation passes. It may also be of interest to anyone
   7 curious about low-level debug info handling.
   8
   9 # Problem statement
  10
  11 At the end of compilation, LLVM must produce a DWARF location list (or similar)
  12 describing what register or stack location a variable can be found in, for each
  13 instruction in that variable's lexical scope. We could track the virtual
  14 register that the variable resides in through compilation, however this is
  15 vulnerable to register optimisations during regalloc, and instruction
  16 movements.
  17
  18 # Solution: instruction referencing
  19
  20 Rather than identify the virtual register that a variable value resides in,
  21 instead in instruction referencing mode, LLVM refers to the machine instruction
  22 and operand position that the value is defined in. Consider the LLVM IR way of
  23 referring to instruction values:
  24
  25 ```llvm
  26 %2 = add i32 %0, %1
  27   #dbg_value(metadata i32 %2,
  28 ```
  29
  30 In LLVM IR, the IR Value is synonymous with the instruction that computes the
  31 value, to the extent that in memory a Value is a pointer to the computing
  32 instruction. Instruction referencing implements this relationship in the
  33 codegen backend of LLVM, after instruction selection. Consider the X86 assembly
  34 below and instruction referencing debug info, corresponding to the earlier
  35 LLVM IR:
  36
  37 ```text
  38 %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1
  39 DBG_INSTR_REF 1, 0, !123, !456, debug-location !789
  40 ```
  41
  42 While the function remains in SSA form, virtual register `%2` is sufficient to
  43 identify the value computed by the instruction -- however the function
  44 eventually leaves SSA form, and register optimisations will obscure which
  45 register the desired value is in. Instead, a more consistent way of identifying
  46 the instruction's value is to refer to the `MachineOperand` where the value is
  47 defined: independently of which register is defined by that `MachineOperand`. In
  48 the code above, the `DBG_INSTR_REF` instruction refers to instruction number
  49 one, operand zero, while the `ADD32rr` has a `debug-instr-number` attribute
  50 attached indicating that it is instruction number one.
  51
  52 De-coupling variable locations from registers avoids difficulties involving
  53 register allocation and optimisation, but requires additional instrumentation
  54 when the instructions are optimised instead. Optimisations that replace
  55 instructions with optimised versions that compute the same value must either
  56 preserve the instruction number, or record a substitution from the old
  57 instruction / operand number pair to the new instruction / operand pair -- see
  58 `MachineFunction::substituteDebugValuesForInst`. If debug info maintenance is
  59 not performed, or an instruction is eliminated as dead code, the variable
  60 location is safely dropped and marked "optimised out". The exception is
  61 instructions that are mutated rather than replaced, which always need debug info
  62 maintenance.
  63
  64 # Register allocator considerations
  65
  66 When the register allocator runs, debugging instructions do not directly refer
  67 to any virtual registers, and thus there is no need for expensive location
  68 maintenance during regalloc (i.e. `LiveDebugVariables`). Debug instructions are
  69 unlinked from the function, then linked back in after register allocation
  70 completes.
  71
  72 The exception is `PHI` instructions: these become implicit definitions at
  73 control flow merges once regalloc finishes, and any debug numbers attached to
  74 `PHI` instructions are lost. To circumvent this, debug numbers of `PHI`s are
  75 recorded at the start of register allocation (`phi-node-elimination`), then
  76 `DBG_PHI` instructions are inserted after regalloc finishes. This requires some
  77 maintenance of which register a variable is located in during regalloc, but at
  78 single positions (block entry points) rather than ranges of instructions.
  79
  80 An example, before regalloc:
  81
  82 ```text
  83 bb.2:
  84   %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1
  85 ```
  86
  87 After:
  88
  89 ```text
  90 bb.2:
  91   DBG_PHI $rax, 1
  92 ```
  93
  94 # `LiveDebugValues`
  95
  96 After optimisations and code layout complete, information about variable
  97 values must be translated into variable locations, i.e. registers and stack
  98 slots. This is performed in the [`LiveDebugValues` pass][LiveDebugValues], where
  99 the debug instructions and machine code are separated out into two independent
 100 functions:
 101  * One that assigns values to variable names,
 102  * One that assigns values to machine registers and stack slots.
 103
 104 LLVM's existing SSA tools are used to place `PHI`s for each function, between
 105 variable values and the values contained in machine locations, with value
 106 propagation eliminating any unnecessary `PHI`s. The two can then be joined up
 107 to map variables to values, then values to locations, for each instruction in
 108 the function.
 109
 110 Key to this process is being able to identify the movement of values between
 111 registers and stack locations, so that the location of values can be preserved
 112 for the full time that they are resident in the machine.
 113
 114 # Required target support and transition guide
 115
 116 Instruction referencing will work on any target, but likely with poor coverage.
 117 Supporting instruction referencing well requires:
 118  * Target hooks to be implemented to allow `LiveDebugValues` to follow values
 119    through the machine,
 120  * Target-specific optimisations to be instrumented, to preserve instruction
 121    numbers.
 122
 123 ## Target hooks
 124
 125 `TargetInstrInfo::isCopyInstrImpl` must be implemented to recognise any
 126 instructions that are copy-like -- `LiveDebugValues` uses this to identify when
 127 values move between registers.
 128
 129 `TargetInstrInfo::isLoadFromStackSlotPostFE` and
 130 `TargetInstrInfo::isStoreToStackSlotPostFE` are needed to identify spill and
 131 restore instructions. Each should return the destination or source register
 132 respectively. `LiveDebugValues` will track the movement of a value from / to
 133 the stack slot. In addition, any instruction that writes to a stack spill
 134 should have a `MachineMemoryOperand` attached, so that `LiveDebugValues` can
 135 recognise that a slot has been clobbered.
 136
 137 ## Target-specific optimisation instrumentation
 138
 139 Optimisations come in two flavours: those that mutate a `MachineInstr` to make
 140 it do something different, and those that create a new instruction to replace
 141 the operation of the old.
 142
 143 The former _must_ be instrumented -- the relevant question is whether any
 144 register def in any operand will produce a different value, as a result of the
 145 mutation. If the answer is yes, then there is a risk that a `DBG_INSTR_REF`
 146 instruction referring to that operand will end up assigning the different
 147 value to a variable, presenting the debugging developer with an unexpected
 148 variable value. In such scenarios, call `MachineInstr::dropDebugNumber()` on the
 149 mutated instruction to erase its instruction number. Any `DBG_INSTR_REF`
 150 referring to it will produce an empty variable location instead, that appears
 151 as "optimised out" in the debugger.
 152
 153 For the latter flavour of optimisation, to increase coverage you should record
 154 an instruction number substitution: a mapping from the old instruction number /
 155 operand pair to new instruction number / operand pair. Consider if we replace
 156 a three-address add instruction with a two-address add:
 157
 158 ```text
 159 %2:gr32 = ADD32rr %0, %1, debug-instr-number 1
 160 ```
 161
 162 becomes
 163
 164 ```text
 165 %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2
 166 ```
 167
 168 With a substitution from "instruction number 1 operand 0" to "instruction number
 169 2 operand 0" recorded in the `MachineFunction`. In `LiveDebugValues`,
 170 `DBG_INSTR_REF`s will be mapped through the substitution table to find the most
 171 recent instruction number / operand number of the value it refers to.
 172
 173 Use `MachineFunction::substituteDebugValuesForInst` to automatically produce
 174 substitutions between an old and new instruction. It assumes that any operand
 175 that is a def in the old instruction is a def in the new instruction at the
 176 same operand position. This works most of the time, for example in the example
 177 above.
 178
 179 If operand numbers do not line up between the old and new instruction, use
 180 `MachineInstr::getDebugInstrNum` to acquire the instruction number for the new
 181 instruction, and `MachineFunction::makeDebugValueSubstitution` to record the
 182 mapping between register definitions in the old and new instructions. If some
 183 values computed by the old instruction are no longer computed by the new
 184 instruction, record no substitution -- `LiveDebugValues` will safely drop the
 185 now unavailable variable value.
 186
 187 Should your target clone instructions, much the same as the `TailDuplicator`
 188 optimisation pass, do not attempt to preserve the instruction numbers or
 189 record any substitutions. `MachineFunction::CloneMachineInstr` should drop the
 190 instruction number of any cloned instruction, to avoid duplicate numbers
 191 appearing to `LiveDebugValues`. Dealing with duplicated instructions is a
 192 natural extension to instruction referencing that's currently unimplemented.
 193
 194 [LiveDebugValues]: project:SourceLevelDebugging.rst#LiveDebugValues expansion of variable locations