1 # Debug info migration: From intrinsics to records
3 We're planning on removing debug info intrinsics from LLVM, as they're slow, unwieldy and can confuse optimisation passes if they're not expecting them. Instead of having a sequence of instructions that looks like this:
6 %add = add i32 %foo, %bar
7 call void @llvm.dbg.value(metadata %add, ...
8 %sub = sub i32 %add, %tosub
9 call void @llvm.dbg.value(metadata %sub, ...
10 call void @a_normal_function()
13 with `dbg.value` intrinsics representing debug info records, it would instead be printed as:
16 %add = add i32 %foo, %bar
18 %sub = sub i32 %add, %tosub
20 call void @a_normal_function()
23 The debug records are not instructions, do not appear in the instruction list, and won't appear in your optimisation passes unless you go digging for them deliberately.
25 # Great, what do I need to do!
27 Very little -- we've already instrumented all of LLVM to handle these new records ("`DbgRecords`") and behave identically to past LLVM behaviour. This is currently being turned on by default, so that `DbgRecords` will be used by default in memory, IR, and bitcode.
31 There are two significant changes to be aware of. Firstly, we're adding a single bit of debug relevant data to the `BasicBlock::iterator` class (it's so that we can determine whether ranges intend on including debug info at the beginning of a block or not). That means when writing passes that insert LLVM IR instructions, you need to identify positions with `BasicBlock::iterator` rather than just a bare `Instruction *`. Most of the time this means that after identifying where you intend on inserting something, you must also call `getIterator` on the instruction position -- however when inserting at the start of a block you _must_ use `getFirstInsertionPt`, `getFirstNonPHIIt` or `begin` and use that iterator to insert, rather than just fetching a pointer to the first instruction.
33 The second matter is that if you transfer sequences of instructions from one place to another manually, i.e. repeatedly using `moveBefore` where you might have used `splice`, then you should instead use the method `moveBeforePreserving`. `moveBeforePreserving` will transfer debug info records with the instruction they're attached to. This is something that happens automatically today -- if you use `moveBefore` on every element of an instruction sequence, then debug intrinsics will be moved in the normal course of your code, but we lose this behaviour with non-instruction debug info.
35 For a more in-depth overview of how to update existing code to support debug records, see [the guide below](#how-to-update-existing-code).
39 As we change from using debug intrinsics to debug records, any tools that depend on parsing IR produced by LLVM will need to handle the new format. For the most part, the difference between the printed form of a debug intrinsic call and a debug record is trivial:
41 1. An extra 2 spaces of indentation are added.
42 2. The text `(tail|notail|musttail)? call void @llvm.dbg.<type>` is replaced with `#dbg_<type>`.
43 3. The leading `metadata ` is removed from each argument to the intrinsic.
44 4. The DILocation changes from being an instruction attachment with the format `!dbg !<Num>`, to being an ordinary argument, i.e. `!<Num>`, that is passed as the final argument to the debug record.
46 Following these rules, we have this example of a debug intrinsic and the equivalent debug record:
50 call void @llvm.dbg.value(metadata i32 %add, metadata !10, metadata !DIExpression()), !dbg !20
52 #dbg_value(i32 %add, !10, !DIExpression(), !20)
57 Any tests downstream of the main LLVM repo that test the IR output of LLVM may break as a result of the change to using records. Updating an individual test to expect records instead of intrinsics should be trivial, given the update rules above. Updating many tests may be burdensome however; to update the lit tests in the main repository, the following steps were used:
59 1. Collect the list of failing lit tests into a single file, `failing-tests.txt`, separated by (and ending with) newlines.
60 2. Use the following line to split the failing tests into tests that use update_test_checks and tests that don't:
62 $ while IFS= read -r f; do grep -q "Assertions have been autogenerated by" "$f" && echo "$f" >> update-checks-tests.txt || echo "$f" >> manual-tests.txt; done < failing-tests.txt
64 3. For the tests that use update_test_checks, run the appropriate update_test_checks script - for the main LLVM repo, this was achieved with:
66 $ xargs ./llvm/utils/update_test_checks.py --opt-binary ./build/bin/opt < update-checks-tests.txt
67 $ xargs ./llvm/utils/update_cc_test_checks.py --llvm-bin ./build/bin/ < update-checks-tests.txt
69 4. The remaining tests can be manually updated, although if there is a large number of tests then the following scripts may be useful; firstly, a script used to extract the check-line prefixes from a file:
74 # Always add CHECK, since it's more effort than it's worth to filter files where
75 # every RUN line uses other check prefixes.
76 # Then detect every instance of "check-prefix(es)=..." and add the
77 # comma-separated arguments as extra checks.
80 echo "$filename,CHECK"
81 allchecks=$(grep -Eo 'check-prefix(es)?[ =][A-Z0-9_,-]+' $filename | sed -E 's/.+[= ]([A-Z0-9_,-]+).*/\1/g; s/,/\n/g')
82 for check in $allchecks; do
83 echo "$filename,$check"
87 Then a second script to perform the work of actually updating the check-lines in each of the failing tests, with a series of simple substitution patterns:
89 $ cat ./substitute-checks.sh
95 # Any test that explicitly tests debug intrinsic output is not suitable to
96 # update by this script.
97 if grep -q "write-experimental-debuginfo=false" "$file"; then
102 /(#|;|\/\/).*$check[A-Z0-9_\-]*:/!b
103 /DIGlobalVariableExpression/b
104 /!llvm.dbg./bpostcall
105 s/((((((no|must)?tail )?call.*)?void )?@)?llvm.)?dbg\.([a-z]+)/#dbg_\7/
110 s/DIExpression\(([^)]*)\)\)(,( !dbg)?)?/DIExpression(\1),/
112 s/((\))?(,) )?!dbg (![0-9]+)/\3\4\2/
113 s/((\))?(, ))?!dbg/\3/
116 Both of these scripts combined can be used on the list in `manual-tests.txt` as follows:
118 $ cat manual-tests.txt | xargs ./get-checks.sh | sort | uniq | awk -F ',' '{ system("./substitute-checks.sh " $1 " " $2) }'
120 These scripts dealt successfully with the vast majority of checks in `clang/test` and `llvm/test`.
121 5. Verify the resulting tests pass, and detect any failing tests:
123 $ xargs ./build/bin/llvm-lit -q < failing-tests.txt
126 LLVM :: DebugInfo/Generic/dbg-value-lower-linenos.ll
127 LLVM :: Transforms/HotColdSplit/transfer-debug-info.ll
128 LLVM :: Transforms/ObjCARC/basic.ll
129 LLVM :: Transforms/ObjCARC/ensure-that-exception-unwind-path-is-visited.ll
130 LLVM :: Transforms/SafeStack/X86/debug-loc2.ll
133 Total Discovered Tests: 295
136 6. Some tests may have failed - the update scripts are simplistic and preserve no context across lines, and so there are cases that they will not handle; the remaining cases must be manually updated (or handled by further scripts).
140 Some new functions that have been added are temporary and will be deprecated in the future. The intention is that they'll help downstream projects adapt during the transition period.
145 LLVMDIBuilderInsertDeclareBefore # Insert a debug record (new debug info format) instead of a debug intrinsic (old debug info format).
146 LLVMDIBuilderInsertDeclareAtEnd # Same as above.
147 LLVMDIBuilderInsertDbgValueBefore # Same as above.
148 LLVMDIBuilderInsertDbgValueAtEnd # Same as above.
150 New functions (to be deprecated)
151 --------------------------------
152 LLVMIsNewDbgInfoFormat # Returns true if the module is in the new non-instruction mode.
153 LLVMSetIsNewDbgInfoFormat # Convert to the requested debug info format.
155 New functions (no plans to deprecate)
156 -------------------------------------
157 LLVMGetFirstDbgRecord # Obtain the first debug record attached to an instruction.
158 LLVMGetLastDbgRecord # Obtain the last debug record attached to an instruction.
159 LLVMGetNextDbgRecord # Get next debug record or NULL.
160 LLVMGetPreviousDbgRecord # Get previous debug record or NULL.
161 LLVMDIBuilderInsertDeclareRecordBefore # Insert a debug record (new debug info format).
162 LLVMDIBuilderInsertDeclareRecordAtEnd # Same as above. See info below.
163 LLVMDIBuilderInsertDbgValueRecordBefore # Same as above. See info below.
164 LLVMDIBuilderInsertDbgValueRecordAtEnd # Same as above. See info below.
166 LLVMPositionBuilderBeforeDbgRecords # See info below.
167 LLVMPositionBuilderBeforeInstrAndDbgRecords # See info below.
170 `LLVMDIBuilderInsertDeclareRecordBefore`, `LLVMDIBuilderInsertDeclareRecordAtEnd`, `LLVMDIBuilderInsertDbgValueRecordBefore` and `LLVMDIBuilderInsertDbgValueRecordAtEnd` are replacing the deleted `LLVMDIBuilderInsertDeclareBefore-style` functions.
172 `LLVMPositionBuilderBeforeDbgRecords` and `LLVMPositionBuilderBeforeInstrAndDbgRecords` behave the same as `LLVMPositionBuilder` and `LLVMPositionBuilderBefore` except the insertion position is set before the debug records that precede the target instruction. Note that this doesn't mean that debug intrinsics before the chosen instruction are skipped, only debug records (which unlike debug records are not themselves instructions).
174 If you don't know which function to call then follow this rule:
175 If you are trying to insert at the start of a block, or purposfully skip debug intrinsics to determine the insertion point for any other reason, then call the new functions.
177 `LLVMPositionBuilder` and `LLVMPositionBuilderBefore` are unchanged. They insert before the indicated instruction but after any attached debug records.
179 `LLVMGetFirstDbgRecord`, `LLVMGetLastDbgRecord`, `LLVMGetNextDbgRecord` and `LLVMGetPreviousDbgRecord` can be used for iterating over debug records attached to instructions (provided as `LLVMValueRef`).
182 LLVMDbgRecordRef DbgRec;
183 for (DbgRec = LLVMGetFirstDbgRecord(Inst); DbgRec;
184 DbgRec = LLVMGetNextDbgRecord(DbgRec)) {
185 // do something with DbgRec
190 LLVMDbgRecordRef DbgRec;
191 for (DbgRec = LLVMGetLastDbgRecord(Inst); DbgRec;
192 DbgRec = LLVMGetPreviousDbgRecord(DbgRec)) {
193 // do something with DbgRec
197 # The new "Debug Record" model
199 Below is a brief overview of the new representation that replaces debug intrinsics; for an instructive guide on updating old code, see [here](#how-to-update-existing-code).
201 ## What exactly have you replaced debug intrinsics with?
203 We're using a dedicated C++ class called `DbgRecord` to store debug info, with a one-to-one relationship between each instance of a debug intrinsic and each `DbgRecord` object in any LLVM IR program; these `DbgRecord`s are represented in the IR as non-instruction debug records, as described in the [Source Level Debugging](project:SourceLevelDebugging.rst#Debug Records) document. This class has a set of subclasses that store exactly the same information as is stored in debugging intrinsics. Each one also has almost entirely the same set of methods, that behave in the same way:
205 https://llvm.org/docs/doxygen/classllvm_1_1DbgRecord.html
206 https://llvm.org/docs/doxygen/classllvm_1_1DbgVariableRecord.html
207 https://llvm.org/docs/doxygen/classllvm_1_1DbgLabelRecord.html
209 This allows you to treat a `DbgVariableRecord` as if it's a `dbg.value`/`dbg.declare`/`dbg.assign` intrinsic most of the time, for example in generic (auto-param) lambdas, and the same for `DbgLabelRecord` and `dbg.label`s.
211 ## How do these `DbgRecords` fit into the instruction stream?
216 +---------------+ +---------------+
217 ---------------->| Instruction +--------->| Instruction |
218 +-------+-------+ +---------------+
225 <-------+ DbgMarker |<-------
230 +-------------+ +-------------+ +-------------+
231 | DbgRecord +--->| DbgRecord +-->| DbgRecord |
232 +-------------+ +-------------+ +-------------+
235 Each instruction has a pointer to a `DbgMarker` (which will become optional), that contains a list of `DbgRecord` objects. No debugging records appear in the instruction list at all. `DbgRecord`s have a parent pointer to their owning `DbgMarker`, and each `DbgMarker` has a pointer back to it's owning instruction.
237 Not shown are the links from DbgRecord to other parts of the `Value`/`Metadata` hierachy: `DbgRecord` subclasses have tracking pointers to the DIMetadata that they use, and `DbgVariableRecord` has references to `Value`s that are stored in a `DebugValueUser` base class. This refers to a `ValueAsMetadata` object referring to `Value`s, via the `TrackingMetadata` facility.
239 The various kinds of debug intrinsic (value, declare, assign, label) are all stored in `DbgRecord` subclasses, with a "RecordKind" field distinguishing `DbgLabelRecord`s from `DbgVariableRecord`s, and a `LocationType` field in the `DbgVariableRecord` class further disambiguating the various debug variable intrinsics it can represent.
241 # How to update existing code
243 Any existing code that interacts with debug intrinsics in some way will need to be updated to interact with debug records in the same way. A few quick rules to keep in mind when updating code:
245 - Debug records will not be seen when iterating over instructions; to find the debug records that appear immediately before an instruction, you'll need to iterate over `Instruction::getDbgRecordRange()`.
246 - Debug records have interfaces that are identical to those of debug intrinsics, meaning that any code that operates on debug intrinsics can be trivially applied to debug records as well. The exceptions for this are `Instruction` or `CallInst` methods that don't logically apply to debug records, and `isa`/`cast`/`dyn_cast` methods, are replaced by methods on the `DbgRecord` class itself.
247 - Debug records cannot appear in a module that also contains debug intrinsics; the two are mutually exclusive. As debug records are the future format, handling records correctly should be prioritized in new code.
248 - Until support for intrinsics is no longer present, a valid hotfix for code that only handles debug intrinsics and is non-trivial to update is to convert the module to the intrinsic format using `Module::setIsNewDbgInfoFormat`, and convert it back afterwards.
249 - This can also be performed within a lexical scope for a module or an individual function using the class `ScopedDbgInfoFormatSetter`:
251 void handleModule(Module &M) {
253 ScopedDbgInfoFormatSetter FormatSetter(M, false);
254 handleModuleWithDebugIntrinsics(M);
256 // Module returns to previous debug info format after exiting the above block.
260 Below is a rough guide on how existing code that currently supports debug intrinsics can be updated to support debug records.
262 ## Creating debug records
264 Debug records will automatically be created by the `DIBuilder` class when the new format is enabled. As with instructions, it is also possible to call `DbgRecord::clone` to create an unattached copy of an existing record.
266 ## Skipping debug records, ignoring debug-uses of `Values`, stably counting instructions, etc.
268 This will all happen transparently without needing to think about it!
271 for (Instruction &I : BB) {
272 // Old: Skips debug intrinsics
273 if (isa<DbgInfoIntrinsic>(&I))
275 // New: No extra code needed, debug records are skipped by default.
280 ## Finding debug records
282 Utilities such as `findDbgUsers` and the like now have an optional argument that will return the set of `DbgVariableRecord` records that refer to a `Value`. You should be able to treat them the same as intrinsics.
286 SmallVector<DbgVariableIntrinsic *> DbgUsers;
287 findDbgUsers(DbgUsers, V);
288 for (auto *DVI : DbgUsers) {
289 if (DVI->getParent() != BB)
290 DVI->replaceVariableLocationOp(V, New);
293 SmallVector<DbgVariableIntrinsic *> DbgUsers;
294 SmallVector<DbgVariableRecord *> DVRUsers;
295 findDbgUsers(DbgUsers, V, &DVRUsers);
296 for (auto *DVI : DbgUsers)
297 if (DVI->getParent() != BB)
298 DVI->replaceVariableLocationOp(V, New);
299 for (auto *DVR : DVRUsers)
300 if (DVR->getParent() != BB)
301 DVR->replaceVariableLocationOp(V, New);
304 ## Examining debug records at positions
306 Call `Instruction::getDbgRecordRange()` to get the range of `DbgRecord` objects that are attached to an instruction.
309 for (Instruction &I : BB) {
310 // Old: Uses a data member of a debug intrinsic, and then skips to the next
312 if (DbgInfoIntrinsic *DII = dyn_cast<DbgInfoIntrinsic>(&I)) {
313 recordDebugLocation(DII->getDebugLoc());
316 // New: Iterates over the debug records that appear before `I`, and treats
317 // them identically to the intrinsic block above.
318 // NB: This should always appear at the top of the for-loop, so that we
319 // process the debug records preceding `I` before `I` itself.
320 for (DbgRecord &DR = I.getDbgRecordRange()) {
321 recordDebugLocation(DR.getDebugLoc());
323 processInstruction(I);
327 This can also be passed through the function `filterDbgVars` to specifically
328 iterate over DbgVariableRecords, which are more commonly used.
331 for (Instruction &I : BB) {
332 // Old: If `I` is a DbgVariableIntrinsic we record the variable, and apply
333 // extra logic if it is an `llvm.dbg.declare`.
334 if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I)) {
335 recordVariable(DVI->getVariable());
336 if (DbgDeclareInst *DDI = dyn_cast<DbgDeclareInst>(DVI))
337 recordDeclareAddress(DDI->getAddress());
340 // New: `filterDbgVars` is used to iterate over only DbgVariableRecords.
341 for (DbgVariableRecord &DVR = filterDbgVars(I.getDbgRecordRange())) {
342 recordVariable(DVR.getVariable());
343 // Debug variable records are not cast to subclasses; simply call the
344 // appropriate `isDbgX()` check, and use the methods as normal.
345 if (DVR.isDbgDeclare())
346 recordDeclareAddress(DVR.getAddress());
352 ## Processing individual debug records
354 In most cases, any code that operates on debug intrinsics can be extracted to a template function or auto lambda (if it is not already in one) that can be applied to both debug intrinsics and debug records - though keep in mind the main exception that `isa`/`cast`/`dyn_cast` do not apply to `DbgVariableRecord` types.
357 // Old: Function that operates on debug variable intrinsics in a BasicBlock, and
358 // collects llvm.dbg.declares.
359 void processDbgInfoInBlock(BasicBlock &BB,
360 SmallVectorImpl<DbgDeclareInst*> &DeclareIntrinsics) {
361 for (Instruction &I : BB) {
362 if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I)) {
363 processVariableValue(DebugVariable(DVI), DVI->getValue());
364 if (DbgDeclareInst *DDI = dyn_cast<DbgDeclareInst>(DVI))
365 Declares.push_back(DDI);
366 else if (!isa<Constant>(DVI->getValue()))
367 DVI->setKillLocation();
372 // New: Template function is used to deduplicate handling of intrinsics and
374 // An overloaded function is also used to handle isa/cast/dyn_cast operations
375 // for intrinsics and records, since those functions cannot be directly applied
377 DbgDeclareInst *DynCastToDeclare(DbgVariableIntrinsic *DVI) {
378 return dyn_cast<DbgDeclareInst>(DVI);
380 DbgVariableRecord *DynCastToDeclare(DbgVariableRecord *DVR) {
381 return DVR->isDbgDeclare() ? DVR : nullptr;
384 template<typename DbgVarTy, DbgDeclTy>
385 void processDbgVariable(DbgVarTy *DbgVar,
386 SmallVectorImpl<DbgDeclTy*> &Declares) {
387 processVariableValue(DebugVariable(DbgVar), DbgVar->getValue());
388 if (DbgDeclTy *DbgDeclare = DynCastToDeclare(DbgVar))
389 Declares.push_back(DbgDeclare);
390 else if (!isa<Constant>(DbgVar->getValue()))
391 DbgVar->setKillLocation();
394 void processDbgInfoInBlock(BasicBlock &BB,
395 SmallVectorImpl<DbgDeclareInst*> &DeclareIntrinsics,
396 SmallVectorImpl<DbgVariableRecord*> &DeclareRecords) {
397 for (Instruction &I : BB) {
398 if (DbgVariableIntrinsic *DVI = dyn_cast<DbgVariableIntrinsic>(&I))
399 processDbgVariable(DVI, DeclareIntrinsics);
400 for (DbgVariableRecord *DVR : filterDbgVars(I.getDbgRecordRange()))
401 processDbgVariable(DVR, DeclareRecords);
406 ## Moving and deleting debug records
408 You can use `DbgRecord::removeFromParent` to unlink a `DbgRecord` from it's marker, and then `BasicBlock::insertDbgRecordBefore` or `BasicBlock::insertDbgRecordAfter` to re-insert the `DbgRecord` somewhere else. You cannot insert a `DbgRecord` at an arbitary point in a list of `DbgRecord`s (if you're doing this with `llvm.dbg.value`s then it's unlikely to be correct).
410 Erase `DbgRecord`s by calling `eraseFromParent`.
413 // Old: Move a debug intrinsic to the start of the block, and delete all other intrinsics for the same variable in the block.
414 void moveDbgIntrinsicToStart(DbgVariableIntrinsic *DVI) {
415 BasicBlock *ParentBB = DVI->getParent();
416 DVI->removeFromParent();
417 for (Instruction &I : ParentBB) {
418 if (auto *BlockDVI = dyn_cast<DbgVariableIntrinsic>(&I))
419 if (BlockDVI->getVariable() == DVI->getVariable())
420 BlockDVI->eraseFromParent();
422 DVI->insertBefore(ParentBB->getFirstInsertionPt());
425 // New: Perform the same operation, but for a debug record.
426 void moveDbgRecordToStart(DbgVariableRecord *DVR) {
427 BasicBlock *ParentBB = DVR->getParent();
428 DVR->removeFromParent();
429 for (Instruction &I : ParentBB) {
430 for (auto &BlockDVR : filterDbgVars(I.getDbgRecordRange()))
431 if (BlockDVR->getVariable() == DVR->getVariable())
432 BlockDVR->eraseFromParent();
434 DVR->insertBefore(ParentBB->getFirstInsertionPt());
438 ## What about dangling debug records?
440 If you have a block like so:
444 %bar = add i32 %baz...
445 dbg.value(metadata i32 %bar,...
449 your optimisation pass may wish to erase the terminator and then do something to the block. This is easy to do when debug info is kept in instructions, but with `DbgRecord`s there is no trailing instruction to attach the variable information to in the block above, once the terminator is erased. For such degenerate blocks, `DbgRecord`s are stored temporarily in a map in `LLVMContext`, and are re-inserted when a terminator is reinserted to the block or other instruction inserted at `end()`.
451 This can technically lead to trouble in the vanishingly rare scenario where an optimisation pass erases a terminator and then decides to erase the whole block. (We recommend not doing that).
455 The above guide does not comprehensively cover every pattern that could apply to debug intrinsics; as mentioned at the [start of the guide](#how-to-update-existing-code), you can temporarily convert the target module from debug records to intrinsics as a stopgap measure. Most operations that can be performed on debug intrinsics have exact equivalents for debug records, but if you encounter any exceptions, reading the class docs (linked [here](#what-exactly-have-you-replaced-debug-intrinsics-with)) may give some insight, there may be examples in the existing codebase, and you can always ask for help on the [forums](https://discourse.llvm.org/tag/debuginfo).