lldb/docs/resources/formatterbytecode.rst

   1 Formatter Bytecode
   2 ==================
   3
   4 Background
   5 ----------
   6
   7 LLDB provides rich customization options to display data types (see :doc:`/use/variable/`). To use custom data formatters, developers need to edit the global `~/.lldbinit` file to make sure they are found and loaded. In addition to this rather manual workflow, developers or library authors can ship ship data formatters with their code in a format that allows LLDB automatically find them and run them securely.
   8
   9 An end-to-end example of such a workflow is the Swift `DebugDescription` macro (see https://www.swift.org/blog/announcing-swift-6/#debugging ) that translates Swift string interpolation into LLDB summary strings, and puts them into a `.lldbsummaries` section, where LLDB can find them.
  10
  11 This document describes a minimal bytecode tailored to running LLDB formatters. It defines a human-readable assembler representation for the language, an efficient binary encoding, a virtual machine for evaluating it, and format for embedding formatters into binary containers.
  12
  13 Goals
  14 ~~~~~
  15
  16 Provide an efficient and secure encoding for data formatters that can be used as a compilation target from user-friendly representations (such as DIL, Swift DebugDescription, or NatVis).
  17
  18 Non-goals
  19 ~~~~~~~~~
  20
  21 While humans could write the assembler syntax, making it user-friendly is not a goal. It is meant to be used as a compilation target for higher-level, language-specific affordances.
  22
  23 Design of the virtual machine
  24 -----------------------------
  25
  26 The LLDB formatter virtual machine uses a stack-based bytecode, comparable with DWARF expressions, but with higher-level data types and functions.
  27
  28 The virtual machine has two stacks, a data and a control stack. The control stack is kept separate to make it easier to reason about the security aspects of the virtual machine.
  29
  30 Data types
  31 ~~~~~~~~~~
  32
  33 All objects on the data stack must have one of the following data types. These data types are "host" data types, in LLDB parlance.
  34
  35 * *String* (UTF-8)
  36 * *Int* (64 bit)
  37 * *UInt* (64 bit)
  38 * *Object* (Basically an `SBValue`)
  39 * *Type* (Basically an `SBType`)
  40 * *Selector* (One of the predefine functions)
  41
  42 *Object* and *Type* are opaque, they can only be used as a parameters of `call`.
  43
  44 Instruction set
  45 ---------------
  46
  47 Stack operations
  48 ~~~~~~~~~~~~~~~~
  49
  50 These instructions manipulate the data stack directly.
  51
  52 ========  ==========  ===========================
  53  Opcode    Mnemonic    Stack effect
  54 --------  ----------  ---------------------------
  55  0x00      `dup`       `(x -> x x)`
  56  0x01      `drop`      `(x y -> x)`
  57  0x02      `pick`      `(x ... UInt -> x ... x)`
  58  0x03      `over`      `(x y -> x y x)`
  59  0x04      `swap`      `(x y -> y x)`
  60  0x05      `rot`       `(x y z -> z x y)`
  61 ========  ==========  ===========================
  62
  63 Control flow
  64 ~~~~~~~~~~~~
  65
  66 These manipulate the control stack and program counter. Both `if` and `ifelse` expect a `UInt` at the top of the data stack to represent the condition.
  67
  68 ========  ==========  ============================================================
  69  Opcode    Mnemonic    Description
  70 --------  ----------  ------------------------------------------------------------
  71  0x10       `{`        push a code block address onto the control stack
  72   --        `}`        (technically not an opcode) syntax for end of code block
  73  0x11      `if`        `(UInt -> )` pop a block from the control stack,
  74                        if the top of the data stack is nonzero, execute it
  75  0x12      `ifelse`    `(UInt -> )` pop two blocks from the control stack, if
  76                        the top of the data stack is nonzero, execute the first,
  77                        otherwise the second.
  78 ========  ==========  ============================================================
  79
  80 Literals for basic types
  81 ~~~~~~~~~~~~~~~~~~~~~~~~
  82
  83 ========  ===========  ============================================================
  84  Opcode    Mnemonic    Description
  85 --------  -----------  ------------------------------------------------------------
  86  0x20      `123u`      `( -> UInt)` push an unsigned 64-bit host integer
  87  0x21      `123`       `( -> Int)` push a signed 64-bit host integer
  88  0x22      `"abc"`     `( -> String)` push a UTF-8 host string
  89  0x23      `@strlen`   `( -> Selector)` push one of the predefined function
  90                        selectors. See `call`.
  91 ========  ===========  ============================================================
  92
  93 Conversion operations
  94 ~~~~~~~~~~~~~~~~~~~~~
  95
  96 ========  ===========  ================================================================
  97  Opcode    Mnemonic    Description
  98 --------  -----------  ----------------------------------------------------------------
  99  0x2a      `as_int`   `( UInt -> Int)` reinterpret a UInt as an Int
 100  0x2b      `as_uint`  `( Int -> UInt)` reinterpret an Int as a UInt
 101  0x2c      `is_null`  `( Object -> UInt )` check an object for null `(object ? 0 : 1)`
 102 ========  ===========  ================================================================
 103
 104
 105 Arithmetic, logic, and comparison operations
 106 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 107
 108 All of these operations are only defined for `Int` and `UInt` and both operands need to be of the same type. The `>>` operator is an arithmetic shift if the parameters are of type `Int`, otherwise it's a logical shift to the right.
 109
 110 ========  ==========  ===========================
 111  Opcode    Mnemonic    Stack effect
 112 --------  ----------  ---------------------------
 113  0x30      `+`         `(x y -> [x+y])`
 114  0x31      `-`          etc ...
 115  0x32      `*`
 116  0x33      `/`
 117  0x34      `%`
 118  0x35      `<<`
 119  0x36      `>>`
 120  0x40      `~`
 121  0x41      `|`
 122  0x42      `^`
 123  0x50      `=`
 124  0x51      `!=`
 125  0x52      `<`
 126  0x53      `>`
 127  0x54      `=<`
 128  0x55      `>=`
 129 ========  ==========  ===========================
 130
 131 Function calls
 132 ~~~~~~~~~~~~~~
 133
 134 For security reasons the list of functions callable with `call` is predefined. The supported functions are either existing methods on `SBValue`, or string formatting operations.
 135
 136 ========  ==========  ============================================
 137  Opcode    Mnemonic    Stack effect
 138 --------  ----------  --------------------------------------------
 139  0x60      `call`      `(Object argN ... arg0 Selector -> retval)`
 140 ========  ==========  ============================================
 141
 142 Method is one of a predefined set of *Selectors*.
 143
 144 ====  ============================  ===================================================  ==================================
 145 Sel.  Mnemonic                      Stack Effect                                         Description
 146 ----  ----------------------------  ---------------------------------------------------  ----------------------------------
 147 0x00  `summary`                     `(Object @summary -> String)`                        `SBValue::GetSummary`
 148 0x01  `type_summary`                `(Object @type_summary -> String)`                   `SBValue::GetTypeSummary`
 149 0x10  `get_num_children`            `(Object @get_num_children -> UInt)`                 `SBValue::GetNumChildren`
 150 0x11  `get_child_at_index`          `(Object UInt @get_child_at_index -> Object)`        `SBValue::GetChildAtIndex`
 151 0x12  `get_child_with_name`         `(Object String @get_child_with_name -> Object)`     `SBValue::GetChildAtIndex`
 152 0x13  `get_child_index`             `(Object String @get_child_index -> UInt)`           `SBValue::GetChildIndex`
 153 0x15  `get_type`                    `(Object @get_type -> Type)`                         `SBValue::GetType`
 154 0x16  `get_template_argument_type`  `(Object UInt @get_template_argument_type -> Type)`  `SBValue::GetTemplateArgumentType`
 155 0x17  `cast`                        `(Object Type @cast -> Object)`                      `SBValue::Cast`
 156 0x20  `get_value`                   `(Object @get_value -> Object)`                      `SBValue::GetValue`
 157 0x21  `get_value_as_unsigned`       `(Object @get_value_as_unsigned -> UInt)`            `SBValue::GetValueAsUnsigned`
 158 0x22  `get_value_as_signed`         `(Object @get_value_as_signed -> Int)`               `SBValue::GetValueAsSigned`
 159 0x23  `get_value_as_address`        `(Object @get_value_as_address -> UInt)`             `SBValue::GetValueAsAddress`
 160 0x40  `read_memory_byte`            `(UInt @read_memory_byte -> UInt)`                   `Target::ReadMemory`
 161 0x41  `read_memory_uint32`          `(UInt @read_memory_uint32 -> UInt)`                 `Target::ReadMemory`
 162 0x42  `read_memory_int32`           `(UInt @read_memory_int32 -> Int)`                   `Target::ReadMemory`
 163 0x43  `read_memory_uint64`          `(UInt @read_memory_uint64 -> UInt)`                 `Target::ReadMemory`
 164 0x44  `read_memory_int64`           `(UInt @read_memory_int64 -> Int)`                   `Target::ReadMemory`
 165 0x45  `read_memory_address`         `(UInt @read_memory_uint64 -> UInt)`                 `Target::ReadMemory`
 166 0x46  `read_memory`                 `(UInt Type @read_memory -> Object)`                 `Target::ReadMemory`
 167 0x50  `fmt`                         `(String arg0 ... @fmt -> String)`                   `llvm::format`
 168 0x51  `sprintf`                     `(String arg0 ... sprintf -> String)`                `sprintf`
 169 0x52  `strlen`                      `(String strlen -> String)`                          `strlen in bytes`
 170 ====  ============================  ===================================================  ==================================
 171
 172 Byte Code
 173 ~~~~~~~~~
 174
 175 Most instructions are just a single byte opcode. The only exceptions are the literals:
 176
 177 * *String*: Length in bytes encoded as ULEB128, followed length bytes
 178 * *Int*: LEB128
 179 * *UInt*: ULEB128
 180 * *Selector*: ULEB128
 181
 182 Embedding
 183 ~~~~~~~~~
 184
 185 Expression programs are embedded into an `.lldbformatters` section (an evolution of the Swift `.lldbsummaries` section) that is a dictionary of type names/regexes and descriptions. It consists of a list of records. Each record starts with the following header:
 186
 187 * Version number (ULEB128)
 188 * Remaining size of the record (minus the header) (ULEB128)
 189
 190 The version number is increased whenever an incompatible change is made. Adding new opcodes or selectors is not an incompatible change since consumers can unambiguously detect this and report an error.
 191
 192 Space between two records may be padded with NULL bytes.
 193
 194 In version 1, a record consists of a dictionary key, which is a type name or regex.
 195
 196 * Length of the key in bytes (ULEB128)
 197 * The key (UTF-8)
 198
 199 A regex has to start with `^`, which is part of the regular expression.
 200
 201 After this comes a flag bitfield, which is a ULEB-encoded `lldb::TypeOptions` bitfield.
 202
 203 * Flags (ULEB128)
 204
 205
 206 This is followed by one or more dictionary values that immediately follow each other and entirely fill out the record size from the header. Each expression program has the following layout:
 207
 208 * Function signature (1 byte)
 209 * Length of the program (ULEB128)
 210 * The program bytecode
 211
 212 The possible function signatures are:
 213
 214 =========  ====================== ==========================
 215 Signature    Mnemonic             Stack Effect
 216 ---------  ---------------------- --------------------------
 217   0x00     `@summary`             `(Object -> String)`
 218   0x01     `@init`                `(Object -> Object+)`
 219   0x02     `@get_num_children`    `(Object+ -> UInt)`
 220   0x03     `@get_child_index`     `(Object+ String -> UInt)`
 221   0x04     `@get_child_at_index`  `(Object+ UInt -> Object)`
 222   0x05     `@get_value`           `(Object+ -> String)`
 223 =========  ====================== ==========================
 224
 225 If not specified, the init function defaults to an empty function that just passes the Object along. Its results may be cached and allow common prep work to be done for an Object that can be reused by subsequent calls to the other methods. This way subsequent calls to `@get_child_at_index` can avoid recomputing shared information, for example.
 226
 227 While it is more efficient to store multiple programs per type key, this is not a requirement. LLDB will merge all entries. If there are conflicts the result is undefined.
 228
 229 Execution model
 230 ~~~~~~~~~~~~~~~
 231
 232 Execution begins at the first byte in the program. The program counter of the virtual machine starts at offset 0 of the bytecode and may never move outside the range of the program as defined in the header. The data stack starts with one Object or the result of the `@init` function (`Object+` in the table above).
 233
 234 Error handling
 235 ~~~~~~~~~~~~~~
 236
 237 In version 1 errors are unrecoverable, the entire expression will fail if any kind of error is encountered.
 238