docs/StackMaps.rst

   1 ===================================
   2 Stack maps and patch points in LLVM
   3 ===================================
   4
   5 .. contents::
   6    :local:
   7    :depth: 2
   8
   9 Definitions
  10 ===========
  11
  12 In this document we refer to the "runtime" collectively as all
  13 components that serve as the LLVM client, including the LLVM IR
  14 generator, object code consumer, and code patcher.
  15
  16 A stack map records the location of ``live values`` at a particular
  17 instruction address. These ``live values`` do not refer to all the
  18 LLVM values live across the stack map. Instead, they are only the
  19 values that the runtime requires to be live at this point. For
  20 example, they may be the values the runtime will need to resume
  21 program execution at that point independent of the compiled function
  22 containing the stack map.
  23
  24 LLVM emits stack map data into the object code within a designated
  25 :ref:`stackmap-section`. This stack map data contains a record for
  26 each stack map. The record stores the stack map's instruction address
  27 and contains a entry for each mapped value. Each entry encodes a
  28 value's location as a register, stack offset, or constant.
  29
  30 A patch point is an instruction address at which space is reserved for
  31 patching a new instruction sequence at run time. Patch points look
  32 much like calls to LLVM. They take arguments that follow a calling
  33 convention and may return a value. They also imply stack map
  34 generation, which allows the runtime to locate the patchpoint and
  35 find the location of ``live values`` at that point.
  36
  37 Motivation
  38 ==========
  39
  40 This functionality is currently experimental but is potentially useful
  41 in a variety of settings, the most obvious being a runtime (JIT)
  42 compiler. Example applications of the patchpoint intrinsics are
  43 implementing an inline call cache for polymorphic method dispatch or
  44 optimizing the retrieval of properties in dynamically typed languages
  45 such as JavaScript.
  46
  47 The intrinsics documented here are currently used by the JavaScript
  48 compiler within the open source WebKit project, see the `FTL JIT
  49 <https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
  50 used whenever stack maps or code patching are needed. Because the
  51 intrinsics have experimental status, compatibility across LLVM
  52 releases is not guaranteed.
  53
  54 The stack map functionality described in this document is separate
  55 from the functionality described in
  56 :ref:`stack-map`. `GCFunctionMetadata` provides the location of
  57 pointers into a collected heap captured by the `GCRoot` intrinsic,
  58 which can also be considered a "stack map". Unlike the stack maps
  59 defined above, the `GCFunctionMetadata` stack map interface does not
  60 provide a way to associate live register values of arbitrary type with
  61 an instruction address, nor does it specify a format for the resulting
  62 stack map. The stack maps described here could potentially provide
  63 richer information to a garbage collecting runtime, but that usage
  64 will not be discussed in this document.
  65
  66 Intrinsics
  67 ==========
  68
  69 The following two kinds of intrinsics can be used to implement stack
  70 maps and patch points: ``llvm.experimental.stackmap`` and
  71 ``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
  72 stack map record, and they both allow some form of code patching. They
  73 can be used independently (i.e. ``llvm.experimental.patchpoint``
  74 implicitly generates a stack map without the need for an additional
  75 call to ``llvm.experimental.stackmap``). The choice of which to use
  76 depends on whether it is necessary to reserve space for code patching
  77 and whether any of the intrinsic arguments should be lowered according
  78 to calling conventions. ``llvm.experimental.stackmap`` does not
  79 reserve any space, nor does it expect any call arguments. If the
  80 runtime patches code at the stack map's address, it will destructively
  81 overwrite the program text. This is unlike
  82 ``llvm.experimental.patchpoint``, which reserves space for in-place
  83 patching without overwriting surrounding code. The
  84 ``llvm.experimental.patchpoint`` intrinsic also lowers a specified
  85 number of arguments according to its calling convention. This allows
  86 patched code to make in-place function calls without marshaling.
  87
  88 Each instance of one of these intrinsics generates a stack map record
  89 in the :ref:`stackmap-section`. The record includes an ID, allowing
  90 the runtime to uniquely identify the stack map, and the offset within
  91 the code from the beginning of the enclosing function.
  92
  93 '``llvm.experimental.stackmap``' Intrinsic
  94 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  95
  96 Syntax:
  97 """""""
  98
  99 ::
 100
 101       declare void
 102         @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
 103
 104 Overview:
 105 """""""""
 106
 107 The '``llvm.experimental.stackmap``' intrinsic records the location of
 108 specified values in the stack map without generating any code.
 109
 110 Operands:
 111 """""""""
 112
 113 The first operand is an ID to be encoded within the stack map. The
 114 second operand is the number of shadow bytes following the
 115 intrinsic. The variable number of operands that follow are the ``live
 116 values`` for which locations will be recorded in the stack map.
 117
 118 To use this intrinsic as a bare-bones stack map, with no code patching
 119 support, the number of shadow bytes can be set to zero.
 120
 121 Semantics:
 122 """"""""""
 123
 124 The stack map intrinsic generates no code in place, unless nops are
 125 needed to cover its shadow (see below). However, its offset from
 126 function entry is stored in the stack map. This is the relative
 127 instruction address immediately following the instructions that
 128 precede the stack map.
 129
 130 The stack map ID allows a runtime to locate the desired stack map
 131 record. LLVM passes this ID through directly to the stack map
 132 record without checking uniqueness.
 133
 134 LLVM guarantees a shadow of instructions following the stack map's
 135 instruction offset during which neither the end of the basic block nor
 136 another call to ``llvm.experimental.stackmap`` or
 137 ``llvm.experimental.patchpoint`` may occur. This allows the runtime to
 138 patch the code at this point in response to an event triggered from
 139 outside the code. The code for instructions following the stack map
 140 may be emitted in the stack map's shadow, and these instructions may
 141 be overwritten by destructive patching. Without shadow bytes, this
 142 destructive patching could overwrite program text or data outside the
 143 current function. We disallow overlapping stack map shadows so that
 144 the runtime does not need to consider this corner case.
 145
 146 For example, a stack map with 8 byte shadow:
 147
 148 .. code-block:: llvm
 149
 150   call void @runtime()
 151   call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
 152                                                          i64* %ptr)
 153   %val = load i64* %ptr
 154   %add = add i64 %val, 3
 155   ret i64 %add
 156
 157 May require one byte of nop-padding:
 158
 159 .. code-block:: none
 160
 161   0x00 callq _runtime
 162   0x05 nop                <--- stack map address
 163   0x06 movq (%rdi), %rax
 164   0x07 addq $3, %rax
 165   0x0a popq %rdx
 166   0x0b ret                <---- end of 8-byte shadow
 167
 168 Now, if the runtime needs to invalidate the compiled code, it may
 169 patch 8 bytes of code at the stack map's address at follows:
 170
 171 .. code-block:: none
 172
 173   0x00 callq _runtime
 174   0x05 movl  $0xffff, %rax <--- patched code at stack map address
 175   0x0a callq *%rax         <---- end of 8-byte shadow
 176
 177 This way, after the normal call to the runtime returns, the code will
 178 execute a patched call to a special entry point that can rebuild a
 179 stack frame from the values located by the stack map.
 180
 181 '``llvm.experimental.patchpoint.*``' Intrinsic
 182 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 183
 184 Syntax:
 185 """""""
 186
 187 ::
 188
 189       declare void
 190         @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
 191                                            i8* <target>, i32 <numArgs>, ...)
 192       declare i64
 193         @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
 194                                           i8* <target>, i32 <numArgs>, ...)
 195
 196 Overview:
 197 """""""""
 198
 199 The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
 200 call to the specified ``<target>`` and records the location of specified
 201 values in the stack map.
 202
 203 Operands:
 204 """""""""
 205
 206 The first operand is an ID, the second operand is the number of bytes
 207 reserved for the patchable region, the third operand is the target
 208 address of a function (optionally null), and the fourth operand
 209 specifies how many of the following variable operands are considered
 210 function call arguments. The remaining variable number of operands are
 211 the ``live values`` for which locations will be recorded in the stack
 212 map.
 213
 214 Semantics:
 215 """"""""""
 216
 217 The patch point intrinsic generates a stack map. It also emits a
 218 function call to the address specified by ``<target>`` if the address
 219 is not a constant null. The function call and its arguments are
 220 lowered according to the calling convention specified at the
 221 intrinsic's callsite. Variants of the intrinsic with non-void return
 222 type also return a value according to calling convention.
 223
 224 On PowerPC, note that ``<target>`` must be the ABI function pointer for the
 225 intended target of the indirect call. Specifically, when compiling for the
 226 ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
 227 the C/C++ function-pointer representation.
 228
 229 Requesting zero patch point arguments is valid. In this case, all
 230 variable operands are handled just like
 231 ``llvm.experimental.stackmap.*``. The difference is that space will
 232 still be reserved for patching, a call will be emitted, and a return
 233 value is allowed.
 234
 235 The location of the arguments are not normally recorded in the stack
 236 map because they are already fixed by the calling convention. The
 237 remaining ``live values`` will have their location recorded, which
 238 could be a register, stack location, or constant. A special calling
 239 convention has been introduced for use with stack maps, anyregcc,
 240 which forces the arguments to be loaded into registers but allows
 241 those register to be dynamically allocated. These argument registers
 242 will have their register locations recorded in the stack map in
 243 addition to the remaining ``live values``.
 244
 245 The patch point also emits nops to cover at least ``<numBytes>`` of
 246 instruction encoding space. Hence, the client must ensure that
 247 ``<numBytes>`` is enough to encode a call to the target address on the
 248 supported targets. If the call target is constant null, then there is
 249 no minimum requirement. A zero-byte null target patchpoint is
 250 valid.
 251
 252 The runtime may patch the code emitted for the patch point, including
 253 the call sequence and nops. However, the runtime may not assume
 254 anything about the code LLVM emits within the reserved space. Partial
 255 patching is not allowed. The runtime must patch all reserved bytes,
 256 padding with nops if necessary.
 257
 258 This example shows a patch point reserving 15 bytes, with one argument
 259 in $rdi, and a return value in $rax per native calling convention:
 260
 261 .. code-block:: llvm
 262
 263   %target = inttoptr i64 -281474976710654 to i8*
 264   %val = call i64 (i64, i32, ...)*
 265            @llvm.experimental.patchpoint.i64(i64 78, i32 15,
 266                                              i8* %target, i32 1, i64* %ptr)
 267   %add = add i64 %val, 3
 268   ret i64 %add
 269
 270 May generate:
 271
 272 .. code-block:: none
 273
 274   0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
 275   0x0a callq   *%r11
 276   0x0d nop
 277   0x0e nop                               <--- end of reserved 15-bytes
 278   0x0f addq    $0x3, %rax
 279   0x10 movl    %rax, 8(%rsp)
 280
 281 Note that no stack map locations will be recorded. If the patched code
 282 sequence does not need arguments fixed to specific calling convention
 283 registers, then the ``anyregcc`` convention may be used:
 284
 285 .. code-block:: none
 286
 287   %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
 288                                                      i8* %target, i32 1,
 289                                                      i64* %ptr)
 290
 291 The stack map now indicates the location of the %ptr argument and
 292 return value:
 293
 294 .. code-block:: none
 295
 296   Stack Map: ID=78, Loc0=%r9 Loc1=%r8
 297
 298 The patch code sequence may now use the argument that happened to be
 299 allocated in %r8 and return a value allocated in %r9:
 300
 301 .. code-block:: none
 302
 303   0x00 movslq 4(%r8) %r9              <--- patched code at patch point address
 304   0x03 nop
 305   ...
 306   0x0e nop                            <--- end of reserved 15-bytes
 307   0x0f addq    $0x3, %r9
 308   0x10 movl    %r9, 8(%rsp)
 309
 310 .. _stackmap-format:
 311
 312 Stack Map Format
 313 ================
 314
 315 The existence of a stack map or patch point intrinsic within an LLVM
 316 Module forces code emission to create a :ref:`stackmap-section`. The
 317 format of this section follows:
 318
 319 .. code-block:: none
 320
 321   Header {
 322     uint8  : Stack Map Version (current version is 3)
 323     uint8  : Reserved (expected to be 0)
 324     uint16 : Reserved (expected to be 0)
 325   }
 326   uint32 : NumFunctions
 327   uint32 : NumConstants
 328   uint32 : NumRecords
 329   StkSizeRecord[NumFunctions] {
 330     uint64 : Function Address
 331     uint64 : Stack Size
 332     uint64 : Record Count
 333   }
 334   Constants[NumConstants] {
 335     uint64 : LargeConstant
 336   }
 337   StkMapRecord[NumRecords] {
 338     uint64 : PatchPoint ID
 339     uint32 : Instruction Offset
 340     uint16 : Reserved (record flags)
 341     uint16 : NumLocations
 342     Location[NumLocations] {
 343       uint8  : Register | Direct | Indirect | Constant | ConstantIndex
 344       uint8  : Reserved (expected to be 0)
 345       uint16 : Location Size
 346       uint16 : Dwarf RegNum
 347       uint16 : Reserved (expected to be 0)
 348       int32  : Offset or SmallConstant
 349     }
 350     uint32 : Padding (only if required to align to 8 byte)
 351     uint16 : Padding
 352     uint16 : NumLiveOuts
 353     LiveOuts[NumLiveOuts]
 354       uint16 : Dwarf RegNum
 355       uint8  : Reserved
 356       uint8  : Size in Bytes
 357     }
 358     uint32 : Padding (only if required to align to 8 byte)
 359   }
 360
 361 The first byte of each location encodes a type that indicates how to
 362 interpret the ``RegNum`` and ``Offset`` fields as follows:
 363
 364 ======== ========== =================== ===========================
 365 Encoding Type       Value               Description
 366 -------- ---------- ------------------- ---------------------------
 367 0x1      Register   Reg                 Value in a register
 368 0x2      Direct     Reg + Offset        Frame index value
 369 0x3      Indirect   [Reg + Offset]      Spilled value
 370 0x4      Constant   Offset              Small constant
 371 0x5      ConstIndex Constants[Offset]   Large constant
 372 ======== ========== =================== ===========================
 373
 374 In the common case, a value is available in a register, and the
 375 ``Offset`` field will be zero. Values spilled to the stack are encoded
 376 as ``Indirect`` locations. The runtime must load those values from a
 377 stack address, typically in the form ``[BP + Offset]``. If an
 378 ``alloca`` value is passed directly to a stack map intrinsic, then
 379 LLVM may fold the frame index into the stack map as an optimization to
 380 avoid allocating a register or stack slot. These frame indices will be
 381 encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
 382 also optimize constants by emitting them directly in the stack map,
 383 either in the ``Offset`` of a ``Constant`` location or in the constant
 384 pool, referred to by ``ConstantIndex`` locations.
 385
 386 At each callsite, a "liveout" register list is also recorded. These
 387 are the registers that are live across the stackmap and therefore must
 388 be saved by the runtime. This is an important optimization when the
 389 patchpoint intrinsic is used with a calling convention that by default
 390 preserves most registers as callee-save.
 391
 392 Each entry in the liveout register list contains a DWARF register
 393 number and size in bytes. The stackmap format deliberately omits
 394 specific subregister information. Instead the runtime must interpret
 395 this information conservatively. For example, if the stackmap reports
 396 one byte at ``%rax``, then the value may be in either ``%al`` or
 397 ``%ah``. It doesn't matter in practice, because the runtime will
 398 simply save ``%rax``. However, if the stackmap reports 16 bytes at
 399 ``%ymm0``, then the runtime can safely optimize by saving only
 400 ``%xmm0``.
 401
 402 The stack map format is a contract between an LLVM SVN revision and
 403 the runtime. It is currently experimental and may change in the short
 404 term, but minimizing the need to update the runtime is
 405 important. Consequently, the stack map design is motivated by
 406 simplicity and extensibility. Compactness of the representation is
 407 secondary because the runtime is expected to parse the data
 408 immediately after compiling a module and encode the information in its
 409 own format. Since the runtime controls the allocation of sections, it
 410 can reuse the same stack map space for multiple modules.
 411
 412 Stackmap support is currently only implemented for 64-bit
 413 platforms. However, a 32-bit implementation should be able to use the
 414 same format with an insignificant amount of wasted space.
 415
 416 .. _stackmap-section:
 417
 418 Stack Map Section
 419 ^^^^^^^^^^^^^^^^^
 420
 421 A JIT compiler can easily access this section by providing its own
 422 memory manager via the LLVM C API
 423 ``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
 424 manager, the JIT provides a callback:
 425 ``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
 426 this section, it invokes the callback and passes the section name. The
 427 JIT can record the in-memory address of the section at this time and
 428 later parse it to recover the stack map data.
 429
 430 For MachO (e.g. on Darwin), the stack map section name is
 431 "__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
 432
 433 For ELF (e.g. on Linux), the stack map section name is
 434 ".llvm_stackmaps".  The segment name is "__LLVM_STACKMAPS".
 435
 436 Stack Map Usage
 437 ===============
 438
 439 The stack map support described in this document can be used to
 440 precisely determine the location of values at a specific position in
 441 the code. LLVM does not maintain any mapping between those values and
 442 any higher-level entity. The runtime must be able to interpret the
 443 stack map record given only the ID, offset, and the order of the
 444 locations, records, and functions, which LLVM preserves.
 445
 446 Note that this is quite different from the goal of debug information,
 447 which is a best-effort attempt to track the location of named
 448 variables at every instruction.
 449
 450 An important motivation for this design is to allow a runtime to
 451 commandeer a stack frame when execution reaches an instruction address
 452 associated with a stack map. The runtime must be able to rebuild a
 453 stack frame and resume program execution using the information
 454 provided by the stack map. For example, execution may resume in an
 455 interpreter or a recompiled version of the same function.
 456
 457 This usage restricts LLVM optimization. Clearly, LLVM must not move
 458 stores across a stack map. However, loads must also be handled
 459 conservatively. If the load may trigger an exception, hoisting it
 460 above a stack map could be invalid. For example, the runtime may
 461 determine that a load is safe to execute without a type check given
 462 the current state of the type system. If the type system changes while
 463 some activation of the load's function exists on the stack, the load
 464 becomes unsafe. The runtime can prevent subsequent execution of that
 465 load by immediately patching any stack map location that lies between
 466 the current call site and the load (typically, the runtime would
 467 simply patch all stack map locations to invalidate the function). If
 468 the compiler had hoisted the load above the stack map, then the
 469 program could crash before the runtime could take back control.
 470
 471 To enforce these semantics, stackmap and patchpoint intrinsics are
 472 considered to potentially read and write all memory. This may limit
 473 optimization more than some clients desire. This limitation may be
 474 avoided by marking the call site as "readonly". In the future we may
 475 also allow meta-data to be added to the intrinsic call to express
 476 aliasing, thereby allowing optimizations to hoist certain loads above
 477 stack maps.
 478
 479 Direct Stack Map Entries
 480 ^^^^^^^^^^^^^^^^^^^^^^^^
 481
 482 As shown in :ref:`stackmap-section`, a Direct stack map location
 483 records the address of frame index. This address is itself the value
 484 that the runtime requested. This differs from Indirect locations,
 485 which refer to a stack locations from which the requested values must
 486 be loaded. Direct locations can communicate the address if an alloca,
 487 while Indirect locations handle register spills.
 488
 489 For example:
 490
 491 .. code-block:: none
 492
 493   entry:
 494     %a = alloca i64...
 495     llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
 496
 497 The runtime can determine this alloca's relative location on the
 498 stack immediately after compilation, or at any time thereafter. This
 499 differs from Register and Indirect locations, because the runtime can
 500 only read the values in those locations when execution reaches the
 501 instruction address of the stack map.
 502
 503 This functionality requires LLVM to treat entry-block allocas
 504 specially when they are directly consumed by an intrinsics. (This is
 505 the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
 506 transformations must not substitute the alloca with any intervening
 507 value. This can be verified by the runtime simply by checking that the
 508 stack map's location is a Direct location type.
 509
 510
 511 Supported Architectures
 512 =======================
 513
 514 Support for StackMap generation and the related intrinsics requires
 515 some code for each backend.  Today, only a subset of LLVM's backends
 516 are supported.  The currently supported architectures are X86_64,
 517 PowerPC, and Aarch64.