clang/docs/HardwareAssistedAddressSanitizerDesign.rst

   1 =======================================================
   2 Hardware-assisted AddressSanitizer Design Documentation
   3 =======================================================
   4
   5 This page is a design document for
   6 **hardware-assisted AddressSanitizer** (or **HWASAN**)
   7 a tool similar to :doc:`AddressSanitizer`,
   8 but based on partial hardware assistance.
   9
  10
  11 Introduction
  12 ============
  13
  14 :doc:`AddressSanitizer`
  15 tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
  16 uses *redzones* to find buffer-overflows and
  17 *quarantine* to find use-after-free.
  18 The redzones, the quarantine, and, to a less extent, the shadow, are the
  19 sources of AddressSanitizer's memory overhead.
  20 See the `AddressSanitizer paper`_ for details.
  21
  22 AArch64 has `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
  23 software to use the 8 most significant bits of a 64-bit pointer as
  24 a tag. HWASAN uses `Address Tagging`_
  25 to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
  26 but with smaller memory overhead and slightly different (mostly better)
  27 accuracy guarantees.
  28
  29 Intel's `Linear Address Masking`_ (LAM) also provides address tagging for
  30 x86_64, though it is not widely available in hardware yet.  For x86_64, HWASAN
  31 has a limited implementation using page aliasing instead.
  32
  33 Algorithm
  34 =========
  35 * Every heap/stack/global memory object is forcibly aligned by `TG` bytes
  36   (`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
  37 * For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
  38 * The pointer to the object is tagged with `T`.
  39 * The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
  40 * Every load and store is instrumented to read the memory tag and compare it
  41   with the pointer tag, exception is raised on tag mismatch.
  42
  43 For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
  44
  45 Short granules
  46 --------------
  47
  48 A short granule is a granule of size between 1 and `TG-1` bytes. The size
  49 of a short granule is stored at the location in shadow memory where the
  50 granule's tag is normally stored, while the granule's actual tag is stored
  51 in the last byte of the granule. This means that in order to verify that a
  52 pointer tag matches a memory tag, HWASAN must check for two possibilities:
  53
  54 * the pointer tag is equal to the memory tag in shadow memory, or
  55 * the shadow memory tag is actually a short granule size, the value being loaded
  56   is in bounds of the granule and the pointer tag is equal to the last byte of
  57   the granule.
  58
  59 Pointer tags between 1 to `TG-1` are possible and are as likely as any other
  60 tag. This means that these tags in memory have two interpretations: the full
  61 tag interpretation (where the pointer tag is between 1 and `TG-1` and the
  62 last byte of the granule is ordinary data) and the short tag interpretation
  63 (where the pointer tag is stored in the granule).
  64
  65 When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
  66 will show both the memory tag and the last byte of the granule. Currently,
  67 it is up to the user to disambiguate the two possibilities.
  68
  69 Instrumentation
  70 ===============
  71
  72 Memory Accesses
  73 ---------------
  74 In the majority of cases, memory accesses are prefixed with a call to
  75 an outlined instruction sequence that verifies the tags. The code size
  76 and performance overhead of the call is reduced by using a custom calling
  77 convention that
  78
  79 * preserves most registers, and
  80 * is specialized to the register containing the address, and the type and
  81   size of the memory access.
  82
  83 Currently, the following sequence is used:
  84
  85 .. code-block:: none
  86
  87   // int foo(int *a) { return *a; }
  88   // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c
  89   [...]
  90   foo:
  91         stp     x30, x20, [sp, #-16]!
  92         adrp    x20, :got:__hwasan_shadow               // load shadow address from GOT into x20
  93         ldr     x20, [x20, :got_lo12:__hwasan_shadow]
  94         bl      __hwasan_check_x0_2_short_v2            // call outlined tag check
  95                                                         // (arguments: x0 = address, x20 = shadow base;
  96                                                         // "2" encodes the access type and size)
  97         ldr     w0, [x0]                                // inline load
  98         ldp     x30, x20, [sp], #16
  99         ret
 100
 101   [...]
 102   __hwasan_check_x0_2_short_v2:
 103         sbfx    x16, x0, #4, #52                        // shadow offset
 104         ldrb    w16, [x20, x16]                         // load shadow tag
 105         cmp     x16, x0, lsr #56                        // extract address tag, compare with shadow tag
 106         b.ne    .Ltmp0                                  // jump to short tag handler on mismatch
 107   .Ltmp1:
 108         ret
 109   .Ltmp0:
 110         cmp     w16, #15                                // is this a short tag?
 111         b.hi    .Ltmp2                                  // if not, error
 112         and     x17, x0, #0xf                           // find the address's position in the short granule
 113         add     x17, x17, #3                            // adjust to the position of the last byte loaded
 114         cmp     w16, w17                                // check that position is in bounds
 115         b.ls    .Ltmp2                                  // if not, error
 116         orr     x16, x0, #0xf                           // compute address of last byte of granule
 117         ldrb    w16, [x16]                              // load tag from it
 118         cmp     x16, x0, lsr #56                        // compare with pointer tag
 119         b.eq    .Ltmp1                                  // if matches, continue
 120   .Ltmp2:
 121         stp     x0, x1, [sp, #-256]!                    // save original x0, x1 on stack (they will be overwritten)
 122         stp     x29, x30, [sp, #232]                    // create frame record
 123         mov     x1, #2                                  // set x1 to a constant indicating the type of failure
 124         adrp    x16, :got:__hwasan_tag_mismatch_v2      // call runtime function to save remaining registers and report error
 125         ldr     x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler)
 126         br      x16
 127
 128 Heap
 129 ----
 130
 131 Tagging the heap memory/pointers is done by `malloc`.
 132 This can be based on any malloc that forces all objects to be TG-aligned.
 133 `free` tags the memory with a different tag.
 134
 135 Stack
 136 -----
 137
 138 Stack frames are instrumented by aligning all non-promotable allocas
 139 by `TG` and tagging stack memory in function prologue and epilogue.
 140
 141 Tags for different allocas in one function are **not** generated
 142 independently; doing that in a function with `M` allocas would require
 143 maintaining `M` live stack pointers, significantly increasing register
 144 pressure. Instead we generate a single base tag value in the prologue,
 145 and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
 146 ReTag can be as simple as exclusive-or with constant `M`.
 147
 148 Stack instrumentation is expected to be a major source of overhead,
 149 but could be optional.
 150
 151 Globals
 152 -------
 153
 154 Most globals in HWASAN instrumented code are tagged. This is accomplished
 155 using the following mechanisms:
 156
 157   * The address of each global has a static tag associated with it. The first
 158     defined global in a translation unit has a pseudorandom tag associated
 159     with it, based on the hash of the file path. Subsequent global tags are
 160     incremental from the previously-assigned tag.
 161
 162   * The global's tag is added to its symbol address in the object file's symbol
 163     table. This causes the global's address to be tagged when its address is
 164     taken.
 165
 166   * When the address of a global is taken directly (i.e. not via the GOT), a special
 167     instruction sequence needs to be used to add the tag to the address,
 168     because the tag would otherwise take the address outside of the small code
 169     model (4GB on AArch64). No changes are required when the address is taken
 170     via the GOT because the address stored in the GOT will contain the tag.
 171
 172   * An associated ``hwasan_globals`` section is emitted for each tagged global,
 173     which indicates the address of the global, its size and its tag.  These
 174     sections are concatenated by the linker into a single ``hwasan_globals``
 175     section that is enumerated by the runtime (via an ELF note) when a binary
 176     is loaded and the memory is tagged accordingly.
 177
 178 A complete example is given below:
 179
 180 .. code-block:: none
 181
 182   // int x = 1; int *f() { return &x; }
 183   // clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c
 184
 185   [...]
 186   f:
 187         adrp    x0, :pg_hi21_nc:x            // set bits 12-63 to upper bits of untagged address
 188         movk    x0, #:prel_g3:x+0x100000000  // set bits 48-63 to tag
 189         add     x0, x0, :lo12:x              // set bits 0-11 to lower bits of address
 190         ret
 191
 192   [...]
 193         .data
 194   .Lx.hwasan:
 195         .word   1
 196
 197         .globl  x
 198         .set x, .Lx.hwasan+0x2d00000000000000
 199
 200   [...]
 201         .section        .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat
 202   .Lhwasan.note:
 203         .word   8                            // namesz
 204         .word   8                            // descsz
 205         .word   3                            // NT_LLVM_HWASAN_GLOBALS
 206         .asciz  "LLVM\000\000\000"
 207         .word   __start_hwasan_globals-.Lhwasan.note
 208         .word   __stop_hwasan_globals-.Lhwasan.note
 209
 210   [...]
 211         .section        hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2
 212   .Lx.hwasan.descriptor:
 213         .word   .Lx.hwasan-.Lx.hwasan.descriptor
 214         .word   0x2d000004                   // tag = 0x2d, size = 4
 215
 216 Error reporting
 217 ---------------
 218
 219 Errors are generated by the `HLT` instruction and are handled by a signal handler.
 220
 221 Attribute
 222 ---------
 223
 224 HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
 225 C function attribute. An alternative would be to re-use ASAN's attribute
 226 `sanitize_address`. The reasons to use a separate attribute are:
 227
 228   * Users may need to disable ASAN but not HWASAN, or vise versa,
 229     because the tools have different trade-offs and compatibility issues.
 230   * LLVM (ideally) does not use flags to decide which pass is being used,
 231     ASAN or HWASAN are being applied, based on the function attributes.
 232
 233 This does mean that users of HWASAN may need to add the new attribute
 234 to the code that already uses the old attribute.
 235
 236
 237 Comparison with AddressSanitizer
 238 ================================
 239
 240 HWASAN:
 241   * Is less portable than :doc:`AddressSanitizer`
 242     as it relies on hardware `Address Tagging`_ (AArch64).
 243     Address Tagging can be emulated with compiler instrumentation,
 244     but it will require the instrumentation to remove the tags before
 245     any load or store, which is infeasible in any realistic environment
 246     that contains non-instrumented code.
 247   * May have compatibility problems if the target code uses higher
 248     pointer bits for other purposes.
 249   * May require changes in the OS kernels (e.g. Linux seems to dislike
 250     tagged pointers passed from address space:
 251     https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
 252   * **Does not require redzones to detect buffer overflows**,
 253     but the buffer overflow detection is probabilistic, with roughly
 254     `1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS
 255     respectively).
 256   * **Does not require quarantine to detect heap-use-after-free,
 257     or stack-use-after-return**.
 258     The detection is similarly probabilistic.
 259
 260 The memory overhead of HWASAN is expected to be much smaller
 261 than that of AddressSanitizer:
 262 `1/TG` extra memory for the shadow
 263 and some overhead due to `TG`-aligning all objects.
 264
 265 Supported architectures
 266 =======================
 267 HWASAN relies on `Address Tagging`_ which is only available on AArch64.
 268 For other 64-bit architectures it is possible to remove the address tags
 269 before every load and store by compiler instrumentation, but this variant
 270 will have limited deployability since not all of the code is
 271 typically instrumented.
 272
 273 On x86_64, HWASAN utilizes page aliasing to place tags in userspace address
 274 bits.  Currently only heap tagging is supported.  The page aliases rely on
 275 shared memory, which will cause heap memory to be shared between processes if
 276 the application calls ``fork()``.  Therefore x86_64 is really only safe for
 277 applications that do not fork.
 278
 279 HWASAN does not currently support 32-bit architectures since they do not
 280 support `Address Tagging`_ and the address space is too constrained to easily
 281 implement page aliasing.
 282
 283
 284 Related Work
 285 ============
 286 * `SPARC ADI`_ implements a similar tool mostly in hardware.
 287 * `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
 288   similar approaches ("lock & key").
 289 * `Watchdog`_ discussed a heavier, but still somewhat similar
 290   "lock & key" approach.
 291 * *TODO: add more "related work" links. Suggestions are welcome.*
 292
 293
 294 .. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
 295 .. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
 296 .. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
 297 .. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
 298 .. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html
 299 .. _Linear Address Masking: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html