llvm/docs/ScudoHardenedAllocator.rst

   1 ========================
   2 Scudo Hardened Allocator
   3 ========================
   4
   5 .. contents::
   6    :local:
   7    :depth: 2
   8
   9 Introduction
  10 ============
  11
  12 The Scudo Hardened Allocator is a user-mode allocator, originally based on LLVM
  13 Sanitizers'
  14 `CombinedAllocator <https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_combined.h>`_.
  15 It aims at providing additional mitigation against heap based vulnerabilities,
  16 while maintaining good performance. Scudo is currently the default allocator in
  17 `Fuchsia <https://fuchsia.dev/>`_, and in `Android <https://www.android.com/>`_
  18 since Android 11.
  19
  20 The name "Scudo" comes from the Italian word for
  21 `shield <https://www.collinsdictionary.com/dictionary/italian-english/scudo>`_
  22 (and Escudo in Spanish).
  23
  24 Design
  25 ======
  26
  27 Allocator
  28 ---------
  29 Scudo was designed with security in mind, but aims at striking a good balance
  30 between security and performance. It was designed to be highly tunable and
  31 configurable, and while we provide some default configurations, we encourage
  32 consumers to come up with the parameters that will work best for their use
  33 cases.
  34
  35 The allocator combines several components that serve distinct purposes:
  36
  37 - the Primary allocator: fast and efficient, it services smaller allocation
  38   sizes by carving reserved memory regions into blocks of identical size. There
  39   are currently two Primary allocators implemented, specific to 32 and 64 bit
  40   architectures. It is configurable via compile time options.
  41
  42 - the Secondary allocator: slower, it services larger allocation sizes via the
  43   memory mapping primitives of the underlying operating system. Secondary backed
  44   allocations are surrounded by Guard Pages. It is also configurable via compile
  45   time options.
  46
  47 - the thread specific data Registry: defines how local caches operate for each
  48   thread. There are currently two models implemented: the exclusive model where
  49   each thread holds its own caches (using the ELF TLS); or the shared model
  50   where threads share a fixed size pool of caches.
  51
  52 - the Quarantine: offers a way to delay the deallocation operations, preventing
  53   blocks to be immediately available for reuse. Blocks held will be recycled
  54   once certain size criteria are reached. This is essentially a delayed freelist
  55   which can help mitigate some use-after-free situations. This feature is fairly
  56   costly in terms of performance and memory footprint, is mostly controlled by
  57   runtime options and is disabled by default.
  58
  59 Allocations Header
  60 ------------------
  61 Every chunk of heap memory returned to an application by the allocator will be
  62 preceded by a header. This has two purposes:
  63
  64 - being to store various information about the chunk, that can be leveraged to
  65   ensure consistency of the heap operations;
  66
  67 - being able to detect potential corruption. For this purpose, the header is
  68   checksummed and corruption of the header will be detected when said header is
  69   accessed (note that if the corrupted header is not accessed, the corruption
  70   will remain undetected).
  71
  72 The following information is stored in the header:
  73
  74 - the class ID for that chunk, which identifies the region where the chunk
  75   resides for Primary backed allocations, or 0 for Secondary backed allocations;
  76
  77 - the state of the chunk (available, allocated or quarantined);
  78
  79 - the allocation type (malloc, new, new[] or memalign), to detect potential
  80   mismatches in the allocation APIs used;
  81
  82 - the size (Primary) or unused bytes amount (Secondary) for that chunk, which is
  83   necessary for reallocation or sized-deallocation operations;
  84
  85 - the offset of the chunk, which is the distance in bytes from the beginning of
  86   the returned chunk to the beginning of the backend allocation (the "block");
  87
  88 - the 16-bit checksum;
  89
  90 This header fits within 8 bytes on all platforms supported, and contributes to a
  91 small overhead for each allocation.
  92
  93 The checksum is computed using a CRC32 (made faster with hardware support)
  94 of the global secret, the chunk pointer itself, and the 8 bytes of header with
  95 the checksum field zeroed out. It is not intended to be cryptographically
  96 strong.
  97
  98 The header is atomically loaded and stored to prevent races. This is important
  99 as two consecutive chunks could belong to different threads. We work on local
 100 copies and use compare-exchange primitives to update the headers in the heap
 101 memory, and avoid any type of double-fetching.
 102
 103 Randomness
 104 ----------
 105 Randomness is a critical factor to the additional security provided by the
 106 allocator. The allocator trusts the memory mapping primitives of the OS to
 107 provide pages at (mostly) non-predictable locations in memory, as well as the
 108 binaries to be compiled with ASLR. In the event one of those assumptions is
 109 incorrect, the security will be greatly reduced. Scudo further randomizes how
 110 blocks are allocated in the Primary, can randomize how caches are assigned to
 111 threads.
 112
 113 Memory reclaiming
 114 -----------------
 115 Primary and Secondary allocators have different behaviors with regard to
 116 reclaiming. While Secondary mapped allocations can be unmapped on deallocation,
 117 it isn't the case for the Primary, which could lead to a steady growth of the
 118 RSS of a process. To counteract this, if the underlying OS allows it, pages
 119 that are covered by contiguous free memory blocks in the Primary can be
 120 released: this generally means they won't count towards the RSS of a process and
 121 be zero filled on subsequent accesses). This is done in the deallocation path,
 122 and several options exist to tune this behavior.
 123
 124 Usage
 125 =====
 126
 127 Platform
 128 --------
 129 If using Fuchsia or an Android version greater than 11, your memory allocations
 130 are already service by Scudo (note that Android Svelte configurations still use
 131 jemalloc).
 132
 133 Library
 134 -------
 135 The allocator static library can be built from the LLVM tree thanks to the
 136 ``scudo_standalone`` CMake rule. The associated tests can be exercised thanks to
 137 the ``check-scudo_standalone`` CMake rule.
 138
 139 Linking the static library to your project can require the use of the
 140 ``whole-archive`` linker flag (or equivalent), depending on your linker.
 141 Additional flags might also be necessary.
 142
 143 Your linked binary should now make use of the Scudo allocation and deallocation
 144 functions.
 145
 146 You may also build Scudo like this:
 147
 148 .. code:: console
 149
 150   cd $LLVM/compiler-rt/lib
 151   clang++ -fPIC -std=c++17 -msse4.2 -O2 -pthread -shared \
 152     -I scudo/standalone/include \
 153     scudo/standalone/*.cpp \
 154     -o $HOME/libscudo.so
 155
 156 and then use it with existing binaries as follows:
 157
 158 .. code:: console
 159
 160   LD_PRELOAD=$HOME/libscudo.so ./a.out
 161
 162 Clang
 163 -----
 164 With a recent version of Clang (post rL317337), the "old" version of the
 165 allocator can be linked with a binary at compilation using the
 166 ``-fsanitize=scudo`` command-line argument, if the target platform is supported.
 167 Currently, the only other sanitizer Scudo is compatible with is UBSan
 168 (eg: ``-fsanitize=scudo,undefined``). Compiling with Scudo will also enforce
 169 PIE for the output binary.
 170
 171 We will transition this to the standalone Scudo version in the future.
 172
 173 Options
 174 -------
 175 Several aspects of the allocator can be configured on a per process basis
 176 through the following ways:
 177
 178 - at compile time, by defining ``SCUDO_DEFAULT_OPTIONS`` to the options string
 179   you want set by default;
 180
 181 - by defining a ``__scudo_default_options`` function in one's program that
 182   returns the options string to be parsed. Said function must have the following
 183   prototype: ``extern "C" const char* __scudo_default_options(void)``, with a
 184   default visibility. This will override the compile time define;
 185
 186 - through the environment variable SCUDO_OPTIONS, containing the options string
 187   to be parsed. Options defined this way will override any definition made
 188   through ``__scudo_default_options``.
 189
 190 - via the standard ``mallopt`` `API <https://man7.org/linux/man-pages/man3/mallopt.3.html>`_,
 191   using parameters that are Scudo specific.
 192
 193 When dealing with the options string, it follows a syntax similar to ASan, where
 194 distinct options can be assigned in the same string, separated by colons.
 195
 196 For example, using the environment variable:
 197
 198 .. code:: console
 199
 200   SCUDO_OPTIONS="delete_size_mismatch=false:release_to_os_interval_ms=-1" ./a.out
 201
 202 Or using the function:
 203
 204 .. code:: cpp
 205
 206   extern "C" const char *__scudo_default_options() {
 207     return "delete_size_mismatch=false:release_to_os_interval_ms=-1";
 208   }
 209
 210
 211 The following "string" options are available:
 212
 213 +---------------------------------+----------------+-------------------------------------------------+
 214 | Option                          | Default        | Description                                     |
 215 +---------------------------------+----------------+-------------------------------------------------+
 216 | quarantine_size_kb              | 0              | The size (in Kb) of quarantine used to delay    |
 217 |                                 |                | the actual deallocation of chunks. Lower value  |
 218 |                                 |                | may reduce memory usage but decrease the        |
 219 |                                 |                | effectiveness of the mitigation; a negative     |
 220 |                                 |                | value will fallback to the defaults. Setting    |
 221 |                                 |                | *both* this and thread_local_quarantine_size_kb |
 222 |                                 |                | to zero will disable the quarantine entirely.   |
 223 +---------------------------------+----------------+-------------------------------------------------+
 224 | quarantine_max_chunk_size       | 0              | Size (in bytes) up to which chunks can be       |
 225 |                                 |                | quarantined.                                    |
 226 +---------------------------------+----------------+-------------------------------------------------+
 227 | thread_local_quarantine_size_kb | 0              | The size (in Kb) of per-thread cache use to     |
 228 |                                 |                | offload the global quarantine. Lower value may  |
 229 |                                 |                | reduce memory usage but might increase          |
 230 |                                 |                | contention on the global quarantine. Setting    |
 231 |                                 |                | *both* this and quarantine_size_kb to zero will |
 232 |                                 |                | disable the quarantine entirely.                |
 233 +---------------------------------+----------------+-------------------------------------------------+
 234 | dealloc_type_mismatch           | false          | Whether or not we report errors on              |
 235 |                                 |                | malloc/delete, new/free, new/delete[], etc.     |
 236 +---------------------------------+----------------+-------------------------------------------------+
 237 | delete_size_mismatch            | true           | Whether or not we report errors on mismatch     |
 238 |                                 |                | between sizes of new and delete.                |
 239 +---------------------------------+----------------+-------------------------------------------------+
 240 | zero_contents                   | false          | Whether or not we zero chunk contents on        |
 241 |                                 |                | allocation.                                     |
 242 +---------------------------------+----------------+-------------------------------------------------+
 243 | pattern_fill_contents           | false          | Whether or not we fill chunk contents with a    |
 244 |                                 |                | byte pattern on allocation.                     |
 245 +---------------------------------+----------------+-------------------------------------------------+
 246 | may_return_null                 | true           | Whether or not a non-fatal failure can return a |
 247 |                                 |                | NULL pointer (as opposed to terminating).       |
 248 +---------------------------------+----------------+-------------------------------------------------+
 249 | release_to_os_interval_ms       | 5000           | The minimum interval (in ms) at which a release |
 250 |                                 |                | can be attempted (a negative value disables     |
 251 |                                 |                | reclaiming).                                    |
 252 +---------------------------------+----------------+-------------------------------------------------+
 253 | allocation_ring_buffer_size     | 32768          | If stack trace collection is requested, how     |
 254 |                                 |                | many previous allocations to keep in the        |
 255 |                                 |                | allocation ring buffer.                         |
 256 |                                 |                |                                                 |
 257 |                                 |                | This buffer is used to provide allocation and   |
 258 |                                 |                | deallocation stack traces for MTE fault         |
 259 |                                 |                | reports. The larger the buffer, the more        |
 260 |                                 |                | unrelated allocations can happen between        |
 261 |                                 |                | (de)allocation and the fault.                   |
 262 |                                 |                | If your sync-mode MTE faults do not have        |
 263 |                                 |                | (de)allocation stack traces, try increasing the |
 264 |                                 |                | buffer size.                                    |
 265 |                                 |                |                                                 |
 266 |                                 |                | Stack trace collection can be requested using   |
 267 |                                 |                | the scudo_malloc_set_track_allocation_stacks    |
 268 |                                 |                | function.                                       |
 269 +---------------------------------+----------------+-------------------------------------------------+
 270
 271 Additional flags can be specified, for example if Scudo if compiled with
 272 `GWP-ASan <https://llvm.org/docs/GwpAsan.html>`_ support.
 273
 274 The following "mallopt" options are available (options are defined in
 275 ``include/scudo/interface.h``):
 276
 277 +---------------------------+-------------------------------------------------------+
 278 | Option                    | Description                                           |
 279 +---------------------------+-------------------------------------------------------+
 280 | M_DECAY_TIME              | Sets the release interval option to the specified     |
 281 |                           | value (Android only allows 0 or 1 to respectively set |
 282 |                           | the interval to the minimum and maximum value as      |
 283 |                           | specified at compile time).                           |
 284 +---------------------------+-------------------------------------------------------+
 285 | M_PURGE                   | Forces immediate memory reclaiming but does not       |
 286 |                           | reclaim everything. For smaller size classes, there   |
 287 |                           | is still some memory that is not reclaimed due to the |
 288 |                           | extra time it takes and the small amount of memory    |
 289 |                           | that can be reclaimed.                                |
 290 |                           | The value is ignored.                                 |
 291 +---------------------------+-------------------------------------------------------+
 292 | M_PURGE_ALL               | Same as M_PURGE but will force release all possible   |
 293 |                           | memory regardless of how long it takes.               |
 294 |                           | The value is ignored.                                 |
 295 +---------------------------+-------------------------------------------------------+
 296 | M_MEMTAG_TUNING           | Tunes the allocator's choice of memory tags to make   |
 297 |                           | it more likely that a certain class of memory errors  |
 298 |                           | will be detected. The value argument should be one of |
 299 |                           | the enumerators of ``scudo_memtag_tuning``.           |
 300 +---------------------------+-------------------------------------------------------+
 301 | M_THREAD_DISABLE_MEM_INIT | Tunes the per-thread memory initialization, 0 being   |
 302 |                           | the normal behavior, 1 disabling the automatic heap   |
 303 |                           | initialization.                                       |
 304 +---------------------------+-------------------------------------------------------+
 305 | M_CACHE_COUNT_MAX         | Set the maximum number of entries than can be cached  |
 306 |                           | in the Secondary cache.                               |
 307 +---------------------------+-------------------------------------------------------+
 308 | M_CACHE_SIZE_MAX          | Sets the maximum size of entries that can be cached   |
 309 |                           | in the Secondary cache.                               |
 310 +---------------------------+-------------------------------------------------------+
 311 | M_TSDS_COUNT_MAX          | Increases the maximum number of TSDs that can be used |
 312 |                           | up to the limit specified at compile time.            |
 313 +---------------------------+-------------------------------------------------------+
 314
 315 Error Types
 316 ===========
 317
 318 The allocator will output an error message, and potentially terminate the
 319 process, when an unexpected behavior is detected. The output usually starts with
 320 ``"Scudo ERROR:"`` followed by a short summary of the problem that occurred as
 321 well as the pointer(s) involved. Once again, Scudo is meant to be a mitigation,
 322 and might not be the most useful of tools to help you root-cause the issue,
 323 please consider `ASan <https://github.com/google/sanitizers/wiki/AddressSanitizer>`_
 324 for this purpose.
 325
 326 Here is a list of the current error messages and their potential cause:
 327
 328 - ``"corrupted chunk header"``: the checksum verification of the chunk header
 329   has failed. This is likely due to one of two things: the header was
 330   overwritten (partially or totally), or the pointer passed to the function is
 331   not a chunk at all;
 332
 333 - ``"race on chunk header"``: two different threads are attempting to manipulate
 334   the same header at the same time. This is usually symptomatic of a
 335   race-condition or general lack of locking when performing operations on that
 336   chunk;
 337
 338 - ``"invalid chunk state"``: the chunk is not in the expected state for a given
 339   operation, eg: it is not allocated when trying to free it, or it's not
 340   quarantined when trying to recycle it, etc. A double-free is the typical
 341   reason this error would occur;
 342
 343 - ``"misaligned pointer"``: we strongly enforce basic alignment requirements, 8
 344   bytes on 32-bit platforms, 16 bytes on 64-bit platforms. If a pointer passed
 345   to our functions does not fit those, something is definitely wrong.
 346
 347 - ``"allocation type mismatch"``: when the optional deallocation type mismatch
 348   check is enabled, a deallocation function called on a chunk has to match the
 349   type of function that was called to allocate it. Security implications of such
 350   a mismatch are not necessarily obvious but situational at best;
 351
 352 - ``"invalid sized delete"``: when the C++14 sized delete operator is used, and
 353   the optional check enabled, this indicates that the size passed when
 354   deallocating a chunk is not congruent with the one requested when allocating
 355   it. This is likely to be a `compiler issue <https://software.intel.com/en-us/forums/intel-c-compiler/topic/783942>`_,
 356   as was the case with Intel C++ Compiler, or some type confusion on the object
 357   being deallocated;
 358
 359 - ``"RSS limit exhausted"``: the maximum RSS optionally specified has been
 360   exceeded;
 361
 362 Several other error messages relate to parameter checking on the libc allocation
 363 APIs and are fairly straightforward to understand.
 364