11 The constexpr interpreter aims to replace the existing tree evaluator in
12 clang, improving performance on constructs which are executed inefficiently
13 by the evaluator. The interpreter is activated using the following flags:
15 * ``-fexperimental-new-constant-interpreter`` enables the interpreter,
16 emitting an error if an unsupported feature is encountered
21 Bytecode compilation is handled in ``Compiler.h`` for statements
22 and for expressions. The compiler has two different
23 backends: one to generate bytecode for functions (``ByteCodeEmitter``) and
24 one to directly evaluate expressions as they are compiled, without
25 generating bytecode (``EvalEmitter``). All functions are compiled to
26 bytecode, while toplevel expressions used in constant contexts are directly
27 evaluated since the bytecode would never be reused. This mechanism aims to
28 pave the way towards replacing the evaluator, improving its performance on
29 functions and loops, while being just as fast on single-use toplevel
32 The interpreter relies on stack-based, strongly-typed opcodes. The glue
33 logic between the code generator, along with the enumeration and
34 description of opcodes, can be found in ``Opcodes.td``. The opcodes are
35 implemented as generic template methods in ``Interp.h`` and instantiated
36 with the relevant primitive types by the interpreter loop or by the
42 * ``PT_{U|S}int{8|16|32|64}``
44 Signed or unsigned integers of a specific bit width, implemented using
45 the ```Integral``` type.
49 Signed or unsigned integers of an arbitrary, but fixed width used to
50 implement integral types which are required by the target, but are not
51 supported by the host. Under the hood, they rely on ``APInt``. The
52 ``Integral`` specialisation for these types is required by opcodes to
53 share an implementation with fixed integrals.
57 Representation for boolean types, essentially a 1-bit unsigned
62 Arbitrary, but fixed precision floating point numbers. Could be
63 specialised in the future similarly to integers in order to improve
64 floating point performance.
68 Pointer type, defined in ``"Pointer.h"``. The most common type of
69 pointer is a "BlockPointer", which points to an ``interp::Block``.
70 But other pointer types exist, such as typeid pointers or
75 Function pointer type, can also be a null function pointer. Defined
76 in ``"FunctionPointer.h"``.
80 Member pointer type, can also be a null member pointer. Defined
81 in ``"MemberPointer.h"``
86 The interpreter distinguishes two kinds of composite types: arrays and
87 records (structs and classes). Unions are represented as records, except
88 at most a single field can be marked as active. The contents of inactive
89 fields are kept until they are reactivated and overwritten.
90 Complex numbers (``_Complex``) and vectors
91 (``__attribute((vector_size(16)))``) are treated as arrays.
97 Bytecode is executed using a stack-based interpreter. The execution
98 context consists of an ``InterpStack``, along with a chain of
99 ``InterpFrame`` objects storing the call frames. Frames are built by
100 call instructions and destroyed by return instructions. They perform
101 one allocation to reserve space for all locals in a single block.
102 These objects store all the required information to emit stack traces
103 whenever evaluation fails.
108 Memory management in the interpreter relies on 3 data structures: ``Block``
109 objects which store the data and associated inline metadata, ``Pointer``
110 objects which refer to or into blocks, and ``Descriptor`` structures which
111 describe blocks and subobjects nested inside blocks.
116 Blocks contain data interleaved with metadata. They are allocated either
117 statically in the code generator (globals, static members, dummy parameter
118 values etc.) or dynamically in the interpreter, when creating the frame
119 containing the local variables of a function. Blocks are associated with a
120 descriptor that characterises the entire allocation, along with a few
121 additional attributes:
123 * ``IsStatic`` indicates whether the block has static duration in the
124 interpreter, i.e. it is not a local in a frame.
126 * ``DeclID`` identifies each global declaration (it is set to an invalid
127 and irrelevant value for locals) in order to prevent illegal writes and
128 reads involving globals and temporaries with static storage duration.
130 Static blocks are never deallocated, but local ones might be deallocated
131 even when there are live pointers to them. Pointers are only valid as
132 long as the blocks they point to are valid, so a block with pointers to
133 it whose lifetime ends is kept alive until all pointers to it go out of
134 scope. Since the frame is destroyed on function exit, such blocks are
135 turned into a ``DeadBlock`` and copied to storage managed by the
136 interpreter itself, not the frame. Reads and writes to these blocks are
137 illegal and cause an appropriate diagnostic to be emitted. When the last
138 pointer goes out of scope, dead blocks are also deallocated.
140 The lifetime of blocks is managed through 3 methods stored in the
141 descriptor of the block:
143 * **CtorFn**: initializes the metadata which is store in the block,
144 alongside actual data. Invokes the default constructors of objects
145 which are not trivial (``Pointer``, ``RealFP``, etc.)
147 * **DtorFn**: invokes the destructors of non-trivial objects.
149 * **MoveFn**: moves a block to dead storage.
151 Non-static blocks track all the pointers into them through an intrusive
152 doubly-linked list, required to adjust and invalidate all pointers when
153 transforming a block into a dead block. If the lifetime of an object ends,
154 all pointers to it are invalidated, emitting the appropriate diagnostics when
157 The interpreter distinguishes 3 different kinds of blocks:
161 A block containing a single primitive with no additional metadata.
163 * **Arrays of primitives**
165 An array of primitives contains a pointer to an ``InitMap`` storage as its
166 first field: the initialisation map is a bit map indicating all elements of
167 the array which were initialised. If the pointer is null, no elements were
168 initialised, while a value of ``(InitMap*)-1`` indicates that the object was
169 fully initialised. When all fields are initialised, the map is deallocated
170 and replaced with that token.
172 Array elements are stored sequentially, without padding, after the pointer
175 * **Arrays of composites and records**
177 Each element in an array of composites is preceded by an ``InlineDescriptor``
178 which stores the attributes specific to the field and not the whole
179 allocation site. Descriptors and elements are stored sequentially in the
181 Records are laid out identically to arrays of composites: each field and base
182 class is preceded by an inline descriptor. The ``InlineDescriptor``
183 has the following fields:
185 * **Offset**: byte offset into the array or record, used to step back to the
186 parent array or record.
187 * **IsConst**: flag indicating if the field is const-qualified.
188 * **IsInitialized**: flag indicating whether the field or element was
189 initialized. For non-primitive fields, this is only relevant to determine
190 the dynamic type of objects during construction.
191 * **IsBase**: flag indicating whether the record is a base class. In that
192 case, the offset can be used to identify the derived class.
193 * **IsActive**: indicates if the field is the active field of a union.
194 * **IsMutable**: indicates if the field is marked as mutable.
196 Inline descriptors are filled in by the `CtorFn` of blocks, which leaves storage
197 in an uninitialised, but valid state.
202 Descriptors are generated at bytecode compilation time and contain information
203 required to determine if a particular memory access is allowed in constexpr.
204 They also carry all the information required to emit a diagnostic involving
205 a memory access, such as the declaration which originates the block.
206 Currently there is a single kind of descriptor encoding information for all
212 Pointers, implemented in ``Pointer.h`` are represented as a tagged union.
214 * **BlockPointer**: used to reference memory allocated and managed by the
215 interpreter, being the only pointer kind which allows dereferencing in the
217 * **TypeIDPointer**: tracks information for the opaque type returned by
219 * **IntegralPointer**: a pointer formed from an integer,
222 Besides the previously mentioned union, a number of other pointer-like types
225 * **FunctionPointer** tracks functions.
226 * **MemberPointer** tracks C++ object members
231 Block pointers track a ``Pointee``, the block to which they point, along
232 with a ``Base`` and an ``Offset``. The base identifies the innermost field,
233 while the offset points to an array element relative to the base (including
234 one-past-end pointers). The offset identifies the array element or field
235 which is referenced, while the base points to the outer object or array which
236 contains the field. These two fields allow all pointers to be uniquely
237 identified, disambiguated and characterised.
239 As an example, consider the following structure:
256 On the target, ``&a`` and ``&a.b.x`` are equal. So are ``&a.c[0]`` and
257 ``&a.c[0].a``. In the interpreter, all these pointers must be
258 distinguished since the are all allowed to address distinct range of
261 In the interpreter, the object would require 240 bytes of storage and
262 would have its field interleaved with metadata. The pointers which can
263 be derived to the object are illustrated in the following diagram:
267 0 16 32 40 56 64 80 96 112 120 136 144 160 176 184 200 208 224 240
268 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
269 + B | D | D | x | D | y | D | D | D | a | D | b | D | D | a | D | b | D | z |
270 +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
271 ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
272 | | | | | | | &a.c[0].b | | &a.c[1].b |
273 a |&a.b.x &a.y &a.c |&a.c[0].a |&a.c[1].a |
274 &a.b &a.c[0] &a.c[1] &a.z
276 The ``Base`` offset of all pointers points to the start of a field or
277 an array and is preceded by an inline descriptor (unless ``Base`` is
278 zero, pointing to the root). All the relevant attributes can be read
279 from either the inline descriptor or the descriptor of the block.
282 Array elements are identified by the ``Offset`` field of pointers,
283 pointing to past the inline descriptors for composites and before
284 the actual data in the case of primitive arrays. The ``Offset``
285 points to the offset where primitives can be read from. As an example,
286 ``a.c + 1`` would have the same base as ``a.c`` since it is an element
287 of ``a.c``, but its offset would point to ``&a.c[1]``. The
288 array-to-pointer decay operation adjusts a pointer to an array (where
289 the offset is equal to the base) to a pointer to the first element.
294 ``TypeInfoPointer`` tracks two types: the type assigned to
295 ``std::type_info`` and the type which was passed to ``typeinfo``.
296 It is part of the taged union in ``Pointer``.