5 The Opaque Pointer Type
6 =======================
8 Traditionally, LLVM IR pointer types have contained a pointee type. For example,
9 ``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However,
10 due to a lack of pointee type semantics and various issues with having pointee
11 types, there is a desire to remove pointee types from pointers.
13 The opaque pointer type project aims to replace all pointer types containing
14 pointee types in LLVM with an opaque pointer type. The new pointer type is
15 tentatively represented textually as ``ptr``.
17 Address spaces are still used to distinguish between different kinds of pointers
18 where the distinction is relevant for lowering (e.g. data vs function pointers
19 have different sizes on some architectures). Opaque pointers are not changing
20 anything related to address spaces and lowering. For more information, see
21 `DataLayout <LangRef.html#langref-datalayout>`_.
23 Issues with explicit pointee types
24 ==================================
26 LLVM IR pointers can be cast back and forth between pointers with different
27 pointee types. The pointee type does not necessarily actually represent the
28 actual underlying type in memory. In other words, the pointee type contains no
31 Lots of operations do not actually care about the underlying type. These
32 operations, typically intrinsics, usually end up taking an ``i8 *``. This causes
33 lots of redundant no-op bitcasts in the IR to and from a pointer with a
34 different pointee type. The extra bitcasts take up space and require extra work
35 to look through in optimizations. And more bitcasts increases the chances of
36 incorrect bitcasts, especially in regards to address spaces.
38 Some instructions still need to know what type to treat the memory pointed to by
39 the pointer as. For example, a load needs to know how many bytes to load from
40 memory. In these cases, instructions themselves contain a type argument. For
41 example the load instruction from older versions of LLVM
53 A nice analogous transition that happened earlier in LLVM is integer signedness.
54 There is no distinction between signed and unsigned integer types, rather the
55 integer operations themselves contain what to treat the integer as. Initially,
56 LLVM IR distinguished between unsigned and signed integer types. The transition
57 from manifesting signedness in types to instructions happened early on in LLVM's
58 life to the betterment of LLVM IR.
60 I Still Need Pointee Types!
61 ===========================
63 The frontend should already know what type each operation operates on based on
64 the input source code. However, some frontends like Clang may end up relying on
65 LLVM pointer pointee types to keep track of pointee types. The frontend needs to
66 keep track of frontend pointee types on its own.
68 For optimizations around frontend types, pointee types are not useful due their
69 lack of semantics. Rather, since LLVM IR works on untyped memory, for a frontend
70 to tell LLVM about frontend types for the purposes of alias analysis, extra
71 metadata is added to the IR. For more information, see `TBAA
72 <LangRef.html#tbaa-metadata>`_.
74 Some specific operations still need to know what type a pointer types to. For
75 the most part, this is codegen and ABI specific. For example, `byval
76 <LangRef.html#parameter-attributes>`_ arguments are pointers, but backends need
77 to know the underlying type of the argument to properly lower it. In cases like
78 these, the attributes contain a type argument. For example,
82 call void @f(ptr byval(i32) %p)
84 signifies that ``%p`` as an argument should be lowered as an ``i32`` passed
87 If you have use cases that this sort of fix doesn't cover, please email
93 LLVM currently has many places that depend on pointee types. Each dependency on
94 pointee types needs to be resolved in some way or another. This essentially
95 translates to figuring out how to remove all calls to
96 ``PointerType::getElementType`` and ``Type::getPointerElementType()``.
98 Making everything use opaque pointers in one huge commit is infeasible. This
99 needs to be done incrementally. The following steps need to be done, in no
102 * Introduce the opaque pointer type
106 * Remove remaining in-tree users of pointee types
108 * There are many miscellaneous uses that should be cleaned up individually
110 * Some of the larger use cases are mentioned below
112 * Various ABI attributes and instructions that rely on pointee types need to be
113 modified to specify the type separately
115 * This has already happened for all instructions like loads, stores, GEPs,
116 and various attributes like ``byval``
118 * More cases may be found as work continues
120 * Remove calls to and deprecate ``IRBuilder`` methods that rely on pointee types
122 * For example, some of the ``IRBuilder::CreateGEP()`` methods use the pointer
123 operand's pointee type to determine the GEP operand type
125 * Some methods are already deprecated with ``LLVM_ATTRIBUTE_DEPRECATED``, such
126 as some overloads of ``IRBuilder::CreateLoad()``
128 * Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer
129 type (not to be turned on until ready)
131 * To support legacy bitcode, such as legacy stores/loads, we need to track
132 pointee types for all values since legacy instructions may infer the types
133 from a pointer operand's pointee type
135 * Migrate frontends to not keep track of frontend pointee types via LLVM pointer
138 * This is mostly Clang, see ``clang::CodeGen::Address::getElementType()``
140 * Add option to internally treat all pointer types opaque pointers and see what
141 breaks, starting with LLVM tests, then run Clang over large codebases
143 * We don't want to start mass-updating tests until we're fairly confident that opaque pointers won't cause major issues
145 * Replace legacy pointer types in LLVM tests with opaque pointer types
147 Frontend Migration Steps
148 ========================
150 If you have your own frontend, there are a couple of things to do after opaque
151 pointer types fully work.
153 * Don't rely on LLVM pointee types to keep track of frontend pointee types
155 * Migrate away from LLVM IR instruction builders that rely on pointee types
157 * For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to
158 use one where the source element type is explicitly passed in, not inferred
159 from the pointer operand pointee type