11 ``MemorySSA`` is an analysis that allows us to cheaply reason about the
12 interactions between various memory operations. Its goal is to replace
13 ``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because,
14 unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily
15 result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't
16 have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get
19 At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based
20 form for memory, complete with def-use and use-def chains, which
21 enables users to quickly find may-def and may-uses of memory operations.
22 It can also be thought of as a way to cheaply give versions to the complete
23 state of heap memory, and associate memory operations with those versions.
25 This document goes over how ``MemorySSA`` is structured, and some basic
26 intuition on how ``MemorySSA`` works.
28 A paper on MemorySSA (with notes about how it's implemented in GCC) `can be
29 found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's
30 relatively out-of-date; the paper references multiple heap partitions, but GCC
31 eventually swapped to just using one, like we now have in LLVM. Like
32 GCC's, LLVM's MemorySSA is intraprocedural.
38 MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a
39 structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are
40 ``MemorySSA``'s parallel to LLVM ``Instruction``\ s.
42 Each ``MemoryAccess`` can be one of three types:
48 ``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any
49 point we have two (or more) ``MemoryDef``\ s that could flow into a
50 ``BasicBlock``, the block's top ``MemoryAccess`` will be a
51 ``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any
52 concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s
53 inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s
56 Note also that in SSA, Phi nodes merge must-reach definitions (that is,
57 definitions that *must* be new versions of variables). In MemorySSA, PHI nodes
58 merge may-reach definitions (that is, until disambiguated, the versions that
59 reach a phi node may or may not clobber a given variable).
61 ``MemoryUse``\ s are operations which use but don't modify memory. An example of
62 a ``MemoryUse`` is a ``load``, or a ``readonly`` function call.
64 ``MemoryDef``\ s are operations which may either modify memory, or which
65 introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s
66 include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher)
67 ordering, volatile operations, memory fences, etc.
69 Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``.
70 It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being
71 run on, and implies that we've hit the top of the function. It's the only
72 ``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of
73 ``liveOnEntry`` implies that the memory being used is either undefined or
74 defined before the function begins.
76 An example of all of this overlaid on LLVM IR (obtained by running ``opt
77 -passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When
78 viewing this example, it may be helpful to view it in terms of clobbers. The
79 operands of a given ``MemoryAccess`` are all (potential) clobbers of said
80 MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber
81 for other ``MemoryAccess``\ es. Another useful way of looking at it is in
82 terms of heap versions. In that view, operands of a given
83 ``MemoryAccess`` are the version of the heap before the operation, and
84 if the access produces a value, the value is the new version of the heap
94 ; 1 = MemoryDef(liveOnEntry)
99 ; 6 = MemoryPhi({%0,1},{if.end,4})
100 br i1 undef, label %if.then, label %if.else
113 ; 5 = MemoryPhi({if.then,2},{if.else,3})
115 %1 = load i8, i8* %p1
119 %2 = load i8, i8* %p3
123 The ``MemorySSA`` IR is shown in comments that precede the instructions they map
124 to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)``
125 is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM
126 instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this
127 particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8*
128 %p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM
129 Instruction, so the line directly below a ``MemoryPhi`` isn't special.
131 Going from the top down:
133 - ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering
134 ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This
135 ``MemoryPhi`` is referred to in the textual IR by the number ``6``.
136 - ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition,
137 and its reaching definition before it is ``6``, or the ``MemoryPhi`` after
138 ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_
139 sections below for why this ``MemoryDef`` isn't linked to a separate,
140 disambiguated ``MemoryPhi``.)
141 - ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its
142 reaching definition is also ``6``.
143 - ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before
144 this block could either be ``2`` or ``3``.
145 - ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that
146 it's clobbered by ``5``.
147 - ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; it's
148 reaching definition is ``5``.
149 - ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory,
150 and the last thing that could clobber this use is above ``while.cond`` (e.g.
151 the store to ``%p3``). In heap versioning parlance, it really only depends on
152 the heap version 1, and is unaffected by the new heap versions generated since
155 As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not
156 meant to interact with LLVM IR.
161 ``MemorySSA`` is an analysis that can be built for any arbitrary function. When
162 it's built, it does a pass over the function's IR in order to build up its
163 mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things
164 like the dominance relation between ``MemoryAccess``\ es, and get the
165 ``MemoryAccess`` for any given ``Instruction`` .
167 When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker``
168 that you can use (see below).
174 A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or
175 the walker, for short. The goal of the walker is to provide answers to clobber
176 queries beyond what's represented directly by ``MemoryAccess``\ es. For example,
185 ; 1 = MemoryDef(liveOnEntry)
191 The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would
192 be the walker's goal to figure this out, and return ``liveOnEntry`` when queried
193 for the clobber of ``MemoryAccess`` ``2``.
195 By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s
196 and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to
197 be using. Walkers were built to be flexible, though, so it's entirely reasonable
198 (and expected) to create more specialized walkers (e.g. one that specifically
199 queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc).
202 Locating clobbers yourself
203 ^^^^^^^^^^^^^^^^^^^^^^^^^^
205 If you choose to make your own walker, you can find the clobber for a
206 ``MemoryAccess`` by walking every ``MemoryDef`` that dominates said
207 ``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple;
208 they ultimately form a linked list of every clobber that dominates the
209 ``MemoryAccess`` that you're trying to optimize. In other words, the
210 ``definingAccess`` of a ``MemoryDef`` is always the nearest dominating
211 ``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``.
214 Build-time use optimization
215 ---------------------------
217 ``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time.
218 Specifically, we optimize the operand of every ``MemoryUse`` to point to the
219 actual clobber of said ``MemoryUse``. This can be seen in the above example; the
220 second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a
221 ``MemoryDef`` from the entry block. This is done to make walking,
222 value numbering, etc, faster and easier.
224 It is not possible to optimize ``MemoryDef`` in the same way, as we
225 restrict ``MemorySSA`` to one heap variable and, thus, one Phi node
229 Invalidation and updating
230 -------------------------
232 Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever
233 the IR is updated. "Update", in this case, includes the addition, deletion, and
234 motion of ``Instructions``. The update API is being made on an as-needed basis.
235 If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API.
241 ``MemorySSA`` only places ``MemoryPhi``\ s where they're actually
242 needed. That is, it is a pruned SSA form, like LLVM's SSA form. For
252 ; 1 = MemoryDef(liveOnEntry)
257 ; 3 = MemoryPhi({%0,1},{if.end,2})
258 br i1 undef, label %if.then, label %if.else
268 %1 = load i8, i8* %p1
272 %2 = load i8, i8* %p3
276 Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi``
277 for ``if.end`` would be pointless, so we don't place one. So, if you need to
278 place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create
279 a ``MemoryPhi`` for ``if.end``.
281 If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s
282 everywhere. Because we have Walkers that are capable of optimizing above said
283 phis, doing so shouldn't prohibit optimizations.
289 ``MemorySSA`` is meant to reason about the relation between memory
290 operations, and enable quicker querying.
291 It isn't meant to be the single source of truth for all potential memory-related
292 optimizations. Specifically, care must be taken when trying to use ``MemorySSA``
293 to reason about atomic or volatile operations, as in:
297 define i8 @foo(i8* %a) {
299 br i1 undef, label %if.then, label %if.end
302 ; 1 = MemoryDef(liveOnEntry)
303 %0 = load volatile i8, i8* %a
307 %av = phi i8 [0, %entry], [%0, %if.then]
311 Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may
312 seem legal. Because it's a volatile load, though, it's not.
321 ``MemorySSA`` in LLVM deliberately trades off precision for speed.
322 Let us think about memory variables as if they were disjoint partitions of the
323 heap (that is, if you have one variable, as above, it represents the entire
324 heap, and if you have multiple variables, each one represents some
325 disjoint portion of the heap)
327 First, because alias analysis results conflict with each other, and
328 each result may be what an analysis wants (IE
329 TBAA may say no-alias, and something else may say must-alias), it is
330 not possible to partition the heap the way every optimization wants.
331 Second, some alias analysis results are not transitive (IE A noalias B,
332 and B noalias C, does not mean A noalias C), so it is not possible to
333 come up with a precise partitioning in all cases without variables to
334 represent every pair of possible aliases. Thus, partitioning
335 precisely may require introducing at least N^2 new virtual variables,
338 Each of these variables may be clobbered at multiple def sites.
340 To give an example, if you were to split up struct fields into
341 individual variables, all aliasing operations that may-def multiple struct
342 fields, will may-def more than one of them. This is pretty common (calls,
343 copies, field stores, etc).
345 Experience with SSA forms for memory in other compilers has shown that
346 it is simply not possible to do this precisely, and in fact, doing it
347 precisely is not worth it, because now all the optimizations have to
348 walk tons and tons of virtual variables and phi nodes.
350 So we partition. At the point at which you partition, again,
351 experience has shown us there is no point in partitioning to more than
352 one variable. It simply generates more IR, and optimizations still
353 have to query something to disambiguate further anyway.
355 As a result, LLVM partitions to one variable.
360 Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one
361 useful guarantee - all loads are optimized to point at the thing that
362 actually clobbers them. This gives some nice properties. For example,
363 for a given store, you can find all loads actually clobbered by that
364 store by walking the immediate uses of the store.