11 ``MemorySSA`` is an analysis that allows us to cheaply reason about the
12 interactions between various memory operations. Its goal is to replace
13 ``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because,
14 unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily
15 result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't
16 have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get
17 better results, too. One common use of ``MemorySSA`` is to quickly find out
18 that something definitely cannot happen (for example, reason that a hoist
19 out of a loop can't happen).
21 At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based
22 form for memory, complete with def-use and use-def chains, which
23 enables users to quickly find may-def and may-uses of memory operations.
24 It can also be thought of as a way to cheaply give versions to the complete
25 state of memory, and associate memory operations with those versions.
27 This document goes over how ``MemorySSA`` is structured, and some basic
28 intuition on how ``MemorySSA`` works.
30 A paper on MemorySSA (with notes about how it's implemented in GCC) `can be
31 found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's
32 relatively out-of-date; the paper references multiple memory partitions, but GCC
33 eventually swapped to just using one, like we now have in LLVM. Like
34 GCC's, LLVM's MemorySSA is intraprocedural.
40 MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a
41 structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are
42 ``MemorySSA``'s parallel to LLVM ``Instruction``\ s.
44 Each ``MemoryAccess`` can be one of three types:
50 ``MemoryDef``\ s are operations which may either modify memory, or which
51 introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s
52 include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher)
53 ordering, volatile operations, memory fences, etc. A ``MemoryDef``
54 always introduces a new version of the entire memory and is linked with a single
55 ``MemoryDef/MemoryPhi`` which is the version of memory that the new
56 version is based on. This implies that there is a *single*
57 ``Def`` chain that connects all the ``Def``\ s, either directly
58 or indirectly. For example in:
66 ``d`` is connected directly with ``c`` and indirectly with ``b``.
67 This means that ``d`` potentially clobbers (see below) ``c`` *or*
68 ``b`` *or* both. This in turn implies that without the use of `The walker`_,
69 initially every ``MemoryDef`` clobbers every other ``MemoryDef``.
71 ``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any
72 point we have two (or more) ``MemoryDef``\ s that could flow into a
73 ``BasicBlock``, the block's top ``MemoryAccess`` will be a
74 ``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any
75 concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s
76 inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s
79 Note also that in SSA, Phi nodes merge must-reach definitions (that is,
80 definitions that *must* be new versions of variables). In MemorySSA, PHI nodes
81 merge may-reach definitions (that is, until disambiguated, the versions that
82 reach a phi node may or may not clobber a given variable).
84 ``MemoryUse``\ s are operations which use but don't modify memory. An example of
85 a ``MemoryUse`` is a ``load``, or a ``readonly`` function call.
87 Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``.
88 It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being
89 run on, and implies that we've hit the top of the function. It's the only
90 ``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of
91 ``liveOnEntry`` implies that the memory being used is either undefined or
92 defined before the function begins.
94 An example of all of this overlaid on LLVM IR (obtained by running ``opt
95 -passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When
96 viewing this example, it may be helpful to view it in terms of clobbers.
97 The operands of a given ``MemoryAccess`` are all (potential) clobbers of said
98 ``MemoryAccess``, and the value produced by a ``MemoryAccess`` can act as a clobber
99 for other ``MemoryAccess``\ es.
101 If a ``MemoryAccess`` is a *clobber* of another, it means that these two
102 ``MemoryAccess``\ es may access the same memory. For example, ``x = MemoryDef(y)``
103 means that ``x`` potentially modifies memory that ``y`` modifies/constrains
104 (or has modified / constrained).
105 In the same manner, ``a = MemoryPhi({BB1,b},{BB2,c})`` means that
106 anyone that uses ``a`` is accessing memory potentially modified / constrained
107 by either ``b`` or ``c`` (or both). And finally, ``MemoryUse(x)`` means
108 that this use accesses memory that ``x`` has modified / constrained
109 (as an example, think that if ``x = MemoryDef(...)``
110 and ``MemoryUse(x)`` are in the same loop, the use can't
111 be hoisted outside alone).
113 Another useful way of looking at it is in terms of memory versions.
114 In that view, operands of a given ``MemoryAccess`` are the version
115 of the entire memory before the operation, and if the access produces
116 a value (i.e. ``MemoryDef/MemoryPhi``),
117 the value is the new version of the memory after the operation.
126 ; 1 = MemoryDef(liveOnEntry)
131 ; 6 = MemoryPhi({entry,1},{if.end,4})
132 br i1 undef, label %if.then, label %if.else
145 ; 5 = MemoryPhi({if.then,2},{if.else,3})
147 %1 = load i8, i8* %p1
151 %2 = load i8, i8* %p3
155 The ``MemorySSA`` IR is shown in comments that precede the instructions they map
156 to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)``
157 is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM
158 instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this
159 particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8*
160 %p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM
161 Instruction, so the line directly below a ``MemoryPhi`` isn't special.
163 Going from the top down:
165 - ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering
166 ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This
167 ``MemoryPhi`` is referred to in the textual IR by the number ``6``.
168 - ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition,
169 and its reaching definition before it is ``6``, or the ``MemoryPhi`` after
170 ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_
171 sections below for why this ``MemoryDef`` isn't linked to a separate,
172 disambiguated ``MemoryPhi``.)
173 - ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its
174 reaching definition is also ``6``.
175 - ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before
176 this block could either be ``2`` or ``3``.
177 - ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that
178 it's clobbered by ``5``.
179 - ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; its
180 reaching definition is ``5``.
181 - ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory,
182 and the last thing that could clobber this use is above ``while.cond`` (e.g.
183 the store to ``%p3``). In memory versioning parlance, it really only depends on
184 the memory version 1, and is unaffected by the new memory versions generated since
187 As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not
188 meant to interact with LLVM IR.
193 ``MemorySSA`` is an analysis that can be built for any arbitrary function. When
194 it's built, it does a pass over the function's IR in order to build up its
195 mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things
196 like the dominance relation between ``MemoryAccess``\ es, and get the
197 ``MemoryAccess`` for any given ``Instruction`` .
199 When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker``
200 that you can use (see below).
206 A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or
207 the walker, for short. The goal of the walker is to provide answers to clobber
208 queries beyond what's represented directly by ``MemoryAccess``\ es. For example,
217 ; 1 = MemoryDef(liveOnEntry)
223 The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would
224 be the walker's goal to figure this out, and return ``liveOnEntry`` when queried
225 for the clobber of ``MemoryAccess`` ``2``.
227 By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s
228 and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to
229 be using. Walkers were built to be flexible, though, so it's entirely reasonable
230 (and expected) to create more specialized walkers (e.g. one that specifically
231 queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc).
236 There are two main APIs used to retrieve the clobbering access using the walker:
238 - ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA);`` return the
239 clobbering memory access for ``MA``, caching all intermediate results
240 computed along the way as part of each access queried.
242 - ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA, const MemoryLocation &Loc);``
243 returns the access clobbering memory location ``Loc``, starting at ``MA``.
244 Because this API does not request the clobbering access of a specific memory
245 access, there are no results that can be cached.
247 Locating clobbers yourself
248 ^^^^^^^^^^^^^^^^^^^^^^^^^^
250 If you choose to make your own walker, you can find the clobber for a
251 ``MemoryAccess`` by walking every ``MemoryDef`` that dominates said
252 ``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple;
253 they ultimately form a linked list of every clobber that dominates the
254 ``MemoryAccess`` that you're trying to optimize. In other words, the
255 ``definingAccess`` of a ``MemoryDef`` is always the nearest dominating
256 ``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``.
259 Build-time use optimization
260 ---------------------------
262 ``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time.
263 Specifically, we optimize the operand of every ``MemoryUse`` to point to the
264 actual clobber of said ``MemoryUse``. This can be seen in the above example; the
265 second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a
266 ``MemoryDef`` from the entry block. This is done to make walking,
267 value numbering, etc, faster and easier.
269 It is not possible to optimize ``MemoryDef`` in the same way, as we
270 restrict ``MemorySSA`` to one memory variable and, thus, one Phi node
274 Invalidation and updating
275 -------------------------
277 Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever
278 the IR is updated. "Update", in this case, includes the addition, deletion, and
279 motion of ``Instructions``. The update API is being made on an as-needed basis.
280 If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API.
286 ``MemorySSA`` only places ``MemoryPhi``\ s where they're actually
287 needed. That is, it is a pruned SSA form, like LLVM's SSA form. For
297 ; 1 = MemoryDef(liveOnEntry)
302 ; 3 = MemoryPhi({%0,1},{if.end,2})
303 br i1 undef, label %if.then, label %if.else
313 %1 = load i8, i8* %p1
317 %2 = load i8, i8* %p3
321 Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi``
322 for ``if.end`` would be pointless, so we don't place one. So, if you need to
323 place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create
324 a ``MemoryPhi`` for ``if.end``.
326 If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s
327 everywhere. Because we have Walkers that are capable of optimizing above said
328 phis, doing so shouldn't prohibit optimizations.
334 ``MemorySSA`` is meant to reason about the relation between memory
335 operations, and enable quicker querying.
336 It isn't meant to be the single source of truth for all potential memory-related
337 optimizations. Specifically, care must be taken when trying to use ``MemorySSA``
338 to reason about atomic or volatile operations, as in:
342 define i8 @foo(i8* %a) {
344 br i1 undef, label %if.then, label %if.end
347 ; 1 = MemoryDef(liveOnEntry)
348 %0 = load volatile i8, i8* %a
352 %av = phi i8 [0, %entry], [%0, %if.then]
356 Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may
357 seem legal. Because it's a volatile load, though, it's not.
366 ``MemorySSA`` in LLVM deliberately trades off precision for speed.
367 Let us think about memory variables as if they were disjoint partitions of the
368 memory (that is, if you have one variable, as above, it represents the entire
369 memory, and if you have multiple variables, each one represents some
370 disjoint portion of the memory)
372 First, because alias analysis results conflict with each other, and
373 each result may be what an analysis wants (IE
374 TBAA may say no-alias, and something else may say must-alias), it is
375 not possible to partition the memory the way every optimization wants.
376 Second, some alias analysis results are not transitive (IE A noalias B,
377 and B noalias C, does not mean A noalias C), so it is not possible to
378 come up with a precise partitioning in all cases without variables to
379 represent every pair of possible aliases. Thus, partitioning
380 precisely may require introducing at least N^2 new virtual variables,
383 Each of these variables may be clobbered at multiple def sites.
385 To give an example, if you were to split up struct fields into
386 individual variables, all aliasing operations that may-def multiple struct
387 fields, will may-def more than one of them. This is pretty common (calls,
388 copies, field stores, etc).
390 Experience with SSA forms for memory in other compilers has shown that
391 it is simply not possible to do this precisely, and in fact, doing it
392 precisely is not worth it, because now all the optimizations have to
393 walk tons and tons of virtual variables and phi nodes.
395 So we partition. At the point at which you partition, again,
396 experience has shown us there is no point in partitioning to more than
397 one variable. It simply generates more IR, and optimizations still
398 have to query something to disambiguate further anyway.
400 As a result, LLVM partitions to one variable.
405 Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one
406 useful guarantee - all loads are optimized to point at the thing that
407 actually clobbers them. This gives some nice properties. For example,
408 for a given store, you can find all loads actually clobbered by that
409 store by walking the immediate uses of the store.
411 LLVM Developers Meeting presentations
412 -------------------------------------
414 - `2016 LLVM Developers' Meeting: G. Burgess - MemorySSA in Five Minutes <https://www.youtube.com/watch?v=bdxWmryoHak>`_.
415 - `2020 LLVM Developers' Meeting: S. Baziotis & S. Moll - Finding Your Way Around the LLVM Dependence Analysis Zoo <https://www.youtube.com/watch?v=1e5y6WDbXCQ>`_