1 ===================================
2 Memory Model Relaxation Annotations
3 ===================================
11 Memory Model Relaxation Annotations (MMRAs) are target-defined properties
12 on instructions that can be used to selectively relax constraints placed
13 by the memory model. For example:
15 * The use of ``VulkanMemoryModel`` in a SPIRV program allows certain
16 memory operations to be reordered across ``acquire`` or ``release``
18 * OpenCL APIs expose primitives to only fence a specific set of address
19 spaces. Carrying that information to the backend can enable the
20 use of faster synchronization instructions, rather than fencing all
21 address spaces everytime.
23 MMRAs offer an opt-in system for targets to relax the default LLVM
25 As such, they are attached to an operation using LLVM metadata which
26 can always be dropped without affecting correctness.
32 A load, a store, an atomic, or a function call that is marked as
35 synchronizing operation
36 An instruction that synchronizes memory with other threads (e.g.
37 an atomic or a fence).
40 Metadata attached to a memory or synchronizing operation
41 that represents some target-defined property regarding memory
44 An operation may have multiple tags that each represent a different
47 A tag is composed of a pair of metadata string: a *prefix* and a *suffix*.
49 In LLVM IR, the pair is represented using a metadata tuple.
50 In other cases (comments, documentation, etc.), we may use the
51 ``prefix:suffix`` notation.
55 :caption: Example: Tags in Metadata
57 !0 = !{!"scope", !"workgroup"} # scope:workgroup
58 !1 = !{!"scope", !"device"} # scope:device
59 !2 = !{!"scope", !"system"} # scope:system
63 The only semantics relevant to the optimizer is the
64 "compatibility" relation defined below. All other
65 semantics are target defined.
67 Tags can also be organised in lists to allow operations
68 to specify all of the tags they belong to. Such a list
69 is referred to as a "set of tags".
72 :caption: Example: Set of Tags in Metadata
74 !0 = !{!"scope", !"workgroup"}
75 !1 = !{!"sync-as", !"private"}
80 If an operation does not have MMRA metadata, it's treated as if
81 it has an empty list (``!{}``) of tags.
83 Note that it is not an error if a tag is not recognized by the
84 instruction it is applied to, or by the current target.
85 Such tags are simply ignored.
87 Both synchronizing operations and memory operations can have
88 zero or more tags attached to them using the ``!mmra`` syntax.
90 For the sake of readability in examples below,
91 we use a (non-functional) short syntax to represent MMMRA metadata:
94 :caption: Short Syntax Example
97 store %ptr1 !mmra !{!"foo", !"bar"}
99 These two notations can be used in this document and are strictly
100 equivalent. However, only the second version is functional.
103 Two sets of tags are said to be *compatible* iff, for every unique
104 tag prefix P present in at least one set:
106 - the other set contains no tag with prefix P, or
107 - at least one tag with prefix P is common to both sets.
109 The above definition implies that an empty set is always compatible
110 with any other set. This is an important property as it ensures that
111 if a transform drops the metadata on an operation, it can never affect
112 correctness. In other words, the memory model cannot be relaxed further
113 by deleting metadata from instructions.
117 The *happens-before* Relation
118 ==============================
120 Compatibility checks can be used to opt out of the *happens-before* relation
121 established between two instructions.
124 When two instructions' metadata are not compatible, any program order
125 between them are not in *happens-before*.
127 For example, consider two tags ``foo:bar`` and
128 ``foo:baz`` exposed by a target:
132 A: store %ptr1 # foo:bar
133 B: store %ptr2 # foo:baz
134 X: store atomic release %ptr3 # foo:bar
136 In the above figure, ``A`` is compatible with ``X``, and hence ``A``
137 happens-before ``X``. But ``B`` is not compatible with
138 ``X``, and hence it is not happens-before ``X``.
141 If an synchronizing operation has one or more tags, then whether it
142 synchronizes-with and participates in the ``seq_cst`` order with
143 other operations is target dependent.
145 Whether the following example synchronizes with another sequence depends
146 on the target-defined semantics of ``foo:bar`` and ``foo:bux``.
150 fence release # foo:bar
151 store atomic %ptr1 # foo:bux
159 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
160 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:0 vulkan:nonprivate
162 A and B are not ordered relative to each other
163 (no *happens-before*) because their sets of tags are not compatible.
165 Note that the ``sync-as`` value does not have to match the ``addrspace`` value.
166 e.g. In Example 1, a store-release to a location in ``addrspace(1)`` wants to
167 only synchronize with operations happening in ``addrspace(0)``.
172 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
173 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:1 vulkan:nonprivate
175 The ordering of A and B is unaffected because their set of tags are
178 Note that A and B may or may not be in *happens-before* due to other reasons.
183 A: store ptr addrspace(1) %ptr2 # sync-as:1 vulkan:nonprivate
184 B: store atomic release ptr addrspace(1) %ptr3 # vulkan:nonprivate
186 The ordering of A and B is unaffected because their set of tags are
192 A: store ptr addrspace(1) %ptr2 # sync-as:1
193 B: store atomic release ptr addrspace(1) %ptr3 # sync-as:2
195 A and B do not have to be ordered relative to each other
196 (no *happens-before*) because their sets of tags are not compatible.
201 SPIRV ``NonPrivatePointer``
202 ---------------------------
204 MMRAs can support the SPIRV capability
205 ``VulkanMemoryModel``, where synchronizing operations only affect
206 memory operations that specify ``NonPrivatePointer`` semantics.
208 The example below is generated from a SPIRV program using the
211 - Add ``vulkan:nonprivate`` to every synchronizing operation.
212 - Add ``vulkan:nonprivate`` to every non-atomic memory operation
213 that is marked ``NonPrivatePointer``.
214 - Add ``vulkan:private`` to tags of every non-atomic memory operation
215 that is not marked ``NonPrivatePointer``.
220 A: store %ptr1 # vulkan:nonprivate
221 B: store %ptr2 # vulkan:private
222 X: store atomic release %ptr3 # vulkan:nonprivate
225 Y: load atomic acquire %ptr3 # vulkan:nonprivate
226 C: load %ptr2 # vulkan:private
227 D: load %ptr1 # vulkan:nonprivate
229 Compatibility ensures that operation ``A`` is ordered
230 relative to ``X`` while operation ``D`` is ordered relative to ``Y``.
231 If ``X`` synchronizes with ``Y``, then ``A`` happens-before ``D``.
232 No such relation can be inferred about operations ``B`` and ``C``.
235 The `Vulkan Memory Model <https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#memory-model-non-private>`_
236 considers all atomic operation non-private.
238 Whether ``vulkan:nonprivate`` would be specified on atomic operations is
239 an implementation detail, as an atomic operation is always ``nonprivate``.
240 The implementation may choose to be explicit and emit IR with
241 ``vulkan:nonprivate`` on every atomic operation, or it could choose to
242 only emit ``vulkan::private`` and assume ``vulkan:nonprivate``
245 Operations marked with ``vulkan:private`` effectively opt out of the
246 happens-before order in a SPIRV program since they are incompatible
247 with every synchronizing operation. Note that SPIRV operations that
248 are not marked ``NonPrivatePointer`` are not entirely private to the
249 thread --- they are implicitly synchronized at the start or end of a
250 thread by the Vulkan *system-synchronizes-with* relationship. This
251 example assumes that the target-defined semantics of
252 ``vulkan:private`` correctly implements this property.
254 This scheme is general enough to express the interoperability of SPIRV
255 programs with other environments.
260 A: store %ptr1 # vulkan:nonprivate
261 X: store atomic release %ptr2 # vulkan:nonprivate
264 Y: load atomic acquire %ptr2 # foo:bar
267 In the above example, thread ``T1`` originates from a SPIRV program
268 while thread ``T2`` originates from a non-SPIRV program. Whether ``X``
269 can synchronize with ``Y`` is target defined. If ``X`` synchronizes
270 with ``Y``, then ``A`` happens before ``B`` (because A/X and
273 Implementation Example
274 ~~~~~~~~~~~~~~~~~~~~~~
276 Consider the implementation of SPIRV ``NonPrivatePointer`` on a target
277 where all memory operations are cached, and the entire cache is
278 flushed or invalidated at a ``release`` or ``acquire`` respectively. A
279 possible scheme is that when translating a SPIRV program, memory
280 operations marked ``NonPrivatePointer`` should not be cached, and the
281 cache contents should not be touched during an ``acquire`` and
282 ``release`` operation.
284 This could be implemented using the tags that share the ``vulkan:`` prefix,
287 - For memory operations:
289 - Operations with ``vulkan:nonprivate`` should bypass the cache.
290 - Operations with ``vulkan:private`` should be cached.
291 - Operations that specify neither or both should conservatively
292 bypass the cache to ensure correctness.
294 - For synchronizing operations:
296 - Operations with ``vulkan:nonprivate`` should not flush or
297 invalidate the cache.
298 - Operations with ``vulkan:private`` should flush or invalidate the cache.
299 - Operations that specify neither or both should conservatively
300 flush or invalidate the cache to ensure correctness.
303 In such an implementation, dropping the metadata on an operation, while
304 not affecting correctness, may have big performance implications.
305 e.g. an operation bypasses the cache when it shouldn't.
310 MMRAs may express the selective synchronization of
311 different memory types.
313 As an example, a target may expose an ``sync-as:<N>`` tag to
314 pass information about which address spaces are synchronized by the
315 execution of a synchronizing operation.
318 Address spaces are used here as a common example, but this concept
319 can apply for other "memory types". What "memory types" means here is
324 # let 1 = global address space
325 # let 3 = local address space
328 A: store %ptr1 # sync-as:1
329 B: store %ptr2 # sync-as:3
330 X: store atomic release ptr addrspace(0) %ptr3 # sync-as:3
333 Y: load atomic acquire ptr addrspace(0) %ptr3 # sync-as:3
334 C: load %ptr2 # sync-as:3
335 D: load %ptr1 # sync-as:1
337 In the above figure, ``X`` and ``Y`` are atomic operations on a
338 location in the ``global`` address space. If ``X`` synchronizes with
339 ``Y``, then ``B`` happens-before ``C`` in the ``local`` address
340 space. But no such statement can be made about operations ``A`` and
341 ``D``, although they are peformed on a location in the ``global``
344 Implementation Example: Adding Address Space Information to Fences
345 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
347 Languages such as OpenCL C provide fence operations such as
348 ``atomic_work_item_fence`` that can take an explicit address
351 By default, LLVM has no means to carry that information in the IR, so
352 the information is lost during lowering to LLVM IR. This means that
353 targets such as AMDGPU have to conservatively emit instructions to
354 fence all address spaces in all cases, which can have a noticeable
355 performance impact in high-performance applications.
357 MMRAs may be used to preserve that information at the IR level, all the
358 way through code generation. For example, a fence that only affects the
359 global address space ``addrspace(1)`` may be lowered as
363 fence release # sync-as:1
365 and the target may use the presence of ``sync-as:1`` to infer that it
366 must only emit instruction to fence the global address space.
368 Note that as MMRAs are opt in, a fence that does not have MMRA metadata
369 could still be lowered conservatively, so this optimization would only
370 apply if the front-end emits the MMRA metadata on the fence instructions.
377 The following sections are informational.
382 MMRAs are a way to capture optimization opportunities in the program.
383 But when an operation mentions no tags or conflicting tags,
384 the target may need to produce conservative code to ensure correctness
385 at the cost of performance. This can happen in the following situations:
387 1. When a target first introduces MMRAs, the
388 frontend might not have been updated to emit them.
389 2. An optimization may drop MMRA metadata.
390 3. An optimization may add arbitrary tags to an operation.
392 Note that targets can always choose to ignore (or even drop) MMRAs
393 and revert to the default behavior/codegen heuristics without
394 affecting correctness.
396 Consequences of the Absence of *happens-before*
397 -----------------------------------------------
399 In the :ref:`happens-before<HappensBefore>` section, we defined how an
400 *happens-before* relation between two instruction can be broken
401 by leveraging compatibility between MMRAs. When the instructions
402 are incompatible and there is no *happens-before* relation, we say
403 that the instructions "do not have to be ordered relative to each
406 "Ordering" in this context is a very broad term which covers both
407 static and runtime aspects.
409 When there is no ordering constraint, we *could* statically reorder
410 the instructions in an optimizer transform if the reordering does
411 not break other constraints as single location coherence.
412 Static reordering is one consequence of breaking *happens-before*,
413 but is not the most interesting one.
415 Run-time consequences are more interesting. When there is an
416 *happens-before* relation between instructions, the target has to emit
417 synchronization code to ensure other threads will observe the effects of
418 the instructions in the right order.
420 For instance, the target may have to wait for previous loads & stores to
421 finish before starting a fence-release, or there may be a need to flush a
422 memory cache before executing the next instruction.
423 In the absence of *happens-before*, there is no such requirement and
424 no waiting or flushing is required. This may noticeably speed up
425 execution in some cases.
430 If a pass can combine multiple memory or synchronizing operations
431 into one, it needs to be able to combine MMRAs. One possible way to
432 achieve this is by doing a prefix-wise union of the tag sets.
434 Let A and B be two tags set, and U be the prefix-wise union of A and B.
435 For every unique tag prefix P present in A or B:
437 * If either A or B has no tags with prefix P, no tags with prefix
439 * If both A and B have at least one tag with prefix P, all tags with prefix
440 P from both sets are added to U.
442 Passes should avoid aggressively combining MMRAs, as this can result
443 in significant losses of information. While this cannot affect
444 correctness, it may affect performance.
446 As a general rule of thumb, common passes such as SimplifyCFG that
447 aggressively combine/reorder operations should only combine
448 instructions that have identical sets of tags.
449 Passes that combine less frequently, or that are well aware of the cost
450 of combining the MMRAs can use the prefix-wise union described above.
456 A: store release %ptr1 # foo:x, foo:y, bar:x
457 B: store release %ptr2 # foo:x, bar:y
459 # Unique prefixes P = [foo, bar]
460 # "foo:x" is common to A and B so it's added to U.
461 # "bar:x" != "bar:y" so it's not added to U.
462 U: store release %ptr3 # foo:x
466 A: store release %ptr1 # foo:x, foo:y
467 B: store release %ptr2 # foo:x, bux:y
469 # Unique prefixes P = [foo, bux]
470 # "foo:x" is common to A and B so it's added to U.
471 # No tags have the prefix "bux" in A.
472 U: store release %ptr3 # foo:x
476 A: store release %ptr1
477 B: store release %ptr2 # foo:x, bar:y
479 # Unique prefixes P = [foo, bar]
480 # No tags with "foo" or "bar" in A, so no tags added.
481 U: store release %ptr3