4 Locking is well-known and the common use cases are straightforward: Any
5 CPU holding a given lock sees any changes previously seen or made by any
6 CPU before it previously released that same lock. This last sentence
7 is the only part of this document that most developers will need to read.
9 However, developers who would like to also access lock-protected shared
10 variables outside of their corresponding locks should continue reading.
13 Locking and Prior Accesses
14 --------------------------
16 The basic rule of locking is worth repeating:
18 Any CPU holding a given lock sees any changes previously seen
19 or made by any CPU before it previously released that same lock.
21 Note that this statement is a bit stronger than "Any CPU holding a
22 given lock sees all changes made by any CPU during the time that CPU was
23 previously holding this same lock". For example, consider the following
24 pair of code fragments:
26 /* See MP+polocks.litmus. */
43 The basic rule guarantees that if CPU0() acquires mylock before CPU1(),
44 then both r0 and r1 must be set to the value 1. This also has the
45 consequence that if the final value of r0 is equal to 1, then the final
46 value of r1 must also be equal to 1. In contrast, the weaker rule would
47 say nothing about the final value of r1.
50 Locking and Subsequent Accesses
51 -------------------------------
53 The converse to the basic rule also holds: Any CPU holding a given
54 lock will not see any changes that will be made by any CPU after it
55 subsequently acquires this same lock. This converse statement is
56 illustrated by the following litmus test:
58 /* See MP+porevlocks.litmus. */
75 This converse to the basic rule guarantees that if CPU0() acquires
76 mylock before CPU1(), then both r0 and r1 must be set to the value 0.
77 This also has the consequence that if the final value of r1 is equal
78 to 0, then the final value of r0 must also be equal to 0. In contrast,
79 the weaker rule would say nothing about the final value of r0.
81 These examples show only a single pair of CPUs, but the effects of the
82 locking basic rule extend across multiple acquisitions of a given lock
86 Double-Checked Locking
87 ----------------------
89 It is well known that more than just a lock is required to make
90 double-checked locking work correctly, This litmus test illustrates
91 one incorrect approach:
93 /* See Documentation/litmus-tests/locking/DCL-broken.litmus. */
106 r2 = READ_ONCE(data);
108 /* CPU1() is the exactly the same as CPU0(). */
110 There are two problems. First, there is no ordering between the first
111 READ_ONCE() of "flag" and the READ_ONCE() of "data". Second, there is
112 no ordering between the two WRITE_ONCE() calls. It should therefore be
113 no surprise that "r2" can be zero, and a quick herd7 run confirms this.
115 One way to fix this is to use smp_load_acquire() and smp_store_release()
116 as shown in this corrected version:
118 /* See Documentation/litmus-tests/locking/DCL-fixed.litmus. */
121 r0 = smp_load_acquire(&flag);
124 r1 = READ_ONCE(flag);
127 smp_store_release(&flag, 1);
131 r2 = READ_ONCE(data);
133 /* CPU1() is the exactly the same as CPU0(). */
135 The smp_load_acquire() guarantees that its load from "flags" will
136 be ordered before the READ_ONCE() from data, thus solving the first
137 problem. The smp_store_release() guarantees that its store will be
138 ordered after the WRITE_ONCE() to "data", solving the second problem.
139 The smp_store_release() pairs with the smp_load_acquire(), thus ensuring
140 that the ordering provided by each actually takes effect. Again, a
141 quick herd7 run confirms this.
143 In short, if you access a lock-protected variable without holding the
144 corresponding lock, you will need to provide additional ordering, in
145 this case, via the smp_load_acquire() and the smp_store_release().
148 Ordering Provided by a Lock to CPUs Not Holding That Lock
149 ---------------------------------------------------------
151 It is not necessarily the case that accesses ordered by locking will be
152 seen as ordered by CPUs not holding that lock. Consider this example:
154 /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
160 spin_unlock(&mylock);
168 spin_unlock(&mylock);
178 Counter-intuitive though it might be, it is quite possible to have
179 the final value of r0 be 1, the final value of z be 2, and the final
180 value of r1 be 0. The reason for this surprising outcome is that CPU2()
181 never acquired the lock, and thus did not fully benefit from the lock's
184 Ordering can be extended to CPUs not holding the lock by careful use
185 of smp_mb__after_spinlock():
187 /* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
193 spin_unlock(&mylock);
199 smp_mb__after_spinlock();
202 spin_unlock(&mylock);
212 This addition of smp_mb__after_spinlock() strengthens the lock
213 acquisition sufficiently to rule out the counter-intuitive outcome.
214 In other words, the addition of the smp_mb__after_spinlock() prohibits
215 the counter-intuitive result where the final value of r0 is 1, the final
216 value of z is 2, and the final value of r1 is 0.
219 No Roach-Motel Locking!
220 -----------------------
222 This example requires familiarity with the herd7 "filter" clause, so
223 please read up on that topic in litmus-tests.txt.
225 It is tempting to allow memory-reference instructions to be pulled
226 into a critical section, but this cannot be allowed in the general case.
227 For example, consider a spin loop preceding a lock-based critical section.
228 Now, herd7 does not model spin loops, but we can emulate one with two
229 loads, with a "filter" clause to constrain the first to return the
230 initial value and the second to return the updated value, as shown below:
232 /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
236 r2 = atomic_inc_return(&y);
246 r2 = atomic_inc_return(&y);
250 filter (1:r0=0 /\ 1:r1=1)
253 The variable "x" is the control variable for the emulated spin loop.
254 CPU0() sets it to "1" while holding the lock, and CPU1() emulates the
255 spin loop by reading it twice, first into "1:r0" (which should get the
256 initial value "0") and then into "1:r1" (which should get the updated
259 The "filter" clause takes this into account, constraining "1:r0" to
260 equal "0" and "1:r1" to equal 1.
262 Then the "exists" clause checks to see if CPU1() acquired its lock first,
263 which should not happen given the filter clause because CPU0() updates
264 "x" while holding the lock. And herd7 confirms this.
266 But suppose that the compiler was permitted to reorder the spin loop
267 into CPU1()'s critical section, like this:
269 /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
275 r2 = atomic_inc_return(&y);
285 r2 = atomic_inc_return(&y);
289 filter (1:r0=0 /\ 1:r1=1)
292 If "1:r0" is equal to "0", "1:r1" can never equal "1" because CPU0()
293 cannot update "x" while CPU1() holds the lock. And herd7 confirms this,
294 showing zero executions matching the "filter" criteria.
296 And this is why Linux-kernel lock and unlock primitives must prevent
297 code from entering critical sections. It is not sufficient to only
298 prevent code from leaving them.