1 .. SPDX-License-Identifier: GPL-2.0
3 .. _kernel_hacking_locktypes:
5 ==========================
6 Lock types and their rules
7 ==========================
12 The kernel provides a variety of locking primitives which can be divided
13 into three categories:
19 This document conceptually describes these lock types and provides rules
20 for their nesting, including the rules for use under PREEMPT_RT.
29 Sleeping locks can only be acquired in preemptible task context.
31 Although implementations allow try_lock() from other contexts, it is
32 necessary to carefully evaluate the safety of unlock() as well as of
33 try_lock(). Furthermore, it is also necessary to evaluate the debugging
34 versions of these primitives. In short, don't acquire sleeping locks from
35 other contexts unless there is no other option.
46 On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
58 On non-PREEMPT_RT kernels, local_lock functions are wrappers around
59 preemption and interrupt disabling primitives. Contrary to other locking
60 mechanisms, disabling preemption or interrupts are pure CPU local
61 concurrency control mechanisms and not suited for inter-CPU concurrency
71 On non-PREEMPT_RT kernels, these lock types are also spinning locks:
76 Spinning locks implicitly disable preemption and the lock / unlock functions
77 can have suffixes which apply further protections:
79 =================== ====================================================
80 _bh() Disable / enable bottom halves (soft interrupts)
81 _irq() Disable / enable interrupts
82 _irqsave/restore() Save and disable / restore interrupt disabled state
83 =================== ====================================================
89 The aforementioned lock types except semaphores have strict owner
92 The context (task) that acquired the lock must release it.
94 rw_semaphores have a special interface which allows non-owner release for
101 RT-mutexes are mutexes with support for priority inheritance (PI).
103 PI has limitations on non-PREEMPT_RT kernels due to preemption and
104 interrupt disabled sections.
106 PI clearly cannot preempt preemption-disabled or interrupt-disabled
107 regions of code, even on PREEMPT_RT kernels. Instead, PREEMPT_RT kernels
108 execute most such regions of code in preemptible task context, especially
109 interrupt handlers and soft interrupts. This conversion allows spinlock_t
110 and rwlock_t to be implemented via RT-mutexes.
116 semaphore is a counting semaphore implementation.
118 Semaphores are often used for both serialization and waiting, but new use
119 cases should instead use separate serialization and wait mechanisms, such
120 as mutexes and completions.
122 semaphores and PREEMPT_RT
123 ----------------------------
125 PREEMPT_RT does not change the semaphore implementation because counting
126 semaphores have no concept of owners, thus preventing PREEMPT_RT from
127 providing priority inheritance for semaphores. After all, an unknown
128 owner cannot be boosted. As a consequence, blocking on semaphores can
129 result in priority inversion.
135 rw_semaphore is a multiple readers and single writer lock mechanism.
137 On non-PREEMPT_RT kernels the implementation is fair, thus preventing
140 rw_semaphore complies by default with the strict owner semantics, but there
141 exist special-purpose interfaces that allow non-owner release for readers.
142 These interfaces work independent of the kernel configuration.
144 rw_semaphore and PREEMPT_RT
145 ---------------------------
147 PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
148 implementation, thus changing the fairness:
150 Because an rw_semaphore writer cannot grant its priority to multiple
151 readers, a preempted low-priority reader will continue holding its lock,
152 thus starving even high-priority writers. In contrast, because readers
153 can grant their priority to a writer, a preempted low-priority writer will
154 have its priority boosted until it releases the lock, thus preventing that
155 writer from starving readers.
161 local_lock provides a named scope to critical sections which are protected
162 by disabling preemption or interrupts.
164 On non-PREEMPT_RT kernels local_lock operations map to the preemption and
165 interrupt disabling and enabling primitives:
167 =============================== ======================
168 local_lock(&llock) preempt_disable()
169 local_unlock(&llock) preempt_enable()
170 local_lock_irq(&llock) local_irq_disable()
171 local_unlock_irq(&llock) local_irq_enable()
172 local_lock_irqsave(&llock) local_irq_save()
173 local_unlock_irqrestore(&llock) local_irq_restore()
174 =============================== ======================
176 The named scope of local_lock has two advantages over the regular
179 - The lock name allows static analysis and is also a clear documentation
180 of the protection scope while the regular primitives are scopeless and
183 - If lockdep is enabled the local_lock gains a lockmap which allows to
184 validate the correctness of the protection. This can detect cases where
185 e.g. a function using preempt_disable() as protection mechanism is
186 invoked from interrupt or soft-interrupt context. Aside of that
187 lockdep_assert_held(&llock) works as with any other locking primitive.
189 local_lock and PREEMPT_RT
190 -------------------------
192 PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
195 - All spinlock_t changes also apply to local_lock.
200 local_lock should be used in situations where disabling preemption or
201 interrupts is the appropriate form of concurrency control to protect
202 per-CPU data structures on a non PREEMPT_RT kernel.
204 local_lock is not suitable to protect against preemption or interrupts on a
205 PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
208 raw_spinlock_t and spinlock_t
209 =============================
214 raw_spinlock_t is a strict spinning lock implementation regardless of the
215 kernel configuration including PREEMPT_RT enabled kernels.
217 raw_spinlock_t is a strict spinning lock implementation in all kernels,
218 including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical
219 core code, low-level interrupt handling and places where disabling
220 preemption or interrupts is required, for example, to safely access
221 hardware state. raw_spinlock_t can sometimes also be used when the
222 critical section is tiny, thus avoiding RT-mutex overhead.
227 The semantics of spinlock_t change with the state of PREEMPT_RT.
229 On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
230 exactly the same semantics.
232 spinlock_t and PREEMPT_RT
233 -------------------------
235 On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
236 based on rt_mutex which changes the semantics:
238 - Preemption is not disabled.
240 - The hard interrupt related suffixes for spin_lock / spin_unlock
241 operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
242 interrupt disabled state.
244 - The soft interrupt related suffix (_bh()) still disables softirq
247 Non-PREEMPT_RT kernels disable preemption to get this effect.
249 PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
250 preemption disabled. The lock disables softirq handlers and also
251 prevents reentrancy due to task preemption.
253 PREEMPT_RT kernels preserve all other spinlock_t semantics:
255 - Tasks holding a spinlock_t do not migrate. Non-PREEMPT_RT kernels
256 avoid migration by disabling preemption. PREEMPT_RT kernels instead
257 disable migration, which ensures that pointers to per-CPU variables
258 remain valid even if the task is preempted.
260 - Task state is preserved across spinlock acquisition, ensuring that the
261 task-state rules apply to all kernel configurations. Non-PREEMPT_RT
262 kernels leave task state untouched. However, PREEMPT_RT must change
263 task state if the task blocks during acquisition. Therefore, it saves
264 the current task state before blocking and the corresponding lock wakeup
265 restores it, as shown below::
267 task->state = TASK_INTERRUPTIBLE
270 task->saved_state = task->state
271 task->state = TASK_UNINTERRUPTIBLE
274 task->state = task->saved_state
276 Other types of wakeups would normally unconditionally set the task state
277 to RUNNING, but that does not work here because the task must remain
278 blocked until the lock becomes available. Therefore, when a non-lock
279 wakeup attempts to awaken a task blocked waiting for a spinlock, it
280 instead sets the saved state to RUNNING. Then, when the lock
281 acquisition completes, the lock wakeup sets the task state to the saved
282 state, in this case setting it to RUNNING::
284 task->state = TASK_INTERRUPTIBLE
287 task->saved_state = task->state
288 task->state = TASK_UNINTERRUPTIBLE
291 task->saved_state = TASK_RUNNING
294 task->state = task->saved_state
296 This ensures that the real wakeup cannot be lost.
302 rwlock_t is a multiple readers and single writer lock mechanism.
304 Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
305 suffix rules of spinlock_t apply accordingly. The implementation is fair,
306 thus preventing writer starvation.
308 rwlock_t and PREEMPT_RT
309 -----------------------
311 PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
312 implementation, thus changing semantics:
314 - All the spinlock_t changes also apply to rwlock_t.
316 - Because an rwlock_t writer cannot grant its priority to multiple
317 readers, a preempted low-priority reader will continue holding its lock,
318 thus starving even high-priority writers. In contrast, because readers
319 can grant their priority to a writer, a preempted low-priority writer
320 will have its priority boosted until it releases the lock, thus
321 preventing that writer from starving readers.
330 The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
331 implications. For example, on a non-PREEMPT_RT kernel the following code
332 sequence works as expected::
334 local_lock_irq(&local_lock);
335 raw_spin_lock(&lock);
337 and is fully equivalent to::
339 raw_spin_lock_irq(&lock);
341 On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
342 is mapped to a per-CPU spinlock_t which neither disables interrupts nor
343 preemption. The following code sequence works perfectly correct on both
344 PREEMPT_RT and non-PREEMPT_RT kernels::
346 local_lock_irq(&local_lock);
349 Another caveat with local locks is that each local_lock has a specific
350 protection scope. So the following substitution is wrong::
354 local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags);
356 local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
361 local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags);
363 local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
368 lockdep_assert_irqs_disabled();
369 access_protected_data();
372 On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
373 local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
374 of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
375 because local_lock_irqsave() does not disable interrupts due to the
376 PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
380 local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
382 local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
387 local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
389 local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
394 lockdep_assert_held(&local_lock);
395 access_protected_data();
399 spinlock_t and rwlock_t
400 -----------------------
402 The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
403 have a few implications. For example, on a non-PREEMPT_RT kernel the
404 following code sequence works as expected::
409 and is fully equivalent to::
411 spin_lock_irq(&lock);
413 Same applies to rwlock_t and the _irqsave() suffix variants.
415 On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
416 fully preemptible context. Instead, use spin_lock_irq() or
417 spin_lock_irqsave() and their unlock counterparts. In cases where the
418 interrupt disabling and locking must remain separate, PREEMPT_RT offers a
419 local_lock mechanism. Acquiring the local_lock pins the task to a CPU,
420 allowing things like per-CPU interrupt disabled locks to be acquired.
421 However, this approach should be used only where absolutely necessary.
423 A typical scenario is protection of per-CPU variables in thread context::
425 struct foo *p = get_cpu_ptr(&var1);
428 p->count += this_cpu_read(var2);
430 This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
431 this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
432 not allow to acquire p->lock because get_cpu_ptr() implicitly disables
433 preemption. The following substitution works on both kernels::
438 p = this_cpu_ptr(&var1);
440 p->count += this_cpu_read(var2);
442 On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable()
443 which makes the above code fully equivalent. On a PREEMPT_RT kernel
444 migrate_disable() ensures that the task is pinned on the current CPU which
445 in turn guarantees that the per-CPU access to var1 and var2 are staying on
448 The migrate_disable() substitution is not valid for the following
456 p = this_cpu_ptr(&var1);
459 While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because
460 here migrate_disable() does not protect against reentrancy from a
461 preempting task. A correct substitution for this case is::
467 local_lock(&foo_lock);
468 p = this_cpu_ptr(&var1);
471 On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
472 preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
473 underlying per-CPU spinlock.
479 Acquiring a raw_spinlock_t disables preemption and possibly also
480 interrupts, so the critical section must avoid acquiring a regular
481 spinlock_t or rwlock_t, for example, the critical section must avoid
482 allocating memory. Thus, on a non-PREEMPT_RT kernel the following code
485 raw_spin_lock(&lock);
486 p = kmalloc(sizeof(*p), GFP_ATOMIC);
488 But this code fails on PREEMPT_RT kernels because the memory allocator is
489 fully preemptible and therefore cannot be invoked from truly atomic
490 contexts. However, it is perfectly fine to invoke the memory allocator
491 while holding normal non-raw spinlocks because they do not disable
492 preemption on PREEMPT_RT kernels::
495 p = kmalloc(sizeof(*p), GFP_ATOMIC);
501 PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
502 small to accommodate an RT-mutex. Therefore, the semantics of bit
503 spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
504 caveats also apply to bit spinlocks.
506 Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
507 using conditional (#ifdef'ed) code changes at the usage site. In contrast,
508 usage-site changes are not needed for the spinlock_t substitution.
509 Instead, conditionals in header files and the core locking implemementation
510 enable the compiler to do the substitution transparently.
513 Lock type nesting rules
514 =======================
516 The most basic rules are:
518 - Lock types of the same lock category (sleeping, CPU local, spinning)
519 can nest arbitrarily as long as they respect the general lock ordering
520 rules to prevent deadlocks.
522 - Sleeping lock types cannot nest inside CPU local and spinning lock types.
524 - CPU local and spinning lock types can nest inside sleeping lock types.
526 - Spinning lock types can nest inside all lock types
528 These constraints apply both in PREEMPT_RT and otherwise.
530 The fact that PREEMPT_RT changes the lock category of spinlock_t and
531 rwlock_t from spinning to sleeping and substitutes local_lock with a
532 per-CPU spinlock_t means that they cannot be acquired while holding a raw
533 spinlock. This results in the following nesting ordering:
536 2) spinlock_t, rwlock_t, local_lock
537 3) raw_spinlock_t and bit spinlocks
539 Lockdep will complain if these constraints are violated, both in
540 PREEMPT_RT and otherwise.