2 Is your SMP system locking up unpredictably? No keyboard activity, just
3 a frustrating complete hard lockup? Do you want to help us debugging
4 such lockups? If all yes then this document is definitely for you.
6 on Intel SMP hardware there is a feature that enables us to generate
7 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt - these get
8 executed even if the system is otherwise locked up hard) This can be
9 used to debug hard kernel lockups. By executing periodic NMI interrupts,
10 the kernel can monitor whether any CPU has locked up, and print out
11 debugging messages if so. You can enable/disable the NMI watchdog at boot
12 time with the 'nmi_watchdog=1' boot parameter. Eg. the relevant
15 append="nmi_watchdog=1"
17 A 'lockup' is the following scenario: if any CPU in the system does not
18 execute the period local timer interrupt for more than 5 seconds, then
19 the NMI handler generates an oops and kills the process. This
20 'controlled crash' (and the resulting kernel messages) can be used to
21 debug the lockup. Thus whenever the lockup happens, wait 5 seconds and
22 the oops will show up automatically. If the kernel produces no messages
23 then the system has crashed so hard (eg. hardware-wise) that either it
24 cannot even accept NMI interrupts, or the crash has made the kernel
25 unable to print messages.
27 NOTE: currently the NMI-oopser is enabled unconditionally on x86 SMP
30 [ feel free to send bug reports, suggestions and patches to
31 Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
32 list at <linux-smp@vger.kernel.org> ]