1 .\" $NetBSD: crash.8,v 1.11 2008/09/24 18:19:13 reed Exp $
3 .\" Copyright (c) 1980, 1991, 1993
4 .\" The Regents of the University of California. All rights reserved.
6 .\" Redistribution and use in source and binary forms, with or without
7 .\" modification, are permitted provided that the following conditions
9 .\" 1. Redistributions of source code must retain the above copyright
10 .\" notice, this list of conditions and the following disclaimer.
11 .\" 2. Redistributions in binary form must reproduce the above copyright
12 .\" notice, this list of conditions and the following disclaimer in the
13 .\" documentation and/or other materials provided with the distribution.
14 .\" 3. Neither the name of the University nor the names of its contributors
15 .\" may be used to endorse or promote products derived from this software
16 .\" without specific prior written permission.
18 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30 .\" from: @(#)crash.8 8.1 (Berkeley) 6/5/93
37 .Nd UNIX system failures
39 This section explains what happens when the system crashes
40 and (very briefly) how to analyze crash dumps.
42 When the system crashes voluntarily it prints a message of the form
44 .Dl panic: why i gave up the ghost
46 on the console, takes a dump on a mass storage peripheral,
47 and then invokes an automatic reboot procedure as
50 (If auto-reboot is disabled on the front panel of the machine the system
51 will simply halt at this point.)
52 Unless some unexpected inconsistency is encountered in the state
53 of the file systems due to hardware or software failure, the system
54 will then resume multi-user operations.
56 The system has a large number of internal consistency checks; if one
57 of these fails, then it will panic with a very short message indicating
59 In many instances, this will be the name of the routine which detected
60 the error, or a two-word description of the inconsistency.
61 A full understanding of most panic messages requires perusal of the
62 source code for the system.
64 The most common cause of system failures is hardware failure, which
65 can reflect itself in different ways. Here are the messages which
66 are most likely, with some hints as to causes.
67 Left unstated in all cases is the possibility that hardware or software
68 error produced the message in some unexpected way.
69 .Bl -tag -width 8n -compact
71 This cryptic panic message results from a failure to mount the root filesystem
72 during the bootstrap process.
73 Either the root filesystem has been corrupted,
74 or the system is attempting to use the wrong device as root filesystem.
75 Usually, an alternative copy of the system binary or an alternative root
76 filesystem can be used to bring up the system to investigate.
77 .It Can't exec /sbin/init
78 This is not a panic message, as reboots are likely to be futile.
79 Late in the bootstrap procedure, the system was unable to locate
80 and execute the initialization process,
82 The root filesystem is incorrect or has been corrupted, or the mode
83 or type of /sbin/init forbids execution.
85 .It hard IO err in swap
86 The system encountered an error trying to write to the paging device
87 or an error in reading critical information from a disk drive.
88 The offending disk should be fixed if it is broken or unreliable.
89 .It realloccg: bad optim
91 .It alloccgblk: cyl groups corrupted
92 .It ialloccg: map corrupted
93 .It free: freeing free block
94 .It free: freeing free frag
95 .It ifree: freeing free inode
96 .It alloccg: map corrupted
97 These panic messages are among those that may be produced
98 when filesystem inconsistencies are detected.
99 The problem generally results from a failure to repair damaged filesystems
100 after a crash, hardware failures, or other condition that should not
102 A filesystem check will normally correct the problem.
103 .It timeout table overflow
104 This really shouldn't be a panic, but until the data structure
105 involved is made to be extensible, running out of entries causes a crash.
106 If this happens, make the timeout table bigger.
110 These indicate either a serious bug in the system or, more often,
111 a glitch or failing hardware.
112 If SBI faults recur, check out the hardware or call
113 field service. If the other faults recur, there is likely a bug somewhere
114 in the system, although these can be caused by a flakey processor.
115 Run processor microdiagnostics.
116 .It machine check %x: Em description
117 .It \0\0\0machine dependent machine-check information
118 Machine checks are different on each type of CPU.
119 Most of the internal processor registers are saved at the time of the fault
120 and are printed on the console.
121 For most processors, there is one line that summarizes the type of machine
123 Often, the nature of the problem is apparent from this message
124 and/or the contents of key registers.
125 The VAX Hardware Handbook should be consulted,
126 and, if necessary, your friendly field service people should be informed
128 .It trap type %d, code=%x, pc=%x
129 A unexpected trap has occurred within the system; the trap types are:
130 .Bd -literal -offset indent
131 0 reserved addressing fault
132 1 privileged instruction fault
133 2 reserved operand fault
134 3 bpt instruction fault
135 4 xfc instruction fault
142 11 compatibility mode fault
147 The favorite trap types in system crashes are trap types 8 and 9,
149 a wild reference. The code is the referenced address, and the pc at the
150 time of the fault is printed. These problems tend to be easy to track
151 down if they are kernel bugs since the processor stops cold, but random
152 flakiness seems to cause this sometimes.
153 The debugger can be used to locate the instruction and subroutine
154 corresponding to the PC value.
155 If that is insufficient to suggest the nature of the problem,
156 more detailed examination of the system status at the time of the trap
157 usually can produce an explanation.
159 The system initialization process has exited. This is bad news, as no new
160 users will then be able to log in. Rebooting is the only fix, so the
161 system just does it right away.
162 .It out of mbufs: map full
163 The network has exhausted its private page map for network buffers.
164 This usually indicates that buffers are being lost, and rather than
165 allow the system to slowly degrade, it reboots immediately.
166 The map may be made larger if necessary.
169 That completes the list of panic types you are likely to see.
171 When the system crashes it writes (or at least attempts to write)
172 an image of memory into the back end of the dump device,
173 usually the same as the primary swap
174 area. After the system is rebooted, the program
176 runs and preserves a copy of this core image and the current
177 system in a specified directory for later perusal. See
181 To analyze a dump you should begin by running
185 flag on the system load image and core dump.
186 If the core image is the result of a panic,
187 the panic message is printed.
190 will provide a stack trace from the point of
191 the crash and this will provide a clue as to
195 .Dq Using ADB to Debug the UNIX Kernel .
200 .Dq VAX 11/780 System Maintenance Guide
202 .Dq VAX Hardware Handbook
203 for more information about machine checks.
205 .Dq Using ADB to Debug the UNIX Kernel