Patrick Welche <prlw1@cam.ac.uk>
[netbsd-mini2440.git] / external / ibm-public / postfix / dist / html / DEBUG_README.html
blob48dea66873e1a145649f5aa1b66c50d518fff02e
1 <!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
4 <html>
6 <head>
8 <title> Postfix Debugging Howto </title>
10 <meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
12 </head>
14 <body>
16 <h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>
18 <hr>
20 <h2>Purpose of this document</h2>
22 <p> This document describes how to debug parts of the Postfix mail
23 system when things do not work according to expectation. The methods
24 vary from making Postfix log a lot of detail, to running some daemon
25 processes under control of a call tracer or debugger. </p>
27 <p> The text assumes that the Postfix <a href="postconf.5.html">main.cf</a> and <a href="master.5.html">master.cf</a>
28 configuration files are stored in directory /etc/postfix. You can
29 use the command "<b>postconf <a href="postconf.5.html#config_directory">config_directory</a></b>" to find out the
30 actual location of this directory on your machine. </p>
32 <p> Listed in order of increasing invasiveness, the debugging
33 techniques are as follows: </p>
35 <ul>
37 <li><a href="#logging">Look for obvious signs of trouble</a>
39 <li><a href="#trace_mail">Debugging Postfix from inside</a>
41 <li><a href="#no_chroot">Try turning off chroot operation in
42 master.cf</a>
44 <li><a href="#debug_peer">Verbose logging for specific SMTP
45 connections</a>
47 <li><a href="#sniffer">Record the SMTP session with a network
48 sniffer</a>
50 <li><a href="#verbose">Making Postfix daemon programs more verbose</a>
52 <li><a href="#man_trace">Manually tracing a Postfix daemon process</a>
54 <li><a href="#auto_trace">Automatically tracing a Postfix daemon
55 process</a>
57 <li><a href="#ddd">Running daemon programs with the interactive
58 ddd debugger</a>
60 <li><a href="#screen">Running daemon programs with the interactive
61 gdb debugger</a>
63 <li><a href="#gdb">Running daemon programs under a non-interactive
64 debugger</a>
66 <li><a href="#unreasonable">Unreasonable behavior</a>
68 <li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>
70 </ul>
72 <h2><a name="logging">Look for obvious signs of trouble</a></h2>
74 <p> Postfix logs all failed and successful deliveries to a logfile.
75 The file is usually called /var/log/maillog or /var/log/mail; the
76 exact pathname is defined in the /etc/syslog.conf file. </p>
78 <p> When Postfix does not receive or deliver mail, the first order
79 of business is to look for errors that prevent Postfix from working
80 properly: </p>
82 <blockquote>
83 <pre>
84 % <b>egrep '(warning|error|fatal|panic):' /some/log/file | more</b>
85 </pre>
86 </blockquote>
88 <p> Note: the most important message is near the BEGINNING of the
89 output. Error messages that come later are less useful. </p>
91 <p> The nature of each problem is indicated as follows: </p>
93 <ul>
95 <li> <p> "<b>panic</b>" indicates a problem in the software itself
96 that only a programmer can fix. Postfix cannot proceed until this
97 is fixed. </p>
99 <li> <p> "<b>fatal</b>" is the result of missing files, incorrect
100 permissions, incorrect configuration file settings that you can
101 fix. Postfix cannot proceed until this is fixed. </p>
103 <li> <p> "<b>error</b>" reports an error condition. For safety
104 reasons, a Postfix process will terminate when more than 13 of these
105 happen. </p>
107 <li> <p> "<b>warning</b>" indicates a non-fatal error. These are
108 problems that you may not be able to fix (such as a broken DNS
109 server elsewhere on the network) but may also indicate local
110 configuration errors that could become a problem later. </p>
112 </ul>
114 <h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>
116 <p> Postfix version 2.1 and later can
117 produce mail delivery reports for debugging purposes. These reports
118 not only show sender/recipient addresses after address rewriting
119 and alias expansion or forwarding, they also show information about
120 delivery to mailbox, delivery to non-Postfix command, responses
121 from remote SMTP servers, and so on.
122 </p>
124 <p> Postfix can produce two types of mail delivery reports for
125 debugging: </p>
127 <ul>
129 <li> <p> What-if: report what would happen, but do not actually
130 deliver mail. This mode of operation is requested with: </p>
132 <pre>
133 % <b>/usr/sbin/sendmail -bv address...</b>
134 Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
135 </pre>
137 <li> <p> What happened: deliver mail and report successes and/or
138 failures, including replies from remote SMTP servers. This mode
139 of operation is requested with: </p>
141 <pre>
142 % <b>/usr/sbin/sendmail -v address...</b>
143 Mail Delivery Status Report will be mailed to &lt;your login name&gt;.
144 </pre>
146 </ul>
148 <p> These reports contain information that is generated by Postfix
149 delivery agents. Since these run as daemon processes that cannot
150 interact with users directly, the result is sent as mail to the
151 sender of the test message. The format of these reports is practically
152 identical to that of ordinary non-delivery notifications. </p>
154 <p> For a detailed example of a mail delivery status report, see
155 the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
156 section at the end of the <a href="ADDRESS_REWRITING_README.html">ADDRESS_REWRITING_README</a> document. </p>
158 <h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>
160 <p> A common mistake is to turn on chroot operation in the <a href="master.5.html">master.cf</a>
161 file without going through all the necessary steps to set up a
162 chroot environment. This causes Postfix daemon processes to fail
163 due to all kinds of missing files. </p>
165 <p> The example below shows an SMTP server that is configured with
166 chroot turned off: </p>
168 <blockquote>
169 <pre>
170 /etc/postfix/<a href="master.5.html">master.cf</a>:
171 # =============================================================
172 # service type private unpriv <b>chroot</b> wakeup maxproc command
173 # (yes) (yes) <b>(yes)</b> (never) (100)
174 # =============================================================
175 smtp inet n - <b>n</b> - - smtpd
176 </pre>
177 </blockquote>
179 <p> Inspect <a href="master.5.html">master.cf</a> for any processes that have chroot operation
180 not turned off. If you find any, save a copy of the <a href="master.5.html">master.cf</a> file,
181 and edit the entries in question. After executing the command
182 "<b>postfix reload</b>", see if the problem has gone away. </p>
184 <p> If turning off chrooted operation made the problem go away,
185 then congratulations. Leaving Postfix running in this way is
186 adequate for most sites. If you prefer chrooted operation, see
187 the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
188 BASIC_CONFIGURATION_README</a> file for information about how to
189 prepare Postfix for chrooted operation. </p>
191 <h2><a name="debug_peer">Verbose logging for specific SMTP
192 connections</a></h2>
194 <p> In /etc/postfix/<a href="postconf.5.html">main.cf</a>, list the remote site name or address
195 in the <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> parameter. For example, in order to make
196 the software log a lot of information to the syslog daemon for
197 connections from or to the loopback interface: </p>
199 <blockquote>
200 <pre>
201 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
202 <a href="postconf.5.html#debug_peer_list">debug_peer_list</a> = 127.0.0.1
203 </pre>
204 </blockquote>
206 <p> You can specify one or more hosts, domains, addresses or
207 net/masks. To make the change effective immediately, execute the
208 command "<b>postfix reload</b>". </p>
210 <h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>
212 <p> This example uses <b>tcpdump</b>. In order to record a conversation
213 you need to specify a large enough buffer with the "<b>-s</b>"
214 option or else you will miss some or all of the packet payload.
215 </p>
217 <blockquote>
218 <pre>
219 # <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
220 </pre>
221 </blockquote>
223 <p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
224 use "<b>-s 2000</b>" instead. </p>
226 <p> Run this for a while, stop with Ctrl-C when done. To view the
227 data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
228 </p>
230 <h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>
232 <p> Append one or more "<b>-v</b>" options to selected daemon
233 definitions in /etc/postfix/<a href="master.5.html">master.cf</a> and type "<b>postfix reload</b>".
234 This will cause a lot of activity to be logged to the syslog daemon.
235 For example, to make the Postfix SMTP server process more verbose: </p>
237 <blockquote>
238 <pre>
239 /etc/postfix/<a href="master.5.html">master.cf</a>:
240 smtp inet n - n - - smtpd -v
241 </pre>
242 </blockquote>
244 <p> To diagnose problems with address rewriting specify a "<b>-v</b>"
245 option for the <a href="cleanup.8.html">cleanup(8)</a> and/or <a href="trivial-rewrite.8.html">trivial-rewrite(8)</a> daemon, and to
246 diagnose problems with mail delivery specify a "<b>-v</b>"
247 option for the <a href="qmgr.8.html">qmgr(8)</a> or <a href="qmgr.8.html">oqmgr(8)</a> queue manager, or for the <a href="lmtp.8.html">lmtp(8)</a>,
248 <a href="local.8.html">local(8)</a>, <a href="pipe.8.html">pipe(8)</a>, <a href="smtp.8.html">smtp(8)</a>, or <a href="virtual.8.html">virtual(8)</a> delivery agent. </p>
250 <h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>
252 <p> Many systems allow you to inspect a running process with a
253 system call tracer. For example: </p>
255 <blockquote>
256 <pre>
257 # <b>trace -p process-id</b> (SunOS 4)
258 # <b>strace -p process-id</b> (Linux and many others)
259 # <b>truss -p process-id</b> (Solaris, FreeBSD)
260 # <b>ktrace -p process-id</b> (generic 4.4BSD)
261 </pre>
262 </blockquote>
264 <p> Even more informative are traces of system library calls.
265 Examples: </p>
267 <blockquote>
268 <pre>
269 # <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
270 # <b>sotruss -p process-id</b> (Solaris)
271 </pre>
272 </blockquote>
274 <p> See your system documentation for details. </p>
276 <p> Tracing a running process can give valuable information about
277 what a process is attempting to do. This is as much information as
278 you can get without running an interactive debugger program, as
279 described in a later section. </p>
281 <h2><a name="auto_trace">Automatically tracing a Postfix daemon
282 process</a></h2>
284 <p> Postfix can attach a call tracer whenever a daemon process
285 starts. Call tracers come in several kinds. </p>
287 <ol>
289 <li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
290 <b>strace</b>, or <b>ktrace</b>. These show the communication
291 between the process and the kernel. </p>
293 <li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
294 These show calls of library routines, and give a better idea of
295 what is going on within the process. </p>
297 </ol>
299 <p> Append a <b>-D</b> option to the suspect command in
300 /etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
302 <blockquote>
303 <pre>
304 /etc/postfix/<a href="master.5.html">master.cf</a>:
305 smtp inet n - n - - smtpd -D
306 </pre>
307 </blockquote>
309 <p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
310 so that it invokes the call tracer of your choice, for example:
311 </p>
313 <blockquote>
314 <pre>
315 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
316 <a href="postconf.5.html#debugger_command">debugger_command</a> =
317 PATH=/bin:/usr/bin:/usr/local/bin;
318 (truss -p $<a href="postconf.5.html#process_id">process_id</a> 2&gt&amp;1 | logger -p mail.info) &amp; sleep 5
319 </pre>
320 </blockquote>
322 <p> Type "<b>postfix reload</b>" and watch the logfile. </p>
324 <h2><a name="ddd">Running daemon programs with the interactive
325 ddd debugger</a></h2>
327 <p> If you have X Windows installed on the Postfix machine, then
328 an interactive debugger such as <b>ddd</b> can be convenient.
329 </p>
331 <p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
332 so that it invokes <b>ddd</b>: </p>
334 <blockquote>
335 <pre>
336 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
337 <a href="postconf.5.html#debugger_command">debugger_command</a> =
338 PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
339 ddd $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> &amp; sleep 5
340 </pre>
341 </blockquote>
343 <p> Be sure that <b>gdb</b> is in the command search path, and
344 export <b>XAUTHORITY</b> so that X access control works, for example:
345 </p>
347 <blockquote>
348 <pre>
349 % <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
350 $ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
351 </pre>
352 </blockquote>
354 <p> Append a <b>-D</b> option to the suspect daemon definition in
355 /etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
357 <blockquote>
358 <pre>
359 /etc/postfix/<a href="master.5.html">master.cf</a>:
360 smtp inet n - n - - smtpd -D
361 </pre>
362 </blockquote>
364 <p> Stop and start the Postfix system. This is necessary so that
365 Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
366 settings. </p>
368 <p> Whenever the suspect daemon process is started, a debugger
369 window pops up and you can watch in detail what happens. </p>
371 <h2><a name="screen">Running daemon programs with the interactive
372 gdb debugger</a></h2>
374 <p> If you have the screen command installed on the Postfix machine, then
375 you can run an interactive debugger such as <b>gdb</b> as follows. </p>
377 <p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
378 so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
379 </p>
381 <blockquote>
382 <pre>
383 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
384 <a href="postconf.5.html#debugger_command">debugger_command</a> =
385 PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
386 export HOME; screen -e^tt -dmS $<a href="postconf.5.html#process_name">process_name</a> gdb
387 $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a> $<a href="postconf.5.html#process_id">process_id</a> &amp; sleep 2
388 </pre>
389 </blockquote>
391 <p> Be sure that <b>gdb</b> is in the command search path. </p>
393 <p> Append a <b>-D</b> option to the suspect daemon definition in
394 /etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
396 <blockquote>
397 <pre>
398 /etc/postfix/<a href="master.5.html">master.cf</a>:
399 smtp inet n - n - - smtpd -D
400 </pre>
401 </blockquote>
403 <p> Execute the command "<b>postfix reload</b>" and wait until a
404 daemon process is started (you can see this in the maillog file).
405 </p>
407 <p> Then attach to the screen, and debug away: </p>
409 <blockquote>
410 <pre>
411 # HOME=/root screen -r
412 gdb) continue
413 gdb) where
414 </pre>
415 </blockquote>
417 <h2><a name="gdb">Running daemon programs under a non-interactive
418 debugger</a></h2>
420 <p> If you do not have X Windows installed on the Postfix machine,
421 or if you are not familiar with interactive debuggers, then you
422 can try to run <b>gdb</b> in non-interactive mode, and have it
423 print a stack trace when the process crashes. </p>
425 <p> Edit the <a href="postconf.5.html#debugger_command">debugger_command</a> definition in /etc/postfix/<a href="postconf.5.html">main.cf</a>
426 so that it invokes the <b>gdb</b> debugger: </p>
428 <blockquote>
429 <pre>
430 /etc/postfix/<a href="postconf.5.html">main.cf</a>:
431 <a href="postconf.5.html#debugger_command">debugger_command</a> =
432 PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
433 where; sleep 8640000) | gdb $<a href="postconf.5.html#daemon_directory">daemon_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>
434 $<a href="postconf.5.html#process_id">process_id</a> 2&gt&amp;1
435 &gt;$<a href="postconf.5.html#config_directory">config_directory</a>/$<a href="postconf.5.html#process_name">process_name</a>.$<a href="postconf.5.html#process_id">process_id</a>.log &amp; sleep 5
436 </pre>
437 </blockquote>
439 <p> Append a <b>-D</b> option to the suspect daemon in
440 /etc/postfix/<a href="master.5.html">master.cf</a>, for example: </p>
442 <blockquote>
443 <pre>
444 /etc/postfix/<a href="master.5.html">master.cf</a>:
445 smtp inet n - n - - smtpd -D
446 </pre>
447 </blockquote>
449 <p> Type "<b>postfix reload</b>" to make the configuration changes
450 effective. </p>
452 <p> Whenever a suspect daemon process is started, an output file
453 is created, named after the daemon and process ID (for example,
454 smtpd.12345.log). When the process crashes, a stack trace (with
455 output from the "<b>where</b>" command) is written to its logfile.
456 </p>
458 <h2><a name="unreasonable">Unreasonable behavior</a></h2>
460 <p> Sometimes the behavior exhibited by Postfix just does not match the
461 source code. Why can a program deviate from the instructions given
462 by its author? There are two possibilities. </p>
464 <ul>
466 <li> <p> The compiler has erred. This rarely happens. </p>
468 <li> <p> The hardware has erred. Does the machine have ECC memory? </p>
470 </ul>
472 <p> In both cases, the program being executed is not the program
473 that was supposed to be executed, so anything could happen. </p>
475 <p> There is a third possibility: </p>
477 <ul>
479 <li> <p> Bugs in system software (kernel or libraries). </p>
481 </ul>
483 <p> Hardware-related failures usually do not reproduce in exactly
484 the same way after power cycling and rebooting the system. There's
485 little Postfix can do about bad hardware. Be sure to use hardware
486 that at the very least can detect memory errors. Otherwise, Postfix
487 will just be waiting to be hit by a bit error. Critical systems
488 deserve real hardware. </p>
490 <p> When a compiler makes an error, the problem can be reproduced
491 whenever the resulting program is run. Compiler errors are most
492 likely to happen in the code optimizer. If a problem is reproducible
493 across power cycles and system reboots, it can be worthwhile to
494 rebuild Postfix with optimization disabled, and to see if optimization
495 makes a difference. </p>
497 <p> In order to compile Postfix with optimizations turned off: </p>
499 <blockquote>
500 <pre>
501 % <b>make tidy</b>
502 % <b>make makefiles OPT=</b>
503 </pre>
504 </blockquote>
506 <p> This produces a set of Makefiles that do not request compiler
507 optimization. </p>
509 <p> Once the makefiles are set up, build the software: </p>
511 <blockquote>
512 <pre>
513 % <b>make</b>
514 % <b>su</b>
515 Password:
516 # <b>make install</b>
517 </pre>
518 </blockquote>
520 <p> If the problem goes away, then it is time to ask your vendor
521 for help. </p>
523 <h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>
525 <p> The people who participate on postfix-users@postfix.org
526 are very helpful, especially if YOU provide them with sufficient
527 information. Remember, these volunteers are willing to help, but
528 their time is limited. </p>
530 <p> When reporting a problem, be sure to include the following
531 information. </p>
533 <ul>
535 <li> <p> A summary of the problem. Please do not just send some
536 logging without explanation of what YOU believe is wrong. </p>
538 <li> <p> Complete error messages. Please use cut-and-paste, or use
539 attachments, instead of reciting information from memory.
540 </p>
542 <li> <p> Postfix logging. See the text at the top of the <a href="DEBUG_README.html">DEBUG_README</a>
543 document to find out where logging is stored. Please do not frustrate
544 the helpers by word wrapping the logging. If the logging is more
545 than a few kbytes of text, consider posting an URL on a web or ftp
546 site. </p>
548 <li> <p> Consider using a test email address so that you don't have
549 to reveal email addresses or passwords of innocent people. </p>
551 <li> <p> If you can't use a test email address, please anonymize
552 email addresses and host names consistently. Replace each letter
553 by "A", each digit
554 by "D" so that the helpers can still recognize syntactical errors.
555 </p>
557 <li> <p> Output from "<b>postconf -n</b>". Please do not send your
558 <a href="postconf.5.html">main.cf</a> file, or 500+ lines of <b>postconf</b> output. </p>
560 <li> <p> Better, provide output from the <b>postfinger</b> tool.
561 This can be found at <a href="http://ftp.wl0.org/SOURCES/postfinger">http://ftp.wl0.org/SOURCES/postfinger</a>. </p>
563 <li> <p> If the problem is SASL related, consider including the
564 output from the <b>saslfinger</b> tool. This can be found at
565 <a href="http://postfix.state-of-mind.de/patrick.koetter/saslfinger/">http://postfix.state-of-mind.de/patrick.koetter/saslfinger/</a>. </p>
567 <li> <p> If the problem is about too much mail in the queue, consider
568 including output from the <b>qshape</b> tool, as described in the
569 <a href="QSHAPE_README.html">QSHAPE_README</a> file. </p>
571 <li> <p> If the problem is protocol related (connections time out,
572 or an SMTP server complains about syntax errors etc.) consider
573 recording a session with <b>tcpdump</b>, as described in the <a
574 href="#sniffer">DEBUG_README</a> document. </ul>
576 </body>
578 </html>