Fix xslt_process() to ensure that it inserts a NULL terminator after the
[PostgreSQL.git] / doc / src / sgml / sources.sgml
blob526233ee55db0a415c799a2e62904f8dc9655e33
1 <!-- $PostgreSQL$ -->
3 <chapter id="source">
4 <title>PostgreSQL Coding Conventions</title>
6 <sect1 id="source-format">
7 <title>Formatting</title>
9 <para>
10 Source code formatting uses 4 column tab spacing, with
11 tabs preserved (i.e., tabs are not expanded to spaces).
12 Each logical indentation level is one additional tab stop.
13 </para>
15 <para>
16 Layout rules (brace positioning, etc) follow BSD conventions. In
17 particular, curly braces for the controlled blocks of <literal>if</>,
18 <literal>while</>, <literal>switch</>, etc go on their own lines.
19 </para>
21 <para>
22 Do not use C++ style comments (<literal>//</> comments). Strict ANSI C
23 compilers do not accept them. For the same reason, do not use C++
24 extensions such as declaring new variables mid-block.
25 </para>
27 <para>
28 The preferred style for multi-line comment blocks is
29 <programlisting>
31 * comment text begins here
32 * and continues here
34 </programlisting>
35 Note that comment blocks that begin in column 1 will be preserved as-is
36 by <application>pgindent</>, but it will re-flow indented comment blocks
37 as though they were plain text. If you want to preserve the line breaks
38 in an indented block, add dashes like this:
39 <programlisting>
40 /*----------
41 * comment text begins here
42 * and continues here
43 *----------
45 </programlisting>
46 </para>
48 <para>
49 While submitted patches do not absolutely have to follow these formatting
50 rules, it's a good idea to do so. Your code will get run through
51 <application>pgindent</> before the next release, so there's no point in
52 making it look nice under some other set of formatting conventions.
53 </para>
55 <para>
56 The <filename>src/tools</filename> directory contains sample settings
57 files that can be used with the <productname>emacs</productname>,
58 <productname>xemacs</productname> or <productname>vim</productname>
59 editors to help ensure that they format code according to these
60 conventions.
61 </para>
63 <para>
64 The text browsing tools <application>more</application> and
65 <application>less</application> can be invoked as:
66 <programlisting>
67 more -x4
68 less -x4
69 </programlisting>
70 to make them show tabs appropriately.
71 </para>
72 </sect1>
74 <sect1 id="error-message-reporting">
75 <title>Reporting Errors Within the Server</title>
77 <indexterm>
78 <primary>ereport</primary>
79 </indexterm>
80 <indexterm>
81 <primary>elog</primary>
82 </indexterm>
84 <para>
85 Error, warning, and log messages generated within the server code
86 should be created using <function>ereport</>, or its older cousin
87 <function>elog</>. The use of this function is complex enough to
88 require some explanation.
89 </para>
91 <para>
92 There are two required elements for every message: a severity level
93 (ranging from <literal>DEBUG</> to <literal>PANIC</>) and a primary
94 message text. In addition there are optional elements, the most
95 common of which is an error identifier code that follows the SQL spec's
96 SQLSTATE conventions.
97 <function>ereport</> itself is just a shell function, that exists
98 mainly for the syntactic convenience of making message generation
99 look like a function call in the C source code. The only parameter
100 accepted directly by <function>ereport</> is the severity level.
101 The primary message text and any optional message elements are
102 generated by calling auxiliary functions, such as <function>errmsg</>,
103 within the <function>ereport</> call.
104 </para>
106 <para>
107 A typical call to <function>ereport</> might look like this:
108 <programlisting>
109 ereport(ERROR,
110 (errcode(ERRCODE_DIVISION_BY_ZERO),
111 errmsg("division by zero")));
112 </programlisting>
113 This specifies error severity level <literal>ERROR</> (a run-of-the-mill
114 error). The <function>errcode</> call specifies the SQLSTATE error code
115 using a macro defined in <filename>src/include/utils/errcodes.h</>. The
116 <function>errmsg</> call provides the primary message text. Notice the
117 extra set of parentheses surrounding the auxiliary function calls &mdash;
118 these are annoying but syntactically necessary.
119 </para>
121 <para>
122 Here is a more complex example:
123 <programlisting>
124 ereport(ERROR,
125 (errcode(ERRCODE_AMBIGUOUS_FUNCTION),
126 errmsg("function %s is not unique",
127 func_signature_string(funcname, nargs,
128 actual_arg_types)),
129 errhint("Unable to choose a best candidate function. "
130 "You might need to add explicit typecasts.")));
131 </programlisting>
132 This illustrates the use of format codes to embed run-time values into
133 a message text. Also, an optional <quote>hint</> message is provided.
134 </para>
136 <para>
137 The available auxiliary routines for <function>ereport</> are:
138 <itemizedlist>
139 <listitem>
140 <para>
141 <function>errcode(sqlerrcode)</function> specifies the SQLSTATE error identifier
142 code for the condition. If this routine is not called, the error
143 identifier defaults to
144 <literal>ERRCODE_INTERNAL_ERROR</> when the error severity level is
145 <literal>ERROR</> or higher, <literal>ERRCODE_WARNING</> when the
146 error level is <literal>WARNING</>, otherwise (for <literal>NOTICE</>
147 and below) <literal>ERRCODE_SUCCESSFUL_COMPLETION</>.
148 While these defaults are often convenient, always think whether they
149 are appropriate before omitting the <function>errcode()</> call.
150 </para>
151 </listitem>
152 <listitem>
153 <para>
154 <function>errmsg(const char *msg, ...)</function> specifies the primary error
155 message text, and possibly run-time values to insert into it. Insertions
156 are specified by <function>sprintf</>-style format codes. In addition to
157 the standard format codes accepted by <function>sprintf</>, the format
158 code <literal>%m</> can be used to insert the error message returned
159 by <function>strerror</> for the current value of <literal>errno</>.
160 <footnote>
161 <para>
162 That is, the value that was current when the <function>ereport</> call
163 was reached; changes of <literal>errno</> within the auxiliary reporting
164 routines will not affect it. That would not be true if you were to
165 write <literal>strerror(errno)</> explicitly in <function>errmsg</>'s
166 parameter list; accordingly, do not do so.
167 </para>
168 </footnote>
169 <literal>%m</> does not require any
170 corresponding entry in the parameter list for <function>errmsg</>.
171 Note that the message string will be run through <function>gettext</>
172 for possible localization before format codes are processed.
173 </para>
174 </listitem>
175 <listitem>
176 <para>
177 <function>errmsg_internal(const char *msg, ...)</function> is the same as
178 <function>errmsg</>, except that the message string will not be
179 translated nor included in the internationalization message dictionary.
180 This should be used for <quote>cannot happen</> cases that are probably
181 not worth expending translation effort on.
182 </para>
183 </listitem>
184 <listitem>
185 <para>
186 <function>errmsg_plural(const char *fmt_singular, const char *fmt_plural,
187 unsigned long n, ...)</function> is like <function>errmsg</>, but with
188 support for various plural forms of the message.
189 <replaceable>fmt_singular</> is the English singular format,
190 <replaceable>fmt_plural</> is the English plural format,
191 <replaceable>n</> is the integer value that determines which plural
192 form is needed, and the remaining arguments are formatted according
193 to the selected format string. For more information see
194 <xref linkend="nls-guidelines">.
195 </para>
196 </listitem>
197 <listitem>
198 <para>
199 <function>errdetail(const char *msg, ...)</function> supplies an optional
200 <quote>detail</> message; this is to be used when there is additional
201 information that seems inappropriate to put in the primary message.
202 The message string is processed in just the same way as for
203 <function>errmsg</>.
204 </para>
205 </listitem>
206 <listitem>
207 <para>
208 <function>errdetail_log(const char *msg, ...)</function> is the same as
209 <function>errdetail</> except that this string goes only to the server
210 log, never to the client. If both <function>errdetail</> and
211 <function>errdetail_log</> are used then one string goes to the client
212 and the other to the log. This is useful for error details that are
213 too security-sensitive or too bulky to include in the report
214 sent to the client.
215 </para>
216 </listitem>
217 <listitem>
218 <para>
219 <function>errdetail_plural(const char *fmt_singular, const char *fmt_plural,
220 unsigned long n, ...)</function> is like <function>errdetail</>, but with
221 support for various plural forms of the message.
222 For more information see <xref linkend="nls-guidelines">.
223 </para>
224 </listitem>
225 <listitem>
226 <para>
227 <function>errhint(const char *msg, ...)</function> supplies an optional
228 <quote>hint</> message; this is to be used when offering suggestions
229 about how to fix the problem, as opposed to factual details about
230 what went wrong.
231 The message string is processed in just the same way as for
232 <function>errmsg</>.
233 </para>
234 </listitem>
235 <listitem>
236 <para>
237 <function>errcontext(const char *msg, ...)</function> is not normally called
238 directly from an <function>ereport</> message site; rather it is used
239 in <literal>error_context_stack</> callback functions to provide
240 information about the context in which an error occurred, such as the
241 current location in a PL function.
242 The message string is processed in just the same way as for
243 <function>errmsg</>. Unlike the other auxiliary functions, this can
244 be called more than once per <function>ereport</> call; the successive
245 strings thus supplied are concatenated with separating newlines.
246 </para>
247 </listitem>
248 <listitem>
249 <para>
250 <function>errposition(int cursorpos)</function> specifies the textual location
251 of an error within a query string. Currently it is only useful for
252 errors detected in the lexical and syntactic analysis phases of
253 query processing.
254 </para>
255 </listitem>
256 <listitem>
257 <para>
258 <function>errcode_for_file_access()</> is a convenience function that
259 selects an appropriate SQLSTATE error identifier for a failure in a
260 file-access-related system call. It uses the saved
261 <literal>errno</> to determine which error code to generate.
262 Usually this should be used in combination with <literal>%m</> in the
263 primary error message text.
264 </para>
265 </listitem>
266 <listitem>
267 <para>
268 <function>errcode_for_socket_access()</> is a convenience function that
269 selects an appropriate SQLSTATE error identifier for a failure in a
270 socket-related system call.
271 </para>
272 </listitem>
273 <listitem>
274 <para>
275 <function>errhidestmt(bool hide_stmt)</function> can be called to specify
276 suppression of the <literal>STATEMENT:</> portion of a message in the
277 postmaster log. Generally this is appropriate if the message text
278 includes the current statement already.
279 </para>
280 </listitem>
281 </itemizedlist>
282 </para>
284 <para>
285 There is an older function <function>elog</> that is still heavily used.
286 An <function>elog</> call:
287 <programlisting>
288 elog(level, "format string", ...);
289 </programlisting>
290 is exactly equivalent to:
291 <programlisting>
292 ereport(level, (errmsg_internal("format string", ...)));
293 </programlisting>
294 Notice that the SQLSTATE error code is always defaulted, and the message
295 string is not subject to translation.
296 Therefore, <function>elog</> should be used only for internal errors and
297 low-level debug logging. Any message that is likely to be of interest to
298 ordinary users should go through <function>ereport</>. Nonetheless,
299 there are enough internal <quote>cannot happen</> error checks in the
300 system that <function>elog</> is still widely used; it is preferred for
301 those messages for its notational simplicity.
302 </para>
304 <para>
305 Advice about writing good error messages can be found in
306 <xref linkend="error-style-guide">.
307 </para>
308 </sect1>
310 <sect1 id="error-style-guide">
311 <title>Error Message Style Guide</title>
313 <para>
314 This style guide is offered in the hope of maintaining a consistent,
315 user-friendly style throughout all the messages generated by
316 <productname>PostgreSQL</>.
317 </para>
319 <simplesect>
320 <title>What goes where</title>
322 <para>
323 The primary message should be short, factual, and avoid reference to
324 implementation details such as specific function names.
325 <quote>Short</quote> means <quote>should fit on one line under normal
326 conditions</quote>. Use a detail message if needed to keep the primary
327 message short, or if you feel a need to mention implementation details
328 such as the particular system call that failed. Both primary and detail
329 messages should be factual. Use a hint message for suggestions about what
330 to do to fix the problem, especially if the suggestion might not always be
331 applicable.
332 </para>
334 <para>
335 For example, instead of:
336 <programlisting>
337 IpcMemoryCreate: shmget(key=%d, size=%u, 0%o) failed: %m
338 (plus a long addendum that is basically a hint)
339 </programlisting>
340 write:
341 <programlisting>
342 Primary: could not create shared memory segment: %m
343 Detail: Failed syscall was shmget(key=%d, size=%u, 0%o).
344 Hint: the addendum
345 </programlisting>
346 </para>
348 <para>
349 Rationale: keeping the primary message short helps keep it to the point,
350 and lets clients lay out screen space on the assumption that one line is
351 enough for error messages. Detail and hint messages can be relegated to a
352 verbose mode, or perhaps a pop-up error-details window. Also, details and
353 hints would normally be suppressed from the server log to save
354 space. Reference to implementation details is best avoided since users
355 don't know the details anyway.
356 </para>
358 </simplesect>
360 <simplesect>
361 <title>Formatting</title>
363 <para>
364 Don't put any specific assumptions about formatting into the message
365 texts. Expect clients and the server log to wrap lines to fit their own
366 needs. In long messages, newline characters (\n) can be used to indicate
367 suggested paragraph breaks. Don't end a message with a newline. Don't
368 use tabs or other formatting characters. (In error context displays,
369 newlines are automatically added to separate levels of context such as
370 function calls.)
371 </para>
373 <para>
374 Rationale: Messages are not necessarily displayed on terminal-type
375 displays. In GUI displays or browsers these formatting instructions are
376 at best ignored.
377 </para>
379 </simplesect>
381 <simplesect>
382 <title>Quotation marks</title>
384 <para>
385 English text should use double quotes when quoting is appropriate.
386 Text in other languages should consistently use one kind of quotes that is
387 consistent with publishing customs and computer output of other programs.
388 </para>
390 <para>
391 Rationale: The choice of double quotes over single quotes is somewhat
392 arbitrary, but tends to be the preferred use. Some have suggested
393 choosing the kind of quotes depending on the type of object according to
394 SQL conventions (namely, strings single quoted, identifiers double
395 quoted). But this is a language-internal technical issue that many users
396 aren't even familiar with, it won't scale to other kinds of quoted terms,
397 it doesn't translate to other languages, and it's pretty pointless, too.
398 </para>
400 </simplesect>
402 <simplesect>
403 <title>Use of quotes</title>
405 <para>
406 Use quotes always to delimit file names, user-supplied identifiers, and
407 other variables that might contain words. Do not use them to mark up
408 variables that will not contain words (for example, operator names).
409 </para>
411 <para>
412 There are functions in the backend that will double-quote their own output
413 at need (for example, <function>format_type_be</>()). Do not put
414 additional quotes around the output of such functions.
415 </para>
417 <para>
418 Rationale: Objects can have names that create ambiguity when embedded in a
419 message. Be consistent about denoting where a plugged-in name starts and
420 ends. But don't clutter messages with unnecessary or duplicate quote
421 marks.
422 </para>
424 </simplesect>
426 <simplesect>
427 <title>Grammar and punctuation</title>
429 <para>
430 The rules are different for primary error messages and for detail/hint
431 messages:
432 </para>
434 <para>
435 Primary error messages: Do not capitalize the first letter. Do not end a
436 message with a period. Do not even think about ending a message with an
437 exclamation point.
438 </para>
440 <para>
441 Detail and hint messages: Use complete sentences, and end each with
442 a period. Capitalize the first word of sentences. Put two spaces after
443 the period if another sentence follows (for English text; might be
444 inappropriate in other languages).
445 </para>
447 <para>
448 Rationale: Avoiding punctuation makes it easier for client applications to
449 embed the message into a variety of grammatical contexts. Often, primary
450 messages are not grammatically complete sentences anyway. (And if they're
451 long enough to be more than one sentence, they should be split into
452 primary and detail parts.) However, detail and hint messages are longer
453 and might need to include multiple sentences. For consistency, they should
454 follow complete-sentence style even when there's only one sentence.
455 </para>
457 </simplesect>
459 <simplesect>
460 <title>Upper case vs. lower case</title>
462 <para>
463 Use lower case for message wording, including the first letter of a
464 primary error message. Use upper case for SQL commands and key words if
465 they appear in the message.
466 </para>
468 <para>
469 Rationale: It's easier to make everything look more consistent this
470 way, since some messages are complete sentences and some not.
471 </para>
473 </simplesect>
475 <simplesect>
476 <title>Avoid passive voice</title>
478 <para>
479 Use the active voice. Use complete sentences when there is an acting
480 subject (<quote>A could not do B</quote>). Use telegram style without
481 subject if the subject would be the program itself; do not use
482 <quote>I</quote> for the program.
483 </para>
485 <para>
486 Rationale: The program is not human. Don't pretend otherwise.
487 </para>
489 </simplesect>
491 <simplesect>
492 <title>Present vs past tense</title>
494 <para>
495 Use past tense if an attempt to do something failed, but could perhaps
496 succeed next time (perhaps after fixing some problem). Use present tense
497 if the failure is certainly permanent.
498 </para>
500 <para>
501 There is a nontrivial semantic difference between sentences of the form:
502 <programlisting>
503 could not open file "%s": %m
504 </programlisting>
505 and:
506 <programlisting>
507 cannot open file "%s"
508 </programlisting>
509 The first one means that the attempt to open the file failed. The
510 message should give a reason, such as <quote>disk full</quote> or
511 <quote>file doesn't exist</quote>. The past tense is appropriate because
512 next time the disk might not be full anymore or the file in question might
513 exist.
514 </para>
516 <para>
517 The second form indicates that the functionality of opening the named file
518 does not exist at all in the program, or that it's conceptually
519 impossible. The present tense is appropriate because the condition will
520 persist indefinitely.
521 </para>
523 <para>
524 Rationale: Granted, the average user will not be able to draw great
525 conclusions merely from the tense of the message, but since the language
526 provides us with a grammar we should use it correctly.
527 </para>
529 </simplesect>
531 <simplesect>
532 <title>Type of the object</title>
534 <para>
535 When citing the name of an object, state what kind of object it is.
536 </para>
538 <para>
539 Rationale: Otherwise no one will know what <quote>foo.bar.baz</>
540 refers to.
541 </para>
543 </simplesect>
545 <simplesect>
546 <title>Brackets</title>
548 <para>
549 Square brackets are only to be used (1) in command synopses to denote
550 optional arguments, or (2) to denote an array subscript.
551 </para>
553 <para>
554 Rationale: Anything else does not correspond to widely-known customary
555 usage and will confuse people.
556 </para>
558 </simplesect>
560 <simplesect>
561 <title>Assembling error messages</title>
563 <para>
564 When a message includes text that is generated elsewhere, embed it in
565 this style:
566 <programlisting>
567 could not open file %s: %m
568 </programlisting>
569 </para>
571 <para>
572 Rationale: It would be difficult to account for all possible error codes
573 to paste this into a single smooth sentence, so some sort of punctuation
574 is needed. Putting the embedded text in parentheses has also been
575 suggested, but it's unnatural if the embedded text is likely to be the
576 most important part of the message, as is often the case.
577 </para>
579 </simplesect>
581 <simplesect>
582 <title>Reasons for errors</title>
584 <para>
585 Messages should always state the reason why an error occurred.
586 For example:
587 <programlisting>
588 BAD: could not open file %s
589 BETTER: could not open file %s (I/O failure)
590 </programlisting>
591 If no reason is known you better fix the code.
592 </para>
594 </simplesect>
596 <simplesect>
597 <title>Function names</title>
599 <para>
600 Don't include the name of the reporting routine in the error text. We have
601 other mechanisms for finding that out when needed, and for most users it's
602 not helpful information. If the error text doesn't make as much sense
603 without the function name, reword it.
604 <programlisting>
605 BAD: pg_atoi: error in "z": cannot parse "z"
606 BETTER: invalid input syntax for integer: "z"
607 </programlisting>
608 </para>
610 <para>
611 Avoid mentioning called function names, either; instead say what the code
612 was trying to do:
613 <programlisting>
614 BAD: open() failed: %m
615 BETTER: could not open file %s: %m
616 </programlisting>
617 If it really seems necessary, mention the system call in the detail
618 message. (In some cases, providing the actual values passed to the
619 system call might be appropriate information for the detail message.)
620 </para>
622 <para>
623 Rationale: Users don't know what all those functions do.
624 </para>
626 </simplesect>
628 <simplesect>
629 <title>Tricky words to avoid</title>
631 <formalpara>
632 <title>Unable</title>
633 <para>
634 <quote>Unable</quote> is nearly the passive voice. Better use
635 <quote>cannot</quote> or <quote>could not</quote>, as appropriate.
636 </para>
637 </formalpara>
639 <formalpara>
640 <title>Bad</title>
641 <para>
642 Error messages like <quote>bad result</quote> are really hard to interpret
643 intelligently. It's better to write why the result is <quote>bad</quote>,
644 e.g., <quote>invalid format</quote>.
645 </para>
646 </formalpara>
648 <formalpara>
649 <title>Illegal</title>
650 <para>
651 <quote>Illegal</quote> stands for a violation of the law, the rest is
652 <quote>invalid</quote>. Better yet, say why it's invalid.
653 </para>
654 </formalpara>
656 <formalpara>
657 <title>Unknown</title>
658 <para>
659 Try to avoid <quote>unknown</quote>. Consider <quote>error: unknown
660 response</quote>. If you don't know what the response is, how do you know
661 it's erroneous? <quote>Unrecognized</quote> is often a better choice.
662 Also, be sure to include the value being complained of.
663 <programlisting>
664 BAD: unknown node type
665 BETTER: unrecognized node type: 42
666 </programlisting>
667 </para>
668 </formalpara>
670 <formalpara>
671 <title>Find vs. Exists</title>
672 <para>
673 If the program uses a nontrivial algorithm to locate a resource (e.g., a
674 path search) and that algorithm fails, it is fair to say that the program
675 couldn't <quote>find</quote> the resource. If, on the other hand, the
676 expected location of the resource is known but the program cannot access
677 it there then say that the resource doesn't <quote>exist</quote>. Using
678 <quote>find</quote> in this case sounds weak and confuses the issue.
679 </para>
680 </formalpara>
682 <formalpara>
683 <title>May vs. Can vs. Might</title>
684 <para>
685 <quote>May</quote> suggests permission (e.g., "You may borrow my rake."),
686 and has little use in documentation or error messages.
687 <quote>Can</quote> suggests ability (e.g., "I can lift that log."),
688 and <quote>might</quote> suggests possibility (e.g., "It might rain
689 today."). Using the proper word clarifies meaning and assists
690 translation.
691 </para>
692 </formalpara>
694 <formalpara>
695 <title>Contractions</title>
696 <para>
697 Avoid contractions, like <quote>can't</quote>; use
698 <quote>cannot</quote> instead.
699 </para>
700 </formalpara>
702 </simplesect>
704 <simplesect>
705 <title>Proper spelling</title>
707 <para>
708 Spell out words in full. For instance, avoid:
709 <itemizedlist>
710 <listitem>
711 <para>
712 spec
713 </para>
714 </listitem>
715 <listitem>
716 <para>
717 stats
718 </para>
719 </listitem>
720 <listitem>
721 <para>
722 parens
723 </para>
724 </listitem>
725 <listitem>
726 <para>
727 auth
728 </para>
729 </listitem>
730 <listitem>
731 <para>
732 xact
733 </para>
734 </listitem>
735 </itemizedlist>
736 </para>
738 <para>
739 Rationale: This will improve consistency.
740 </para>
742 </simplesect>
744 <simplesect>
745 <title>Localization</title>
747 <para>
748 Keep in mind that error message texts need to be translated into other
749 languages. Follow the guidelines in <xref linkend="nls-guidelines">
750 to avoid making life difficult for translators.
751 </para>
752 </simplesect>
754 </sect1>
756 </chapter>