Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / bki.sgml
blob3cd5bee7ffaf40cc683781a4094d6ebcf6f5d0e2
1 <!-- doc/src/sgml/bki.sgml -->
3 <chapter id="bki">
4 <title>System Catalog Declarations and Initial Contents</title>
6 <para>
7 <productname>PostgreSQL</productname> uses many different system catalogs
8 to keep track of the existence and properties of database objects, such as
9 tables and functions. Physically there is no difference between a system
10 catalog and a plain user table, but the backend C code knows the structure
11 and properties of each catalog, and can manipulate it directly at a low
12 level. Thus, for example, it is inadvisable to attempt to alter the
13 structure of a catalog on-the-fly; that would break assumptions built into
14 the C code about how rows of the catalog are laid out. But the structure
15 of the catalogs can change between major versions.
16 </para>
18 <para>
19 The structures of the catalogs are declared in specially formatted C
20 header files in the <filename>src/include/catalog/</filename> directory of
21 the source tree. For each catalog there is a header file
22 named after the catalog (e.g., <filename>pg_class.h</filename>
23 for <structname>pg_class</structname>), which defines the set of columns
24 the catalog has, as well as some other basic properties such as its OID.
25 </para>
27 <para>
28 Many of the catalogs have initial data that must be loaded into them
29 during the <quote>bootstrap</quote> phase
30 of <application>initdb</application>, to bring the system up to a point
31 where it is capable of executing SQL commands. (For
32 example, <filename>pg_class.h</filename> must contain an entry for itself,
33 as well as one for each other system catalog and index.) This
34 initial data is kept in editable form in data files that are also stored
35 in the <filename>src/include/catalog/</filename> directory. For example,
36 <filename>pg_proc.dat</filename> describes all the initial rows that must
37 be inserted into the <structname>pg_proc</structname> catalog.
38 </para>
40 <para>
41 To create the catalog files and load this initial data into them, a
42 backend running in bootstrap mode reads a <acronym>BKI</acronym>
43 (Backend Interface) file containing commands and initial data.
44 The <filename>postgres.bki</filename> file used in this mode is prepared
45 from the aforementioned header and data files, while building
46 a <productname>PostgreSQL</productname> distribution, by a Perl script
47 named <filename>genbki.pl</filename>.
48 Although it's specific to a particular <productname>PostgreSQL</productname>
49 release, <filename>postgres.bki</filename> is platform-independent and is
50 installed in the <filename>share</filename> subdirectory of the
51 installation tree.
52 </para>
54 <para>
55 <filename>genbki.pl</filename> also produces a derived header file for
56 each catalog, for example <filename>pg_class_d.h</filename> for
57 the <structname>pg_class</structname> catalog. This file contains
58 automatically-generated macro definitions, and may contain other macros,
59 enum declarations, and so on that can be useful for client C code that
60 reads a particular catalog.
61 </para>
63 <para>
64 Most PostgreSQL developers don't need to be directly concerned with
65 the <acronym>BKI</acronym> file, but almost any nontrivial feature
66 addition in the backend will require modifying the catalog header files
67 and/or initial data files. The rest of this chapter gives some
68 information about that, and for completeness describes
69 the <acronym>BKI</acronym> file format.
70 </para>
72 <sect1 id="system-catalog-declarations">
73 <title>System Catalog Declaration Rules</title>
75 <para>
76 The key part of a catalog header file is a C structure definition
77 describing the layout of each row of the catalog. This begins with
78 a <literal>CATALOG</literal> macro, which so far as the C compiler is
79 concerned is just shorthand for <literal>typedef struct
80 FormData_<replaceable>catalogname</replaceable></literal>.
81 Each field in the struct gives rise to a catalog column.
82 Fields can be annotated using the BKI property macros described
83 in <filename>genbki.h</filename>, for example to define a default value
84 for a field or mark it as nullable or not nullable.
85 The <literal>CATALOG</literal> line can also be annotated, with some
86 other BKI property macros described in <filename>genbki.h</filename>, to
87 define other properties of the catalog as a whole, such as whether
88 it is a shared relation.
89 </para>
91 <para>
92 The system catalog cache code (and most catalog-munging code in general)
93 assumes that the fixed-length portions of all system catalog tuples are
94 in fact present, because it maps this C struct declaration onto them.
95 Thus, all variable-length fields and nullable fields must be placed at
96 the end, and they cannot be accessed as struct fields.
97 For example, if you tried to
98 set <structname>pg_type</structname>.<structfield>typrelid</structfield>
99 to be NULL, it would fail when some piece of code tried to reference
100 <literal>typetup-&gt;typrelid</literal> (or worse,
101 <literal>typetup-&gt;typelem</literal>, because that follows
102 <structfield>typrelid</structfield>). This would result in
103 random errors or even segmentation violations.
104 </para>
106 <para>
107 As a partial guard against this type of error, variable-length or
108 nullable fields should not be made directly visible to the C compiler.
109 This is accomplished by wrapping them in <literal>#ifdef
110 CATALOG_VARLEN</literal> ... <literal>#endif</literal> (where
111 <literal>CATALOG_VARLEN</literal> is a symbol that is never defined).
112 This prevents C code from carelessly trying to access fields that might
113 not be there or might be at some other offset.
114 As an independent guard against creating incorrect rows, we
115 require all columns that should be non-nullable to be marked so
116 in <structname>pg_attribute</structname>. The bootstrap code will
117 automatically mark catalog columns as <literal>NOT NULL</literal>
118 if they are fixed-width and are not preceded by any nullable or
119 variable-width column.
120 Where this rule is inadequate, you can force correct marking by using
121 <literal>BKI_FORCE_NOT_NULL</literal>
122 and <literal>BKI_FORCE_NULL</literal> annotations as needed.
123 </para>
125 <para>
126 Frontend code should not include any <filename>pg_xxx.h</filename>
127 catalog header file, as these files may contain C code that won't compile
128 outside the backend. (Typically, that happens because these files also
129 contain declarations for functions
130 in <filename>src/backend/catalog/</filename> files.)
131 Instead, frontend code may include the corresponding
132 generated <filename>pg_xxx_d.h</filename> header, which will contain
133 OID <literal>#define</literal>s and any other data that might be of use
134 on the client side. If you want macros or other code in a catalog header
135 to be visible to frontend code, write <literal>#ifdef
136 EXPOSE_TO_CLIENT_CODE</literal> ... <literal>#endif</literal> around that
137 section to instruct <filename>genbki.pl</filename> to copy that section
138 to the <filename>pg_xxx_d.h</filename> header.
139 </para>
141 <para>
142 A few of the catalogs are so fundamental that they can't even be created
143 by the <acronym>BKI</acronym> <literal>create</literal> command that's
144 used for most catalogs, because that command needs to write information
145 into these catalogs to describe the new catalog. These are
146 called <firstterm>bootstrap</firstterm> catalogs, and defining one takes
147 a lot of extra work: you have to manually prepare appropriate entries for
148 them in the pre-loaded contents of <structname>pg_class</structname>
149 and <structname>pg_type</structname>, and those entries will need to be
150 updated for subsequent changes to the catalog's structure.
151 (Bootstrap catalogs also need pre-loaded entries
152 in <structname>pg_attribute</structname>, but
153 fortunately <filename>genbki.pl</filename> handles that chore nowadays.)
154 Avoid making new catalogs be bootstrap catalogs if at all possible.
155 </para>
156 </sect1>
158 <sect1 id="system-catalog-initial-data">
159 <title>System Catalog Initial Data</title>
161 <para>
162 Each catalog that has any manually-created initial data (some do not)
163 has a corresponding <literal>.dat</literal> file that contains its
164 initial data in an editable format.
165 </para>
167 <sect2 id="system-catalog-initial-data-format">
168 <title>Data File Format</title>
170 <para>
171 Each <literal>.dat</literal> file contains Perl data structure literals
172 that are simply eval'd to produce an in-memory data structure consisting
173 of an array of hash references, one per catalog row.
174 A slightly modified excerpt from <filename>pg_database.dat</filename>
175 will demonstrate the key features:
176 </para>
178 <!-- The "slight modification" is the apostrophe in the description. -->
179 <programlisting><![CDATA[
182 # A comment could appear here.
183 { oid => '1', oid_symbol => 'Template1DbOid',
184 descr => 'database\'s default template',
185 datname => 'template1', encoding => 'ENCODING',
186 datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
187 datallowconn => 't', dathasloginevt => 'f', datconnlimit => '-1', datfrozenxid => '0',
188 datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
189 datctype => 'LC_CTYPE', datlocale => 'DATLOCALE', datacl => '_null_' },
192 ]]></programlisting>
194 <para>
195 Points to note:
196 </para>
198 <itemizedlist>
200 <listitem>
201 <para>
202 The overall file layout is: open square bracket, one or more sets of
203 curly braces each of which represents a catalog row, close square
204 bracket. Write a comma after each closing curly brace.
205 </para>
206 </listitem>
208 <listitem>
209 <para>
210 Within each catalog row, write comma-separated
211 <replaceable>key</replaceable> <literal>=&gt;</literal>
212 <replaceable>value</replaceable> pairs. The
213 allowed <replaceable>key</replaceable>s are the names of the catalog's
214 columns, plus the metadata keys <literal>oid</literal>,
215 <literal>oid_symbol</literal>,
216 <literal>array_type_oid</literal>, and <literal>descr</literal>.
217 (The use of <literal>oid</literal> and <literal>oid_symbol</literal>
218 is described in <xref linkend="system-catalog-oid-assignment"/> below,
219 while <literal>array_type_oid</literal> is described in
220 <xref linkend="system-catalog-auto-array-types"/>.
221 <literal>descr</literal> supplies a description string for the object,
222 which will be inserted into <structname>pg_description</structname>
223 or <structname>pg_shdescription</structname> as appropriate.)
224 While the metadata keys are optional, the catalog's defined columns
225 must all be provided, except when the catalog's <literal>.h</literal>
226 file specifies a default value for the column.
227 (In the example above, the <structfield>datdba</structfield> field has
228 been omitted because <filename>pg_database.h</filename> supplies a
229 suitable default value for it.)
230 </para>
231 </listitem>
233 <listitem>
234 <para>
235 All values must be single-quoted. Escape single quotes used within a
236 value with a backslash. Backslashes meant as data can, but need not,
237 be doubled; this follows Perl's rules for simple quoted literals.
238 Note that backslashes appearing as data will be treated as escapes by
239 the bootstrap scanner, according to the same rules as for escape string
240 constants (see <xref linkend="sql-syntax-strings-escape"/>); for
241 example <literal>\t</literal> converts to a tab character. If you
242 actually want a backslash in the final value, you will need to write
243 four of them: Perl strips two, leaving <literal>\\</literal> for the
244 bootstrap scanner to see.
245 </para>
246 </listitem>
248 <listitem>
249 <para>
250 Null values are represented by <literal>_null_</literal>.
251 (Note that there is no way to create a value that is just that
252 string.)
253 </para>
254 </listitem>
256 <listitem>
257 <para>
258 Comments are preceded by <literal>#</literal>, and must be on their
259 own lines.
260 </para>
261 </listitem>
263 <listitem>
264 <para>
265 Field values that are OIDs of other catalog entries should be
266 represented by symbolic names rather than actual numeric OIDs.
267 (In the example above, <structfield>dattablespace</structfield>
268 contains such a reference.)
269 This is described in <xref linkend="system-catalog-oid-references"/>
270 below.
271 </para>
272 </listitem>
274 <listitem>
275 <para>
276 Since hashes are unordered data structures, field order and line
277 layout aren't semantically significant. However, to maintain a
278 consistent appearance, we set a few rules that are applied by the
279 formatting script <filename>reformat_dat_file.pl</filename>:
281 <itemizedlist>
283 <listitem>
284 <para>
285 Within each pair of curly braces, the metadata
286 fields <literal>oid</literal>, <literal>oid_symbol</literal>,
287 <literal>array_type_oid</literal>, and <literal>descr</literal>
288 (if present) come first, in that order, then the catalog's own
289 fields appear in their defined order.
290 </para>
291 </listitem>
293 <listitem>
294 <para>
295 Newlines are inserted between fields as needed to limit line length
296 to 80 characters, if possible. A newline is also inserted between
297 the metadata fields and the regular fields.
298 </para>
299 </listitem>
301 <listitem>
302 <para>
303 If the catalog's <literal>.h</literal> file specifies a default
304 value for a column, and a data entry has that same
305 value, <filename>reformat_dat_file.pl</filename> will omit it from
306 the data file. This keeps the data representation compact.
307 </para>
308 </listitem>
310 <listitem>
311 <para>
312 <filename>reformat_dat_file.pl</filename> preserves blank lines
313 and comment lines as-is.
314 </para>
315 </listitem>
317 </itemizedlist>
319 It's recommended to run <filename>reformat_dat_file.pl</filename>
320 before submitting catalog data patches. For convenience, you can
321 simply change to <filename>src/include/catalog/</filename> and
322 run <literal>make reformat-dat-files</literal>.
323 </para>
324 </listitem>
326 <listitem>
327 <para>
328 If you want to add a new method of making the data representation
329 smaller, you must implement it
330 in <filename>reformat_dat_file.pl</filename> and also
331 teach <function>Catalog::ParseData()</function> how to expand the
332 data back into the full representation.
333 </para>
334 </listitem>
336 </itemizedlist>
337 </sect2>
339 <sect2 id="system-catalog-oid-assignment">
340 <title>OID Assignment</title>
342 <para>
343 A catalog row appearing in the initial data can be given a
344 manually-assigned OID by writing an <literal>oid
345 =&gt; <replaceable>nnnn</replaceable></literal> metadata field.
346 Furthermore, if an OID is assigned, a C macro for that OID can be
347 created by writing an <literal>oid_symbol
348 =&gt; <replaceable>name</replaceable></literal> metadata field.
349 </para>
351 <para>
352 Pre-loaded catalog rows must have preassigned OIDs if there are OID
353 references to them in other pre-loaded rows. A preassigned OID is
354 also needed if the row's OID must be referenced from C code.
355 If neither case applies, the <literal>oid</literal> metadata field can
356 be omitted, in which case the bootstrap code assigns an OID
357 automatically.
358 In practice we usually preassign OIDs for all or none of the pre-loaded
359 rows in a given catalog, even if only some of them are actually
360 cross-referenced.
361 </para>
363 <para>
364 Writing the actual numeric value of any OID in C code is considered
365 very bad form; always use a macro, instead. Direct references
366 to <structname>pg_proc</structname> OIDs are common enough that there's
367 a special mechanism to create the necessary macros automatically;
368 see <filename>src/backend/utils/Gen_fmgrtab.pl</filename>. Similarly
369 &mdash; but, for historical reasons, not done the same way &mdash;
370 there's an automatic method for creating macros
371 for <structname>pg_type</structname>
372 OIDs. <literal>oid_symbol</literal> entries are therefore not
373 necessary in those two catalogs. Likewise, macros for
374 the <structname>pg_class</structname> OIDs of system catalogs and
375 indexes are set up automatically. For all other system catalogs, you
376 have to manually specify any macros you need
377 via <literal>oid_symbol</literal> entries.
378 </para>
380 <para>
381 To find an available OID for a new pre-loaded row, run the
382 script <filename>src/include/catalog/unused_oids</filename>.
383 It prints inclusive ranges of unused OIDs (e.g., the output
384 line <literal>45-900</literal> means OIDs 45 through 900 have not been
385 allocated yet). Currently, OIDs 1&ndash;9999 are reserved for manual
386 assignment; the <filename>unused_oids</filename> script simply looks
387 through the catalog headers and <filename>.dat</filename> files
388 to see which ones do not appear. You can also use
389 the <filename>duplicate_oids</filename> script to check for mistakes.
390 (<filename>genbki.pl</filename> will assign OIDs for any rows that
391 didn't get one hand-assigned to them, and it will also detect duplicate
392 OIDs at compile time.)
393 </para>
395 <para>
396 When choosing OIDs for a patch that is not expected to be committed
397 immediately, best practice is to use a group of more-or-less
398 consecutive OIDs starting with some random choice in the range
399 8000&mdash;9999. This minimizes the risk of OID collisions with other
400 patches being developed concurrently. To keep the 8000&mdash;9999
401 range free for development purposes, after a patch has been committed
402 to the master git repository its OIDs should be renumbered into
403 available space below that range. Typically, this will be done
404 near the end of each development cycle, moving all OIDs consumed by
405 patches committed in that cycle at the same time. The script
406 <filename>renumber_oids.pl</filename> can be used for this purpose.
407 If an uncommitted patch is found to have OID conflicts with some
408 recently-committed patch, <filename>renumber_oids.pl</filename> may
409 also be useful for recovering from that situation.
410 </para>
412 <para>
413 Because of this convention of possibly renumbering OIDs assigned by
414 patches, the OIDs assigned by a patch should not be considered stable
415 until the patch has been included in an official release. We do not
416 change manually-assigned object OIDs once released, however, as that
417 would create assorted compatibility problems.
418 </para>
420 <para>
421 If <filename>genbki.pl</filename> needs to assign an OID to a catalog
422 entry that does not have a manually-assigned OID, it will use a value in
423 the range 10000&mdash;11999. The server's OID counter is set to 10000
424 at the start of a bootstrap run, so that any objects created on-the-fly
425 during bootstrap processing also receive OIDs in this range. (The
426 usual OID assignment mechanism takes care of preventing any conflicts.)
427 </para>
429 <para>
430 Objects with OIDs below <symbol>FirstUnpinnedObjectId</symbol> (12000)
431 are considered <quote>pinned</quote>, preventing them from being
432 deleted. (There are a small number of exceptions, which are
433 hard-wired into <function>IsPinnedObject()</function>.)
434 <application>initdb</application> forces the OID counter up
435 to <symbol>FirstUnpinnedObjectId</symbol> as soon as it's ready to
436 create unpinned objects. Thus objects created during the later phases
437 of <application>initdb</application>, such as objects created while
438 running the <filename>information_schema.sql</filename> script, will
439 not be pinned, while all objects known
440 to <filename>genbki.pl</filename> will be.
441 </para>
443 <para>
444 OIDs assigned during normal database operation are constrained to be
445 16384 or higher. This ensures that the range 10000&mdash;16383 is free
446 for OIDs assigned automatically by <filename>genbki.pl</filename> or
447 during <application>initdb</application>. These
448 automatically-assigned OIDs are not considered stable, and may change
449 from one installation to another.
450 </para>
451 </sect2>
453 <sect2 id="system-catalog-oid-references">
454 <title>OID Reference Lookup</title>
456 <para>
457 In principle, cross-references from one initial catalog row to another
458 could be written just by writing the preassigned OID of the referenced
459 row in the referencing field. However, that is against project
460 policy, because it is error-prone, hard to read, and subject to
461 breakage if a newly-assigned OID is renumbered. Therefore
462 <filename>genbki.pl</filename> provides mechanisms to write
463 symbolic references instead.
464 The rules are as follows:
465 </para>
467 <itemizedlist>
469 <listitem>
470 <para>
471 Use of symbolic references is enabled in a particular catalog column
472 by attaching <literal>BKI_LOOKUP(<replaceable>lookuprule</replaceable>)</literal>
473 to the column's definition, where <replaceable>lookuprule</replaceable>
474 is the name of the referenced catalog, e.g., <literal>pg_proc</literal>.
475 <literal>BKI_LOOKUP</literal> can be attached to columns of
476 type <type>Oid</type>, <type>regproc</type>, <type>oidvector</type>,
477 or <type>Oid[]</type>; in the latter two cases it implies performing a
478 lookup on each element of the array.
479 </para>
480 </listitem>
482 <listitem>
483 <para>
484 It's also permissible to attach <literal>BKI_LOOKUP(encoding)</literal>
485 to integer columns to reference character set encodings, which are
486 not currently represented as catalog OIDs, but have a set of values
487 known to <filename>genbki.pl</filename>.
488 </para>
489 </listitem>
491 <listitem>
492 <para>
493 In some catalog columns, it's allowed for entries to be zero instead
494 of a valid reference. If this is allowed, write
495 <literal>BKI_LOOKUP_OPT</literal> instead
496 of <literal>BKI_LOOKUP</literal>. Then you can
497 write <literal>0</literal> for an entry. (If the column is
498 declared <type>regproc</type>, you can optionally
499 write <literal>-</literal> instead of <literal>0</literal>.)
500 Except for this special case, all entries in
501 a <literal>BKI_LOOKUP</literal> column must be symbolic references.
502 <filename>genbki.pl</filename> will warn about unrecognized names.
503 </para>
504 </listitem>
506 <listitem>
507 <para>
508 Most kinds of catalog objects are simply referenced by their names.
509 Note that type names must exactly match the
510 referenced <structname>pg_type</structname>
511 entry's <structfield>typname</structfield>; you do not get to use
512 any aliases such as <literal>integer</literal>
513 for <literal>int4</literal>.
514 </para>
515 </listitem>
517 <listitem>
518 <para>
519 A function can be represented by
520 its <structfield>proname</structfield>, if that is unique among
521 the <filename>pg_proc.dat</filename> entries (this works like regproc
522 input). Otherwise, write it
523 as <replaceable>proname(argtypename,argtypename,...)</replaceable>,
524 like regprocedure. The argument type names must be spelled exactly as
525 they are in the <filename>pg_proc.dat</filename> entry's
526 <structfield>proargtypes</structfield> field. Do not insert any
527 spaces.
528 </para>
529 </listitem>
531 <listitem>
532 <para>
533 Operators are represented
534 by <replaceable>oprname(lefttype,righttype)</replaceable>,
535 writing the type names exactly as they appear in
536 the <filename>pg_operator.dat</filename>
537 entry's <structfield>oprleft</structfield>
538 and <structfield>oprright</structfield> fields.
539 (Write <literal>0</literal> for the omitted operand of a unary
540 operator.)
541 </para>
542 </listitem>
544 <listitem>
545 <para>
546 The names of opclasses and opfamilies are only unique within an
547 access method, so they are represented
548 by <replaceable>access_method_name</replaceable><literal>/</literal><replaceable>object_name</replaceable>.
549 </para>
550 </listitem>
552 <listitem>
553 <para>
554 In none of these cases is there any provision for
555 schema-qualification; all objects created during bootstrap are
556 expected to be in the <literal>pg_catalog</literal> schema.
557 </para>
558 </listitem>
559 </itemizedlist>
561 <para>
562 <filename>genbki.pl</filename> resolves all symbolic references while it
563 runs, and puts simple numeric OIDs into the emitted BKI file. There is
564 therefore no need for the bootstrap backend to deal with symbolic
565 references.
566 </para>
568 <para>
569 It's desirable to mark OID reference columns
570 with <literal>BKI_LOOKUP</literal> or <literal>BKI_LOOKUP_OPT</literal>
571 even if the catalog has no initial data that requires lookup. This
572 allows <filename>genbki.pl</filename> to record the foreign key
573 relationships that exist in the system catalogs. That information is
574 used in the regression tests to check for incorrect entries. See also
575 the macros <literal>DECLARE_FOREIGN_KEY</literal>,
576 <literal>DECLARE_FOREIGN_KEY_OPT</literal>,
577 <literal>DECLARE_ARRAY_FOREIGN_KEY</literal>,
578 and <literal>DECLARE_ARRAY_FOREIGN_KEY_OPT</literal>, which are
579 used to declare foreign key relationships that are too complex
580 for <literal>BKI_LOOKUP</literal> (typically, multi-column foreign
581 keys).
582 </para>
583 </sect2>
585 <sect2 id="system-catalog-auto-array-types">
586 <title>Automatic Creation of Array Types</title>
588 <para>
589 Most scalar data types should have a corresponding array type (that is,
590 a standard varlena array type whose element type is the scalar type, and
591 which is referenced by the <structfield>typarray</structfield> field of
592 the scalar type's <structname>pg_type</structname>
593 entry). <filename>genbki.pl</filename> is able to generate
594 the <structname>pg_type</structname> entry for the array type
595 automatically in most cases.
596 </para>
598 <para>
599 To use this facility, just write an <literal>array_type_oid
600 =&gt; <replaceable>nnnn</replaceable></literal> metadata field in the
601 scalar type's <structname>pg_type</structname> entry, specifying the OID
602 to use for the array type. You may then omit
603 the <structfield>typarray</structfield> field, since it will be filled
604 automatically with that OID.
605 </para>
607 <para>
608 The generated array type's name is the scalar type's name with an
609 underscore prepended. The array entry's other fields are filled from
610 <literal>BKI_ARRAY_DEFAULT(<replaceable>value</replaceable>)</literal>
611 annotations in <filename>pg_type.h</filename>, or if there isn't one,
612 copied from the scalar type. (There's also a special case
613 for <structfield>typalign</structfield>.) Then
614 the <structfield>typelem</structfield>
615 and <structfield>typarray</structfield> fields of the two entries are
616 set to cross-reference each other.
617 </para>
618 </sect2>
620 <sect2 id="system-catalog-recipes">
621 <title>Recipes for Editing Data Files</title>
623 <para>
624 Here are some suggestions about the easiest ways to perform common tasks
625 when updating catalog data files.
626 </para>
628 <formalpara>
629 <title>Add a new column with a default to a catalog:</title>
630 <para>
631 Add the column to the header file with
632 a <literal>BKI_DEFAULT(<replaceable>value</replaceable>)</literal>
633 annotation. The data file need only be adjusted by adding the field
634 in existing rows where a non-default value is needed.
635 </para>
636 </formalpara>
638 <formalpara>
639 <title>Add a default value to an existing column that doesn't have
640 one:</title>
641 <para>
642 Add a <literal>BKI_DEFAULT</literal> annotation to the header file,
643 then run <literal>make reformat-dat-files</literal> to remove
644 now-redundant field entries.
645 </para>
646 </formalpara>
648 <formalpara>
649 <title>Remove a column, whether it has a default or not:</title>
650 <para>
651 Remove the column from the header, then run <literal>make
652 reformat-dat-files</literal> to remove now-useless field entries.
653 </para>
654 </formalpara>
656 <formalpara>
657 <title>Change or remove an existing default value:</title>
658 <para>
659 You cannot simply change the header file, since that will cause the
660 current data to be interpreted incorrectly. First run <literal>make
661 expand-dat-files</literal> to rewrite the data files with all
662 default values inserted explicitly, then change or remove
663 the <literal>BKI_DEFAULT</literal> annotation, then run <literal>make
664 reformat-dat-files</literal> to remove superfluous fields again.
665 </para>
666 </formalpara>
668 <formalpara>
669 <title>Ad-hoc bulk editing:</title>
670 <para>
671 <filename>reformat_dat_file.pl</filename> can be adapted to perform
672 many kinds of bulk changes. Look for its block comments showing where
673 one-off code can be inserted. In the following example, we are going
674 to consolidate two Boolean fields in <structname>pg_proc</structname>
675 into a char field:
677 <orderedlist>
678 <listitem>
679 <para>
680 Add the new column, with a default,
681 to <filename>pg_proc.h</filename>:
682 <programlisting>
683 + /* see PROKIND_ categories below */
684 + char prokind BKI_DEFAULT(f);
685 </programlisting>
686 </para>
687 </listitem>
689 <listitem>
690 <para>
691 Create a new script based on <filename>reformat_dat_file.pl</filename>
692 to insert appropriate values on-the-fly:
693 <programlisting>
694 - # At this point we have the full row in memory as a hash
695 - # and can do any operations we want. As written, it only
696 - # removes default values, but this script can be adapted to
697 - # do one-off bulk-editing.
698 + # One-off change to migrate to prokind
699 + # Default has already been filled in by now, so change to other
700 + # values as appropriate
701 + if ($values{proisagg} eq 't')
703 + $values{prokind} = 'a';
705 + elsif ($values{proiswindow} eq 't')
707 + $values{prokind} = 'w';
709 </programlisting>
710 </para>
711 </listitem>
713 <listitem>
714 <para>
715 Run the new script:
716 <programlisting>
717 $ cd src/include/catalog
718 $ perl rewrite_dat_with_prokind.pl pg_proc.dat
719 </programlisting>
720 At this point <filename>pg_proc.dat</filename> has all three
721 columns, <structfield>prokind</structfield>,
722 <structfield>proisagg</structfield>,
723 and <structfield>proiswindow</structfield>, though they will appear
724 only in rows where they have non-default values.
725 </para>
726 </listitem>
728 <listitem>
729 <para>
730 Remove the old columns from <filename>pg_proc.h</filename>:
731 <programlisting>
732 - /* is it an aggregate? */
733 - bool proisagg BKI_DEFAULT(f);
735 - /* is it a window function? */
736 - bool proiswindow BKI_DEFAULT(f);
737 </programlisting>
738 </para>
739 </listitem>
741 <listitem>
742 <para>
743 Finally, run <literal>make reformat-dat-files</literal> to remove
744 the useless old entries from <filename>pg_proc.dat</filename>.
745 </para>
746 </listitem>
747 </orderedlist>
749 For further examples of scripts used for bulk editing, see
750 <filename>convert_oid2name.pl</filename>
751 and <filename>remove_pg_type_oid_symbols.pl</filename> attached to this
752 message:
753 <ulink url="https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com"></ulink>
754 </para>
755 </formalpara>
756 </sect2>
757 </sect1>
759 <sect1 id="bki-format">
760 <title><acronym>BKI</acronym> File Format</title>
762 <para>
763 This section describes how the <productname>PostgreSQL</productname>
764 backend interprets <acronym>BKI</acronym> files. This description
765 will be easier to understand if the <filename>postgres.bki</filename>
766 file is at hand as an example.
767 </para>
769 <para>
770 <acronym>BKI</acronym> input consists of a sequence of commands. Commands are made up
771 of a number of tokens, depending on the syntax of the command.
772 Tokens are usually separated by whitespace, but need not be if
773 there is no ambiguity. There is no special command separator; the
774 next token that syntactically cannot belong to the preceding
775 command starts a new one. (Usually you would put a new command on
776 a new line, for clarity.) Tokens can be certain key words, special
777 characters (parentheses, commas, etc.), identifiers, numbers, or
778 single-quoted strings. Everything is case sensitive.
779 </para>
781 <para>
782 Lines starting with <literal>#</literal> are ignored.
783 </para>
785 </sect1>
787 <sect1 id="bki-commands">
788 <title><acronym>BKI</acronym> Commands</title>
790 <variablelist>
791 <varlistentry>
792 <term>
793 <literal>create</literal>
794 <replaceable class="parameter">tablename</replaceable>
795 <replaceable class="parameter">tableoid</replaceable>
796 <optional><literal>bootstrap</literal></optional>
797 <optional><literal>shared_relation</literal></optional>
798 <optional><literal>rowtype_oid</literal> <replaceable>oid</replaceable></optional>
799 (<replaceable class="parameter">name1</replaceable> =
800 <replaceable class="parameter">type1</replaceable>
801 <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional> <optional>,
802 <replaceable class="parameter">name2</replaceable> =
803 <replaceable class="parameter">type2</replaceable>
804 <optional><literal>FORCE NOT NULL</literal> | <literal>FORCE NULL</literal> </optional>,
805 ...</optional>)
806 </term>
808 <listitem>
809 <para>
810 Create a table named <replaceable
811 class="parameter">tablename</replaceable>, and having the OID
812 <replaceable class="parameter">tableoid</replaceable>,
813 with the columns given in parentheses.
814 </para>
816 <para>
817 The following column types are supported directly by
818 <filename>bootstrap.c</filename>: <type>bool</type>,
819 <type>bytea</type>, <type>char</type> (1 byte),
820 <type>name</type>, <type>int2</type>,
821 <type>int4</type>, <type>regproc</type>, <type>regclass</type>,
822 <type>regtype</type>, <type>text</type>,
823 <type>oid</type>, <type>tid</type>, <type>xid</type>,
824 <type>cid</type>, <type>int2vector</type>, <type>oidvector</type>,
825 <type>_int4</type> (array), <type>_text</type> (array),
826 <type>_oid</type> (array), <type>_char</type> (array),
827 <type>_aclitem</type> (array). Although it is possible to create
828 tables containing columns of other types, this cannot be done until
829 after <structname>pg_type</structname> has been created and filled with
830 appropriate entries. (That effectively means that only these
831 column types can be used in bootstrap catalogs, but non-bootstrap
832 catalogs can contain any built-in type.)
833 </para>
835 <para>
836 When <literal>bootstrap</literal> is specified,
837 the table will only be created on disk; nothing is entered into
838 <structname>pg_class</structname>,
839 <structname>pg_attribute</structname>, etc., for it. Thus the
840 table will not be accessible by ordinary SQL operations until
841 such entries are made the hard way (with <literal>insert</literal>
842 commands). This option is used for creating
843 <structname>pg_class</structname> etc. themselves.
844 </para>
846 <para>
847 The table is created as shared if <literal>shared_relation</literal> is
848 specified.
849 The table's row type OID (<structname>pg_type</structname> OID) can optionally
850 be specified via the <literal>rowtype_oid</literal> clause; if not specified,
851 an OID is automatically generated for it. (The <literal>rowtype_oid</literal>
852 clause is useless if <literal>bootstrap</literal> is specified, but it can be
853 provided anyway for documentation.)
854 </para>
855 </listitem>
856 </varlistentry>
858 <varlistentry>
859 <term>
860 <literal>open</literal> <replaceable class="parameter">tablename</replaceable>
861 </term>
863 <listitem>
864 <para>
865 Open the table named
866 <replaceable class="parameter">tablename</replaceable>
867 for insertion of data. Any currently open table is closed.
868 </para>
869 </listitem>
870 </varlistentry>
872 <varlistentry>
873 <term>
874 <literal>close</literal> <replaceable class="parameter">tablename</replaceable>
875 </term>
877 <listitem>
878 <para>
879 Close the open table. The name of the table must be given as a
880 cross-check.
881 </para>
882 </listitem>
883 </varlistentry>
885 <varlistentry>
886 <term>
887 <literal>insert</literal> <literal>(</literal> <optional><replaceable class="parameter">oid_value</replaceable></optional> <replaceable class="parameter">value1</replaceable> <replaceable class="parameter">value2</replaceable> ... <literal>)</literal>
888 </term>
890 <listitem>
891 <para>
892 Insert a new row into the open table using <replaceable
893 class="parameter">value1</replaceable>, <replaceable
894 class="parameter">value2</replaceable>, etc., for its column
895 values.
896 </para>
898 <para>
899 NULL values can be specified using the special key word
900 <literal>_null_</literal>. Values that do not look like
901 identifiers or digit strings must be single-quoted.
902 (To include a single quote in a value, write it twice.
903 Escape-string-style backslash escapes are allowed in the string, too.)
904 </para>
905 </listitem>
906 </varlistentry>
908 <varlistentry>
909 <term>
910 <literal>declare</literal> <optional><literal>unique</literal></optional>
911 <literal>index</literal> <replaceable class="parameter">indexname</replaceable>
912 <replaceable class="parameter">indexoid</replaceable>
913 <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
914 <literal>using</literal> <replaceable class="parameter">amname</replaceable>
915 <literal>(</literal> <replaceable class="parameter">opclass1</replaceable>
916 <replaceable class="parameter">name1</replaceable>
917 <optional>, ...</optional> <literal>)</literal>
918 </term>
920 <listitem>
921 <para>
922 Create an index named <replaceable
923 class="parameter">indexname</replaceable>, having OID
924 <replaceable class="parameter">indexoid</replaceable>,
925 on the table named
926 <replaceable class="parameter">tablename</replaceable>, using the
927 <replaceable class="parameter">amname</replaceable> access
928 method. The fields to index are called <replaceable
929 class="parameter">name1</replaceable>, <replaceable
930 class="parameter">name2</replaceable> etc., and the operator
931 classes to use are <replaceable
932 class="parameter">opclass1</replaceable>, <replaceable
933 class="parameter">opclass2</replaceable> etc., respectively.
934 The index file is created and appropriate catalog entries are
935 made for it, but the index contents are not initialized by this command.
936 </para>
937 </listitem>
938 </varlistentry>
940 <varlistentry>
941 <term>
942 <literal>declare toast</literal>
943 <replaceable class="parameter">toasttableoid</replaceable>
944 <replaceable class="parameter">toastindexoid</replaceable>
945 <literal>on</literal> <replaceable class="parameter">tablename</replaceable>
946 </term>
948 <listitem>
949 <para>
950 Create a TOAST table for the table named
951 <replaceable class="parameter">tablename</replaceable>.
952 The TOAST table is assigned OID
953 <replaceable class="parameter">toasttableoid</replaceable>
954 and its index is assigned OID
955 <replaceable class="parameter">toastindexoid</replaceable>.
956 As with <literal>declare index</literal>, filling of the index
957 is postponed.
958 </para>
959 </listitem>
960 </varlistentry>
962 <varlistentry>
963 <term><literal>build indices</literal></term>
965 <listitem>
966 <para>
967 Fill in the indices that have previously been declared.
968 </para>
969 </listitem>
970 </varlistentry>
971 </variablelist>
973 </sect1>
975 <sect1 id="bki-structure">
976 <title>Structure of the Bootstrap <acronym>BKI</acronym> File</title>
978 <para>
979 The <literal>open</literal> command cannot be used until the tables it uses
980 exist and have entries for the table that is to be opened.
981 (These minimum tables are <structname>pg_class</structname>,
982 <structname>pg_attribute</structname>, <structname>pg_proc</structname>, and
983 <structname>pg_type</structname>.) To allow those tables themselves to be filled,
984 <literal>create</literal> with the <literal>bootstrap</literal> option implicitly opens
985 the created table for data insertion.
986 </para>
988 <para>
989 Also, the <literal>declare index</literal> and <literal>declare toast</literal>
990 commands cannot be used until the system catalogs they need have been
991 created and filled in.
992 </para>
994 <para>
995 Thus, the structure of the <filename>postgres.bki</filename> file has to
997 <orderedlist>
998 <listitem>
999 <para>
1000 <literal>create bootstrap</literal> one of the critical tables
1001 </para>
1002 </listitem>
1003 <listitem>
1004 <para>
1005 <literal>insert</literal> data describing at least the critical tables
1006 </para>
1007 </listitem>
1008 <listitem>
1009 <para>
1010 <literal>close</literal>
1011 </para>
1012 </listitem>
1013 <listitem>
1014 <para>
1015 Repeat for the other critical tables.
1016 </para>
1017 </listitem>
1018 <listitem>
1019 <para>
1020 <literal>create</literal> (without <literal>bootstrap</literal>) a noncritical table
1021 </para>
1022 </listitem>
1023 <listitem>
1024 <para>
1025 <literal>open</literal>
1026 </para>
1027 </listitem>
1028 <listitem>
1029 <para>
1030 <literal>insert</literal> desired data
1031 </para>
1032 </listitem>
1033 <listitem>
1034 <para>
1035 <literal>close</literal>
1036 </para>
1037 </listitem>
1038 <listitem>
1039 <para>
1040 Repeat for the other noncritical tables.
1041 </para>
1042 </listitem>
1043 <listitem>
1044 <para>
1045 Define indexes and toast tables.
1046 </para>
1047 </listitem>
1048 <listitem>
1049 <para>
1050 <literal>build indices</literal>
1051 </para>
1052 </listitem>
1053 </orderedlist>
1054 </para>
1056 <para>
1057 There are doubtless other, undocumented ordering dependencies.
1058 </para>
1059 </sect1>
1061 <sect1 id="bki-example">
1062 <title>BKI Example</title>
1064 <para>
1065 The following sequence of commands will create the table
1066 <literal>test_table</literal> with OID 420, having three columns
1067 <literal>oid</literal>, <literal>cola</literal> and <literal>colb</literal>
1068 of type <type>oid</type>, <type>int4</type> and <type>text</type>,
1069 respectively, and insert two rows into the table:
1070 <programlisting>
1071 create test_table 420 (oid = oid, cola = int4, colb = text)
1072 open test_table
1073 insert ( 421 1 'value 1' )
1074 insert ( 422 2 _null_ )
1075 close test_table
1076 </programlisting>
1077 </para>
1078 </sect1>
1079 </chapter>