Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / spgist.sgml
blob6af93719b843921d2b2e7b3882e1145cb30afb38
1 <!-- doc/src/sgml/spgist.sgml -->
3 <sect1 id="spgist">
4 <title>SP-GiST Indexes</title>
6 <indexterm>
7 <primary>index</primary>
8 <secondary>SP-GiST</secondary>
9 </indexterm>
11 <sect2 id="spgist-intro">
12 <title>Introduction</title>
14 <para>
15 <acronym>SP-GiST</acronym> is an abbreviation for space-partitioned
16 <acronym>GiST</acronym>. <acronym>SP-GiST</acronym> supports partitioned
17 search trees, which facilitate development of a wide range of different
18 non-balanced data structures, such as quad-trees, k-d trees, and radix
19 trees (tries). The common feature of these structures is that they
20 repeatedly divide the search space into partitions that need not be
21 of equal size. Searches that are well matched to the partitioning rule
22 can be very fast.
23 </para>
25 <para>
26 These popular data structures were originally developed for in-memory
27 usage. In main memory, they are usually designed as a set of dynamically
28 allocated nodes linked by pointers. This is not suitable for direct
29 storing on disk, since these chains of pointers can be rather long which
30 would require too many disk accesses. In contrast, disk-based data
31 structures should have a high fanout to minimize I/O. The challenge
32 addressed by <acronym>SP-GiST</acronym> is to map search tree nodes to
33 disk pages in such a way that a search need access only a few disk pages,
34 even if it traverses many nodes.
35 </para>
37 <para>
38 Like <acronym>GiST</acronym>, <acronym>SP-GiST</acronym> is meant to allow
39 the development of custom data types with the appropriate access methods,
40 by an expert in the domain of the data type, rather than a database expert.
41 </para>
43 <para>
44 Some of the information here is derived from Purdue University's
45 SP-GiST Indexing Project
46 <ulink url="https://www.cs.purdue.edu/spgist/">web site</ulink>.
47 The <acronym>SP-GiST</acronym> implementation in
48 <productname>PostgreSQL</productname> is primarily maintained by Teodor
49 Sigaev and Oleg Bartunov, and there is more information on their
50 <!-- URL will be changed -->
51 <ulink url="http://www.sai.msu.su/~megera/wiki/spgist_dev">web site</ulink>.
52 </para>
54 </sect2>
56 <sect2 id="spgist-builtin-opclasses">
57 <title>Built-in Operator Classes</title>
59 <para>
60 The core <productname>PostgreSQL</productname> distribution
61 includes the <acronym>SP-GiST</acronym> operator classes shown in
62 <xref linkend="spgist-builtin-opclasses-table"/>.
63 </para>
65 <table id="spgist-builtin-opclasses-table">
66 <title>Built-in <acronym>SP-GiST</acronym> Operator Classes</title>
67 <tgroup cols="3">
68 <thead>
69 <row>
70 <entry>Name</entry>
71 <entry>Indexable Operators</entry>
72 <entry>Ordering Operators</entry>
73 </row>
74 </thead>
75 <tbody>
76 <row>
77 <entry valign="middle" morerows="11"><literal>box_ops</literal></entry>
78 <entry><literal>&lt;&lt; (box,box)</literal></entry>
79 <entry valign="middle" morerows="11"><literal>&lt;-&gt; (box,point)</literal></entry>
80 </row>
81 <row><entry><literal>&amp;&lt; (box,box)</literal></entry></row>
82 <row><entry><literal>&amp;&gt; (box,box)</literal></entry></row>
83 <row><entry><literal>&gt;&gt; (box,box)</literal></entry></row>
84 <row><entry><literal>&lt;@ (box,box)</literal></entry></row>
85 <row><entry><literal>@&gt; (box,box)</literal></entry></row>
86 <row><entry><literal>~= (box,box)</literal></entry></row>
87 <row><entry><literal>&amp;&amp; (box,box)</literal></entry></row>
88 <row><entry><literal>&lt;&lt;| (box,box)</literal></entry></row>
89 <row><entry><literal>&amp;&lt;| (box,box)</literal></entry></row>
90 <row><entry><literal>|&amp;&gt; (box,box)</literal></entry></row>
91 <row><entry><literal>|&gt;&gt; (box,box)</literal></entry></row>
93 <row>
94 <entry valign="middle" morerows="10"><literal>inet_ops</literal></entry>
95 <entry><literal>&lt;&lt; (inet,inet)</literal></entry>
96 <entry valign="middle" morerows="10"></entry>
97 </row>
98 <row><entry><literal>&lt;&lt;= (inet,inet)</literal></entry></row>
99 <row><entry><literal>&gt;&gt; (inet,inet)</literal></entry></row>
100 <row><entry><literal>&gt;&gt;= (inet,inet)</literal></entry></row>
101 <row><entry><literal>= (inet,inet)</literal></entry></row>
102 <row><entry><literal>&lt;&gt; (inet,inet)</literal></entry></row>
103 <row><entry><literal>&lt; (inet,inet)</literal></entry></row>
104 <row><entry><literal>&lt;= (inet,inet)</literal></entry></row>
105 <row><entry><literal>&gt; (inet,inet)</literal></entry></row>
106 <row><entry><literal>&gt;= (inet,inet)</literal></entry></row>
107 <row><entry><literal>&amp;&amp; (inet,inet)</literal></entry></row>
109 <row>
110 <entry valign="middle" morerows="5"><literal>kd_point_ops</literal></entry>
111 <entry><literal>|&gt;&gt; (point,point)</literal></entry>
112 <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
113 </row>
114 <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
115 <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
116 <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
117 <row><entry><literal>~= (point,point)</literal></entry></row>
118 <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
120 <row>
121 <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry>
122 <entry><literal>&lt;&lt; (polygon,polygon)</literal></entry>
123 <entry valign="middle" morerows="11"><literal>&lt;-&gt; (polygon,point)</literal></entry>
124 </row>
125 <row><entry><literal>&amp;&lt; (polygon,polygon)</literal></entry></row>
126 <row><entry><literal>&amp;&gt; (polygon,polygon)</literal></entry></row>
127 <row><entry><literal>&gt;&gt; (polygon,polygon)</literal></entry></row>
128 <row><entry><literal>&lt;@ (polygon,polygon)</literal></entry></row>
129 <row><entry><literal>@&gt; (polygon,polygon)</literal></entry></row>
130 <row><entry><literal>~= (polygon,polygon)</literal></entry></row>
131 <row><entry><literal>&amp;&amp; (polygon,polygon)</literal></entry></row>
132 <row><entry><literal>&lt;&lt;| (polygon,polygon)</literal></entry></row>
133 <row><entry><literal>&amp;&lt;| (polygon,polygon)</literal></entry></row>
134 <row><entry><literal>|&gt;&gt; (polygon,polygon)</literal></entry></row>
135 <row><entry><literal>|&amp;&gt; (polygon,polygon)</literal></entry></row>
137 <row>
138 <entry valign="middle" morerows="5"><literal>quad_point_ops</literal></entry>
139 <entry><literal>|&gt;&gt; (point,point)</literal></entry>
140 <entry valign="middle" morerows="5"><literal>&lt;-&gt; (point,point)</literal></entry>
141 </row>
142 <row><entry><literal>&lt;&lt; (point,point)</literal></entry></row>
143 <row><entry><literal>&gt;&gt; (point,point)</literal></entry></row>
144 <row><entry><literal>&lt;&lt;| (point,point)</literal></entry></row>
145 <row><entry><literal>~= (point,point)</literal></entry></row>
146 <row><entry><literal>&lt;@ (point,box)</literal></entry></row>
148 <row>
149 <entry valign="middle" morerows="9"><literal>range_ops</literal></entry>
150 <entry><literal>= (anyrange,anyrange)</literal></entry>
151 <entry valign="middle" morerows="9"></entry>
152 </row>
153 <row><entry><literal>&amp;&amp; (anyrange,anyrange)</literal></entry></row>
154 <row><entry><literal>@&gt; (anyrange,anyelement)</literal></entry></row>
155 <row><entry><literal>@&gt; (anyrange,anyrange)</literal></entry></row>
156 <row><entry><literal>&lt;@ (anyrange,anyrange)</literal></entry></row>
157 <row><entry><literal>&lt;&lt; (anyrange,anyrange)</literal></entry></row>
158 <row><entry><literal>&gt;&gt; (anyrange,anyrange)</literal></entry></row>
159 <row><entry><literal>&amp;&lt; (anyrange,anyrange)</literal></entry></row>
160 <row><entry><literal>&amp;&gt; (anyrange,anyrange)</literal></entry></row>
161 <row><entry><literal>-|- (anyrange,anyrange)</literal></entry></row>
163 <row>
164 <entry valign="middle" morerows="9"><literal>text_ops</literal></entry>
165 <entry><literal>= (text,text)</literal></entry>
166 <entry valign="middle" morerows="9"></entry>
167 </row>
168 <row><entry><literal>&lt; (text,text)</literal></entry></row>
169 <row><entry><literal>&lt;= (text,text)</literal></entry></row>
170 <row><entry><literal>&gt; (text,text)</literal></entry></row>
171 <row><entry><literal>&gt;= (text,text)</literal></entry></row>
172 <row><entry><literal>~&lt;~ (text,text)</literal></entry></row>
173 <row><entry><literal>~&lt;=~ (text,text)</literal></entry></row>
174 <row><entry><literal>~&gt;=~ (text,text)</literal></entry></row>
175 <row><entry><literal>~&gt;~ (text,text)</literal></entry></row>
176 <row><entry><literal>^@ (text,text)</literal></entry></row>
177 </tbody>
178 </tgroup>
179 </table>
181 <para>
182 Of the two operator classes for type <type>point</type>,
183 <literal>quad_point_ops</literal> is the default. <literal>kd_point_ops</literal>
184 supports the same operators but uses a different index data structure that
185 may offer better performance in some applications.
186 </para>
187 <para>
188 The <literal>quad_point_ops</literal>, <literal>kd_point_ops</literal> and
189 <literal>poly_ops</literal> operator classes support the <literal>&lt;-&gt;</literal>
190 ordering operator, which enables the k-nearest neighbor (<literal>k-NN</literal>)
191 search over indexed point or polygon data sets.
192 </para>
194 </sect2>
196 <sect2 id="spgist-extensibility">
197 <title>Extensibility</title>
199 <para>
200 <acronym>SP-GiST</acronym> offers an interface with a high level of
201 abstraction, requiring the access method developer to implement only
202 methods specific to a given data type. The <acronym>SP-GiST</acronym> core
203 is responsible for efficient disk mapping and searching the tree structure.
204 It also takes care of concurrency and logging considerations.
205 </para>
207 <para>
208 Leaf tuples of an <acronym>SP-GiST</acronym> tree usually contain values
209 of the same data type as the indexed column, although it is also possible
210 for them to contain lossy representations of the indexed column.
211 Leaf tuples stored at the root level will directly represent
212 the original indexed data value, but leaf tuples at lower
213 levels might contain only a partial value, such as a suffix.
214 In that case the operator class support functions must be able to
215 reconstruct the original value using information accumulated from the
216 inner tuples that are passed through to reach the leaf level.
217 </para>
219 <para>
220 When an <acronym>SP-GiST</acronym> index is created with
221 <literal>INCLUDE</literal> columns, the values of those columns are also
222 stored in leaf tuples. The <literal>INCLUDE</literal> columns are of no
223 concern to the <acronym>SP-GiST</acronym> operator class, so they are
224 not discussed further here.
225 </para>
227 <para>
228 Inner tuples are more complex, since they are branching points in the
229 search tree. Each inner tuple contains a set of one or more
230 <firstterm>nodes</firstterm>, which represent groups of similar leaf values.
231 A node contains a downlink that leads either to another, lower-level inner
232 tuple, or to a short list of leaf tuples that all lie on the same index page.
233 Each node normally has a <firstterm>label</firstterm> that describes it; for example,
234 in a radix tree the node label could be the next character of the string
235 value. (Alternatively, an operator class can omit the node labels, if it
236 works with a fixed set of nodes for all inner tuples;
237 see <xref linkend="spgist-null-labels"/>.)
238 Optionally, an inner tuple can have a <firstterm>prefix</firstterm> value
239 that describes all its members. In a radix tree this could be the common
240 prefix of the represented strings. The prefix value is not necessarily
241 really a prefix, but can be any data needed by the operator class;
242 for example, in a quad-tree it can store the central point that the four
243 quadrants are measured with respect to. A quad-tree inner tuple would
244 then also contain four nodes corresponding to the quadrants around this
245 central point.
246 </para>
248 <para>
249 Some tree algorithms require knowledge of level (or depth) of the current
250 tuple, so the <acronym>SP-GiST</acronym> core provides the possibility for
251 operator classes to manage level counting while descending the tree.
252 There is also support for incrementally reconstructing the represented
253 value when that is needed, and for passing down additional data (called
254 <firstterm>traverse values</firstterm>) during a tree descent.
255 </para>
257 <note>
258 <para>
259 The <acronym>SP-GiST</acronym> core code takes care of null entries.
260 Although <acronym>SP-GiST</acronym> indexes do store entries for nulls
261 in indexed columns, this is hidden from the index operator class code:
262 no null index entries or search conditions will ever be passed to the
263 operator class methods. (It is assumed that <acronym>SP-GiST</acronym>
264 operators are strict and so cannot succeed for null values.) Null values
265 are therefore not discussed further here.
266 </para>
267 </note>
269 <para>
270 There are five user-defined methods that an index operator class for
271 <acronym>SP-GiST</acronym> must provide, and two are optional. All five
272 mandatory methods follow the convention of accepting two <type>internal</type>
273 arguments, the first of which is a pointer to a C struct containing input
274 values for the support method, while the second argument is a pointer to a
275 C struct where output values must be placed. Four of the mandatory methods just
276 return <type>void</type>, since all their results appear in the output struct; but
277 <function>leaf_consistent</function> returns a <type>boolean</type> result.
278 The methods must not modify any fields of their input structs. In all
279 cases, the output struct is initialized to zeroes before calling the
280 user-defined method. The optional sixth method <function>compress</function>
281 accepts a <type>datum</type> to be indexed as the only argument and returns a value suitable
282 for physical storage in a leaf tuple. The optional seventh method
283 <function>options</function> accepts an <type>internal</type> pointer to a C struct, where
284 opclass-specific parameters should be placed, and returns <type>void</type>.
285 </para>
287 <para>
288 The five mandatory user-defined methods are:
289 </para>
291 <variablelist>
292 <varlistentry>
293 <term><function>config</function></term>
294 <listitem>
295 <para>
296 Returns static information about the index implementation, including
297 the data type OIDs of the prefix and node label data types.
298 </para>
299 <para>
300 The <acronym>SQL</acronym> declaration of the function must look like this:
301 <programlisting>
302 CREATE FUNCTION my_config(internal, internal) RETURNS void ...
303 </programlisting>
304 The first argument is a pointer to a <structname>spgConfigIn</structname>
305 C struct, containing input data for the function.
306 The second argument is a pointer to a <structname>spgConfigOut</structname>
307 C struct, which the function must fill with result data.
308 <programlisting>
309 typedef struct spgConfigIn
311 Oid attType; /* Data type to be indexed */
312 } spgConfigIn;
314 typedef struct spgConfigOut
316 Oid prefixType; /* Data type of inner-tuple prefixes */
317 Oid labelType; /* Data type of inner-tuple node labels */
318 Oid leafType; /* Data type of leaf-tuple values */
319 bool canReturnData; /* Opclass can reconstruct original data */
320 bool longValuesOK; /* Opclass can cope with values &gt; 1 page */
321 } spgConfigOut;
322 </programlisting>
324 <structfield>attType</structfield> is passed in order to support polymorphic
325 index operator classes; for ordinary fixed-data-type operator classes, it
326 will always have the same value and so can be ignored.
327 </para>
329 <para>
330 For operator classes that do not use prefixes,
331 <structfield>prefixType</structfield> can be set to <literal>VOIDOID</literal>.
332 Likewise, for operator classes that do not use node labels,
333 <structfield>labelType</structfield> can be set to <literal>VOIDOID</literal>.
334 <structfield>canReturnData</structfield> should be set true if the operator class
335 is capable of reconstructing the originally-supplied index value.
336 <structfield>longValuesOK</structfield> should be set true only when the
337 <structfield>attType</structfield> is of variable length and the operator
338 class is capable of segmenting long values by repeated suffixing
339 (see <xref linkend="spgist-limits"/>).
340 </para>
342 <para>
343 <structfield>leafType</structfield> should match the index storage type
344 defined by the operator class's <structfield>opckeytype</structfield>
345 catalog entry.
346 (Note that <structfield>opckeytype</structfield> can be zero,
347 implying the storage type is the same as the operator class's input
348 type, which is the most common situation.)
349 For reasons of backward compatibility, the <function>config</function>
350 method can set <structfield>leafType</structfield> to some other value,
351 and that value will be used; but this is deprecated since the index
352 contents are then incorrectly identified in the catalogs.
353 Also, it's permissible to
354 leave <structfield>leafType</structfield> uninitialized (zero);
355 that is interpreted as meaning the index storage type derived from
356 <structfield>opckeytype</structfield>.
357 </para>
359 <para>
360 When <structfield>attType</structfield>
361 and <structfield>leafType</structfield> are different, the optional
362 method <function>compress</function> must be provided.
363 Method <function>compress</function> is responsible
364 for transformation of datums to be indexed from <structfield>attType</structfield>
365 to <structfield>leafType</structfield>.
366 </para>
367 </listitem>
368 </varlistentry>
370 <varlistentry>
371 <term><function>choose</function></term>
372 <listitem>
373 <para>
374 Chooses a method for inserting a new value into an inner tuple.
375 </para>
377 <para>
378 The <acronym>SQL</acronym> declaration of the function must look like this:
379 <programlisting>
380 CREATE FUNCTION my_choose(internal, internal) RETURNS void ...
381 </programlisting>
382 The first argument is a pointer to a <structname>spgChooseIn</structname>
383 C struct, containing input data for the function.
384 The second argument is a pointer to a <structname>spgChooseOut</structname>
385 C struct, which the function must fill with result data.
386 <programlisting>
387 typedef struct spgChooseIn
389 Datum datum; /* original datum to be indexed */
390 Datum leafDatum; /* current datum to be stored at leaf */
391 int level; /* current level (counting from zero) */
393 /* Data from current inner tuple */
394 bool allTheSame; /* tuple is marked all-the-same? */
395 bool hasPrefix; /* tuple has a prefix? */
396 Datum prefixDatum; /* if so, the prefix value */
397 int nNodes; /* number of nodes in the inner tuple */
398 Datum *nodeLabels; /* node label values (NULL if none) */
399 } spgChooseIn;
401 typedef enum spgChooseResultType
403 spgMatchNode = 1, /* descend into existing node */
404 spgAddNode, /* add a node to the inner tuple */
405 spgSplitTuple /* split inner tuple (change its prefix) */
406 } spgChooseResultType;
408 typedef struct spgChooseOut
410 spgChooseResultType resultType; /* action code, see above */
411 union
413 struct /* results for spgMatchNode */
415 int nodeN; /* descend to this node (index from 0) */
416 int levelAdd; /* increment level by this much */
417 Datum restDatum; /* new leaf datum */
418 } matchNode;
419 struct /* results for spgAddNode */
421 Datum nodeLabel; /* new node's label */
422 int nodeN; /* where to insert it (index from 0) */
423 } addNode;
424 struct /* results for spgSplitTuple */
426 /* Info to form new upper-level inner tuple with one child tuple */
427 bool prefixHasPrefix; /* tuple should have a prefix? */
428 Datum prefixPrefixDatum; /* if so, its value */
429 int prefixNNodes; /* number of nodes */
430 Datum *prefixNodeLabels; /* their labels (or NULL for
431 * no labels) */
432 int childNodeN; /* which node gets child tuple */
434 /* Info to form new lower-level inner tuple with all old nodes */
435 bool postfixHasPrefix; /* tuple should have a prefix? */
436 Datum postfixPrefixDatum; /* if so, its value */
437 } splitTuple;
438 } result;
439 } spgChooseOut;
440 </programlisting>
442 <structfield>datum</structfield> is the original datum of
443 <structname>spgConfigIn</structname>.<structfield>attType</structfield>
444 type that was to be inserted into the index.
445 <structfield>leafDatum</structfield> is a value of
446 <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
447 type, which is initially a result of method
448 <function>compress</function> applied to <structfield>datum</structfield>
449 when method <function>compress</function> is provided, or the same value as
450 <structfield>datum</structfield> otherwise.
451 <structfield>leafDatum</structfield> can change at lower levels of the tree
452 if the <function>choose</function> or <function>picksplit</function>
453 methods change it. When the insertion search reaches a leaf page,
454 the current value of <structfield>leafDatum</structfield> is what will be stored
455 in the newly created leaf tuple.
456 <structfield>level</structfield> is the current inner tuple's level, starting at
457 zero for the root level.
458 <structfield>allTheSame</structfield> is true if the current inner tuple is
459 marked as containing multiple equivalent nodes
460 (see <xref linkend="spgist-all-the-same"/>).
461 <structfield>hasPrefix</structfield> is true if the current inner tuple contains
462 a prefix; if so,
463 <structfield>prefixDatum</structfield> is its value.
464 <structfield>nNodes</structfield> is the number of child nodes contained in the
465 inner tuple, and
466 <structfield>nodeLabels</structfield> is an array of their label values, or
467 NULL if there are no labels.
468 </para>
470 <para>
471 The <function>choose</function> function can determine either that
472 the new value matches one of the existing child nodes, or that a new
473 child node must be added, or that the new value is inconsistent with
474 the tuple prefix and so the inner tuple must be split to create a
475 less restrictive prefix.
476 </para>
478 <para>
479 If the new value matches one of the existing child nodes,
480 set <structfield>resultType</structfield> to <literal>spgMatchNode</literal>.
481 Set <structfield>nodeN</structfield> to the index (from zero) of that node in
482 the node array.
483 Set <structfield>levelAdd</structfield> to the increment in
484 <structfield>level</structfield> caused by descending through that node,
485 or leave it as zero if the operator class does not use levels.
486 Set <structfield>restDatum</structfield> to equal <structfield>leafDatum</structfield>
487 if the operator class does not modify datums from one level to the
488 next, or otherwise set it to the modified value to be used as
489 <structfield>leafDatum</structfield> at the next level.
490 </para>
492 <para>
493 If a new child node must be added,
494 set <structfield>resultType</structfield> to <literal>spgAddNode</literal>.
495 Set <structfield>nodeLabel</structfield> to the label to be used for the new
496 node, and set <structfield>nodeN</structfield> to the index (from zero) at which
497 to insert the node in the node array.
498 After the node has been added, the <function>choose</function>
499 function will be called again with the modified inner tuple;
500 that call should result in an <literal>spgMatchNode</literal> result.
501 </para>
503 <para>
504 If the new value is inconsistent with the tuple prefix,
505 set <structfield>resultType</structfield> to <literal>spgSplitTuple</literal>.
506 This action moves all the existing nodes into a new lower-level
507 inner tuple, and replaces the existing inner tuple with a tuple
508 having a single downlink pointing to the new lower-level inner tuple.
509 Set <structfield>prefixHasPrefix</structfield> to indicate whether the new
510 upper tuple should have a prefix, and if so set
511 <structfield>prefixPrefixDatum</structfield> to the prefix value. This new
512 prefix value must be sufficiently less restrictive than the original
513 to accept the new value to be indexed.
514 Set <structfield>prefixNNodes</structfield> to the number of nodes needed in the
515 new tuple, and set <structfield>prefixNodeLabels</structfield> to a palloc'd array
516 holding their labels, or to NULL if node labels are not required.
517 Note that the total size of the new upper tuple must be no more
518 than the total size of the tuple it is replacing; this constrains
519 the lengths of the new prefix and new labels.
520 Set <structfield>childNodeN</structfield> to the index (from zero) of the node
521 that will downlink to the new lower-level inner tuple.
522 Set <structfield>postfixHasPrefix</structfield> to indicate whether the new
523 lower-level inner tuple should have a prefix, and if so set
524 <structfield>postfixPrefixDatum</structfield> to the prefix value. The
525 combination of these two prefixes and the downlink node's label
526 (if any) must have the same meaning as the original prefix, because
527 there is no opportunity to alter the node labels that are moved to
528 the new lower-level tuple, nor to change any child index entries.
529 After the node has been split, the <function>choose</function>
530 function will be called again with the replacement inner tuple.
531 That call may return an <literal>spgAddNode</literal> result, if no suitable
532 node was created by the <literal>spgSplitTuple</literal> action. Eventually
533 <function>choose</function> must return <literal>spgMatchNode</literal> to
534 allow the insertion to descend to the next level.
535 </para>
536 </listitem>
537 </varlistentry>
539 <varlistentry>
540 <term><function>picksplit</function></term>
541 <listitem>
542 <para>
543 Decides how to create a new inner tuple over a set of leaf tuples.
544 </para>
546 <para>
547 The <acronym>SQL</acronym> declaration of the function must look like this:
548 <programlisting>
549 CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...
550 </programlisting>
551 The first argument is a pointer to a <structname>spgPickSplitIn</structname>
552 C struct, containing input data for the function.
553 The second argument is a pointer to a <structname>spgPickSplitOut</structname>
554 C struct, which the function must fill with result data.
555 <programlisting>
556 typedef struct spgPickSplitIn
558 int nTuples; /* number of leaf tuples */
559 Datum *datums; /* their datums (array of length nTuples) */
560 int level; /* current level (counting from zero) */
561 } spgPickSplitIn;
563 typedef struct spgPickSplitOut
565 bool hasPrefix; /* new inner tuple should have a prefix? */
566 Datum prefixDatum; /* if so, its value */
568 int nNodes; /* number of nodes for new inner tuple */
569 Datum *nodeLabels; /* their labels (or NULL for no labels) */
571 int *mapTuplesToNodes; /* node index for each leaf tuple */
572 Datum *leafTupleDatums; /* datum to store in each new leaf tuple */
573 } spgPickSplitOut;
574 </programlisting>
576 <structfield>nTuples</structfield> is the number of leaf tuples provided.
577 <structfield>datums</structfield> is an array of their datum values of
578 <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
579 type.
580 <structfield>level</structfield> is the current level that all the leaf tuples
581 share, which will become the level of the new inner tuple.
582 </para>
584 <para>
585 Set <structfield>hasPrefix</structfield> to indicate whether the new inner
586 tuple should have a prefix, and if so set
587 <structfield>prefixDatum</structfield> to the prefix value.
588 Set <structfield>nNodes</structfield> to indicate the number of nodes that
589 the new inner tuple will contain, and
590 set <structfield>nodeLabels</structfield> to an array of their label values,
591 or to NULL if node labels are not required.
592 Set <structfield>mapTuplesToNodes</structfield> to an array that gives the index
593 (from zero) of the node that each leaf tuple should be assigned to.
594 Set <structfield>leafTupleDatums</structfield> to an array of the values to
595 be stored in the new leaf tuples (these will be the same as the
596 input <structfield>datums</structfield> if the operator class does not modify
597 datums from one level to the next).
598 Note that the <function>picksplit</function> function is
599 responsible for palloc'ing the
600 <structfield>nodeLabels</structfield>, <structfield>mapTuplesToNodes</structfield> and
601 <structfield>leafTupleDatums</structfield> arrays.
602 </para>
604 <para>
605 If more than one leaf tuple is supplied, it is expected that the
606 <function>picksplit</function> function will classify them into more than
607 one node; otherwise it is not possible to split the leaf tuples
608 across multiple pages, which is the ultimate purpose of this
609 operation. Therefore, if the <function>picksplit</function> function
610 ends up placing all the leaf tuples in the same node, the core
611 SP-GiST code will override that decision and generate an inner
612 tuple in which the leaf tuples are assigned at random to several
613 identically-labeled nodes. Such a tuple is marked
614 <literal>allTheSame</literal> to signify that this has happened. The
615 <function>choose</function> and <function>inner_consistent</function> functions
616 must take suitable care with such inner tuples.
617 See <xref linkend="spgist-all-the-same"/> for more information.
618 </para>
620 <para>
621 <function>picksplit</function> can be applied to a single leaf tuple only
622 in the case that the <function>config</function> function set
623 <structfield>longValuesOK</structfield> to true and a larger-than-a-page input
624 value has been supplied. In this case the point of the operation is
625 to strip off a prefix and produce a new, shorter leaf datum value.
626 The call will be repeated until a leaf datum short enough to fit on
627 a page has been produced. See <xref linkend="spgist-limits"/> for
628 more information.
629 </para>
630 </listitem>
631 </varlistentry>
633 <varlistentry>
634 <term><function>inner_consistent</function></term>
635 <listitem>
636 <para>
637 Returns set of nodes (branches) to follow during tree search.
638 </para>
640 <para>
641 The <acronym>SQL</acronym> declaration of the function must look like this:
642 <programlisting>
643 CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...
644 </programlisting>
645 The first argument is a pointer to a <structname>spgInnerConsistentIn</structname>
646 C struct, containing input data for the function.
647 The second argument is a pointer to a <structname>spgInnerConsistentOut</structname>
648 C struct, which the function must fill with result data.
650 <programlisting>
651 typedef struct spgInnerConsistentIn
653 ScanKey scankeys; /* array of operators and comparison values */
654 ScanKey orderbys; /* array of ordering operators and comparison
655 * values */
656 int nkeys; /* length of scankeys array */
657 int norderbys; /* length of orderbys array */
659 Datum reconstructedValue; /* value reconstructed at parent */
660 void *traversalValue; /* opclass-specific traverse value */
661 MemoryContext traversalMemoryContext; /* put new traverse values here */
662 int level; /* current level (counting from zero) */
663 bool returnData; /* original data must be returned? */
665 /* Data from current inner tuple */
666 bool allTheSame; /* tuple is marked all-the-same? */
667 bool hasPrefix; /* tuple has a prefix? */
668 Datum prefixDatum; /* if so, the prefix value */
669 int nNodes; /* number of nodes in the inner tuple */
670 Datum *nodeLabels; /* node label values (NULL if none) */
671 } spgInnerConsistentIn;
673 typedef struct spgInnerConsistentOut
675 int nNodes; /* number of child nodes to be visited */
676 int *nodeNumbers; /* their indexes in the node array */
677 int *levelAdds; /* increment level by this much for each */
678 Datum *reconstructedValues; /* associated reconstructed values */
679 void **traversalValues; /* opclass-specific traverse values */
680 double **distances; /* associated distances */
681 } spgInnerConsistentOut;
682 </programlisting>
684 The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
685 describes the index search condition(s). These conditions are
686 combined with AND &mdash; only index entries that satisfy all of
687 them are interesting. (Note that <structfield>nkeys</structfield> = 0 implies
688 that all index entries satisfy the query.) Usually the consistent
689 function only cares about the <structfield>sk_strategy</structfield> and
690 <structfield>sk_argument</structfield> fields of each array entry, which
691 respectively give the indexable operator and comparison value.
692 In particular it is not necessary to check <structfield>sk_flags</structfield> to
693 see if the comparison value is NULL, because the SP-GiST core code
694 will filter out such conditions.
695 The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
696 describes ordering operators (if any) in the same manner.
697 <structfield>reconstructedValue</structfield> is the value reconstructed for the
698 parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
699 <function>inner_consistent</function> function did not provide a value at the
700 parent level.
701 <structfield>traversalValue</structfield> is a pointer to any traverse data
702 passed down from the previous call of <function>inner_consistent</function>
703 on the parent index tuple, or NULL at the root level.
704 <structfield>traversalMemoryContext</structfield> is the memory context in which
705 to store output traverse values (see below).
706 <structfield>level</structfield> is the current inner tuple's level, starting at
707 zero for the root level.
708 <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
709 required for this query; this will only be so if the
710 <function>config</function> function asserted <structfield>canReturnData</structfield>.
711 <structfield>allTheSame</structfield> is true if the current inner tuple is
712 marked <quote>all-the-same</quote>; in this case all the nodes have the
713 same label (if any) and so either all or none of them match the query
714 (see <xref linkend="spgist-all-the-same"/>).
715 <structfield>hasPrefix</structfield> is true if the current inner tuple contains
716 a prefix; if so,
717 <structfield>prefixDatum</structfield> is its value.
718 <structfield>nNodes</structfield> is the number of child nodes contained in the
719 inner tuple, and
720 <structfield>nodeLabels</structfield> is an array of their label values, or
721 NULL if the nodes do not have labels.
722 </para>
724 <para>
725 <structfield>nNodes</structfield> must be set to the number of child nodes that
726 need to be visited by the search, and
727 <structfield>nodeNumbers</structfield> must be set to an array of their indexes.
728 If the operator class keeps track of levels, set
729 <structfield>levelAdds</structfield> to an array of the level increments
730 required when descending to each node to be visited. (Often these
731 increments will be the same for all the nodes, but that's not
732 necessarily so, so an array is used.)
733 If value reconstruction is needed, set
734 <structfield>reconstructedValues</structfield> to an array of the values
735 reconstructed for each child node to be visited; otherwise, leave
736 <structfield>reconstructedValues</structfield> as NULL.
737 The reconstructed values are assumed to be of type
738 <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
739 (However, since the core system will do nothing with them except
740 possibly copy them, it is sufficient for them to have the
741 same <literal>typlen</literal> and <literal>typbyval</literal>
742 properties as <structfield>leafType</structfield>.)
743 If ordered search is performed, set <structfield>distances</structfield>
744 to an array of distance values according to <structfield>orderbys</structfield>
745 array (nodes with lowest distances will be processed first). Leave it
746 NULL otherwise.
747 If it is desired to pass down additional out-of-band information
748 (<quote>traverse values</quote>) to lower levels of the tree search,
749 set <structfield>traversalValues</structfield> to an array of the appropriate
750 traverse values, one for each child node to be visited; otherwise,
751 leave <structfield>traversalValues</structfield> as NULL.
752 Note that the <function>inner_consistent</function> function is
753 responsible for palloc'ing the
754 <structfield>nodeNumbers</structfield>, <structfield>levelAdds</structfield>,
755 <structfield>distances</structfield>,
756 <structfield>reconstructedValues</structfield>, and
757 <structfield>traversalValues</structfield> arrays in the current memory context.
758 However, any output traverse values pointed to by
759 the <structfield>traversalValues</structfield> array should be allocated
760 in <structfield>traversalMemoryContext</structfield>.
761 Each traverse value must be a single palloc'd chunk.
762 </para>
763 </listitem>
764 </varlistentry>
766 <varlistentry>
767 <term><function>leaf_consistent</function></term>
768 <listitem>
769 <para>
770 Returns true if a leaf tuple satisfies a query.
771 </para>
773 <para>
774 The <acronym>SQL</acronym> declaration of the function must look like this:
775 <programlisting>
776 CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...
777 </programlisting>
778 The first argument is a pointer to a <structname>spgLeafConsistentIn</structname>
779 C struct, containing input data for the function.
780 The second argument is a pointer to a <structname>spgLeafConsistentOut</structname>
781 C struct, which the function must fill with result data.
782 <programlisting>
783 typedef struct spgLeafConsistentIn
785 ScanKey scankeys; /* array of operators and comparison values */
786 ScanKey orderbys; /* array of ordering operators and comparison
787 * values */
788 int nkeys; /* length of scankeys array */
789 int norderbys; /* length of orderbys array */
791 Datum reconstructedValue; /* value reconstructed at parent */
792 void *traversalValue; /* opclass-specific traverse value */
793 int level; /* current level (counting from zero) */
794 bool returnData; /* original data must be returned? */
796 Datum leafDatum; /* datum in leaf tuple */
797 } spgLeafConsistentIn;
799 typedef struct spgLeafConsistentOut
801 Datum leafValue; /* reconstructed original data, if any */
802 bool recheck; /* set true if operator must be rechecked */
803 bool recheckDistances; /* set true if distances must be rechecked */
804 double *distances; /* associated distances */
805 } spgLeafConsistentOut;
806 </programlisting>
808 The array <structfield>scankeys</structfield>, of length <structfield>nkeys</structfield>,
809 describes the index search condition(s). These conditions are
810 combined with AND &mdash; only index entries that satisfy all of
811 them satisfy the query. (Note that <structfield>nkeys</structfield> = 0 implies
812 that all index entries satisfy the query.) Usually the consistent
813 function only cares about the <structfield>sk_strategy</structfield> and
814 <structfield>sk_argument</structfield> fields of each array entry, which
815 respectively give the indexable operator and comparison value.
816 In particular it is not necessary to check <structfield>sk_flags</structfield> to
817 see if the comparison value is NULL, because the SP-GiST core code
818 will filter out such conditions.
819 The array <structfield>orderbys</structfield>, of length <structfield>norderbys</structfield>,
820 describes the ordering operators in the same manner.
821 <structfield>reconstructedValue</structfield> is the value reconstructed for the
822 parent tuple; it is <literal>(Datum) 0</literal> at the root level or if the
823 <function>inner_consistent</function> function did not provide a value at the
824 parent level.
825 <structfield>traversalValue</structfield> is a pointer to any traverse data
826 passed down from the previous call of <function>inner_consistent</function>
827 on the parent index tuple, or NULL at the root level.
828 <structfield>level</structfield> is the current leaf tuple's level, starting at
829 zero for the root level.
830 <structfield>returnData</structfield> is <literal>true</literal> if reconstructed data is
831 required for this query; this will only be so if the
832 <function>config</function> function asserted <structfield>canReturnData</structfield>.
833 <structfield>leafDatum</structfield> is the key value of
834 <structname>spgConfigOut</structname>.<structfield>leafType</structfield>
835 stored in the current leaf tuple.
836 </para>
838 <para>
839 The function must return <literal>true</literal> if the leaf tuple matches the
840 query, or <literal>false</literal> if not. In the <literal>true</literal> case,
841 if <structfield>returnData</structfield> is <literal>true</literal> then
842 <structfield>leafValue</structfield> must be set to the value (of type
843 <structname>spgConfigIn</structname>.<structfield>attType</structfield>)
844 originally supplied to be indexed for this leaf tuple. Also,
845 <structfield>recheck</structfield> may be set to <literal>true</literal> if the match
846 is uncertain and so the operator(s) must be re-applied to the actual
847 heap tuple to verify the match.
848 If ordered search is performed, set <structfield>distances</structfield>
849 to an array of distance values according to <structfield>orderbys</structfield>
850 array. Leave it NULL otherwise. If at least one of returned distances
851 is not exact, set <structfield>recheckDistances</structfield> to true.
852 In this case, the executor will calculate the exact distances after
853 fetching the tuple from the heap, and will reorder the tuples if needed.
854 </para>
855 </listitem>
856 </varlistentry>
857 </variablelist>
859 <para>
860 The optional user-defined methods are:
861 </para>
863 <variablelist>
864 <varlistentry>
865 <term><function>Datum compress(Datum in)</function></term>
866 <listitem>
867 <para>
868 Converts a data item into a format suitable for physical storage in
869 a leaf tuple of the index. It accepts a value of type
870 <structname>spgConfigIn</structname>.<structfield>attType</structfield>
871 and returns a value of type
872 <structname>spgConfigOut</structname>.<structfield>leafType</structfield>.
873 The output value must not contain an out-of-line TOAST pointer.
874 </para>
876 <para>
877 Note: the <function>compress</function> method is only applied to
878 values to be stored. The consistent methods receive query
879 <structfield>scankeys</structfield> unchanged, without transformation
880 using <function>compress</function>.
881 </para>
882 </listitem>
883 </varlistentry>
885 <varlistentry>
886 <term><function>options</function></term>
887 <listitem>
888 <para>
889 Defines a set of user-visible parameters that control operator class
890 behavior.
891 </para>
893 <para>
894 The <acronym>SQL</acronym> declaration of the function must look like this:
896 <programlisting>
897 CREATE OR REPLACE FUNCTION my_options(internal)
898 RETURNS void
899 AS 'MODULE_PATHNAME'
900 LANGUAGE C STRICT;
901 </programlisting>
902 </para>
904 <para>
905 The function is passed a pointer to a <structname>local_relopts</structname>
906 struct, which needs to be filled with a set of operator class
907 specific options. The options can be accessed from other support
908 functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and
909 <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros.
910 </para>
912 <para>
913 Since the representation of the key in <acronym>SP-GiST</acronym> is
914 flexible, it may depend on user-specified parameters.
915 </para>
916 </listitem>
917 </varlistentry>
918 </variablelist>
920 <para>
921 All the SP-GiST support methods are normally called in a short-lived
922 memory context; that is, <varname>CurrentMemoryContext</varname> will be reset
923 after processing of each tuple. It is therefore not very important to
924 worry about pfree'ing everything you palloc. (The <function>config</function>
925 method is an exception: it should try to avoid leaking memory. But
926 usually the <function>config</function> method need do nothing but assign
927 constants into the passed parameter struct.)
928 </para>
930 <para>
931 If the indexed column is of a collatable data type, the index collation
932 will be passed to all the support methods, using the standard
933 <function>PG_GET_COLLATION()</function> mechanism.
934 </para>
936 </sect2>
938 <sect2 id="spgist-implementation">
939 <title>Implementation</title>
941 <para>
942 This section covers implementation details and other tricks that are
943 useful for implementers of <acronym>SP-GiST</acronym> operator classes to
944 know.
945 </para>
947 <sect3 id="spgist-limits">
948 <title>SP-GiST Limits</title>
950 <para>
951 Individual leaf tuples and inner tuples must fit on a single index page
952 (8kB by default). Therefore, when indexing values of variable-length
953 data types, long values can only be supported by methods such as radix
954 trees, in which each level of the tree includes a prefix that is short
955 enough to fit on a page, and the final leaf level includes a suffix also
956 short enough to fit on a page. The operator class should set
957 <structfield>longValuesOK</structfield> to true only if it is prepared to arrange for
958 this to happen. Otherwise, the <acronym>SP-GiST</acronym> core will
959 reject any request to index a value that is too large to fit
960 on an index page.
961 </para>
963 <para>
964 Likewise, it is the operator class's responsibility that inner tuples
965 do not grow too large to fit on an index page; this limits the number
966 of child nodes that can be used in one inner tuple, as well as the
967 maximum size of a prefix value.
968 </para>
970 <para>
971 Another limitation is that when an inner tuple's node points to a set
972 of leaf tuples, those tuples must all be in the same index page.
973 (This is a design decision to reduce seeking and save space in the
974 links that chain such tuples together.) If the set of leaf tuples
975 grows too large for a page, a split is performed and an intermediate
976 inner tuple is inserted. For this to fix the problem, the new inner
977 tuple <emphasis>must</emphasis> divide the set of leaf values into more than one
978 node group. If the operator class's <function>picksplit</function> function
979 fails to do that, the <acronym>SP-GiST</acronym> core resorts to
980 extraordinary measures described in <xref linkend="spgist-all-the-same"/>.
981 </para>
983 <para>
984 When <structfield>longValuesOK</structfield> is true, it is expected
985 that successive levels of the <acronym>SP-GiST</acronym> tree will
986 absorb more and more information into the prefixes and node labels of
987 the inner tuples, making the required leaf datum smaller and smaller,
988 so that eventually it will fit on a page.
989 To prevent bugs in operator classes from causing infinite insertion
990 loops, the <acronym>SP-GiST</acronym> core will raise an error if the
991 leaf datum does not become any smaller within ten cycles
992 of <function>choose</function> method calls.
993 </para>
994 </sect3>
996 <sect3 id="spgist-null-labels">
997 <title>SP-GiST Without Node Labels</title>
999 <para>
1000 Some tree algorithms use a fixed set of nodes for each inner tuple;
1001 for example, in a quad-tree there are always exactly four nodes
1002 corresponding to the four quadrants around the inner tuple's centroid
1003 point. In such a case the code typically works with the nodes by
1004 number, and there is no need for explicit node labels. To suppress
1005 node labels (and thereby save some space), the <function>picksplit</function>
1006 function can return NULL for the <structfield>nodeLabels</structfield> array,
1007 and likewise the <function>choose</function> function can return NULL for
1008 the <structfield>prefixNodeLabels</structfield> array during
1009 a <literal>spgSplitTuple</literal> action.
1010 This will in turn result in <structfield>nodeLabels</structfield> being NULL during
1011 subsequent calls to <function>choose</function> and <function>inner_consistent</function>.
1012 In principle, node labels could be used for some inner tuples and omitted
1013 for others in the same index.
1014 </para>
1016 <para>
1017 When working with an inner tuple having unlabeled nodes, it is an error
1018 for <function>choose</function> to return <literal>spgAddNode</literal>, since the set
1019 of nodes is supposed to be fixed in such cases.
1020 </para>
1021 </sect3>
1023 <sect3 id="spgist-all-the-same">
1024 <title><quote>All-the-Same</quote> Inner Tuples</title>
1026 <para>
1027 The <acronym>SP-GiST</acronym> core can override the results of the
1028 operator class's <function>picksplit</function> function when
1029 <function>picksplit</function> fails to divide the supplied leaf values into
1030 at least two node categories. When this happens, the new inner tuple
1031 is created with multiple nodes that each have the same label (if any)
1032 that <function>picksplit</function> gave to the one node it did use, and the
1033 leaf values are divided at random among these equivalent nodes.
1034 The <literal>allTheSame</literal> flag is set on the inner tuple to warn the
1035 <function>choose</function> and <function>inner_consistent</function> functions that the
1036 tuple does not have the node set that they might otherwise expect.
1037 </para>
1039 <para>
1040 When dealing with an <literal>allTheSame</literal> tuple, a <function>choose</function>
1041 result of <literal>spgMatchNode</literal> is interpreted to mean that the new
1042 value can be assigned to any of the equivalent nodes; the core code will
1043 ignore the supplied <structfield>nodeN</structfield> value and descend into one
1044 of the nodes at random (so as to keep the tree balanced). It is an
1045 error for <function>choose</function> to return <literal>spgAddNode</literal>, since
1046 that would make the nodes not all equivalent; the
1047 <literal>spgSplitTuple</literal> action must be used if the value to be inserted
1048 doesn't match the existing nodes.
1049 </para>
1051 <para>
1052 When dealing with an <literal>allTheSame</literal> tuple, the
1053 <function>inner_consistent</function> function should return either all or none
1054 of the nodes as targets for continuing the index search, since they are
1055 all equivalent. This may or may not require any special-case code,
1056 depending on how much the <function>inner_consistent</function> function normally
1057 assumes about the meaning of the nodes.
1058 </para>
1059 </sect3>
1061 </sect2>
1063 <sect2 id="spgist-examples">
1064 <title>Examples</title>
1066 <para>
1067 The <productname>PostgreSQL</productname> source distribution includes
1068 several examples of index operator classes for <acronym>SP-GiST</acronym>,
1069 as described in <xref linkend="spgist-builtin-opclasses-table"/>. Look
1070 into <filename>src/backend/access/spgist/</filename>
1071 and <filename>src/backend/utils/adt/</filename> to see the code.
1072 </para>
1074 </sect2>
1076 </sect1>