At update of non-LP_NORMAL TID, fail instead of corrupting page header.
[pgsql.git] / doc / src / sgml / gist.sgml
blob99098ab2522831815f9b40d805656328bede09a2
1 <!-- doc/src/sgml/gist.sgml -->
3 <sect1 id="gist">
4 <title>GiST Indexes</title>
6 <indexterm>
7 <primary>index</primary>
8 <secondary>GiST</secondary>
9 </indexterm>
11 <sect2 id="gist-intro">
12 <title>Introduction</title>
14 <para>
15 <acronym>GiST</acronym> stands for Generalized Search Tree. It is a
16 balanced, tree-structured access method, that acts as a base template in
17 which to implement arbitrary indexing schemes. B-trees, R-trees and many
18 other indexing schemes can be implemented in <acronym>GiST</acronym>.
19 </para>
21 <para>
22 One advantage of <acronym>GiST</acronym> is that it allows the development
23 of custom data types with the appropriate access methods, by
24 an expert in the domain of the data type, rather than a database expert.
25 </para>
27 <para>
28 Some of the information here is derived from the University of California
29 at Berkeley's GiST Indexing Project
30 <ulink url="http://gist.cs.berkeley.edu/">web site</ulink> and
31 Marcel Kornacker's thesis,
32 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz">
33 Access Methods for Next-Generation Database Systems</ulink>.
34 The <acronym>GiST</acronym>
35 implementation in <productname>PostgreSQL</productname> is primarily
36 maintained by Teodor Sigaev and Oleg Bartunov, and there is more
37 information on their
38 <ulink url="http://www.sai.msu.su/~megera/postgres/gist/">web site</ulink>.
39 </para>
41 </sect2>
43 <sect2 id="gist-builtin-opclasses">
44 <title>Built-in Operator Classes</title>
46 <para>
47 The core <productname>PostgreSQL</productname> distribution
48 includes the <acronym>GiST</acronym> operator classes shown in
49 <xref linkend="gist-builtin-opclasses-table"/>.
50 (Some of the optional modules described in <xref linkend="contrib"/>
51 provide additional <acronym>GiST</acronym> operator classes.)
52 </para>
54 <table id="gist-builtin-opclasses-table">
55 <title>Built-in <acronym>GiST</acronym> Operator Classes</title>
56 <tgroup cols="3">
57 <colspec colname="col1" colwidth="2*"/>
58 <colspec colname="col2" colwidth="3*"/>
59 <colspec colname="col3" colwidth="2*"/>
60 <thead>
61 <row>
62 <entry>Name</entry>
63 <entry>Indexable Operators</entry>
64 <entry>Ordering Operators</entry>
65 </row>
66 </thead>
67 <tbody>
68 <row>
69 <entry valign="middle" morerows="11"><literal>box_ops</literal></entry>
70 <entry><literal>&lt;&lt; (box, box)</literal></entry>
71 <entry valign="middle" morerows="11"><literal>&lt;-&gt; (box, point)</literal></entry>
72 </row>
73 <row><entry><literal>&amp;&lt; (box, box)</literal></entry></row>
74 <row><entry><literal>&amp;&amp; (box, box)</literal></entry></row>
75 <row><entry><literal>&amp;&gt; (box, box)</literal></entry></row>
76 <row><entry><literal>&gt;&gt; (box, box)</literal></entry></row>
77 <row><entry><literal>~= (box, box)</literal></entry></row>
78 <row><entry><literal>@&gt; (box, box)</literal></entry></row>
79 <row><entry><literal>&lt;@ (box, box)</literal></entry></row>
80 <row><entry><literal>&amp;&lt;| (box, box)</literal></entry></row>
81 <row><entry><literal>&lt;&lt;| (box, box)</literal></entry></row>
82 <row><entry><literal>|&gt;&gt; (box, box)</literal></entry></row>
83 <row><entry><literal>|&amp;&gt; (box, box)</literal></entry></row>
85 <row>
86 <entry valign="middle" morerows="11"><literal>circle_ops</literal></entry>
87 <entry><literal>&lt;&lt; (circle, circle)</literal></entry>
88 <entry valign="middle" morerows="11"><literal>&lt;-&gt; (circle, point)</literal></entry>
89 </row>
90 <row><entry><literal>&amp;&lt; (circle, circle)</literal></entry></row>
91 <row><entry><literal>&amp;&gt; (circle, circle)</literal></entry></row>
92 <row><entry><literal>&gt;&gt; (circle, circle)</literal></entry></row>
93 <row><entry><literal>&lt;@ (circle, circle)</literal></entry></row>
94 <row><entry><literal>@&gt; (circle, circle)</literal></entry></row>
95 <row><entry><literal>~= (circle, circle)</literal></entry></row>
96 <row><entry><literal>&amp;&amp; (circle, circle)</literal></entry></row>
97 <row><entry><literal>|&gt;&gt; (circle, circle)</literal></entry></row>
98 <row><entry><literal>&lt;&lt;| (circle, circle)</literal></entry></row>
99 <row><entry><literal>&amp;&lt;| (circle, circle)</literal></entry></row>
100 <row><entry><literal>|&amp;&gt; (circle, circle)</literal></entry></row>
102 <row>
103 <entry valign="middle" morerows="10"><literal>inet_ops</literal></entry>
104 <entry><literal>&lt;&lt; (inet, inet)</literal></entry>
105 <entry valign="middle" morerows="10"></entry>
106 </row>
107 <row><entry><literal>&lt;&lt;= (inet, inet)</literal></entry></row>
108 <row><entry><literal>&gt;&gt; (inet, inet)</literal></entry></row>
109 <row><entry><literal>&gt;&gt;= (inet, inet)</literal></entry></row>
110 <row><entry><literal>= (inet, inet)</literal></entry></row>
111 <row><entry><literal>&lt;&gt; (inet, inet)</literal></entry></row>
112 <row><entry><literal>&lt; (inet, inet)</literal></entry></row>
113 <row><entry><literal>&lt;= (inet, inet)</literal></entry></row>
114 <row><entry><literal>&gt; (inet, inet)</literal></entry></row>
115 <row><entry><literal>&gt;= (inet, inet)</literal></entry></row>
116 <row><entry><literal>&amp;&amp; (inet, inet)</literal></entry></row>
118 <row>
119 <entry valign="middle" morerows="17"><literal>multirange_ops</literal></entry>
120 <entry><literal>= (anymultirange, anymultirange)</literal></entry>
121 <entry valign="middle" morerows="17"></entry>
122 </row>
123 <row><entry><literal>&amp;&amp; (anymultirange, anymultirange)</literal></entry></row>
124 <row><entry><literal>&amp;&amp; (anymultirange, anyrange)</literal></entry></row>
125 <row><entry><literal>@&gt; (anymultirange, anyelement)</literal></entry></row>
126 <row><entry><literal>@&gt; (anymultirange, anymultirange)</literal></entry></row>
127 <row><entry><literal>@&gt; (anymultirange, anyrange)</literal></entry></row>
128 <row><entry><literal>&lt;@ (anymultirange, anymultirange)</literal></entry></row>
129 <row><entry><literal>&lt;@ (anymultirange, anyrange)</literal></entry></row>
130 <row><entry><literal>&lt;&lt; (anymultirange, anymultirange)</literal></entry></row>
131 <row><entry><literal>&lt;&lt; (anymultirange, anyrange)</literal></entry></row>
132 <row><entry><literal>&gt;&gt; (anymultirange, anymultirange)</literal></entry></row>
133 <row><entry><literal>&gt;&gt; (anymultirange, anyrange)</literal></entry></row>
134 <row><entry><literal>&amp;&lt; (anymultirange, anymultirange)</literal></entry></row>
135 <row><entry><literal>&amp;&lt; (anymultirange, anyrange)</literal></entry></row>
136 <row><entry><literal>&amp;&gt; (anymultirange, anymultirange)</literal></entry></row>
137 <row><entry><literal>&amp;&gt; (anymultirange, anyrange)</literal></entry></row>
138 <row><entry><literal>-|- (anymultirange, anymultirange)</literal></entry></row>
139 <row><entry><literal>-|- (anymultirange, anyrange)</literal></entry></row>
141 <row>
142 <entry valign="middle" morerows="7"><literal>point_ops</literal></entry>
143 <entry><literal>|&gt;&gt; (point, point)</literal></entry>
144 <entry valign="middle" morerows="7"><literal>&lt;-&gt; (point, point)</literal></entry>
145 </row>
146 <row><entry><literal>&lt;&lt; (point, point)</literal></entry></row>
147 <row><entry><literal>&gt;&gt; (point, point)</literal></entry></row>
148 <row><entry><literal>&lt;&lt;| (point, point)</literal></entry></row>
149 <row><entry><literal>~= (point, point)</literal></entry></row>
150 <row><entry><literal>&lt;@ (point, box)</literal></entry></row>
151 <row><entry><literal>&lt;@ (point, polygon)</literal></entry></row>
152 <row><entry><literal>&lt;@ (point, circle)</literal></entry></row>
154 <row>
155 <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry>
156 <entry><literal>&lt;&lt; (polygon, polygon)</literal></entry>
157 <entry valign="middle" morerows="11"><literal>&lt;-&gt; (polygon, point)</literal></entry>
158 </row>
159 <row><entry><literal>&amp;&lt; (polygon, polygon)</literal></entry></row>
160 <row><entry><literal>&amp;&gt; (polygon, polygon)</literal></entry></row>
161 <row><entry><literal>&gt;&gt; (polygon, polygon)</literal></entry></row>
162 <row><entry><literal>&lt;@ (polygon, polygon)</literal></entry></row>
163 <row><entry><literal>@&gt; (polygon, polygon)</literal></entry></row>
164 <row><entry><literal>~= (polygon, polygon)</literal></entry></row>
165 <row><entry><literal>&amp;&amp; (polygon, polygon)</literal></entry></row>
166 <row><entry><literal>&lt;&lt;| (polygon, polygon)</literal></entry></row>
167 <row><entry><literal>&amp;&lt;| (polygon, polygon)</literal></entry></row>
168 <row><entry><literal>|&amp;&gt; (polygon, polygon)</literal></entry></row>
169 <row><entry><literal>|&gt;&gt; (polygon, polygon)</literal></entry></row>
171 <row>
172 <entry valign="middle" morerows="17"><literal>range_ops</literal></entry>
173 <entry><literal>= (anyrange, anyrange)</literal></entry>
174 <entry valign="middle" morerows="17"></entry>
175 </row>
176 <row><entry><literal>&amp;&amp; (anyrange, anyrange)</literal></entry></row>
177 <row><entry><literal>&amp;&amp; (anyrange, anymultirange)</literal></entry></row>
178 <row><entry><literal>@&gt; (anyrange, anyelement)</literal></entry></row>
179 <row><entry><literal>@&gt; (anyrange, anyrange)</literal></entry></row>
180 <row><entry><literal>@&gt; (anyrange, anymultirange)</literal></entry></row>
181 <row><entry><literal>&lt;@ (anyrange, anyrange)</literal></entry></row>
182 <row><entry><literal>&lt;@ (anyrange, anymultirange)</literal></entry></row>
183 <row><entry><literal>&lt;&lt; (anyrange, anyrange)</literal></entry></row>
184 <row><entry><literal>&lt;&lt; (anyrange, anymultirange)</literal></entry></row>
185 <row><entry><literal>&gt;&gt; (anyrange, anyrange)</literal></entry></row>
186 <row><entry><literal>&gt;&gt; (anyrange, anymultirange)</literal></entry></row>
187 <row><entry><literal>&amp;&lt; (anyrange, anyrange)</literal></entry></row>
188 <row><entry><literal>&amp;&lt; (anyrange, anymultirange)</literal></entry></row>
189 <row><entry><literal>&amp;&gt; (anyrange, anyrange)</literal></entry></row>
190 <row><entry><literal>&amp;&gt; (anyrange, anymultirange)</literal></entry></row>
191 <row><entry><literal>-|- (anyrange, anyrange)</literal></entry></row>
192 <row><entry><literal>-|- (anyrange, anymultirange)</literal></entry></row>
194 <row>
195 <entry valign="middle" morerows="1"><literal>tsquery_ops</literal></entry>
196 <entry><literal>&lt;@ (tsquery, tsquery)</literal></entry>
197 <entry valign="middle" morerows="1"></entry>
198 </row>
199 <row><entry><literal>@&gt; (tsquery, tsquery)</literal></entry></row>
200 <row>
201 <entry valign="middle"><literal>tsvector_ops</literal></entry>
202 <entry><literal>@@ (tsvector, tsquery)</literal></entry>
203 <entry></entry>
204 </row>
205 </tbody>
206 </tgroup>
207 </table>
209 <para>
210 For historical reasons, the <literal>inet_ops</literal> operator class is
211 not the default class for types <type>inet</type> and <type>cidr</type>.
212 To use it, mention the class name in <command>CREATE INDEX</command>,
213 for example
214 <programlisting>
215 CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
216 </programlisting>
217 </para>
219 </sect2>
221 <sect2 id="gist-extensibility">
222 <title>Extensibility</title>
224 <para>
225 Traditionally, implementing a new index access method meant a lot of
226 difficult work. It was necessary to understand the inner workings of the
227 database, such as the lock manager and Write-Ahead Log. The
228 <acronym>GiST</acronym> interface has a high level of abstraction,
229 requiring the access method implementer only to implement the semantics of
230 the data type being accessed. The <acronym>GiST</acronym> layer itself
231 takes care of concurrency, logging and searching the tree structure.
232 </para>
234 <para>
235 This extensibility should not be confused with the extensibility of the
236 other standard search trees in terms of the data they can handle. For
237 example, <productname>PostgreSQL</productname> supports extensible B-trees
238 and hash indexes. That means that you can use
239 <productname>PostgreSQL</productname> to build a B-tree or hash over any
240 data type you want. But B-trees only support range predicates
241 (<literal>&lt;</literal>, <literal>=</literal>, <literal>&gt;</literal>),
242 and hash indexes only support equality queries.
243 </para>
245 <para>
246 So if you index, say, an image collection with a
247 <productname>PostgreSQL</productname> B-tree, you can only issue queries
248 such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less
249 than imagey</quote> and <quote>is imagex greater than imagey</quote>.
250 Depending on how you define <quote>equals</quote>, <quote>less than</quote>
251 and <quote>greater than</quote> in this context, this could be useful.
252 However, by using a <acronym>GiST</acronym> based index, you could create
253 ways to ask domain-specific questions, perhaps <quote>find all images of
254 horses</quote> or <quote>find all over-exposed images</quote>.
255 </para>
257 <para>
258 All it takes to get a <acronym>GiST</acronym> access method up and running
259 is to implement several user-defined methods, which define the behavior of
260 keys in the tree. Of course these methods have to be pretty fancy to
261 support fancy queries, but for all the standard queries (B-trees,
262 R-trees, etc.) they're relatively straightforward. In short,
263 <acronym>GiST</acronym> combines extensibility along with generality, code
264 reuse, and a clean interface.
265 </para>
267 <para>
268 There are five methods that an index operator class for
269 <acronym>GiST</acronym> must provide, and seven that are optional.
270 Correctness of the index is ensured
271 by proper implementation of the <function>same</function>, <function>consistent</function>
272 and <function>union</function> methods, while efficiency (size and speed) of the
273 index will depend on the <function>penalty</function> and <function>picksplit</function>
274 methods.
275 Two optional methods are <function>compress</function> and
276 <function>decompress</function>, which allow an index to have internal tree data of
277 a different type than the data it indexes. The leaves are to be of the
278 indexed data type, while the other tree nodes can be of any C struct (but
279 you still have to follow <productname>PostgreSQL</productname> data type rules here,
280 see about <literal>varlena</literal> for variable sized data). If the tree's
281 internal data type exists at the SQL level, the <literal>STORAGE</literal> option
282 of the <command>CREATE OPERATOR CLASS</command> command can be used.
283 The optional eighth method is <function>distance</function>, which is needed
284 if the operator class wishes to support ordered scans (nearest-neighbor
285 searches). The optional ninth method <function>fetch</function> is needed if the
286 operator class wishes to support index-only scans, except when the
287 <function>compress</function> method is omitted. The optional tenth method
288 <function>options</function> is needed if the operator class has
289 user-specified parameters.
290 The optional eleventh method <function>sortsupport</function> is used to
291 speed up building a <acronym>GiST</acronym> index.
292 The optional twelfth method <function>stratnum</function> is used to
293 translate compare types (from
294 <filename>src/include/nodes/primnodes.h</filename>) into strategy numbers
295 used by the operator class. This lets the core code look up operators for
296 temporal constraint indexes.
297 </para>
299 <variablelist>
300 <varlistentry>
301 <term><function>consistent</function></term>
302 <listitem>
303 <para>
304 Given an index entry <literal>p</literal> and a query value <literal>q</literal>,
305 this function determines whether the index entry is
306 <quote>consistent</quote> with the query; that is, could the predicate
307 <quote><replaceable>indexed_column</replaceable>
308 <replaceable>indexable_operator</replaceable> <literal>q</literal></quote> be true for
309 any row represented by the index entry? For a leaf index entry this is
310 equivalent to testing the indexable condition, while for an internal
311 tree node this determines whether it is necessary to scan the subtree
312 of the index represented by the tree node. When the result is
313 <literal>true</literal>, a <literal>recheck</literal> flag must also be returned.
314 This indicates whether the predicate is certainly true or only possibly
315 true. If <literal>recheck</literal> = <literal>false</literal> then the index has
316 tested the predicate condition exactly, whereas if <literal>recheck</literal>
317 = <literal>true</literal> the row is only a candidate match. In that case the
318 system will automatically evaluate the
319 <replaceable>indexable_operator</replaceable> against the actual row value to see
320 if it is really a match. This convention allows
321 <acronym>GiST</acronym> to support both lossless and lossy index
322 structures.
323 </para>
325 <para>
326 The <acronym>SQL</acronym> declaration of the function must look like this:
328 <programlisting>
329 CREATE OR REPLACE FUNCTION my_consistent(internal, data_type, smallint, oid, internal)
330 RETURNS bool
331 AS 'MODULE_PATHNAME'
332 LANGUAGE C STRICT;
333 </programlisting>
335 And the matching code in the C module could then follow this skeleton:
337 <programlisting>
338 PG_FUNCTION_INFO_V1(my_consistent);
340 Datum
341 my_consistent(PG_FUNCTION_ARGS)
343 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
344 data_type *query = PG_GETARG_DATA_TYPE_P(1);
345 StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
346 /* Oid subtype = PG_GETARG_OID(3); */
347 bool *recheck = (bool *) PG_GETARG_POINTER(4);
348 data_type *key = DatumGetDataType(entry-&gt;key);
349 bool retval;
352 * determine return value as a function of strategy, key and query.
354 * Use GIST_LEAF(entry) to know where you're called in the index tree,
355 * which comes handy when supporting the = operator for example (you could
356 * check for non empty union() in non-leaf nodes and equality in leaf
357 * nodes).
360 *recheck = true; /* or false if check is exact */
362 PG_RETURN_BOOL(retval);
364 </programlisting>
366 Here, <varname>key</varname> is an element in the index and <varname>query</varname>
367 the value being looked up in the index. The <literal>StrategyNumber</literal>
368 parameter indicates which operator of your operator class is being
369 applied &mdash; it matches one of the operator numbers in the
370 <command>CREATE OPERATOR CLASS</command> command.
371 </para>
373 <para>
374 Depending on which operators you have included in the class, the data
375 type of <varname>query</varname> could vary with the operator, since it will
376 be whatever type is on the right-hand side of the operator, which might
377 be different from the indexed data type appearing on the left-hand side.
378 (The above code skeleton assumes that only one type is possible; if
379 not, fetching the <varname>query</varname> argument value would have to depend
380 on the operator.) It is recommended that the SQL declaration of
381 the <function>consistent</function> function use the opclass's indexed data
382 type for the <varname>query</varname> argument, even though the actual type
383 might be something else depending on the operator.
384 </para>
386 </listitem>
387 </varlistentry>
389 <varlistentry>
390 <term><function>union</function></term>
391 <listitem>
392 <para>
393 This method consolidates information in the tree. Given a set of
394 entries, this function generates a new index entry that represents
395 all the given entries.
396 </para>
398 <para>
399 The <acronym>SQL</acronym> declaration of the function must look like this:
401 <programlisting>
402 CREATE OR REPLACE FUNCTION my_union(internal, internal)
403 RETURNS storage_type
404 AS 'MODULE_PATHNAME'
405 LANGUAGE C STRICT;
406 </programlisting>
408 And the matching code in the C module could then follow this skeleton:
410 <programlisting>
411 PG_FUNCTION_INFO_V1(my_union);
413 Datum
414 my_union(PG_FUNCTION_ARGS)
416 GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
417 GISTENTRY *ent = entryvec-&gt;vector;
418 data_type *out,
419 *tmp,
420 *old;
421 int numranges,
422 i = 0;
424 numranges = entryvec-&gt;n;
425 tmp = DatumGetDataType(ent[0].key);
426 out = tmp;
428 if (numranges == 1)
430 out = data_type_deep_copy(tmp);
432 PG_RETURN_DATA_TYPE_P(out);
435 for (i = 1; i &lt; numranges; i++)
437 old = out;
438 tmp = DatumGetDataType(ent[i].key);
439 out = my_union_implementation(out, tmp);
442 PG_RETURN_DATA_TYPE_P(out);
444 </programlisting>
445 </para>
447 <para>
448 As you can see, in this skeleton we're dealing with a data type
449 where <literal>union(X, Y, Z) = union(union(X, Y), Z)</literal>. It's easy
450 enough to support data types where this is not the case, by
451 implementing the proper union algorithm in this
452 <acronym>GiST</acronym> support method.
453 </para>
455 <para>
456 The result of the <function>union</function> function must be a value of the
457 index's storage type, whatever that is (it might or might not be
458 different from the indexed column's type). The <function>union</function>
459 function should return a pointer to newly <function>palloc()</function>ed
460 memory. You can't just return the input value as-is, even if there is
461 no type change.
462 </para>
464 <para>
465 As shown above, the <function>union</function> function's
466 first <type>internal</type> argument is actually
467 a <structname>GistEntryVector</structname> pointer. The second argument is a
468 pointer to an integer variable, which can be ignored. (It used to be
469 required that the <function>union</function> function store the size of its
470 result value into that variable, but this is no longer necessary.)
471 </para>
472 </listitem>
473 </varlistentry>
475 <varlistentry>
476 <term><function>compress</function></term>
477 <listitem>
478 <para>
479 Converts a data item into a format suitable for physical storage in
480 an index page.
481 If the <function>compress</function> method is omitted, data items are stored
482 in the index without modification.
483 </para>
485 <para>
486 The <acronym>SQL</acronym> declaration of the function must look like this:
488 <programlisting>
489 CREATE OR REPLACE FUNCTION my_compress(internal)
490 RETURNS internal
491 AS 'MODULE_PATHNAME'
492 LANGUAGE C STRICT;
493 </programlisting>
495 And the matching code in the C module could then follow this skeleton:
497 <programlisting>
498 PG_FUNCTION_INFO_V1(my_compress);
500 Datum
501 my_compress(PG_FUNCTION_ARGS)
503 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
504 GISTENTRY *retval;
506 if (entry-&gt;leafkey)
508 /* replace entry-&gt;key with a compressed version */
509 compressed_data_type *compressed_data = palloc(sizeof(compressed_data_type));
511 /* fill *compressed_data from entry-&gt;key ... */
513 retval = palloc(sizeof(GISTENTRY));
514 gistentryinit(*retval, PointerGetDatum(compressed_data),
515 entry-&gt;rel, entry-&gt;page, entry-&gt;offset, FALSE);
517 else
519 /* typically we needn't do anything with non-leaf entries */
520 retval = entry;
523 PG_RETURN_POINTER(retval);
525 </programlisting>
526 </para>
528 <para>
529 You have to adapt <replaceable>compressed_data_type</replaceable> to the specific
530 type you're converting to in order to compress your leaf nodes, of
531 course.
532 </para>
533 </listitem>
534 </varlistentry>
536 <varlistentry>
537 <term><function>decompress</function></term>
538 <listitem>
539 <para>
540 Converts the stored representation of a data item into a format that
541 can be manipulated by the other GiST methods in the operator class.
542 If the <function>decompress</function> method is omitted, it is assumed that
543 the other GiST methods can work directly on the stored data format.
544 (<function>decompress</function> is not necessarily the reverse of
545 the <function>compress</function> method; in particular,
546 if <function>compress</function> is lossy then it's impossible
547 for <function>decompress</function> to exactly reconstruct the original
548 data. <function>decompress</function> is not necessarily equivalent
549 to <function>fetch</function>, either, since the other GiST methods might not
550 require full reconstruction of the data.)
551 </para>
553 <para>
554 The <acronym>SQL</acronym> declaration of the function must look like this:
556 <programlisting>
557 CREATE OR REPLACE FUNCTION my_decompress(internal)
558 RETURNS internal
559 AS 'MODULE_PATHNAME'
560 LANGUAGE C STRICT;
561 </programlisting>
563 And the matching code in the C module could then follow this skeleton:
565 <programlisting>
566 PG_FUNCTION_INFO_V1(my_decompress);
568 Datum
569 my_decompress(PG_FUNCTION_ARGS)
571 PG_RETURN_POINTER(PG_GETARG_POINTER(0));
573 </programlisting>
575 The above skeleton is suitable for the case where no decompression
576 is needed. (But, of course, omitting the method altogether is even
577 easier, and is recommended in such cases.)
578 </para>
579 </listitem>
580 </varlistentry>
582 <varlistentry>
583 <term><function>penalty</function></term>
584 <listitem>
585 <para>
586 Returns a value indicating the <quote>cost</quote> of inserting the new
587 entry into a particular branch of the tree. Items will be inserted
588 down the path of least <function>penalty</function> in the tree.
589 Values returned by <function>penalty</function> should be non-negative.
590 If a negative value is returned, it will be treated as zero.
591 </para>
593 <para>
594 The <acronym>SQL</acronym> declaration of the function must look like this:
596 <programlisting>
597 CREATE OR REPLACE FUNCTION my_penalty(internal, internal, internal)
598 RETURNS internal
599 AS 'MODULE_PATHNAME'
600 LANGUAGE C STRICT; -- in some cases penalty functions need not be strict
601 </programlisting>
603 And the matching code in the C module could then follow this skeleton:
605 <programlisting>
606 PG_FUNCTION_INFO_V1(my_penalty);
608 Datum
609 my_penalty(PG_FUNCTION_ARGS)
611 GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
612 GISTENTRY *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
613 float *penalty = (float *) PG_GETARG_POINTER(2);
614 data_type *orig = DatumGetDataType(origentry-&gt;key);
615 data_type *new = DatumGetDataType(newentry-&gt;key);
617 *penalty = my_penalty_implementation(orig, new);
618 PG_RETURN_POINTER(penalty);
620 </programlisting>
622 For historical reasons, the <function>penalty</function> function doesn't
623 just return a <type>float</type> result; instead it has to store the value
624 at the location indicated by the third argument. The return
625 value per se is ignored, though it's conventional to pass back the
626 address of that argument.
627 </para>
629 <para>
630 The <function>penalty</function> function is crucial to good performance of
631 the index. It'll get used at insertion time to determine which branch
632 to follow when choosing where to add the new entry in the tree. At
633 query time, the more balanced the index, the quicker the lookup.
634 </para>
635 </listitem>
636 </varlistentry>
638 <varlistentry>
639 <term><function>picksplit</function></term>
640 <listitem>
641 <para>
642 When an index page split is necessary, this function decides which
643 entries on the page are to stay on the old page, and which are to move
644 to the new page.
645 </para>
647 <para>
648 The <acronym>SQL</acronym> declaration of the function must look like this:
650 <programlisting>
651 CREATE OR REPLACE FUNCTION my_picksplit(internal, internal)
652 RETURNS internal
653 AS 'MODULE_PATHNAME'
654 LANGUAGE C STRICT;
655 </programlisting>
657 And the matching code in the C module could then follow this skeleton:
659 <programlisting>
660 PG_FUNCTION_INFO_V1(my_picksplit);
662 Datum
663 my_picksplit(PG_FUNCTION_ARGS)
665 GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
666 GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
667 OffsetNumber maxoff = entryvec-&gt;n - 1;
668 GISTENTRY *ent = entryvec-&gt;vector;
669 int i,
670 nbytes;
671 OffsetNumber *left,
672 *right;
673 data_type *tmp_union;
674 data_type *unionL;
675 data_type *unionR;
676 GISTENTRY **raw_entryvec;
678 maxoff = entryvec-&gt;n - 1;
679 nbytes = (maxoff + 1) * sizeof(OffsetNumber);
681 v-&gt;spl_left = (OffsetNumber *) palloc(nbytes);
682 left = v-&gt;spl_left;
683 v-&gt;spl_nleft = 0;
685 v-&gt;spl_right = (OffsetNumber *) palloc(nbytes);
686 right = v-&gt;spl_right;
687 v-&gt;spl_nright = 0;
689 unionL = NULL;
690 unionR = NULL;
692 /* Initialize the raw entry vector. */
693 raw_entryvec = (GISTENTRY **) malloc(entryvec-&gt;n * sizeof(void *));
694 for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
695 raw_entryvec[i] = &amp;(entryvec-&gt;vector[i]);
697 for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
699 int real_index = raw_entryvec[i] - entryvec-&gt;vector;
701 tmp_union = DatumGetDataType(entryvec-&gt;vector[real_index].key);
702 Assert(tmp_union != NULL);
705 * Choose where to put the index entries and update unionL and unionR
706 * accordingly. Append the entries to either v-&gt;spl_left or
707 * v-&gt;spl_right, and care about the counters.
710 if (my_choice_is_left(unionL, curl, unionR, curr))
712 if (unionL == NULL)
713 unionL = tmp_union;
714 else
715 unionL = my_union_implementation(unionL, tmp_union);
717 *left = real_index;
718 ++left;
719 ++(v-&gt;spl_nleft);
721 else
724 * Same on the right
729 v-&gt;spl_ldatum = DataTypeGetDatum(unionL);
730 v-&gt;spl_rdatum = DataTypeGetDatum(unionR);
731 PG_RETURN_POINTER(v);
733 </programlisting>
735 Notice that the <function>picksplit</function> function's result is delivered
736 by modifying the passed-in <structname>v</structname> structure. The return
737 value per se is ignored, though it's conventional to pass back the
738 address of <structname>v</structname>.
739 </para>
741 <para>
742 Like <function>penalty</function>, the <function>picksplit</function> function
743 is crucial to good performance of the index. Designing suitable
744 <function>penalty</function> and <function>picksplit</function> implementations
745 is where the challenge of implementing well-performing
746 <acronym>GiST</acronym> indexes lies.
747 </para>
748 </listitem>
749 </varlistentry>
751 <varlistentry>
752 <term><function>same</function></term>
753 <listitem>
754 <para>
755 Returns true if two index entries are identical, false otherwise.
756 (An <quote>index entry</quote> is a value of the index's storage type,
757 not necessarily the original indexed column's type.)
758 </para>
760 <para>
761 The <acronym>SQL</acronym> declaration of the function must look like this:
763 <programlisting>
764 CREATE OR REPLACE FUNCTION my_same(storage_type, storage_type, internal)
765 RETURNS internal
766 AS 'MODULE_PATHNAME'
767 LANGUAGE C STRICT;
768 </programlisting>
770 And the matching code in the C module could then follow this skeleton:
772 <programlisting>
773 PG_FUNCTION_INFO_V1(my_same);
775 Datum
776 my_same(PG_FUNCTION_ARGS)
778 prefix_range *v1 = PG_GETARG_PREFIX_RANGE_P(0);
779 prefix_range *v2 = PG_GETARG_PREFIX_RANGE_P(1);
780 bool *result = (bool *) PG_GETARG_POINTER(2);
782 *result = my_eq(v1, v2);
783 PG_RETURN_POINTER(result);
785 </programlisting>
787 For historical reasons, the <function>same</function> function doesn't
788 just return a Boolean result; instead it has to store the flag
789 at the location indicated by the third argument. The return
790 value per se is ignored, though it's conventional to pass back the
791 address of that argument.
792 </para>
793 </listitem>
794 </varlistentry>
796 <varlistentry>
797 <term><function>distance</function></term>
798 <listitem>
799 <para>
800 Given an index entry <literal>p</literal> and a query value <literal>q</literal>,
801 this function determines the index entry's
802 <quote>distance</quote> from the query value. This function must be
803 supplied if the operator class contains any ordering operators.
804 A query using the ordering operator will be implemented by returning
805 index entries with the smallest <quote>distance</quote> values first,
806 so the results must be consistent with the operator's semantics.
807 For a leaf index entry the result just represents the distance to
808 the index entry; for an internal tree node, the result must be the
809 smallest distance that any child entry could have.
810 </para>
812 <para>
813 The <acronym>SQL</acronym> declaration of the function must look like this:
815 <programlisting>
816 CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid, internal)
817 RETURNS float8
818 AS 'MODULE_PATHNAME'
819 LANGUAGE C STRICT;
820 </programlisting>
822 And the matching code in the C module could then follow this skeleton:
824 <programlisting>
825 PG_FUNCTION_INFO_V1(my_distance);
827 Datum
828 my_distance(PG_FUNCTION_ARGS)
830 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
831 data_type *query = PG_GETARG_DATA_TYPE_P(1);
832 StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
833 /* Oid subtype = PG_GETARG_OID(3); */
834 /* bool *recheck = (bool *) PG_GETARG_POINTER(4); */
835 data_type *key = DatumGetDataType(entry-&gt;key);
836 double retval;
839 * determine return value as a function of strategy, key and query.
842 PG_RETURN_FLOAT8(retval);
844 </programlisting>
846 The arguments to the <function>distance</function> function are identical to
847 the arguments of the <function>consistent</function> function.
848 </para>
850 <para>
851 Some approximation is allowed when determining the distance, so long
852 as the result is never greater than the entry's actual distance. Thus,
853 for example, distance to a bounding box is usually sufficient in
854 geometric applications. For an internal tree node, the distance
855 returned must not be greater than the distance to any of the child
856 nodes. If the returned distance is not exact, the function must set
857 <literal>*recheck</literal> to true. (This is not necessary for internal tree
858 nodes; for them, the calculation is always assumed to be inexact.) In
859 this case the executor will calculate the accurate distance after
860 fetching the tuple from the heap, and reorder the tuples if necessary.
861 </para>
863 <para>
864 If the distance function returns <literal>*recheck = true</literal> for any
865 leaf node, the original ordering operator's return type must
866 be <type>float8</type> or <type>float4</type>, and the distance function's
867 result values must be comparable to those of the original ordering
868 operator, since the executor will sort using both distance function
869 results and recalculated ordering-operator results. Otherwise, the
870 distance function's result values can be any finite <type>float8</type>
871 values, so long as the relative order of the result values matches the
872 order returned by the ordering operator. (Infinity and minus infinity
873 are used internally to handle cases such as nulls, so it is not
874 recommended that <function>distance</function> functions return these values.)
875 </para>
877 </listitem>
878 </varlistentry>
880 <varlistentry>
881 <term><function>fetch</function></term>
882 <listitem>
883 <para>
884 Converts the compressed index representation of a data item into the
885 original data type, for index-only scans. The returned data must be an
886 exact, non-lossy copy of the originally indexed value.
887 </para>
889 <para>
890 The <acronym>SQL</acronym> declaration of the function must look like this:
892 <programlisting>
893 CREATE OR REPLACE FUNCTION my_fetch(internal)
894 RETURNS internal
895 AS 'MODULE_PATHNAME'
896 LANGUAGE C STRICT;
897 </programlisting>
899 The argument is a pointer to a <structname>GISTENTRY</structname> struct. On
900 entry, its <structfield>key</structfield> field contains a non-NULL leaf datum in
901 compressed form. The return value is another <structname>GISTENTRY</structname>
902 struct, whose <structfield>key</structfield> field contains the same datum in its
903 original, uncompressed form. If the opclass's compress function does
904 nothing for leaf entries, the <function>fetch</function> method can return the
905 argument as-is. Or, if the opclass does not have a compress function,
906 the <function>fetch</function> method can be omitted as well, since it would
907 necessarily be a no-op.
908 </para>
910 <para>
911 The matching code in the C module could then follow this skeleton:
913 <programlisting>
914 PG_FUNCTION_INFO_V1(my_fetch);
916 Datum
917 my_fetch(PG_FUNCTION_ARGS)
919 GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
920 input_data_type *in = DatumGetPointer(entry->key);
921 fetched_data_type *fetched_data;
922 GISTENTRY *retval;
924 retval = palloc(sizeof(GISTENTRY));
925 fetched_data = palloc(sizeof(fetched_data_type));
928 * Convert 'fetched_data' into the a Datum of the original datatype.
931 /* fill *retval from fetched_data. */
932 gistentryinit(*retval, PointerGetDatum(converted_datum),
933 entry->rel, entry->page, entry->offset, FALSE);
935 PG_RETURN_POINTER(retval);
937 </programlisting>
938 </para>
940 <para>
941 If the compress method is lossy for leaf entries, the operator class
942 cannot support index-only scans, and must not define
943 a <function>fetch</function> function.
944 </para>
946 </listitem>
947 </varlistentry>
949 <varlistentry>
950 <term><function>options</function></term>
951 <listitem>
952 <para>
953 Allows definition of user-visible parameters that control operator
954 class behavior.
955 </para>
957 <para>
958 The <acronym>SQL</acronym> declaration of the function must look like this:
960 <programlisting>
961 CREATE OR REPLACE FUNCTION my_options(internal)
962 RETURNS void
963 AS 'MODULE_PATHNAME'
964 LANGUAGE C STRICT;
965 </programlisting>
966 </para>
968 <para>
969 The function is passed a pointer to a <structname>local_relopts</structname>
970 struct, which needs to be filled with a set of operator class
971 specific options. The options can be accessed from other support
972 functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and
973 <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros.
974 </para>
976 <para>
977 An example implementation of my_options() and parameters use
978 from other support functions are given below:
980 <programlisting>
981 typedef enum MyEnumType
983 MY_ENUM_ON,
984 MY_ENUM_OFF,
985 MY_ENUM_AUTO
986 } MyEnumType;
988 typedef struct
990 int32 vl_len_; /* varlena header (do not touch directly!) */
991 int int_param; /* integer parameter */
992 double real_param; /* real parameter */
993 MyEnumType enum_param; /* enum parameter */
994 int str_param; /* string parameter */
995 } MyOptionsStruct;
997 /* String representation of enum values */
998 static relopt_enum_elt_def myEnumValues[] =
1000 {"on", MY_ENUM_ON},
1001 {"off", MY_ENUM_OFF},
1002 {"auto", MY_ENUM_AUTO},
1003 {(const char *) NULL} /* list terminator */
1006 static char *str_param_default = "default";
1009 * Sample validator: checks that string is not longer than 8 bytes.
1011 static void
1012 validate_my_string_relopt(const char *value)
1014 if (strlen(value) > 8)
1015 ereport(ERROR,
1016 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
1017 errmsg("str_param must be at most 8 bytes")));
1021 * Sample filler: switches characters to lower case.
1023 static Size
1024 fill_my_string_relopt(const char *value, void *ptr)
1026 char *tmp = str_tolower(value, strlen(value), DEFAULT_COLLATION_OID);
1027 int len = strlen(tmp);
1029 if (ptr)
1030 strcpy((char *) ptr, tmp);
1032 pfree(tmp);
1033 return len + 1;
1036 PG_FUNCTION_INFO_V1(my_options);
1038 Datum
1039 my_options(PG_FUNCTION_ARGS)
1041 local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);
1043 init_local_reloptions(relopts, sizeof(MyOptionsStruct));
1044 add_local_int_reloption(relopts, "int_param", "integer parameter",
1045 100, 0, 1000000,
1046 offsetof(MyOptionsStruct, int_param));
1047 add_local_real_reloption(relopts, "real_param", "real parameter",
1048 1.0, 0.0, 1000000.0,
1049 offsetof(MyOptionsStruct, real_param));
1050 add_local_enum_reloption(relopts, "enum_param", "enum parameter",
1051 myEnumValues, MY_ENUM_ON,
1052 "Valid values are: \"on\", \"off\" and \"auto\".",
1053 offsetof(MyOptionsStruct, enum_param));
1054 add_local_string_reloption(relopts, "str_param", "string parameter",
1055 str_param_default,
1056 &amp;validate_my_string_relopt,
1057 &amp;fill_my_string_relopt,
1058 offsetof(MyOptionsStruct, str_param));
1060 PG_RETURN_VOID();
1063 PG_FUNCTION_INFO_V1(my_compress);
1065 Datum
1066 my_compress(PG_FUNCTION_ARGS)
1068 int int_param = 100;
1069 double real_param = 1.0;
1070 MyEnumType enum_param = MY_ENUM_ON;
1071 char *str_param = str_param_default;
1074 * Normally, when opclass contains 'options' method, then options are always
1075 * passed to support functions. However, if you add 'options' method to
1076 * existing opclass, previously defined indexes have no options, so the
1077 * check is required.
1079 if (PG_HAS_OPCLASS_OPTIONS())
1081 MyOptionsStruct *options = (MyOptionsStruct *) PG_GET_OPCLASS_OPTIONS();
1083 int_param = options->int_param;
1084 real_param = options->real_param;
1085 enum_param = options->enum_param;
1086 str_param = GET_STRING_RELOPTION(options, str_param);
1089 /* the rest implementation of support function */
1092 </programlisting>
1093 </para>
1095 <para>
1096 Since the representation of the key in <acronym>GiST</acronym> is
1097 flexible, it may depend on user-specified parameters. For instance,
1098 the length of key signature may be specified. See
1099 <literal>gtsvector_options()</literal> for example.
1100 </para>
1101 </listitem>
1102 </varlistentry>
1104 <varlistentry>
1105 <term><function>sortsupport</function></term>
1106 <listitem>
1107 <para>
1108 Returns a comparator function to sort data in a way that preserves
1109 locality. It is used by <command>CREATE INDEX</command> and
1110 <command>REINDEX</command> commands. The quality of the created index
1111 depends on how well the sort order determined by the comparator function
1112 preserves locality of the inputs.
1113 </para>
1114 <para>
1115 The <function>sortsupport</function> method is optional. If it is not
1116 provided, <command>CREATE INDEX</command> builds the index by inserting
1117 each tuple to the tree using the <function>penalty</function> and
1118 <function>picksplit</function> functions, which is much slower.
1119 </para>
1121 <para>
1122 The <acronym>SQL</acronym> declaration of the function must look like
1123 this:
1125 <programlisting>
1126 CREATE OR REPLACE FUNCTION my_sortsupport(internal)
1127 RETURNS void
1128 AS 'MODULE_PATHNAME'
1129 LANGUAGE C STRICT;
1130 </programlisting>
1132 The argument is a pointer to a <structname>SortSupport</structname>
1133 struct. At a minimum, the function must fill in its comparator field.
1134 The comparator takes three arguments: two Datums to compare, and
1135 a pointer to the <structname>SortSupport</structname> struct. The
1136 Datums are the two indexed values in the format that they are stored
1137 in the index; that is, in the format returned by the
1138 <function>compress</function> method. The full API is defined in
1139 <filename>src/include/utils/sortsupport.h</filename>.
1140 </para>
1142 <para>
1143 The matching code in the C module could then follow this skeleton:
1145 <programlisting>
1146 PG_FUNCTION_INFO_V1(my_sortsupport);
1148 static int
1149 my_fastcmp(Datum x, Datum y, SortSupport ssup)
1151 /* establish order between x and y by computing some sorting value z */
1153 int z1 = ComputeSpatialCode(x);
1154 int z2 = ComputeSpatialCode(y);
1156 return z1 == z2 ? 0 : z1 > z2 ? 1 : -1;
1159 Datum
1160 my_sortsupport(PG_FUNCTION_ARGS)
1162 SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
1164 ssup->comparator = my_fastcmp;
1165 PG_RETURN_VOID();
1167 </programlisting>
1168 </para>
1169 </listitem>
1170 </varlistentry>
1172 <varlistentry>
1173 <term><function>stratnum</function></term>
1174 <listitem>
1175 <para>
1176 Given a <literal>CompareType</literal> value from
1177 <filename>src/include/nodes/primnodes.h</filename>, returns a strategy
1178 number used by this operator class for matching functionality. The
1179 function should return <literal>InvalidStrategy</literal> if the
1180 operator class has no matching strategy.
1181 </para>
1183 <para>
1184 This is used for temporal index constraints (i.e., <literal>PRIMARY
1185 KEY</literal> and <literal>UNIQUE</literal>). If the operator class
1186 provides this function and it returns results for
1187 <literal>COMPARE_EQ</literal>, it can be used in the
1188 non-<literal>WITHOUT OVERLAPS</literal> part(s) of an index constraint.
1189 </para>
1191 <para>
1192 The <acronym>SQL</acronym> declaration of the function must look like
1193 this:
1195 <programlisting>
1196 CREATE OR REPLACE FUNCTION my_stratnum(integer)
1197 RETURNS smallint
1198 AS 'MODULE_PATHNAME'
1199 LANGUAGE C STRICT;
1200 </programlisting>
1201 </para>
1203 <para>
1204 The matching code in the C module could then follow this skeleton:
1206 <programlisting>
1207 PG_FUNCTION_INFO_V1(my_stratnum);
1209 Datum
1210 my_stratnum(PG_FUNCTION_ARGS)
1212 CompareType cmptype = PG_GETARG_INT32(0);
1213 StrategyNumber ret = InvalidStrategy;
1215 switch (cmptype)
1217 case COMPARE_EQ:
1218 ret = BTEqualStrategyNumber;
1221 PG_RETURN_UINT16(ret);
1223 </programlisting>
1224 </para>
1226 <para>
1227 One translation function is provided by
1228 <productname>PostgreSQL</productname>:
1229 <literal>gist_stratnum_common</literal> is for operator classes that
1230 use the <literal>RT*StrategyNumber</literal> constants.
1231 The <literal>btree_gist</literal>
1232 extension defines a second translation function,
1233 <literal>gist_stratnum_btree</literal>, for operator classes that use
1234 the <literal>BT*StrategyNumber</literal> constants.
1235 </para>
1236 </listitem>
1237 </varlistentry>
1238 </variablelist>
1240 <para>
1241 All the GiST support methods are normally called in short-lived memory
1242 contexts; that is, <varname>CurrentMemoryContext</varname> will get reset after
1243 each tuple is processed. It is therefore not very important to worry about
1244 pfree'ing everything you palloc. However, in some cases it's useful for a
1245 support method to cache data across repeated calls. To do that, allocate
1246 the longer-lived data in <literal>fcinfo-&gt;flinfo-&gt;fn_mcxt</literal>, and
1247 keep a pointer to it in <literal>fcinfo-&gt;flinfo-&gt;fn_extra</literal>. Such
1248 data will survive for the life of the index operation (e.g., a single GiST
1249 index scan, index build, or index tuple insertion). Be careful to pfree
1250 the previous value when replacing a <literal>fn_extra</literal> value, or the leak
1251 will accumulate for the duration of the operation.
1252 </para>
1254 </sect2>
1256 <sect2 id="gist-implementation">
1257 <title>Implementation</title>
1259 <sect3 id="gist-buffering-build">
1260 <title>GiST Index Build Methods</title>
1262 <para>
1263 The simplest way to build a GiST index is just to insert all the entries,
1264 one by one. This tends to be slow for large indexes, because if the
1265 index tuples are scattered across the index and the index is large enough
1266 to not fit in cache, a lot of random I/O will be
1267 needed. <productname>PostgreSQL</productname> supports two alternative
1268 methods for initial build of a GiST index: <firstterm>sorted</firstterm>
1269 and <firstterm>buffered</firstterm> modes.
1270 </para>
1272 <para>
1273 The sorted method is only available if each of the opclasses used by the
1274 index provides a <function>sortsupport</function> function, as described
1275 in <xref linkend="gist-extensibility"/>. If they do, this method is
1276 usually the best, so it is used by default.
1277 </para>
1279 <para>
1280 The buffered method works by not inserting tuples directly into the index
1281 right away. It can dramatically reduce the amount of random I/O needed
1282 for non-ordered data sets. For well-ordered data sets the benefit is
1283 smaller or non-existent, because only a small number of pages receive new
1284 tuples at a time, and those pages fit in cache even if the index as a
1285 whole does not.
1286 </para>
1288 <para>
1289 The buffered method needs to call the <function>penalty</function>
1290 function more often than the simple method does, which consumes some
1291 extra CPU resources. Also, the buffers need temporary disk space, up to
1292 the size of the resulting index. Buffering can also influence the quality
1293 of the resulting index, in both positive and negative directions. That
1294 influence depends on various factors, like the distribution of the input
1295 data and the operator class implementation.
1296 </para>
1298 <para>
1299 If sorting is not possible, then by default a GiST index build switches
1300 to the buffering method when the index size reaches
1301 <xref linkend="guc-effective-cache-size"/>. Buffering can be manually
1302 forced or prevented by the <literal>buffering</literal> parameter to the
1303 CREATE INDEX command. The default behavior is good for most cases, but
1304 turning buffering off might speed up the build somewhat if the input data
1305 is ordered.
1306 </para>
1308 </sect3>
1309 </sect2>
1311 <sect2 id="gist-examples">
1312 <title>Examples</title>
1314 <para>
1315 The <productname>PostgreSQL</productname> source distribution includes
1316 several examples of index methods implemented using
1317 <acronym>GiST</acronym>. The core system currently provides text search
1318 support (indexing for <type>tsvector</type> and <type>tsquery</type>) as well as
1319 R-Tree equivalent functionality for some of the built-in geometric data types
1320 (see <filename>src/backend/access/gist/gistproc.c</filename>). The following
1321 <filename>contrib</filename> modules also contain <acronym>GiST</acronym>
1322 operator classes:
1324 <variablelist>
1325 <varlistentry>
1326 <term><filename>btree_gist</filename></term>
1327 <listitem>
1328 <para>B-tree equivalent functionality for several data types</para>
1329 </listitem>
1330 </varlistentry>
1332 <varlistentry>
1333 <term><filename>cube</filename></term>
1334 <listitem>
1335 <para>Indexing for multidimensional cubes</para>
1336 </listitem>
1337 </varlistentry>
1339 <varlistentry>
1340 <term><filename>hstore</filename></term>
1341 <listitem>
1342 <para>Module for storing (key, value) pairs</para>
1343 </listitem>
1344 </varlistentry>
1346 <varlistentry>
1347 <term><filename>intarray</filename></term>
1348 <listitem>
1349 <para>RD-Tree for one-dimensional array of int4 values</para>
1350 </listitem>
1351 </varlistentry>
1353 <varlistentry>
1354 <term><filename>ltree</filename></term>
1355 <listitem>
1356 <para>Indexing for tree-like structures</para>
1357 </listitem>
1358 </varlistentry>
1360 <varlistentry>
1361 <term><filename>pg_trgm</filename></term>
1362 <listitem>
1363 <para>Text similarity using trigram matching</para>
1364 </listitem>
1365 </varlistentry>
1367 <varlistentry>
1368 <term><filename>seg</filename></term>
1369 <listitem>
1370 <para>Indexing for <quote>float ranges</quote></para>
1371 </listitem>
1372 </varlistentry>
1373 </variablelist>
1374 </para>
1376 </sect2>
1378 </sect1>