doc/src/sgml/gist.sgml

   1 <!-- doc/src/sgml/gist.sgml -->
   2
   3 <sect1 id="gist">
   4 <title>GiST Indexes</title>
   5
   6    <indexterm>
   7     <primary>index</primary>
   8     <secondary>GiST</secondary>
   9    </indexterm>
  10
  11 <sect2 id="gist-intro">
  12  <title>Introduction</title>
  13
  14  <para>
  15    <acronym>GiST</acronym> stands for Generalized Search Tree.  It is a
  16    balanced, tree-structured access method, that acts as a base template in
  17    which to implement arbitrary indexing schemes. B-trees, R-trees and many
  18    other indexing schemes can be implemented in <acronym>GiST</acronym>.
  19  </para>
  20
  21  <para>
  22   One advantage of <acronym>GiST</acronym> is that it allows the development
  23   of custom data types with the appropriate access methods, by
  24   an expert in the domain of the data type, rather than a database expert.
  25  </para>
  26
  27   <para>
  28     Some of the information here is derived from the University of California
  29     at Berkeley's GiST Indexing Project
  30     <ulink url="http://gist.cs.berkeley.edu/">web site</ulink> and
  31     Marcel Kornacker's thesis,
  32     <ulink url="http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz">
  33     Access Methods for Next-Generation Database Systems</ulink>.
  34     The <acronym>GiST</acronym>
  35     implementation in <productname>PostgreSQL</productname> is primarily
  36     maintained by Teodor Sigaev and Oleg Bartunov, and there is more
  37     information on their
  38     <ulink url="http://www.sai.msu.su/~megera/postgres/gist/">web site</ulink>.
  39   </para>
  40
  41 </sect2>
  42
  43 <sect2 id="gist-builtin-opclasses">
  44  <title>Built-in Operator Classes</title>
  45
  46  <para>
  47   The core <productname>PostgreSQL</productname> distribution
  48   includes the <acronym>GiST</acronym> operator classes shown in
  49   <xref linkend="gist-builtin-opclasses-table"/>.
  50   (Some of the optional modules described in <xref linkend="contrib"/>
  51   provide additional <acronym>GiST</acronym> operator classes.)
  52  </para>
  53
  54   <table id="gist-builtin-opclasses-table">
  55    <title>Built-in <acronym>GiST</acronym> Operator Classes</title>
  56    <tgroup cols="3">
  57      <colspec colname="col1" colwidth="2*"/>
  58      <colspec colname="col2" colwidth="3*"/>
  59      <colspec colname="col3" colwidth="2*"/>
  60     <thead>
  61      <row>
  62       <entry>Name</entry>
  63       <entry>Indexable Operators</entry>
  64       <entry>Ordering Operators</entry>
  65      </row>
  66     </thead>
  67     <tbody>
  68      <row>
  69       <entry valign="middle" morerows="11"><literal>box_ops</literal></entry>
  70       <entry><literal>&lt;&lt; (box, box)</literal></entry>
  71       <entry valign="middle" morerows="11"><literal>&lt;-&gt; (box, point)</literal></entry>
  72      </row>
  73      <row><entry><literal>&amp;&lt; (box, box)</literal></entry></row>
  74      <row><entry><literal>&amp;&amp; (box, box)</literal></entry></row>
  75      <row><entry><literal>&amp;&gt; (box, box)</literal></entry></row>
  76      <row><entry><literal>&gt;&gt; (box, box)</literal></entry></row>
  77      <row><entry><literal>~= (box, box)</literal></entry></row>
  78      <row><entry><literal>@&gt; (box, box)</literal></entry></row>
  79      <row><entry><literal>&lt;@ (box, box)</literal></entry></row>
  80      <row><entry><literal>&amp;&lt;| (box, box)</literal></entry></row>
  81      <row><entry><literal>&lt;&lt;| (box, box)</literal></entry></row>
  82      <row><entry><literal>|&gt;&gt; (box, box)</literal></entry></row>
  83      <row><entry><literal>|&amp;&gt; (box, box)</literal></entry></row>
  84
  85      <row>
  86       <entry valign="middle" morerows="11"><literal>circle_ops</literal></entry>
  87       <entry><literal>&lt;&lt; (circle, circle)</literal></entry>
  88       <entry valign="middle" morerows="11"><literal>&lt;-&gt; (circle, point)</literal></entry>
  89      </row>
  90      <row><entry><literal>&amp;&lt; (circle, circle)</literal></entry></row>
  91      <row><entry><literal>&amp;&gt; (circle, circle)</literal></entry></row>
  92      <row><entry><literal>&gt;&gt; (circle, circle)</literal></entry></row>
  93      <row><entry><literal>&lt;@ (circle, circle)</literal></entry></row>
  94      <row><entry><literal>@&gt; (circle, circle)</literal></entry></row>
  95      <row><entry><literal>~= (circle, circle)</literal></entry></row>
  96      <row><entry><literal>&amp;&amp; (circle, circle)</literal></entry></row>
  97      <row><entry><literal>|&gt;&gt; (circle, circle)</literal></entry></row>
  98      <row><entry><literal>&lt;&lt;| (circle, circle)</literal></entry></row>
  99      <row><entry><literal>&amp;&lt;| (circle, circle)</literal></entry></row>
 100      <row><entry><literal>|&amp;&gt; (circle, circle)</literal></entry></row>
 101
 102      <row>
 103       <entry valign="middle" morerows="10"><literal>inet_ops</literal></entry>
 104       <entry><literal>&lt;&lt; (inet, inet)</literal></entry>
 105       <entry valign="middle" morerows="10"></entry>
 106      </row>
 107      <row><entry><literal>&lt;&lt;= (inet, inet)</literal></entry></row>
 108      <row><entry><literal>&gt;&gt; (inet, inet)</literal></entry></row>
 109      <row><entry><literal>&gt;&gt;= (inet, inet)</literal></entry></row>
 110      <row><entry><literal>= (inet, inet)</literal></entry></row>
 111      <row><entry><literal>&lt;&gt; (inet, inet)</literal></entry></row>
 112      <row><entry><literal>&lt; (inet, inet)</literal></entry></row>
 113      <row><entry><literal>&lt;= (inet, inet)</literal></entry></row>
 114      <row><entry><literal>&gt; (inet, inet)</literal></entry></row>
 115      <row><entry><literal>&gt;= (inet, inet)</literal></entry></row>
 116      <row><entry><literal>&amp;&amp; (inet, inet)</literal></entry></row>
 117
 118      <row>
 119       <entry valign="middle" morerows="17"><literal>multirange_ops</literal></entry>
 120       <entry><literal>= (anymultirange, anymultirange)</literal></entry>
 121       <entry valign="middle" morerows="17"></entry>
 122      </row>
 123      <row><entry><literal>&amp;&amp; (anymultirange, anymultirange)</literal></entry></row>
 124      <row><entry><literal>&amp;&amp; (anymultirange, anyrange)</literal></entry></row>
 125      <row><entry><literal>@&gt; (anymultirange, anyelement)</literal></entry></row>
 126      <row><entry><literal>@&gt; (anymultirange, anymultirange)</literal></entry></row>
 127      <row><entry><literal>@&gt; (anymultirange, anyrange)</literal></entry></row>
 128      <row><entry><literal>&lt;@ (anymultirange, anymultirange)</literal></entry></row>
 129      <row><entry><literal>&lt;@ (anymultirange, anyrange)</literal></entry></row>
 130      <row><entry><literal>&lt;&lt; (anymultirange, anymultirange)</literal></entry></row>
 131      <row><entry><literal>&lt;&lt; (anymultirange, anyrange)</literal></entry></row>
 132      <row><entry><literal>&gt;&gt; (anymultirange, anymultirange)</literal></entry></row>
 133      <row><entry><literal>&gt;&gt; (anymultirange, anyrange)</literal></entry></row>
 134      <row><entry><literal>&amp;&lt; (anymultirange, anymultirange)</literal></entry></row>
 135      <row><entry><literal>&amp;&lt; (anymultirange, anyrange)</literal></entry></row>
 136      <row><entry><literal>&amp;&gt; (anymultirange, anymultirange)</literal></entry></row>
 137      <row><entry><literal>&amp;&gt; (anymultirange, anyrange)</literal></entry></row>
 138      <row><entry><literal>-|- (anymultirange, anymultirange)</literal></entry></row>
 139      <row><entry><literal>-|- (anymultirange, anyrange)</literal></entry></row>
 140
 141      <row>
 142       <entry valign="middle" morerows="7"><literal>point_ops</literal></entry>
 143       <entry><literal>|&gt;&gt; (point, point)</literal></entry>
 144       <entry valign="middle" morerows="7"><literal>&lt;-&gt; (point, point)</literal></entry>
 145      </row>
 146      <row><entry><literal>&lt;&lt; (point, point)</literal></entry></row>
 147      <row><entry><literal>&gt;&gt; (point, point)</literal></entry></row>
 148      <row><entry><literal>&lt;&lt;| (point, point)</literal></entry></row>
 149      <row><entry><literal>~= (point, point)</literal></entry></row>
 150      <row><entry><literal>&lt;@ (point, box)</literal></entry></row>
 151      <row><entry><literal>&lt;@ (point, polygon)</literal></entry></row>
 152      <row><entry><literal>&lt;@ (point, circle)</literal></entry></row>
 153
 154      <row>
 155       <entry valign="middle" morerows="11"><literal>poly_ops</literal></entry>
 156       <entry><literal>&lt;&lt; (polygon, polygon)</literal></entry>
 157       <entry valign="middle" morerows="11"><literal>&lt;-&gt; (polygon, point)</literal></entry>
 158      </row>
 159      <row><entry><literal>&amp;&lt; (polygon, polygon)</literal></entry></row>
 160      <row><entry><literal>&amp;&gt; (polygon, polygon)</literal></entry></row>
 161      <row><entry><literal>&gt;&gt; (polygon, polygon)</literal></entry></row>
 162      <row><entry><literal>&lt;@ (polygon, polygon)</literal></entry></row>
 163      <row><entry><literal>@&gt; (polygon, polygon)</literal></entry></row>
 164      <row><entry><literal>~= (polygon, polygon)</literal></entry></row>
 165      <row><entry><literal>&amp;&amp; (polygon, polygon)</literal></entry></row>
 166      <row><entry><literal>&lt;&lt;| (polygon, polygon)</literal></entry></row>
 167      <row><entry><literal>&amp;&lt;| (polygon, polygon)</literal></entry></row>
 168      <row><entry><literal>|&amp;&gt; (polygon, polygon)</literal></entry></row>
 169      <row><entry><literal>|&gt;&gt; (polygon, polygon)</literal></entry></row>
 170
 171      <row>
 172       <entry valign="middle" morerows="17"><literal>range_ops</literal></entry>
 173       <entry><literal>= (anyrange, anyrange)</literal></entry>
 174       <entry valign="middle" morerows="17"></entry>
 175      </row>
 176      <row><entry><literal>&amp;&amp; (anyrange, anyrange)</literal></entry></row>
 177      <row><entry><literal>&amp;&amp; (anyrange, anymultirange)</literal></entry></row>
 178      <row><entry><literal>@&gt; (anyrange, anyelement)</literal></entry></row>
 179      <row><entry><literal>@&gt; (anyrange, anyrange)</literal></entry></row>
 180      <row><entry><literal>@&gt; (anyrange, anymultirange)</literal></entry></row>
 181      <row><entry><literal>&lt;@ (anyrange, anyrange)</literal></entry></row>
 182      <row><entry><literal>&lt;@ (anyrange, anymultirange)</literal></entry></row>
 183      <row><entry><literal>&lt;&lt; (anyrange, anyrange)</literal></entry></row>
 184      <row><entry><literal>&lt;&lt; (anyrange, anymultirange)</literal></entry></row>
 185      <row><entry><literal>&gt;&gt; (anyrange, anyrange)</literal></entry></row>
 186      <row><entry><literal>&gt;&gt; (anyrange, anymultirange)</literal></entry></row>
 187      <row><entry><literal>&amp;&lt; (anyrange, anyrange)</literal></entry></row>
 188      <row><entry><literal>&amp;&lt; (anyrange, anymultirange)</literal></entry></row>
 189      <row><entry><literal>&amp;&gt; (anyrange, anyrange)</literal></entry></row>
 190      <row><entry><literal>&amp;&gt; (anyrange, anymultirange)</literal></entry></row>
 191      <row><entry><literal>-|- (anyrange, anyrange)</literal></entry></row>
 192      <row><entry><literal>-|- (anyrange, anymultirange)</literal></entry></row>
 193
 194      <row>
 195       <entry valign="middle" morerows="1"><literal>tsquery_ops</literal></entry>
 196       <entry><literal>&lt;@ (tsquery, tsquery)</literal></entry>
 197       <entry valign="middle" morerows="1"></entry>
 198      </row>
 199      <row><entry><literal>@&gt; (tsquery, tsquery)</literal></entry></row>
 200      <row>
 201       <entry valign="middle"><literal>tsvector_ops</literal></entry>
 202       <entry><literal>@@ (tsvector, tsquery)</literal></entry>
 203       <entry></entry>
 204      </row>
 205     </tbody>
 206    </tgroup>
 207   </table>
 208
 209  <para>
 210   For historical reasons, the <literal>inet_ops</literal> operator class is
 211   not the default class for types <type>inet</type> and <type>cidr</type>.
 212   To use it, mention the class name in <command>CREATE INDEX</command>,
 213   for example
 214 <programlisting>
 215 CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
 216 </programlisting>
 217  </para>
 218
 219 </sect2>
 220
 221 <sect2 id="gist-extensibility">
 222  <title>Extensibility</title>
 223
 224  <para>
 225    Traditionally, implementing a new index access method meant a lot of
 226    difficult work.  It was necessary to understand the inner workings of the
 227    database, such as the lock manager and Write-Ahead Log.  The
 228    <acronym>GiST</acronym> interface has a high level of abstraction,
 229    requiring the access method implementer only to implement the semantics of
 230    the data type being accessed.  The <acronym>GiST</acronym> layer itself
 231    takes care of concurrency, logging and searching the tree structure.
 232  </para>
 233
 234  <para>
 235    This extensibility should not be confused with the extensibility of the
 236    other standard search trees in terms of the data they can handle.  For
 237    example, <productname>PostgreSQL</productname> supports extensible B-trees
 238    and hash indexes. That means that you can use
 239    <productname>PostgreSQL</productname> to build a B-tree or hash over any
 240    data type you want. But B-trees only support range predicates
 241    (<literal>&lt;</literal>, <literal>=</literal>, <literal>&gt;</literal>),
 242    and hash indexes only support equality queries.
 243  </para>
 244
 245  <para>
 246    So if you index, say, an image collection with a
 247    <productname>PostgreSQL</productname> B-tree, you can only issue queries
 248    such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less
 249    than imagey</quote> and <quote>is imagex greater than imagey</quote>.
 250    Depending on how you define <quote>equals</quote>, <quote>less than</quote>
 251    and <quote>greater than</quote> in this context, this could be useful.
 252    However, by using a <acronym>GiST</acronym> based index, you could create
 253    ways to ask domain-specific questions, perhaps <quote>find all images of
 254    horses</quote> or <quote>find all over-exposed images</quote>.
 255  </para>
 256
 257  <para>
 258    All it takes to get a <acronym>GiST</acronym> access method up and running
 259    is to implement several user-defined methods, which define the behavior of
 260    keys in the tree. Of course these methods have to be pretty fancy to
 261    support fancy queries, but for all the standard queries (B-trees,
 262    R-trees, etc.) they're relatively straightforward. In short,
 263    <acronym>GiST</acronym> combines extensibility along with generality, code
 264    reuse, and a clean interface.
 265   </para>
 266
 267  <para>
 268    There are five methods that an index operator class for
 269    <acronym>GiST</acronym> must provide, and seven that are optional.
 270    Correctness of the index is ensured
 271    by proper implementation of the <function>same</function>, <function>consistent</function>
 272    and <function>union</function> methods, while efficiency (size and speed) of the
 273    index will depend on the <function>penalty</function> and <function>picksplit</function>
 274    methods.
 275    Two optional methods are <function>compress</function> and
 276    <function>decompress</function>, which allow an index to have internal tree data of
 277    a different type than the data it indexes. The leaves are to be of the
 278    indexed data type, while the other tree nodes can be of any C struct (but
 279    you still have to follow <productname>PostgreSQL</productname> data type rules here,
 280    see about <literal>varlena</literal> for variable sized data). If the tree's
 281    internal data type exists at the SQL level, the <literal>STORAGE</literal> option
 282    of the <command>CREATE OPERATOR CLASS</command> command can be used.
 283    The optional eighth method is <function>distance</function>, which is needed
 284    if the operator class wishes to support ordered scans (nearest-neighbor
 285    searches). The optional ninth method <function>fetch</function> is needed if the
 286    operator class wishes to support index-only scans, except when the
 287    <function>compress</function> method is omitted. The optional tenth method
 288    <function>options</function> is needed if the operator class has
 289    user-specified parameters.
 290    The optional eleventh method <function>sortsupport</function> is used to
 291    speed up building a <acronym>GiST</acronym> index.
 292    The optional twelfth method <function>stratnum</function> is used to
 293    translate compare types (from
 294    <filename>src/include/nodes/primnodes.h</filename>) into strategy numbers
 295    used by the operator class.  This lets the core code look up operators for
 296    temporal constraint indexes.
 297  </para>
 298
 299  <variablelist>
 300     <varlistentry>
 301      <term><function>consistent</function></term>
 302      <listitem>
 303       <para>
 304        Given an index entry <literal>p</literal> and a query value <literal>q</literal>,
 305        this function determines whether the index entry is
 306        <quote>consistent</quote> with the query; that is, could the predicate
 307        <quote><replaceable>indexed_column</replaceable>
 308        <replaceable>indexable_operator</replaceable> <literal>q</literal></quote> be true for
 309        any row represented by the index entry?  For a leaf index entry this is
 310        equivalent to testing the indexable condition, while for an internal
 311        tree node this determines whether it is necessary to scan the subtree
 312        of the index represented by the tree node.  When the result is
 313        <literal>true</literal>, a <literal>recheck</literal> flag must also be returned.
 314        This indicates whether the predicate is certainly true or only possibly
 315        true.  If <literal>recheck</literal> = <literal>false</literal> then the index has
 316        tested the predicate condition exactly, whereas if <literal>recheck</literal>
 317        = <literal>true</literal> the row is only a candidate match.  In that case the
 318        system will automatically evaluate the
 319        <replaceable>indexable_operator</replaceable> against the actual row value to see
 320        if it is really a match.  This convention allows
 321        <acronym>GiST</acronym> to support both lossless and lossy index
 322        structures.
 323       </para>
 324
 325       <para>
 326         The <acronym>SQL</acronym> declaration of the function must look like this:
 327
 328 <programlisting>
 329 CREATE OR REPLACE FUNCTION my_consistent(internal, data_type, smallint, oid, internal)
 330 RETURNS bool
 331 AS 'MODULE_PATHNAME'
 332 LANGUAGE C STRICT;
 333 </programlisting>
 334
 335         And the matching code in the C module could then follow this skeleton:
 336
 337 <programlisting>
 338 PG_FUNCTION_INFO_V1(my_consistent);
 339
 340 Datum
 341 my_consistent(PG_FUNCTION_ARGS)
 342 {
 343     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 344     data_type  *query = PG_GETARG_DATA_TYPE_P(1);
 345     StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
 346     /* Oid subtype = PG_GETARG_OID(3); */
 347     bool       *recheck = (bool *) PG_GETARG_POINTER(4);
 348     data_type  *key = DatumGetDataType(entry-&gt;key);
 349     bool        retval;
 350
 351     /*
 352      * determine return value as a function of strategy, key and query.
 353      *
 354      * Use GIST_LEAF(entry) to know where you're called in the index tree,
 355      * which comes handy when supporting the = operator for example (you could
 356      * check for non empty union() in non-leaf nodes and equality in leaf
 357      * nodes).
 358      */
 359
 360     *recheck = true;        /* or false if check is exact */
 361
 362     PG_RETURN_BOOL(retval);
 363 }
 364 </programlisting>
 365
 366        Here, <varname>key</varname> is an element in the index and <varname>query</varname>
 367        the value being looked up in the index. The <literal>StrategyNumber</literal>
 368        parameter indicates which operator of your operator class is being
 369        applied &mdash; it matches one of the operator numbers in the
 370        <command>CREATE OPERATOR CLASS</command> command.
 371       </para>
 372
 373       <para>
 374        Depending on which operators you have included in the class, the data
 375        type of <varname>query</varname> could vary with the operator, since it will
 376        be whatever type is on the right-hand side of the operator, which might
 377        be different from the indexed data type appearing on the left-hand side.
 378        (The above code skeleton assumes that only one type is possible; if
 379        not, fetching the <varname>query</varname> argument value would have to depend
 380        on the operator.)  It is recommended that the SQL declaration of
 381        the <function>consistent</function> function use the opclass's indexed data
 382        type for the <varname>query</varname> argument, even though the actual type
 383        might be something else depending on the operator.
 384       </para>
 385
 386      </listitem>
 387     </varlistentry>
 388
 389     <varlistentry>
 390      <term><function>union</function></term>
 391      <listitem>
 392       <para>
 393        This method consolidates information in the tree.  Given a set of
 394        entries, this function generates a new index entry that represents
 395        all the given entries.
 396       </para>
 397
 398       <para>
 399         The <acronym>SQL</acronym> declaration of the function must look like this:
 400
 401 <programlisting>
 402 CREATE OR REPLACE FUNCTION my_union(internal, internal)
 403 RETURNS storage_type
 404 AS 'MODULE_PATHNAME'
 405 LANGUAGE C STRICT;
 406 </programlisting>
 407
 408         And the matching code in the C module could then follow this skeleton:
 409
 410 <programlisting>
 411 PG_FUNCTION_INFO_V1(my_union);
 412
 413 Datum
 414 my_union(PG_FUNCTION_ARGS)
 415 {
 416     GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 417     GISTENTRY  *ent = entryvec-&gt;vector;
 418     data_type  *out,
 419                *tmp,
 420                *old;
 421     int         numranges,
 422                 i = 0;
 423
 424     numranges = entryvec-&gt;n;
 425     tmp = DatumGetDataType(ent[0].key);
 426     out = tmp;
 427
 428     if (numranges == 1)
 429     {
 430         out = data_type_deep_copy(tmp);
 431
 432         PG_RETURN_DATA_TYPE_P(out);
 433     }
 434
 435     for (i = 1; i &lt; numranges; i++)
 436     {
 437         old = out;
 438         tmp = DatumGetDataType(ent[i].key);
 439         out = my_union_implementation(out, tmp);
 440     }
 441
 442     PG_RETURN_DATA_TYPE_P(out);
 443 }
 444 </programlisting>
 445       </para>
 446
 447       <para>
 448         As you can see, in this skeleton we're dealing with a data type
 449         where <literal>union(X, Y, Z) = union(union(X, Y), Z)</literal>. It's easy
 450         enough to support data types where this is not the case, by
 451         implementing the proper union algorithm in this
 452         <acronym>GiST</acronym> support method.
 453       </para>
 454
 455       <para>
 456         The result of the <function>union</function> function must be a value of the
 457         index's storage type, whatever that is (it might or might not be
 458         different from the indexed column's type).  The <function>union</function>
 459         function should return a pointer to newly <function>palloc()</function>ed
 460         memory. You can't just return the input value as-is, even if there is
 461         no type change.
 462       </para>
 463
 464       <para>
 465        As shown above, the <function>union</function> function's
 466        first <type>internal</type> argument is actually
 467        a <structname>GistEntryVector</structname> pointer.  The second argument is a
 468        pointer to an integer variable, which can be ignored.  (It used to be
 469        required that the <function>union</function> function store the size of its
 470        result value into that variable, but this is no longer necessary.)
 471       </para>
 472      </listitem>
 473     </varlistentry>
 474
 475     <varlistentry>
 476      <term><function>compress</function></term>
 477      <listitem>
 478       <para>
 479        Converts a data item into a format suitable for physical storage in
 480        an index page.
 481        If the <function>compress</function> method is omitted, data items are stored
 482        in the index without modification.
 483       </para>
 484
 485       <para>
 486         The <acronym>SQL</acronym> declaration of the function must look like this:
 487
 488 <programlisting>
 489 CREATE OR REPLACE FUNCTION my_compress(internal)
 490 RETURNS internal
 491 AS 'MODULE_PATHNAME'
 492 LANGUAGE C STRICT;
 493 </programlisting>
 494
 495         And the matching code in the C module could then follow this skeleton:
 496
 497 <programlisting>
 498 PG_FUNCTION_INFO_V1(my_compress);
 499
 500 Datum
 501 my_compress(PG_FUNCTION_ARGS)
 502 {
 503     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 504     GISTENTRY  *retval;
 505
 506     if (entry-&gt;leafkey)
 507     {
 508         /* replace entry-&gt;key with a compressed version */
 509         compressed_data_type *compressed_data = palloc(sizeof(compressed_data_type));
 510
 511         /* fill *compressed_data from entry-&gt;key ... */
 512
 513         retval = palloc(sizeof(GISTENTRY));
 514         gistentryinit(*retval, PointerGetDatum(compressed_data),
 515                       entry-&gt;rel, entry-&gt;page, entry-&gt;offset, FALSE);
 516     }
 517     else
 518     {
 519         /* typically we needn't do anything with non-leaf entries */
 520         retval = entry;
 521     }
 522
 523     PG_RETURN_POINTER(retval);
 524 }
 525 </programlisting>
 526       </para>
 527
 528       <para>
 529        You have to adapt <replaceable>compressed_data_type</replaceable> to the specific
 530        type you're converting to in order to compress your leaf nodes, of
 531        course.
 532       </para>
 533      </listitem>
 534     </varlistentry>
 535
 536     <varlistentry>
 537      <term><function>decompress</function></term>
 538      <listitem>
 539       <para>
 540        Converts the stored representation of a data item into a format that
 541        can be manipulated by the other GiST methods in the operator class.
 542        If the <function>decompress</function> method is omitted, it is assumed that
 543        the other GiST methods can work directly on the stored data format.
 544        (<function>decompress</function> is not necessarily the reverse of
 545        the <function>compress</function> method; in particular,
 546        if <function>compress</function> is lossy then it's impossible
 547        for <function>decompress</function> to exactly reconstruct the original
 548        data.  <function>decompress</function> is not necessarily equivalent
 549        to <function>fetch</function>, either, since the other GiST methods might not
 550        require full reconstruction of the data.)
 551       </para>
 552
 553       <para>
 554         The <acronym>SQL</acronym> declaration of the function must look like this:
 555
 556 <programlisting>
 557 CREATE OR REPLACE FUNCTION my_decompress(internal)
 558 RETURNS internal
 559 AS 'MODULE_PATHNAME'
 560 LANGUAGE C STRICT;
 561 </programlisting>
 562
 563         And the matching code in the C module could then follow this skeleton:
 564
 565 <programlisting>
 566 PG_FUNCTION_INFO_V1(my_decompress);
 567
 568 Datum
 569 my_decompress(PG_FUNCTION_ARGS)
 570 {
 571     PG_RETURN_POINTER(PG_GETARG_POINTER(0));
 572 }
 573 </programlisting>
 574
 575         The above skeleton is suitable for the case where no decompression
 576         is needed.  (But, of course, omitting the method altogether is even
 577         easier, and is recommended in such cases.)
 578       </para>
 579      </listitem>
 580     </varlistentry>
 581
 582     <varlistentry>
 583      <term><function>penalty</function></term>
 584      <listitem>
 585       <para>
 586        Returns a value indicating the <quote>cost</quote> of inserting the new
 587        entry into a particular branch of the tree.  Items will be inserted
 588        down the path of least <function>penalty</function> in the tree.
 589        Values returned by <function>penalty</function> should be non-negative.
 590        If a negative value is returned, it will be treated as zero.
 591       </para>
 592
 593       <para>
 594         The <acronym>SQL</acronym> declaration of the function must look like this:
 595
 596 <programlisting>
 597 CREATE OR REPLACE FUNCTION my_penalty(internal, internal, internal)
 598 RETURNS internal
 599 AS 'MODULE_PATHNAME'
 600 LANGUAGE C STRICT;  -- in some cases penalty functions need not be strict
 601 </programlisting>
 602
 603         And the matching code in the C module could then follow this skeleton:
 604
 605 <programlisting>
 606 PG_FUNCTION_INFO_V1(my_penalty);
 607
 608 Datum
 609 my_penalty(PG_FUNCTION_ARGS)
 610 {
 611     GISTENTRY  *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
 612     GISTENTRY  *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
 613     float      *penalty = (float *) PG_GETARG_POINTER(2);
 614     data_type  *orig = DatumGetDataType(origentry-&gt;key);
 615     data_type  *new = DatumGetDataType(newentry-&gt;key);
 616
 617     *penalty = my_penalty_implementation(orig, new);
 618     PG_RETURN_POINTER(penalty);
 619 }
 620 </programlisting>
 621
 622         For historical reasons, the <function>penalty</function> function doesn't
 623         just return a <type>float</type> result; instead it has to store the value
 624         at the location indicated by the third argument.  The return
 625         value per se is ignored, though it's conventional to pass back the
 626         address of that argument.
 627       </para>
 628
 629       <para>
 630         The <function>penalty</function> function is crucial to good performance of
 631         the index. It'll get used at insertion time to determine which branch
 632         to follow when choosing where to add the new entry in the tree. At
 633         query time, the more balanced the index, the quicker the lookup.
 634       </para>
 635      </listitem>
 636     </varlistentry>
 637
 638     <varlistentry>
 639      <term><function>picksplit</function></term>
 640      <listitem>
 641       <para>
 642        When an index page split is necessary, this function decides which
 643        entries on the page are to stay on the old page, and which are to move
 644        to the new page.
 645       </para>
 646
 647       <para>
 648         The <acronym>SQL</acronym> declaration of the function must look like this:
 649
 650 <programlisting>
 651 CREATE OR REPLACE FUNCTION my_picksplit(internal, internal)
 652 RETURNS internal
 653 AS 'MODULE_PATHNAME'
 654 LANGUAGE C STRICT;
 655 </programlisting>
 656
 657         And the matching code in the C module could then follow this skeleton:
 658
 659 <programlisting>
 660 PG_FUNCTION_INFO_V1(my_picksplit);
 661
 662 Datum
 663 my_picksplit(PG_FUNCTION_ARGS)
 664 {
 665     GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 666     GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
 667     OffsetNumber maxoff = entryvec-&gt;n - 1;
 668     GISTENTRY  *ent = entryvec-&gt;vector;
 669     int         i,
 670                 nbytes;
 671     OffsetNumber *left,
 672                *right;
 673     data_type  *tmp_union;
 674     data_type  *unionL;
 675     data_type  *unionR;
 676     GISTENTRY **raw_entryvec;
 677
 678     maxoff = entryvec-&gt;n - 1;
 679     nbytes = (maxoff + 1) * sizeof(OffsetNumber);
 680
 681     v-&gt;spl_left = (OffsetNumber *) palloc(nbytes);
 682     left = v-&gt;spl_left;
 683     v-&gt;spl_nleft = 0;
 684
 685     v-&gt;spl_right = (OffsetNumber *) palloc(nbytes);
 686     right = v-&gt;spl_right;
 687     v-&gt;spl_nright = 0;
 688
 689     unionL = NULL;
 690     unionR = NULL;
 691
 692     /* Initialize the raw entry vector. */
 693     raw_entryvec = (GISTENTRY **) malloc(entryvec-&gt;n * sizeof(void *));
 694     for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
 695         raw_entryvec[i] = &amp;(entryvec-&gt;vector[i]);
 696
 697     for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
 698     {
 699         int         real_index = raw_entryvec[i] - entryvec-&gt;vector;
 700
 701         tmp_union = DatumGetDataType(entryvec-&gt;vector[real_index].key);
 702         Assert(tmp_union != NULL);
 703
 704         /*
 705          * Choose where to put the index entries and update unionL and unionR
 706          * accordingly. Append the entries to either v-&gt;spl_left or
 707          * v-&gt;spl_right, and care about the counters.
 708          */
 709
 710         if (my_choice_is_left(unionL, curl, unionR, curr))
 711         {
 712             if (unionL == NULL)
 713                 unionL = tmp_union;
 714             else
 715                 unionL = my_union_implementation(unionL, tmp_union);
 716
 717             *left = real_index;
 718             ++left;
 719             ++(v-&gt;spl_nleft);
 720         }
 721         else
 722         {
 723             /*
 724              * Same on the right
 725              */
 726         }
 727     }
 728
 729     v-&gt;spl_ldatum = DataTypeGetDatum(unionL);
 730     v-&gt;spl_rdatum = DataTypeGetDatum(unionR);
 731     PG_RETURN_POINTER(v);
 732 }
 733 </programlisting>
 734
 735        Notice that the <function>picksplit</function> function's result is delivered
 736        by modifying the passed-in <structname>v</structname> structure.  The return
 737        value per se is ignored, though it's conventional to pass back the
 738        address of <structname>v</structname>.
 739       </para>
 740
 741       <para>
 742         Like <function>penalty</function>, the <function>picksplit</function> function
 743         is crucial to good performance of the index.  Designing suitable
 744         <function>penalty</function> and <function>picksplit</function> implementations
 745         is where the challenge of implementing well-performing
 746         <acronym>GiST</acronym> indexes lies.
 747       </para>
 748      </listitem>
 749     </varlistentry>
 750
 751     <varlistentry>
 752      <term><function>same</function></term>
 753      <listitem>
 754       <para>
 755        Returns true if two index entries are identical, false otherwise.
 756        (An <quote>index entry</quote> is a value of the index's storage type,
 757        not necessarily the original indexed column's type.)
 758       </para>
 759
 760       <para>
 761         The <acronym>SQL</acronym> declaration of the function must look like this:
 762
 763 <programlisting>
 764 CREATE OR REPLACE FUNCTION my_same(storage_type, storage_type, internal)
 765 RETURNS internal
 766 AS 'MODULE_PATHNAME'
 767 LANGUAGE C STRICT;
 768 </programlisting>
 769
 770         And the matching code in the C module could then follow this skeleton:
 771
 772 <programlisting>
 773 PG_FUNCTION_INFO_V1(my_same);
 774
 775 Datum
 776 my_same(PG_FUNCTION_ARGS)
 777 {
 778     prefix_range *v1 = PG_GETARG_PREFIX_RANGE_P(0);
 779     prefix_range *v2 = PG_GETARG_PREFIX_RANGE_P(1);
 780     bool       *result = (bool *) PG_GETARG_POINTER(2);
 781
 782     *result = my_eq(v1, v2);
 783     PG_RETURN_POINTER(result);
 784 }
 785 </programlisting>
 786
 787         For historical reasons, the <function>same</function> function doesn't
 788         just return a Boolean result; instead it has to store the flag
 789         at the location indicated by the third argument.  The return
 790         value per se is ignored, though it's conventional to pass back the
 791         address of that argument.
 792       </para>
 793      </listitem>
 794     </varlistentry>
 795
 796     <varlistentry>
 797      <term><function>distance</function></term>
 798      <listitem>
 799       <para>
 800        Given an index entry <literal>p</literal> and a query value <literal>q</literal>,
 801        this function determines the index entry's
 802        <quote>distance</quote> from the query value.  This function must be
 803        supplied if the operator class contains any ordering operators.
 804        A query using the ordering operator will be implemented by returning
 805        index entries with the smallest <quote>distance</quote> values first,
 806        so the results must be consistent with the operator's semantics.
 807        For a leaf index entry the result just represents the distance to
 808        the index entry; for an internal tree node, the result must be the
 809        smallest distance that any child entry could have.
 810       </para>
 811
 812       <para>
 813         The <acronym>SQL</acronym> declaration of the function must look like this:
 814
 815 <programlisting>
 816 CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid, internal)
 817 RETURNS float8
 818 AS 'MODULE_PATHNAME'
 819 LANGUAGE C STRICT;
 820 </programlisting>
 821
 822         And the matching code in the C module could then follow this skeleton:
 823
 824 <programlisting>
 825 PG_FUNCTION_INFO_V1(my_distance);
 826
 827 Datum
 828 my_distance(PG_FUNCTION_ARGS)
 829 {
 830     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 831     data_type  *query = PG_GETARG_DATA_TYPE_P(1);
 832     StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
 833     /* Oid subtype = PG_GETARG_OID(3); */
 834     /* bool *recheck = (bool *) PG_GETARG_POINTER(4); */
 835     data_type  *key = DatumGetDataType(entry-&gt;key);
 836     double      retval;
 837
 838     /*
 839      * determine return value as a function of strategy, key and query.
 840      */
 841
 842     PG_RETURN_FLOAT8(retval);
 843 }
 844 </programlisting>
 845
 846        The arguments to the <function>distance</function> function are identical to
 847        the arguments of the <function>consistent</function> function.
 848       </para>
 849
 850       <para>
 851        Some approximation is allowed when determining the distance, so long
 852        as the result is never greater than the entry's actual distance. Thus,
 853        for example, distance to a bounding box is usually sufficient in
 854        geometric applications.  For an internal tree node, the distance
 855        returned must not be greater than the distance to any of the child
 856        nodes. If the returned distance is not exact, the function must set
 857        <literal>*recheck</literal> to true. (This is not necessary for internal tree
 858        nodes; for them, the calculation is always assumed to be inexact.) In
 859        this case the executor will calculate the accurate distance after
 860        fetching the tuple from the heap, and reorder the tuples if necessary.
 861       </para>
 862
 863       <para>
 864        If the distance function returns <literal>*recheck = true</literal> for any
 865        leaf node, the original ordering operator's return type must
 866        be <type>float8</type> or <type>float4</type>, and the distance function's
 867        result values must be comparable to those of the original ordering
 868        operator, since the executor will sort using both distance function
 869        results and recalculated ordering-operator results.  Otherwise, the
 870        distance function's result values can be any finite <type>float8</type>
 871        values, so long as the relative order of the result values matches the
 872        order returned by the ordering operator.  (Infinity and minus infinity
 873        are used internally to handle cases such as nulls, so it is not
 874        recommended that <function>distance</function> functions return these values.)
 875       </para>
 876
 877      </listitem>
 878     </varlistentry>
 879
 880     <varlistentry>
 881      <term><function>fetch</function></term>
 882      <listitem>
 883       <para>
 884        Converts the compressed index representation of a data item into the
 885        original data type, for index-only scans. The returned data must be an
 886        exact, non-lossy copy of the originally indexed value.
 887       </para>
 888
 889       <para>
 890         The <acronym>SQL</acronym> declaration of the function must look like this:
 891
 892 <programlisting>
 893 CREATE OR REPLACE FUNCTION my_fetch(internal)
 894 RETURNS internal
 895 AS 'MODULE_PATHNAME'
 896 LANGUAGE C STRICT;
 897 </programlisting>
 898
 899         The argument is a pointer to a <structname>GISTENTRY</structname> struct. On
 900         entry, its <structfield>key</structfield> field contains a non-NULL leaf datum in
 901         compressed form. The return value is another <structname>GISTENTRY</structname>
 902         struct, whose <structfield>key</structfield> field contains the same datum in its
 903         original, uncompressed form. If the opclass's compress function does
 904         nothing for leaf entries, the <function>fetch</function> method can return the
 905         argument as-is.  Or, if the opclass does not have a compress function,
 906         the <function>fetch</function> method can be omitted as well, since it would
 907         necessarily be a no-op.
 908        </para>
 909
 910        <para>
 911         The matching code in the C module could then follow this skeleton:
 912
 913 <programlisting>
 914 PG_FUNCTION_INFO_V1(my_fetch);
 915
 916 Datum
 917 my_fetch(PG_FUNCTION_ARGS)
 918 {
 919     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 920     input_data_type *in = DatumGetPointer(entry->key);
 921     fetched_data_type *fetched_data;
 922     GISTENTRY  *retval;
 923
 924     retval = palloc(sizeof(GISTENTRY));
 925     fetched_data = palloc(sizeof(fetched_data_type));
 926
 927     /*
 928      * Convert 'fetched_data' into the a Datum of the original datatype.
 929      */
 930
 931     /* fill *retval from fetched_data. */
 932     gistentryinit(*retval, PointerGetDatum(converted_datum),
 933                   entry->rel, entry->page, entry->offset, FALSE);
 934
 935     PG_RETURN_POINTER(retval);
 936 }
 937 </programlisting>
 938       </para>
 939
 940       <para>
 941        If the compress method is lossy for leaf entries, the operator class
 942        cannot support index-only scans, and must not define
 943        a <function>fetch</function> function.
 944       </para>
 945
 946      </listitem>
 947     </varlistentry>
 948
 949     <varlistentry>
 950      <term><function>options</function></term>
 951      <listitem>
 952       <para>
 953        Allows definition of user-visible parameters that control operator
 954        class behavior.
 955       </para>
 956
 957       <para>
 958         The <acronym>SQL</acronym> declaration of the function must look like this:
 959
 960 <programlisting>
 961 CREATE OR REPLACE FUNCTION my_options(internal)
 962 RETURNS void
 963 AS 'MODULE_PATHNAME'
 964 LANGUAGE C STRICT;
 965 </programlisting>
 966       </para>
 967
 968       <para>
 969        The function is passed a pointer to a <structname>local_relopts</structname>
 970        struct, which needs to be filled with a set of operator class
 971        specific options.  The options can be accessed from other support
 972        functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and
 973        <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros.
 974       </para>
 975
 976        <para>
 977         An example implementation of my_options() and parameters use
 978         from other support functions are given below:
 979
 980 <programlisting>
 981 typedef enum MyEnumType
 982 {
 983     MY_ENUM_ON,
 984     MY_ENUM_OFF,
 985     MY_ENUM_AUTO
 986 } MyEnumType;
 987
 988 typedef struct
 989 {
 990     int32   vl_len_;    /* varlena header (do not touch directly!) */
 991     int     int_param;  /* integer parameter */
 992     double  real_param; /* real parameter */
 993     MyEnumType enum_param; /* enum parameter */
 994     int     str_param;  /* string parameter */
 995 } MyOptionsStruct;
 996
 997 /* String representation of enum values */
 998 static relopt_enum_elt_def myEnumValues[] =
 999 {
1000     {"on", MY_ENUM_ON},
1001     {"off", MY_ENUM_OFF},
1002     {"auto", MY_ENUM_AUTO},
1003     {(const char *) NULL}   /* list terminator */
1004 };
1005
1006 static char *str_param_default = "default";
1007
1008 /*
1009  * Sample validator: checks that string is not longer than 8 bytes.
1010  */
1011 static void
1012 validate_my_string_relopt(const char *value)
1013 {
1014     if (strlen(value) > 8)
1015         ereport(ERROR,
1016                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
1017                  errmsg("str_param must be at most 8 bytes")));
1018 }
1019
1020 /*
1021  * Sample filler: switches characters to lower case.
1022  */
1023 static Size
1024 fill_my_string_relopt(const char *value, void *ptr)
1025 {
1026     char   *tmp = str_tolower(value, strlen(value), DEFAULT_COLLATION_OID);
1027     int     len = strlen(tmp);
1028
1029     if (ptr)
1030         strcpy((char *) ptr, tmp);
1031
1032     pfree(tmp);
1033     return len + 1;
1034 }
1035
1036 PG_FUNCTION_INFO_V1(my_options);
1037
1038 Datum
1039 my_options(PG_FUNCTION_ARGS)
1040 {
1041     local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);
1042
1043     init_local_reloptions(relopts, sizeof(MyOptionsStruct));
1044     add_local_int_reloption(relopts, "int_param", "integer parameter",
1045                             100, 0, 1000000,
1046                             offsetof(MyOptionsStruct, int_param));
1047     add_local_real_reloption(relopts, "real_param", "real parameter",
1048                              1.0, 0.0, 1000000.0,
1049                              offsetof(MyOptionsStruct, real_param));
1050     add_local_enum_reloption(relopts, "enum_param", "enum parameter",
1051                              myEnumValues, MY_ENUM_ON,
1052                              "Valid values are: \"on\", \"off\" and \"auto\".",
1053                              offsetof(MyOptionsStruct, enum_param));
1054     add_local_string_reloption(relopts, "str_param", "string parameter",
1055                                str_param_default,
1056                                &amp;validate_my_string_relopt,
1057                                &amp;fill_my_string_relopt,
1058                                offsetof(MyOptionsStruct, str_param));
1059
1060     PG_RETURN_VOID();
1061 }
1062
1063 PG_FUNCTION_INFO_V1(my_compress);
1064
1065 Datum
1066 my_compress(PG_FUNCTION_ARGS)
1067 {
1068     int     int_param = 100;
1069     double  real_param = 1.0;
1070     MyEnumType enum_param = MY_ENUM_ON;
1071     char   *str_param = str_param_default;
1072
1073     /*
1074      * Normally, when opclass contains 'options' method, then options are always
1075      * passed to support functions.  However, if you add 'options' method to
1076      * existing opclass, previously defined indexes have no options, so the
1077      * check is required.
1078      */
1079     if (PG_HAS_OPCLASS_OPTIONS())
1080     {
1081         MyOptionsStruct *options = (MyOptionsStruct *) PG_GET_OPCLASS_OPTIONS();
1082
1083         int_param = options->int_param;
1084         real_param = options->real_param;
1085         enum_param = options->enum_param;
1086         str_param = GET_STRING_RELOPTION(options, str_param);
1087     }
1088
1089     /* the rest implementation of support function */
1090 }
1091
1092 </programlisting>
1093       </para>
1094
1095       <para>
1096        Since the representation of the key in <acronym>GiST</acronym> is
1097        flexible, it may depend on user-specified parameters.  For instance,
1098        the length of key signature may be specified.  See
1099        <literal>gtsvector_options()</literal> for example.
1100       </para>
1101      </listitem>
1102     </varlistentry>
1103
1104     <varlistentry>
1105      <term><function>sortsupport</function></term>
1106      <listitem>
1107       <para>
1108        Returns a comparator function to sort data in a way that preserves
1109        locality. It is used by <command>CREATE INDEX</command> and
1110        <command>REINDEX</command> commands. The quality of the created index
1111        depends on how well the sort order determined by the comparator function
1112        preserves locality of the inputs.
1113       </para>
1114       <para>
1115        The <function>sortsupport</function> method is optional. If it is not
1116        provided, <command>CREATE INDEX</command> builds the index by inserting
1117        each tuple to the tree using the <function>penalty</function> and
1118        <function>picksplit</function> functions, which is much slower.
1119       </para>
1120
1121       <para>
1122        The <acronym>SQL</acronym> declaration of the function must look like
1123        this:
1124
1125 <programlisting>
1126 CREATE OR REPLACE FUNCTION my_sortsupport(internal)
1127 RETURNS void
1128 AS 'MODULE_PATHNAME'
1129 LANGUAGE C STRICT;
1130 </programlisting>
1131
1132        The argument is a pointer to a <structname>SortSupport</structname>
1133        struct. At a minimum, the function must fill in its comparator field.
1134        The comparator takes three arguments: two Datums to compare, and
1135        a pointer to the <structname>SortSupport</structname> struct. The
1136        Datums are the two indexed values in the format that they are stored
1137        in the index; that is, in the format returned by the
1138        <function>compress</function> method. The full API is defined in
1139        <filename>src/include/utils/sortsupport.h</filename>.
1140        </para>
1141
1142        <para>
1143         The matching code in the C module could then follow this skeleton:
1144
1145 <programlisting>
1146 PG_FUNCTION_INFO_V1(my_sortsupport);
1147
1148 static int
1149 my_fastcmp(Datum x, Datum y, SortSupport ssup)
1150 {
1151   /* establish order between x and y by computing some sorting value z */
1152
1153   int z1 = ComputeSpatialCode(x);
1154   int z2 = ComputeSpatialCode(y);
1155
1156   return z1 == z2 ? 0 : z1 > z2 ? 1 : -1;
1157 }
1158
1159 Datum
1160 my_sortsupport(PG_FUNCTION_ARGS)
1161 {
1162   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
1163
1164   ssup->comparator = my_fastcmp;
1165   PG_RETURN_VOID();
1166 }
1167 </programlisting>
1168       </para>
1169      </listitem>
1170     </varlistentry>
1171
1172     <varlistentry>
1173      <term><function>stratnum</function></term>
1174      <listitem>
1175       <para>
1176        Given a <literal>CompareType</literal> value from
1177        <filename>src/include/nodes/primnodes.h</filename>, returns a strategy
1178        number used by this operator class for matching functionality.  The
1179        function should return <literal>InvalidStrategy</literal> if the
1180        operator class has no matching strategy.
1181       </para>
1182
1183       <para>
1184        This is used for temporal index constraints (i.e., <literal>PRIMARY
1185        KEY</literal> and <literal>UNIQUE</literal>).  If the operator class
1186        provides this function and it returns results for
1187        <literal>COMPARE_EQ</literal>, it can be used in the
1188        non-<literal>WITHOUT OVERLAPS</literal> part(s) of an index constraint.
1189       </para>
1190
1191       <para>
1192        The <acronym>SQL</acronym> declaration of the function must look like
1193        this:
1194
1195 <programlisting>
1196 CREATE OR REPLACE FUNCTION my_stratnum(integer)
1197 RETURNS smallint
1198 AS 'MODULE_PATHNAME'
1199 LANGUAGE C STRICT;
1200 </programlisting>
1201       </para>
1202
1203        <para>
1204         The matching code in the C module could then follow this skeleton:
1205
1206 <programlisting>
1207 PG_FUNCTION_INFO_V1(my_stratnum);
1208
1209 Datum
1210 my_stratnum(PG_FUNCTION_ARGS)
1211 {
1212     CompareType cmptype = PG_GETARG_INT32(0);
1213     StrategyNumber ret = InvalidStrategy;
1214
1215     switch (cmptype)
1216     {
1217         case COMPARE_EQ:
1218             ret = BTEqualStrategyNumber;
1219     }
1220
1221     PG_RETURN_UINT16(ret);
1222 }
1223 </programlisting>
1224       </para>
1225
1226       <para>
1227        One translation function is provided by
1228        <productname>PostgreSQL</productname>:
1229        <literal>gist_stratnum_common</literal> is for operator classes that
1230        use the <literal>RT*StrategyNumber</literal> constants.
1231        The <literal>btree_gist</literal>
1232        extension defines a second translation function,
1233        <literal>gist_stratnum_btree</literal>, for operator classes that use
1234        the <literal>BT*StrategyNumber</literal> constants.
1235       </para>
1236      </listitem>
1237     </varlistentry>
1238   </variablelist>
1239
1240   <para>
1241    All the GiST support methods are normally called in short-lived memory
1242    contexts; that is, <varname>CurrentMemoryContext</varname> will get reset after
1243    each tuple is processed.  It is therefore not very important to worry about
1244    pfree'ing everything you palloc.  However, in some cases it's useful for a
1245    support method to cache data across repeated calls.  To do that, allocate
1246    the longer-lived data in <literal>fcinfo-&gt;flinfo-&gt;fn_mcxt</literal>, and
1247    keep a pointer to it in <literal>fcinfo-&gt;flinfo-&gt;fn_extra</literal>.  Such
1248    data will survive for the life of the index operation (e.g., a single GiST
1249    index scan, index build, or index tuple insertion).  Be careful to pfree
1250    the previous value when replacing a <literal>fn_extra</literal> value, or the leak
1251    will accumulate for the duration of the operation.
1252   </para>
1253
1254 </sect2>
1255
1256 <sect2 id="gist-implementation">
1257  <title>Implementation</title>
1258
1259  <sect3 id="gist-buffering-build">
1260   <title>GiST Index Build Methods</title>
1261
1262   <para>
1263    The simplest way to build a GiST index is just to insert all the entries,
1264    one by one.  This tends to be slow for large indexes, because if the
1265    index tuples are scattered across the index and the index is large enough
1266    to not fit in cache, a lot of random I/O will be
1267    needed.  <productname>PostgreSQL</productname> supports two alternative
1268    methods for initial build of a GiST index: <firstterm>sorted</firstterm>
1269    and <firstterm>buffered</firstterm> modes.
1270   </para>
1271
1272   <para>
1273    The sorted method is only available if each of the opclasses used by the
1274    index provides a <function>sortsupport</function> function, as described
1275    in <xref linkend="gist-extensibility"/>.  If they do, this method is
1276    usually the best, so it is used by default.
1277   </para>
1278
1279   <para>
1280    The buffered method works by not inserting tuples directly into the index
1281    right away.  It can dramatically reduce the amount of random I/O needed
1282    for non-ordered data sets.  For well-ordered data sets the benefit is
1283    smaller or non-existent, because only a small number of pages receive new
1284    tuples at a time, and those pages fit in cache even if the index as a
1285    whole does not.
1286   </para>
1287
1288   <para>
1289    The buffered method needs to call the <function>penalty</function>
1290    function more often than the simple method does, which consumes some
1291    extra CPU resources. Also, the buffers need temporary disk space, up to
1292    the size of the resulting index. Buffering can also influence the quality
1293    of the resulting index, in both positive and negative directions. That
1294    influence depends on various factors, like the distribution of the input
1295    data and the operator class implementation.
1296   </para>
1297
1298   <para>
1299    If sorting is not possible, then by default a GiST index build switches
1300    to the buffering method when the index size reaches
1301    <xref linkend="guc-effective-cache-size"/>.  Buffering can be manually
1302    forced or prevented by the <literal>buffering</literal> parameter to the
1303    CREATE INDEX command.  The default behavior is good for most cases, but
1304    turning buffering off might speed up the build somewhat if the input data
1305    is ordered.
1306   </para>
1307
1308  </sect3>
1309 </sect2>
1310
1311 <sect2 id="gist-examples">
1312  <title>Examples</title>
1313
1314  <para>
1315   The <productname>PostgreSQL</productname> source distribution includes
1316   several examples of index methods implemented using
1317   <acronym>GiST</acronym>.  The core system currently provides text search
1318   support (indexing for <type>tsvector</type> and <type>tsquery</type>) as well as
1319   R-Tree equivalent functionality for some of the built-in geometric data types
1320   (see <filename>src/backend/access/gist/gistproc.c</filename>).  The following
1321   <filename>contrib</filename> modules also contain <acronym>GiST</acronym>
1322   operator classes:
1323
1324  <variablelist>
1325   <varlistentry>
1326    <term><filename>btree_gist</filename></term>
1327    <listitem>
1328     <para>B-tree equivalent functionality for several data types</para>
1329    </listitem>
1330   </varlistentry>
1331
1332   <varlistentry>
1333    <term><filename>cube</filename></term>
1334    <listitem>
1335     <para>Indexing for multidimensional cubes</para>
1336    </listitem>
1337   </varlistentry>
1338
1339   <varlistentry>
1340    <term><filename>hstore</filename></term>
1341    <listitem>
1342     <para>Module for storing (key, value) pairs</para>
1343    </listitem>
1344   </varlistentry>
1345
1346   <varlistentry>
1347    <term><filename>intarray</filename></term>
1348    <listitem>
1349     <para>RD-Tree for one-dimensional array of int4 values</para>
1350    </listitem>
1351   </varlistentry>
1352
1353   <varlistentry>
1354    <term><filename>ltree</filename></term>
1355    <listitem>
1356     <para>Indexing for tree-like structures</para>
1357    </listitem>
1358   </varlistentry>
1359
1360   <varlistentry>
1361    <term><filename>pg_trgm</filename></term>
1362    <listitem>
1363     <para>Text similarity using trigram matching</para>
1364    </listitem>
1365   </varlistentry>
1366
1367   <varlistentry>
1368    <term><filename>seg</filename></term>
1369    <listitem>
1370     <para>Indexing for <quote>float ranges</quote></para>
1371    </listitem>
1372   </varlistentry>
1373  </variablelist>
1374  </para>
1375
1376 </sect2>
1377
1378 </sect1>