doc/src/sgml/xoper.sgml

   1 <!-- doc/src/sgml/xoper.sgml -->
   2
   3  <sect1 id="xoper">
   4   <title>User-Defined Operators</title>
   5
   6   <indexterm zone="xoper">
   7    <primary>operator</primary>
   8    <secondary>user-defined</secondary>
   9   </indexterm>
  10
  11   <para>
  12    Every operator is <quote>syntactic sugar</quote> for a call to an
  13    underlying function that does the real work; so you must
  14    first create the underlying function before you can create
  15    the operator.  However, an operator is <emphasis>not merely</emphasis>
  16    syntactic sugar, because it carries additional information
  17    that helps the query planner optimize queries that use the
  18    operator.  The next section will be devoted to explaining
  19    that additional information.
  20   </para>
  21
  22   <para>
  23    <productname>PostgreSQL</productname> supports prefix
  24    and infix operators.  Operators can be
  25    overloaded;<indexterm><primary>overloading</primary><secondary>operators</secondary></indexterm>
  26    that is, the same operator name can be used for different operators
  27    that have different numbers and types of operands.  When a query is
  28    executed, the system determines the operator to call from the
  29    number and types of the provided operands.
  30   </para>
  31
  32   <para>
  33    Here is an example of creating an operator for adding two complex
  34    numbers.  We assume we've already created the definition of type
  35    <type>complex</type> (see <xref linkend="xtypes"/>).  First we need a
  36    function that does the work, then we can define the operator:
  37
  38 <programlisting>
  39 CREATE FUNCTION complex_add(complex, complex)
  40     RETURNS complex
  41     AS '<replaceable>filename</replaceable>', 'complex_add'
  42     LANGUAGE C IMMUTABLE STRICT;
  43
  44 CREATE OPERATOR + (
  45     leftarg = complex,
  46     rightarg = complex,
  47     function = complex_add,
  48     commutator = +
  49 );
  50 </programlisting>
  51   </para>
  52
  53   <para>
  54    Now we could execute a query like this:
  55
  56 <screen>
  57 SELECT (a + b) AS c FROM test_complex;
  58
  59         c
  60 -----------------
  61  (5.2,6.05)
  62  (133.42,144.95)
  63 </screen>
  64   </para>
  65
  66   <para>
  67    We've shown how to create a binary operator here.  To create a prefix
  68    operator, just omit the <literal>leftarg</literal>.
  69    The <literal>function</literal>
  70    clause and the argument clauses are the only required items in
  71    <command>CREATE OPERATOR</command>.  The <literal>commutator</literal>
  72    clause shown in the example is an optional hint to the query
  73    optimizer.  Further details about <literal>commutator</literal> and other
  74    optimizer hints appear in the next section.
  75   </para>
  76  </sect1>
  77
  78   <sect1 id="xoper-optimization">
  79    <title>Operator Optimization Information</title>
  80
  81   <indexterm zone="xoper-optimization">
  82    <primary>optimization information</primary>
  83    <secondary>for operators</secondary>
  84   </indexterm>
  85
  86    <para>
  87     A <productname>PostgreSQL</productname> operator definition can include
  88     several optional clauses that tell the system useful things about how
  89     the operator behaves.  These clauses should be provided whenever
  90     appropriate, because they can make for considerable speedups in execution
  91     of queries that use the operator.  But if you provide them, you must be
  92     sure that they are right!  Incorrect use of an optimization clause can
  93     result in slow queries, subtly wrong output, or other Bad Things.
  94     You can always leave out an optimization clause if you are not sure
  95     about it; the only consequence is that queries might run slower than
  96     they need to.
  97    </para>
  98
  99    <para>
 100     Additional optimization clauses might be added in future versions of
 101     <productname>PostgreSQL</productname>.  The ones described here are all
 102     the ones that release &version; understands.
 103    </para>
 104
 105    <para>
 106     It is also possible to attach a planner support function to the function
 107     that underlies an operator, providing another way of telling the system
 108     about the behavior of the operator.
 109     See <xref linkend="xfunc-optimization"/> for more information.
 110    </para>
 111
 112    <sect2 id="xoper-commutator">
 113     <title><literal>COMMUTATOR</literal></title>
 114
 115     <para>
 116      The <literal>COMMUTATOR</literal> clause, if provided, names an operator that is the
 117      commutator of the operator being defined.  We say that operator A is the
 118      commutator of operator B if (x A y) equals (y B x) for all possible input
 119      values x, y.  Notice that B is also the commutator of A.  For example,
 120      operators <literal>&lt;</literal> and <literal>&gt;</literal> for a particular data type are usually each others'
 121      commutators, and operator <literal>+</literal> is usually commutative with itself.
 122      But operator <literal>-</literal> is usually not commutative with anything.
 123     </para>
 124
 125     <para>
 126      The left operand type of a commutable operator is the same as the
 127      right operand type of its commutator, and vice versa.  So the name of
 128      the commutator operator is all that <productname>PostgreSQL</productname>
 129      needs to be given to look up the commutator, and that's all that needs to
 130      be provided in the <literal>COMMUTATOR</literal> clause.
 131     </para>
 132
 133     <para>
 134      It's critical to provide commutator information for operators that
 135      will be used in indexes and join clauses, because this allows the
 136      query optimizer to <quote>flip around</quote> such a clause to the forms
 137      needed for different plan types.  For example, consider a query with
 138      a WHERE clause like <literal>tab1.x = tab2.y</literal>, where <literal>tab1.x</literal>
 139      and <literal>tab2.y</literal> are of a user-defined type, and suppose that
 140      <literal>tab2.y</literal> is indexed.  The optimizer cannot generate an
 141      index scan unless it can determine how to flip the clause around to
 142      <literal>tab2.y = tab1.x</literal>, because the index-scan machinery expects
 143      to see the indexed column on the left of the operator it is given.
 144      <productname>PostgreSQL</productname> will <emphasis>not</emphasis> simply
 145      assume that this is a valid transformation &mdash; the creator of the
 146      <literal>=</literal> operator must specify that it is valid, by marking the
 147      operator with commutator information.
 148     </para>
 149    </sect2>
 150
 151    <sect2 id="xoper-negator">
 152     <title><literal>NEGATOR</literal></title>
 153
 154     <para>
 155      The <literal>NEGATOR</literal> clause, if provided, names an operator that is the
 156      negator of the operator being defined.  We say that operator A
 157      is the negator of operator B if both return Boolean results and
 158      (x A y) equals NOT (x B y) for all possible inputs x, y.
 159      Notice that B is also the negator of A.
 160      For example, <literal>&lt;</literal> and <literal>&gt;=</literal> are a negator pair for most data types.
 161      An operator can never validly be its own negator.
 162     </para>
 163
 164    <para>
 165     Unlike commutators, a pair of unary operators could validly be marked
 166     as each other's negators; that would mean (A x) equals NOT (B x)
 167     for all x.
 168    </para>
 169
 170    <para>
 171     An operator's negator must have the same left and/or right operand types
 172     as the operator to be defined, so just as with <literal>COMMUTATOR</literal>, only the operator
 173     name need be given in the <literal>NEGATOR</literal> clause.
 174    </para>
 175
 176    <para>
 177     Providing a negator is very helpful to the query optimizer since
 178     it allows expressions like <literal>NOT (x = y)</literal> to be simplified into
 179     <literal>x &lt;&gt; y</literal>.  This comes up more often than you might think, because
 180     <literal>NOT</literal> operations can be inserted as a consequence of other rearrangements.
 181    </para>
 182   </sect2>
 183
 184   <sect2 id="xoper-restrict">
 185    <title><literal>RESTRICT</literal></title>
 186
 187    <para>
 188     The <literal>RESTRICT</literal> clause, if provided, names a restriction selectivity
 189     estimation function for the operator.  (Note that this is a function
 190     name, not an operator name.)  <literal>RESTRICT</literal> clauses only make sense for
 191     binary operators that return <type>boolean</type>.  The idea behind a restriction
 192     selectivity estimator is to guess what fraction of the rows in a
 193     table will satisfy a <literal>WHERE</literal>-clause condition of the form:
 194 <programlisting>
 195 column OP constant
 196 </programlisting>
 197     for the current operator and a particular constant value.
 198     This assists the optimizer by
 199     giving it some idea of how many rows will be eliminated by <literal>WHERE</literal>
 200     clauses that have this form.  (What happens if the constant is on
 201     the left, you might be wondering?  Well, that's one of the things that
 202     <literal>COMMUTATOR</literal> is for...)
 203    </para>
 204
 205    <para>
 206     Writing new restriction selectivity estimation functions is far beyond
 207     the scope of this chapter, but fortunately you can usually just use
 208     one of the system's standard estimators for many of your own operators.
 209     These are the standard restriction estimators:
 210     <simplelist>
 211      <member><function>eqsel</function> for <literal>=</literal></member>
 212      <member><function>neqsel</function> for <literal>&lt;&gt;</literal></member>
 213      <member><function>scalarltsel</function> for <literal>&lt;</literal></member>
 214      <member><function>scalarlesel</function> for <literal>&lt;=</literal></member>
 215      <member><function>scalargtsel</function> for <literal>&gt;</literal></member>
 216      <member><function>scalargesel</function> for <literal>&gt;=</literal></member>
 217     </simplelist>
 218    </para>
 219
 220    <para>
 221     You can frequently get away with using either <function>eqsel</function> or <function>neqsel</function> for
 222     operators that have very high or very low selectivity, even if they
 223     aren't really equality or inequality.  For example, the
 224     approximate-equality geometric operators use <function>eqsel</function> on the assumption that
 225     they'll usually only match a small fraction of the entries in a table.
 226    </para>
 227
 228    <para>
 229     You can use <function>scalarltsel</function>, <function>scalarlesel</function>,
 230     <function>scalargtsel</function> and <function>scalargesel</function> for comparisons on
 231     data types that have some sensible means of being converted into numeric
 232     scalars for range comparisons.  If possible, add the data type to those
 233     understood by the function <function>convert_to_scalar()</function> in
 234     <filename>src/backend/utils/adt/selfuncs.c</filename>.
 235     (Eventually, this function should be replaced by per-data-type functions
 236     identified through a column of the <classname>pg_type</classname> system catalog; but that hasn't happened
 237     yet.)  If you do not do this, things will still work, but the optimizer's
 238     estimates won't be as good as they could be.
 239    </para>
 240
 241    <para>
 242     Another useful built-in selectivity estimation function
 243     is <function>matchingsel</function>, which will work for almost any
 244     binary operator, if standard MCV and/or histogram statistics are
 245     collected for the input data type(s).  Its default estimate is set to
 246     twice the default estimate used in <function>eqsel</function>, making
 247     it most suitable for comparison operators that are somewhat less
 248     strict than equality.  (Or you could call the
 249     underlying <function>generic_restriction_selectivity</function>
 250     function, providing a different default estimate.)
 251    </para>
 252
 253    <para>
 254     There are additional selectivity estimation functions designed for geometric
 255     operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>,
 256     and <function>contsel</function>.  At this writing these are just stubs, but you might want
 257     to use them (or even better, improve them) anyway.
 258    </para>
 259    </sect2>
 260
 261    <sect2 id="xoper-join">
 262     <title><literal>JOIN</literal></title>
 263
 264     <para>
 265      The <literal>JOIN</literal> clause, if provided, names a join selectivity
 266      estimation function for the operator.  (Note that this is a function
 267      name, not an operator name.)  <literal>JOIN</literal> clauses only make sense for
 268      binary operators that return <type>boolean</type>.  The idea behind a join
 269      selectivity estimator is to guess what fraction of the rows in a
 270      pair of tables will satisfy a <literal>WHERE</literal>-clause condition of the form:
 271 <programlisting>
 272 table1.column1 OP table2.column2
 273 </programlisting>
 274      for the current operator.  As with the <literal>RESTRICT</literal> clause, this helps
 275      the optimizer very substantially by letting it figure out which
 276      of several possible join sequences is likely to take the least work.
 277     </para>
 278
 279     <para>
 280      As before, this chapter will make no attempt to explain how to write
 281      a join selectivity estimator function, but will just suggest that
 282      you use one of the standard estimators if one is applicable:
 283      <simplelist>
 284       <member><function>eqjoinsel</function> for <literal>=</literal></member>
 285       <member><function>neqjoinsel</function> for <literal>&lt;&gt;</literal></member>
 286       <member><function>scalarltjoinsel</function> for <literal>&lt;</literal></member>
 287       <member><function>scalarlejoinsel</function> for <literal>&lt;=</literal></member>
 288       <member><function>scalargtjoinsel</function> for <literal>&gt;</literal></member>
 289       <member><function>scalargejoinsel</function> for <literal>&gt;=</literal></member>
 290       <member><function>matchingjoinsel</function> for generic matching operators</member>
 291       <member><function>areajoinsel</function> for 2D area-based comparisons</member>
 292       <member><function>positionjoinsel</function> for 2D position-based comparisons</member>
 293       <member><function>contjoinsel</function> for 2D containment-based comparisons</member>
 294      </simplelist>
 295     </para>
 296    </sect2>
 297
 298    <sect2 id="xoper-hashes">
 299     <title><literal>HASHES</literal></title>
 300
 301     <para>
 302      The <literal>HASHES</literal> clause, if present, tells the system that
 303      it is permissible to use the hash join method for a join based on this
 304      operator.  <literal>HASHES</literal> only makes sense for a binary operator that
 305      returns <literal>boolean</literal>, and in practice the operator must represent
 306      equality for some data type or pair of data types.
 307     </para>
 308
 309     <para>
 310      The assumption underlying hash join is that the join operator can
 311      only return true for pairs of left and right values that hash to the
 312      same hash code.  If two values get put in different hash buckets, the
 313      join will never compare them at all, implicitly assuming that the
 314      result of the join operator must be false.  So it never makes sense
 315      to specify <literal>HASHES</literal> for operators that do not represent
 316      some form of equality.  In most cases it is only practical to support
 317      hashing for operators that take the same data type on both sides.
 318      However, sometimes it is possible to design compatible hash functions
 319      for two or more data types; that is, functions that will generate the
 320      same hash codes for <quote>equal</quote> values, even though the values
 321      have different representations.  For example, it's fairly simple
 322      to arrange this property when hashing integers of different widths.
 323     </para>
 324
 325     <para>
 326      To be marked <literal>HASHES</literal>, the join operator must appear
 327      in a hash index operator family.  This is not enforced when you create
 328      the operator, since of course the referencing operator family couldn't
 329      exist yet.  But attempts to use the operator in hash joins will fail
 330      at run time if no such operator family exists.  The system needs the
 331      operator family to find the data-type-specific hash function(s) for the
 332      operator's input data type(s).  Of course, you must also create suitable
 333      hash functions before you can create the operator family.
 334     </para>
 335
 336     <para>
 337      Care should be exercised when preparing a hash function, because there
 338      are machine-dependent ways in which it might fail to do the right thing.
 339      For example, if your data type is a structure in which there might be
 340      uninteresting pad bits, you cannot simply pass the whole structure to
 341      <function>hash_any</function>.  (Unless you write your other operators and
 342      functions to ensure that the unused bits are always zero, which is the
 343      recommended strategy.)
 344      Another example is that on machines that meet the <acronym>IEEE</acronym>
 345      floating-point standard, negative zero and positive zero are different
 346      values (different bit patterns) but they are defined to compare equal.
 347      If a float value might contain negative zero then extra steps are needed
 348      to ensure it generates the same hash value as positive zero.
 349     </para>
 350
 351     <para>
 352      A hash-joinable operator must have a commutator (itself if the two
 353      operand data types are the same, or a related equality operator
 354      if they are different) that appears in the same operator family.
 355      If this is not the case, planner errors might occur when the operator
 356      is used.  Also, it is a good idea (but not strictly required) for
 357      a hash operator family that supports multiple data types to provide
 358      equality operators for every combination of the data types; this
 359      allows better optimization.
 360     </para>
 361
 362     <note>
 363     <para>
 364      The function underlying a hash-joinable operator must be marked
 365      immutable or stable.  If it is volatile, the system will never
 366      attempt to use the operator for a hash join.
 367     </para>
 368     </note>
 369
 370     <note>
 371     <para>
 372      If a hash-joinable operator has an underlying function that is marked
 373      strict, the
 374      function must also be complete: that is, it should return true or
 375      false, never null, for any two nonnull inputs.  If this rule is
 376      not followed, hash-optimization of <literal>IN</literal> operations might
 377      generate wrong results.  (Specifically, <literal>IN</literal> might return
 378      false where the correct answer according to the standard would be null;
 379      or it might yield an error complaining that it wasn't prepared for a
 380      null result.)
 381     </para>
 382     </note>
 383
 384    </sect2>
 385
 386    <sect2 id="xoper-merges">
 387     <title><literal>MERGES</literal></title>
 388
 389     <para>
 390      The <literal>MERGES</literal> clause, if present, tells the system that
 391      it is permissible to use the merge-join method for a join based on this
 392      operator.  <literal>MERGES</literal> only makes sense for a binary operator that
 393      returns <literal>boolean</literal>, and in practice the operator must represent
 394      equality for some data type or pair of data types.
 395     </para>
 396
 397     <para>
 398      Merge join is based on the idea of sorting the left- and right-hand tables
 399      into order and then scanning them in parallel.  So, both data types must
 400      be capable of being fully ordered, and the join operator must be one
 401      that can only succeed for pairs of values that fall at the
 402      <quote>same place</quote>
 403      in the sort order.  In practice this means that the join operator must
 404      behave like equality.  But it is possible to merge-join two
 405      distinct data types so long as they are logically compatible.  For
 406      example, the <type>smallint</type>-versus-<type>integer</type>
 407      equality operator is merge-joinable.
 408      We only need sorting operators that will bring both data types into a
 409      logically compatible sequence.
 410     </para>
 411
 412     <para>
 413      To be marked <literal>MERGES</literal>, the join operator must appear
 414      as an equality member of a <literal>btree</literal> index operator family.
 415      This is not enforced when you create
 416      the operator, since of course the referencing operator family couldn't
 417      exist yet.  But the operator will not actually be used for merge joins
 418      unless a matching operator family can be found.  The
 419      <literal>MERGES</literal> flag thus acts as a hint to the planner that
 420      it's worth looking for a matching operator family.
 421     </para>
 422
 423     <para>
 424      A merge-joinable operator must have a commutator (itself if the two
 425      operand data types are the same, or a related equality operator
 426      if they are different) that appears in the same operator family.
 427      If this is not the case, planner errors might occur when the operator
 428      is used.  Also, it is a good idea (but not strictly required) for
 429      a <literal>btree</literal> operator family that supports multiple data types to provide
 430      equality operators for every combination of the data types; this
 431      allows better optimization.
 432     </para>
 433
 434     <note>
 435     <para>
 436      The function underlying a merge-joinable operator must be marked
 437      immutable or stable.  If it is volatile, the system will never
 438      attempt to use the operator for a merge join.
 439     </para>
 440     </note>
 441    </sect2>
 442   </sect1>