doc/src/sgml/tablesample-method.sgml

   1 <!-- doc/src/sgml/tablesample-method.sgml -->
   2
   3 <chapter id="tablesample-method">
   4  <title>Writing a Table Sampling Method</title>
   5
   6  <indexterm zone="tablesample-method">
   7   <primary>table sampling method</primary>
   8  </indexterm>
   9
  10  <indexterm zone="tablesample-method">
  11   <primary><literal>TABLESAMPLE</literal> method</primary>
  12  </indexterm>
  13
  14  <para>
  15   <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal>
  16   clause supports custom table sampling methods, in addition to
  17   the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required
  18   by the SQL standard.  The sampling method determines which rows of the
  19   table will be selected when the <literal>TABLESAMPLE</literal> clause is used.
  20  </para>
  21
  22  <para>
  23   At the SQL level, a table sampling method is represented by a single SQL
  24   function, typically implemented in C, having the signature
  25 <programlisting>
  26 method_name(internal) RETURNS tsm_handler
  27 </programlisting>
  28   The name of the function is the same method name appearing in the
  29   <literal>TABLESAMPLE</literal> clause.  The <type>internal</type> argument is a dummy
  30   (always having value zero) that simply serves to prevent this function from
  31   being called directly from an SQL command.
  32   The result of the function must be a palloc'd struct of
  33   type <type>TsmRoutine</type>, which contains pointers to support functions for
  34   the sampling method.  These support functions are plain C functions and
  35   are not visible or callable at the SQL level.  The support functions are
  36   described in <xref linkend="tablesample-support-functions"/>.
  37  </para>
  38
  39  <para>
  40   In addition to function pointers, the <type>TsmRoutine</type> struct must
  41   provide these additional fields:
  42  </para>
  43
  44  <variablelist>
  45   <varlistentry>
  46    <term><literal>List *parameterTypes</literal></term>
  47    <listitem>
  48     <para>
  49      This is an OID list containing the data type OIDs of the parameter(s)
  50      that will be accepted by the <literal>TABLESAMPLE</literal> clause when this
  51      sampling method is used.  For example, for the built-in methods, this
  52      list contains a single item with value <literal>FLOAT4OID</literal>, which
  53      represents the sampling percentage.  Custom sampling methods can have
  54      more or different parameters.
  55     </para>
  56    </listitem>
  57   </varlistentry>
  58
  59   <varlistentry>
  60    <term><literal>bool repeatable_across_queries</literal></term>
  61    <listitem>
  62     <para>
  63      If <literal>true</literal>, the sampling method can deliver identical samples
  64      across successive queries, if the same parameters
  65      and <literal>REPEATABLE</literal> seed value are supplied each time and the
  66      table contents have not changed.  When this is <literal>false</literal>,
  67      the <literal>REPEATABLE</literal> clause is not accepted for use with the
  68      sampling method.
  69     </para>
  70    </listitem>
  71   </varlistentry>
  72
  73   <varlistentry>
  74    <term><literal>bool repeatable_across_scans</literal></term>
  75    <listitem>
  76     <para>
  77      If <literal>true</literal>, the sampling method can deliver identical samples
  78      across successive scans in the same query (assuming unchanging
  79      parameters, seed value, and snapshot).
  80      When this is <literal>false</literal>, the planner will not select plans that
  81      would require scanning the sampled table more than once, since that
  82      might result in inconsistent query output.
  83     </para>
  84    </listitem>
  85   </varlistentry>
  86  </variablelist>
  87
  88  <para>
  89   The <type>TsmRoutine</type> struct type is declared
  90   in <filename>src/include/access/tsmapi.h</filename>, which see for additional
  91   details.
  92  </para>
  93
  94  <para>
  95   The table sampling methods included in the standard distribution are good
  96   references when trying to write your own.  Look into
  97   the <filename>src/backend/access/tablesample</filename> subdirectory of the source
  98   tree for the built-in sampling methods, and into the <filename>contrib</filename>
  99   subdirectory for add-on methods.
 100  </para>
 101
 102  <sect1 id="tablesample-support-functions">
 103   <title>Sampling Method Support Functions</title>
 104
 105   <para>
 106    The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct
 107    containing pointers to the support functions described below.  Most of
 108    the functions are required, but some are optional, and those pointers can
 109    be NULL.
 110   </para>
 111
 112   <para>
 113 <programlisting>
 114 void
 115 SampleScanGetSampleSize (PlannerInfo *root,
 116                          RelOptInfo *baserel,
 117                          List *paramexprs,
 118                          BlockNumber *pages,
 119                          double *tuples);
 120 </programlisting>
 121
 122    This function is called during planning.  It must estimate the number of
 123    relation pages that will be read during a sample scan, and the number of
 124    tuples that will be selected by the scan.  (For example, these might be
 125    determined by estimating the sampling fraction, and then multiplying
 126    the <literal>baserel-&gt;pages</literal> and <literal>baserel-&gt;tuples</literal>
 127    numbers by that, being sure to round the results to integral values.)
 128    The <literal>paramexprs</literal> list holds the expression(s) that are
 129    parameters to the <literal>TABLESAMPLE</literal> clause.  It is recommended to
 130    use <function>estimate_expression_value()</function> to try to reduce these
 131    expressions to constants, if their values are needed for estimation
 132    purposes; but the function must provide size estimates even if they cannot
 133    be reduced, and it should not fail even if the values appear invalid
 134    (remember that they're only estimates of what the run-time values will be).
 135    The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs.
 136   </para>
 137
 138   <para>
 139 <programlisting>
 140 void
 141 InitSampleScan (SampleScanState *node,
 142                 int eflags);
 143 </programlisting>
 144
 145    Initialize for execution of a SampleScan plan node.
 146    This is called during executor startup.
 147    It should perform any initialization needed before processing can start.
 148    The <structname>SampleScanState</structname> node has already been created, but
 149    its <structfield>tsm_state</structfield> field is NULL.
 150    The <function>InitSampleScan</function> function can palloc whatever internal
 151    state data is needed by the sampling method, and store a pointer to
 152    it in <literal>node-&gt;tsm_state</literal>.
 153    Information about the table to scan is accessible through other fields
 154    of the <structname>SampleScanState</structname> node (but note that the
 155    <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
 156    up yet).
 157    <literal>eflags</literal> contains flag bits describing the executor's
 158    operating mode for this plan node.
 159   </para>
 160
 161   <para>
 162    When <literal>(eflags &amp; EXEC_FLAG_EXPLAIN_ONLY)</literal> is true,
 163    the scan will not actually be performed, so this function should only do
 164    the minimum required to make the node state valid for <command>EXPLAIN</command>
 165    and <function>EndSampleScan</function>.
 166   </para>
 167
 168   <para>
 169    This function can be omitted (set the pointer to NULL), in which case
 170    <function>BeginSampleScan</function> must perform all initialization needed
 171    by the sampling method.
 172   </para>
 173
 174   <para>
 175 <programlisting>
 176 void
 177 BeginSampleScan (SampleScanState *node,
 178                  Datum *params,
 179                  int nparams,
 180                  uint32 seed);
 181 </programlisting>
 182
 183    Begin execution of a sampling scan.
 184    This is called just before the first attempt to fetch a tuple, and
 185    may be called again if the scan needs to be restarted.
 186    Information about the table to scan is accessible through fields
 187    of the <structname>SampleScanState</structname> node (but note that the
 188    <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
 189    up yet).
 190    The <literal>params</literal> array, of length <literal>nparams</literal>, contains the
 191    values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause.
 192    These will have the number and types specified in the sampling
 193    method's <literal>parameterTypes</literal> list, and have been checked
 194    to not be null.
 195    <literal>seed</literal> contains a seed to use for any random numbers generated
 196    within the sampling method; it is either a hash derived from the
 197    <literal>REPEATABLE</literal> value if one was given, or the result
 198    of <literal>random()</literal> if not.
 199   </para>
 200
 201   <para>
 202    This function may adjust the fields <literal>node-&gt;use_bulkread</literal>
 203    and <literal>node-&gt;use_pagemode</literal>.
 204    If <literal>node-&gt;use_bulkread</literal> is <literal>true</literal>, which it is by
 205    default, the scan will use a buffer access strategy that encourages
 206    recycling buffers after use.  It might be reasonable to set this
 207    to <literal>false</literal> if the scan will visit only a small fraction of the
 208    table's pages.
 209    If <literal>node-&gt;use_pagemode</literal> is <literal>true</literal>, which it is by
 210    default, the scan will perform visibility checking in a single pass for
 211    all tuples on each visited page.  It might be reasonable to set this
 212    to <literal>false</literal> if the scan will select only a small fraction of the
 213    tuples on each visited page.  That will result in fewer tuple visibility
 214    checks being performed, though each one will be more expensive because it
 215    will require more locking.
 216   </para>
 217
 218   <para>
 219    If the sampling method is
 220    marked <literal>repeatable_across_scans</literal>, it must be able to
 221    select the same set of tuples during a rescan as it did originally, that is
 222    a fresh call of <function>BeginSampleScan</function> must lead to selecting the
 223    same tuples as before (if the <literal>TABLESAMPLE</literal> parameters
 224    and seed don't change).
 225   </para>
 226
 227   <para>
 228 <programlisting>
 229 BlockNumber
 230 NextSampleBlock (SampleScanState *node, BlockNumber nblocks);
 231 </programlisting>
 232
 233    Returns the block number of the next page to be scanned, or
 234    <literal>InvalidBlockNumber</literal> if no pages remain to be scanned.
 235   </para>
 236
 237   <para>
 238    This function can be omitted (set the pointer to NULL), in which case
 239    the core code will perform a sequential scan of the entire relation.
 240    Such a scan can use synchronized scanning, so that the sampling method
 241    cannot assume that the relation pages are visited in the same order on
 242    each scan.
 243   </para>
 244
 245   <para>
 246 <programlisting>
 247 OffsetNumber
 248 NextSampleTuple (SampleScanState *node,
 249                  BlockNumber blockno,
 250                  OffsetNumber maxoffset);
 251 </programlisting>
 252
 253    Returns the offset number of the next tuple to be sampled on the
 254    specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to
 255    be sampled.  <literal>maxoffset</literal> is the largest offset number in use
 256    on the page.
 257   </para>
 258
 259   <note>
 260    <para>
 261     <function>NextSampleTuple</function> is not explicitly told which of the offset
 262     numbers in the range <literal>1 .. maxoffset</literal> actually contain valid
 263     tuples.  This is not normally a problem since the core code ignores
 264     requests to sample missing or invisible tuples; that should not result in
 265     any bias in the sample.  However, if necessary, the function can use
 266     <literal>node-&gt;donetuples</literal> to examine how many of the tuples
 267     it returned were valid and visible.
 268    </para>
 269   </note>
 270
 271   <note>
 272    <para>
 273     <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume
 274     that <literal>blockno</literal> is the same page number returned by the most
 275     recent <function>NextSampleBlock</function> call.  It was returned by some
 276     previous <function>NextSampleBlock</function> call, but the core code is allowed
 277     to call <function>NextSampleBlock</function> in advance of actually scanning
 278     pages, so as to support prefetching.  It is OK to assume that once
 279     sampling of a given page begins, successive <function>NextSampleTuple</function>
 280     calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is
 281     returned.
 282    </para>
 283   </note>
 284
 285   <para>
 286 <programlisting>
 287 void
 288 EndSampleScan (SampleScanState *node);
 289 </programlisting>
 290
 291    End the scan and release resources.  It is normally not important
 292    to release palloc'd memory, but any externally-visible resources
 293    should be cleaned up.
 294    This function can be omitted (set the pointer to NULL) in the common
 295    case where no such resources exist.
 296   </para>
 297
 298  </sect1>
 299
 300 </chapter>