doc/src/sgml/xfunc.sgml

   1 <!-- doc/src/sgml/xfunc.sgml -->
   2
   3  <sect1 id="xfunc">
   4   <title>User-Defined Functions</title>
   5
   6   <indexterm zone="xfunc">
   7    <primary>function</primary>
   8    <secondary>user-defined</secondary>
   9   </indexterm>
  10
  11   <para>
  12    <productname>PostgreSQL</productname> provides four kinds of
  13    functions:
  14
  15    <itemizedlist>
  16     <listitem>
  17      <para>
  18       query language functions (functions written in
  19       <acronym>SQL</acronym>) (<xref linkend="xfunc-sql"/>)
  20      </para>
  21     </listitem>
  22     <listitem>
  23      <para>
  24       procedural language functions (functions written in, for
  25       example, <application>PL/pgSQL</application> or <application>PL/Tcl</application>)
  26       (<xref linkend="xfunc-pl"/>)
  27      </para>
  28     </listitem>
  29     <listitem>
  30      <para>
  31       internal functions (<xref linkend="xfunc-internal"/>)
  32      </para>
  33     </listitem>
  34     <listitem>
  35      <para>
  36       C-language functions (<xref linkend="xfunc-c"/>)
  37      </para>
  38     </listitem>
  39    </itemizedlist>
  40   </para>
  41
  42   <para>
  43    Every kind
  44    of  function  can take base types, composite types, or
  45    combinations of these as arguments (parameters). In addition,
  46    every kind of function can return a base type or
  47    a composite type.  Functions can also be defined to return
  48    sets of base or composite values.
  49   </para>
  50
  51   <para>
  52    Many kinds of functions can take or return certain pseudo-types
  53    (such as polymorphic types), but the available facilities vary.
  54    Consult the description of each kind of function for more details.
  55   </para>
  56
  57   <para>
  58    It's easiest to define <acronym>SQL</acronym>
  59    functions, so we'll start by discussing those.
  60    Most of the concepts presented for <acronym>SQL</acronym> functions
  61    will carry over to the other types of functions.
  62   </para>
  63
  64   <para>
  65    Throughout this chapter, it can be useful to look at the reference
  66    page of the <link linkend="sql-createfunction"><command>CREATE
  67    FUNCTION</command></link> command to
  68    understand the examples better.  Some examples from this chapter
  69    can be found in <filename>funcs.sql</filename> and
  70    <filename>funcs.c</filename> in the <filename>src/tutorial</filename>
  71    directory in the <productname>PostgreSQL</productname> source
  72    distribution.
  73   </para>
  74   </sect1>
  75
  76   <sect1 id="xproc">
  77    <title>User-Defined Procedures</title>
  78
  79   <indexterm zone="xproc">
  80    <primary>procedure</primary>
  81    <secondary>user-defined</secondary>
  82   </indexterm>
  83
  84    <para>
  85     A procedure is a database object similar to a function.
  86     The key differences are:
  87
  88     <itemizedlist>
  89      <listitem>
  90       <para>
  91        Procedures are defined with
  92        the <link linkend="sql-createprocedure"><command>CREATE
  93        PROCEDURE</command></link> command, not <command>CREATE
  94        FUNCTION</command>.
  95       </para>
  96      </listitem>
  97      <listitem>
  98       <para>
  99        Procedures do not return a function value; hence <command>CREATE
 100        PROCEDURE</command> lacks a <literal>RETURNS</literal> clause.
 101        However, procedures can instead return data to their callers via
 102        output parameters.
 103       </para>
 104      </listitem>
 105      <listitem>
 106       <para>
 107        While a function is called as part of a query or DML command, a
 108        procedure is called in isolation using
 109        the <link linkend="sql-call"><command>CALL</command></link> command.
 110       </para>
 111      </listitem>
 112      <listitem>
 113       <para>
 114        A procedure can commit or roll back transactions during its
 115        execution (then automatically beginning a new transaction), so long
 116        as the invoking <command>CALL</command> command is not part of an
 117        explicit transaction block.  A function cannot do that.
 118       </para>
 119      </listitem>
 120      <listitem>
 121       <para>
 122        Certain function attributes, such as strictness, don't apply to
 123        procedures.  Those attributes control how the function is
 124        used in a query, which isn't relevant to procedures.
 125       </para>
 126      </listitem>
 127     </itemizedlist>
 128    </para>
 129
 130    <para>
 131     The explanations in the following sections about how to define
 132     user-defined functions apply to procedures as well, except for the
 133     points made above.
 134    </para>
 135
 136    <para>
 137     Collectively, functions and procedures are also known
 138     as <firstterm>routines</firstterm><indexterm><primary>routine</primary></indexterm>.
 139     There are commands such as <link linkend="sql-alterroutine"><command>ALTER ROUTINE</command></link>
 140     and <link linkend="sql-droproutine"><command>DROP ROUTINE</command></link> that can operate on functions and
 141     procedures without having to know which kind it is.  Note, however, that
 142     there is no <literal>CREATE ROUTINE</literal> command.
 143    </para>
 144   </sect1>
 145
 146   <sect1 id="xfunc-sql">
 147    <title>Query Language (<acronym>SQL</acronym>) Functions</title>
 148
 149    <indexterm zone="xfunc-sql">
 150     <primary>function</primary>
 151     <secondary>user-defined</secondary>
 152     <tertiary>in SQL</tertiary>
 153    </indexterm>
 154
 155    <para>
 156     SQL functions execute an arbitrary list of SQL statements, returning
 157     the result of the last query in the list.
 158     In the simple (non-set)
 159     case, the first row of the last query's result will be returned.
 160     (Bear in mind that <quote>the first row</quote> of a multirow
 161     result is not well-defined unless you use <literal>ORDER BY</literal>.)
 162     If the last query happens
 163     to return no rows at all, the null value will be returned.
 164    </para>
 165
 166    <para>
 167     Alternatively, an SQL function can be declared to return a set (that is,
 168     multiple rows) by specifying the function's return type as <literal>SETOF
 169     <replaceable>sometype</replaceable></literal>, or equivalently by declaring it as
 170     <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>.  In this case
 171     all rows of the last query's result are returned.  Further details appear
 172     below.
 173    </para>
 174
 175    <para>
 176     The body of an SQL function must be a list of SQL
 177     statements separated by semicolons.  A semicolon after the last
 178     statement is optional.  Unless the function is declared to return
 179     <type>void</type>, the last statement must be a <command>SELECT</command>,
 180     or an <command>INSERT</command>, <command>UPDATE</command>,
 181     <command>DELETE</command>, or <command>MERGE</command>
 182     that has a <literal>RETURNING</literal> clause.
 183    </para>
 184
 185     <para>
 186      Any collection of commands in the  <acronym>SQL</acronym>
 187      language can be packaged together and defined as a function.
 188      Besides <command>SELECT</command> queries, the commands can include data
 189      modification queries (<command>INSERT</command>,
 190      <command>UPDATE</command>, <command>DELETE</command>, and
 191      <command>MERGE</command>), as well as
 192      other SQL commands. (You cannot use transaction control commands, e.g.,
 193      <command>COMMIT</command>, <command>SAVEPOINT</command>, and some utility
 194      commands, e.g.,  <literal>VACUUM</literal>, in <acronym>SQL</acronym> functions.)
 195      However, the final command
 196      must be a <command>SELECT</command> or have a <literal>RETURNING</literal>
 197      clause that returns whatever is
 198      specified as the function's return type.  Alternatively, if you
 199      want to define an SQL function that performs actions but has no
 200      useful value to return, you can define it as returning <type>void</type>.
 201      For example, this function removes rows with negative salaries from
 202      the <literal>emp</literal> table:
 203
 204 <screen>
 205 CREATE FUNCTION clean_emp() RETURNS void AS '
 206     DELETE FROM emp
 207         WHERE salary &lt; 0;
 208 ' LANGUAGE SQL;
 209
 210 SELECT clean_emp();
 211
 212  clean_emp
 213 -----------
 214
 215 (1 row)
 216 </screen>
 217     </para>
 218
 219     <para>
 220      You can also write this as a procedure, thus avoiding the issue of the
 221      return type.  For example:
 222 <screen>
 223 CREATE PROCEDURE clean_emp() AS '
 224     DELETE FROM emp
 225         WHERE salary &lt; 0;
 226 ' LANGUAGE SQL;
 227
 228 CALL clean_emp();
 229 </screen>
 230      In simple cases like this, the difference between a function returning
 231      <type>void</type> and a procedure is mostly stylistic.  However,
 232      procedures offer additional functionality such as transaction control
 233      that is not available in functions.  Also, procedures are SQL standard
 234      whereas returning <type>void</type> is a PostgreSQL extension.
 235     </para>
 236
 237     <note>
 238      <para>
 239       The entire body of an SQL function is parsed before any of it is
 240       executed.  While an SQL function can contain commands that alter
 241       the system catalogs (e.g., <command>CREATE TABLE</command>), the effects
 242       of such commands will not be visible during parse analysis of
 243       later commands in the function.  Thus, for example,
 244       <literal>CREATE TABLE foo (...); INSERT INTO foo VALUES(...);</literal>
 245       will not work as desired if packaged up into a single SQL function,
 246       since <structname>foo</structname> won't exist yet when the <command>INSERT</command>
 247       command is parsed.  It's recommended to use <application>PL/pgSQL</application>
 248       instead of an SQL function in this type of situation.
 249      </para>
 250    </note>
 251
 252    <para>
 253     The syntax of the <command>CREATE FUNCTION</command> command requires
 254     the function body to be written as a string constant.  It is usually
 255     most convenient to use dollar quoting (see <xref
 256     linkend="sql-syntax-dollar-quoting"/>) for the string constant.
 257     If you choose to use regular single-quoted string constant syntax,
 258     you must double single quote marks (<literal>'</literal>) and backslashes
 259     (<literal>\</literal>) (assuming escape string syntax) in the body of
 260     the function (see <xref linkend="sql-syntax-strings"/>).
 261    </para>
 262
 263    <sect2 id="xfunc-sql-function-arguments">
 264     <title>Arguments for <acronym>SQL</acronym> Functions</title>
 265
 266    <indexterm>
 267     <primary>function</primary>
 268     <secondary>named argument</secondary>
 269    </indexterm>
 270
 271     <para>
 272      Arguments of an SQL function can be referenced in the function
 273      body using either names or numbers.  Examples of both methods appear
 274      below.
 275     </para>
 276
 277     <para>
 278      To use a name, declare the function argument as having a name, and
 279      then just write that name in the function body.  If the argument name
 280      is the same as any column name in the current SQL command within the
 281      function, the column name will take precedence.  To override this,
 282      qualify the argument name with the name of the function itself, that is
 283      <literal><replaceable>function_name</replaceable>.<replaceable>argument_name</replaceable></literal>.
 284      (If this would conflict with a qualified column name, again the column
 285      name wins.  You can avoid the ambiguity by choosing a different alias for
 286      the table within the SQL command.)
 287     </para>
 288
 289     <para>
 290      In the older numeric approach, arguments are referenced using the syntax
 291      <literal>$<replaceable>n</replaceable></literal>: <literal>$1</literal> refers to the first input
 292      argument, <literal>$2</literal> to the second, and so on.  This will work
 293      whether or not the particular argument was declared with a name.
 294     </para>
 295
 296     <para>
 297      If an argument is of a composite type, then the dot notation,
 298      e.g., <literal><replaceable>argname</replaceable>.<replaceable>fieldname</replaceable></literal> or
 299      <literal>$1.<replaceable>fieldname</replaceable></literal>, can be used to access attributes of the
 300      argument.  Again, you might need to qualify the argument's name with the
 301      function name to make the form with an argument name unambiguous.
 302     </para>
 303
 304     <para>
 305      SQL function arguments can only be used as data values,
 306      not as identifiers.  Thus for example this is reasonable:
 307 <programlisting>
 308 INSERT INTO mytable VALUES ($1);
 309 </programlisting>
 310 but this will not work:
 311 <programlisting>
 312 INSERT INTO $1 VALUES (42);
 313 </programlisting>
 314     </para>
 315
 316     <note>
 317      <para>
 318       The ability to use names to reference SQL function arguments was added
 319       in <productname>PostgreSQL</productname> 9.2.  Functions to be used in
 320       older servers must use the <literal>$<replaceable>n</replaceable></literal> notation.
 321      </para>
 322     </note>
 323    </sect2>
 324
 325    <sect2 id="xfunc-sql-base-functions">
 326     <title><acronym>SQL</acronym> Functions on Base Types</title>
 327
 328     <para>
 329      The simplest possible <acronym>SQL</acronym> function has no arguments and
 330      simply returns a base type, such as <type>integer</type>:
 331
 332 <screen>
 333 CREATE FUNCTION one() RETURNS integer AS $$
 334     SELECT 1 AS result;
 335 $$ LANGUAGE SQL;
 336
 337 -- Alternative syntax for string literal:
 338 CREATE FUNCTION one() RETURNS integer AS '
 339     SELECT 1 AS result;
 340 ' LANGUAGE SQL;
 341
 342 SELECT one();
 343
 344  one
 345 -----
 346    1
 347 </screen>
 348     </para>
 349
 350     <para>
 351      Notice that we defined a column alias within the function body for the result of the function
 352      (with  the  name <literal>result</literal>),  but this column alias is not visible
 353      outside the function.  Hence,  the  result  is labeled <literal>one</literal>
 354      instead of <literal>result</literal>.
 355     </para>
 356
 357     <para>
 358      It is almost as easy to define <acronym>SQL</acronym> functions
 359      that take base types as arguments:
 360
 361 <screen>
 362 CREATE FUNCTION add_em(x integer, y integer) RETURNS integer AS $$
 363     SELECT x + y;
 364 $$ LANGUAGE SQL;
 365
 366 SELECT add_em(1, 2) AS answer;
 367
 368  answer
 369 --------
 370       3
 371 </screen>
 372     </para>
 373
 374     <para>
 375      Alternatively, we could dispense with names for the arguments and
 376      use numbers:
 377
 378 <screen>
 379 CREATE FUNCTION add_em(integer, integer) RETURNS integer AS $$
 380     SELECT $1 + $2;
 381 $$ LANGUAGE SQL;
 382
 383 SELECT add_em(1, 2) AS answer;
 384
 385  answer
 386 --------
 387       3
 388 </screen>
 389     </para>
 390
 391     <para>
 392      Here is a more useful function, which might be used to debit a
 393      bank account:
 394
 395 <programlisting>
 396 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
 397     UPDATE bank
 398         SET balance = balance - debit
 399         WHERE accountno = tf1.accountno;
 400     SELECT 1;
 401 $$ LANGUAGE SQL;
 402 </programlisting>
 403
 404      A user could execute this function to debit account 17 by $100.00 as
 405      follows:
 406
 407 <programlisting>
 408 SELECT tf1(17, 100.0);
 409 </programlisting>
 410     </para>
 411
 412     <para>
 413      In this example, we chose the name <literal>accountno</literal> for the first
 414      argument, but this is the same as the name of a column in the
 415      <literal>bank</literal> table.  Within the <command>UPDATE</command> command,
 416      <literal>accountno</literal> refers to the column <literal>bank.accountno</literal>,
 417      so <literal>tf1.accountno</literal> must be used to refer to the argument.
 418      We could of course avoid this by using a different name for the argument.
 419     </para>
 420
 421     <para>
 422      In practice one would probably like a more useful result from the
 423      function than a constant 1, so a more likely definition
 424      is:
 425
 426 <programlisting>
 427 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
 428     UPDATE bank
 429         SET balance = balance - debit
 430         WHERE accountno = tf1.accountno;
 431     SELECT balance FROM bank WHERE accountno = tf1.accountno;
 432 $$ LANGUAGE SQL;
 433 </programlisting>
 434
 435      which adjusts the balance and returns the new balance.
 436      The same thing could be done in one command using <literal>RETURNING</literal>:
 437
 438 <programlisting>
 439 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
 440     UPDATE bank
 441         SET balance = balance - debit
 442         WHERE accountno = tf1.accountno
 443     RETURNING balance;
 444 $$ LANGUAGE SQL;
 445 </programlisting>
 446     </para>
 447
 448     <para>
 449      If the final <literal>SELECT</literal> or <literal>RETURNING</literal>
 450      clause in an <acronym>SQL</acronym> function does not return exactly
 451      the function's declared result
 452      type, <productname>PostgreSQL</productname> will automatically cast
 453      the value to the required type, if that is possible with an implicit
 454      or assignment cast.  Otherwise, you must write an explicit cast.
 455      For example, suppose we wanted the
 456      previous <function>add_em</function> function to return
 457      type <type>float8</type> instead.  It's sufficient to write
 458
 459 <programlisting>
 460 CREATE FUNCTION add_em(integer, integer) RETURNS float8 AS $$
 461     SELECT $1 + $2;
 462 $$ LANGUAGE SQL;
 463 </programlisting>
 464
 465      since the <type>integer</type> sum can be implicitly cast
 466      to <type>float8</type>.
 467      (See <xref linkend="typeconv"/> or <xref linkend="sql-createcast"/>
 468      for more about casts.)
 469     </para>
 470    </sect2>
 471
 472    <sect2 id="xfunc-sql-composite-functions">
 473     <title><acronym>SQL</acronym> Functions on Composite Types</title>
 474
 475     <para>
 476      When writing functions with arguments of composite types, we must not
 477      only specify which argument we want but also the desired attribute
 478      (field) of that argument.  For example, suppose that
 479      <type>emp</type> is a table containing employee data, and therefore
 480      also the name of the composite type of each row of the table.  Here
 481      is a function <function>double_salary</function> that computes what someone's
 482      salary would be if it were doubled:
 483
 484 <screen>
 485 CREATE TABLE emp (
 486     name        text,
 487     salary      numeric,
 488     age         integer,
 489     cubicle     point
 490 );
 491
 492 INSERT INTO emp VALUES ('Bill', 4200, 45, '(2,1)');
 493
 494 CREATE FUNCTION double_salary(emp) RETURNS numeric AS $$
 495     SELECT $1.salary * 2 AS salary;
 496 $$ LANGUAGE SQL;
 497
 498 SELECT name, double_salary(emp.*) AS dream
 499     FROM emp
 500     WHERE emp.cubicle ~= point '(2,1)';
 501
 502  name | dream
 503 ------+-------
 504  Bill |  8400
 505 </screen>
 506     </para>
 507
 508     <para>
 509      Notice the use of the syntax <literal>$1.salary</literal>
 510      to select one field of the argument row value.  Also notice
 511      how the calling <command>SELECT</command> command
 512      uses <replaceable>table_name</replaceable><literal>.*</literal> to select
 513      the entire current row of a table as a composite value.  The table
 514      row can alternatively be referenced using just the table name,
 515      like this:
 516 <screen>
 517 SELECT name, double_salary(emp) AS dream
 518     FROM emp
 519     WHERE emp.cubicle ~= point '(2,1)';
 520 </screen>
 521      but this usage is deprecated since it's easy to get confused.
 522      (See <xref linkend="rowtypes-usage"/> for details about these
 523      two notations for the composite value of a table row.)
 524     </para>
 525
 526     <para>
 527      Sometimes it is handy to construct a composite argument value
 528      on-the-fly.  This can be done with the <literal>ROW</literal> construct.
 529      For example, we could adjust the data being passed to the function:
 530 <screen>
 531 SELECT name, double_salary(ROW(name, salary*1.1, age, cubicle)) AS dream
 532     FROM emp;
 533 </screen>
 534     </para>
 535
 536     <para>
 537      It is also possible to build a function that returns a composite type.
 538      This is an example of a function
 539      that returns a single <type>emp</type> row:
 540
 541 <programlisting>
 542 CREATE FUNCTION new_emp() RETURNS emp AS $$
 543     SELECT text 'None' AS name,
 544         1000.0 AS salary,
 545         25 AS age,
 546         point '(2,2)' AS cubicle;
 547 $$ LANGUAGE SQL;
 548 </programlisting>
 549
 550      In this example we have specified each of  the  attributes
 551      with  a  constant value, but any computation
 552      could have been substituted for these constants.
 553     </para>
 554
 555     <para>
 556      Note two important things about defining the function:
 557
 558      <itemizedlist>
 559       <listitem>
 560        <para>
 561         The select list order in the query must be exactly the same as
 562         that in which the columns appear in the composite type.
 563         (Naming the columns, as we did above,
 564         is irrelevant to the system.)
 565        </para>
 566       </listitem>
 567       <listitem>
 568        <para>
 569         We must ensure each expression's type can be cast to that of
 570         the corresponding column of the composite type.
 571         Otherwise we'll get errors like this:
 572 <screen>
 573 <computeroutput>
 574 ERROR:  return type mismatch in function declared to return emp
 575 DETAIL:  Final statement returns text instead of point at column 4.
 576 </computeroutput>
 577 </screen>
 578         As with the base-type case, the system will not insert explicit
 579         casts automatically, only implicit or assignment casts.
 580        </para>
 581       </listitem>
 582      </itemizedlist>
 583     </para>
 584
 585     <para>
 586      A different way to define the same function is:
 587
 588 <programlisting>
 589 CREATE FUNCTION new_emp() RETURNS emp AS $$
 590     SELECT ROW('None', 1000.0, 25, '(2,2)')::emp;
 591 $$ LANGUAGE SQL;
 592 </programlisting>
 593
 594      Here we wrote a <command>SELECT</command> that returns just a single
 595      column of the correct composite type.  This isn't really better
 596      in this situation, but it is a handy alternative in some cases
 597      &mdash; for example, if we need to compute the result by calling
 598      another function that returns the desired composite value.
 599      Another example is that if we are trying to write a function that
 600      returns a domain over composite, rather than a plain composite type,
 601      it is always necessary to write it as returning a single column,
 602      since there is no way to cause a coercion of the whole row result.
 603     </para>
 604
 605     <para>
 606      We could call this function directly either by using it in
 607      a value expression:
 608
 609 <screen>
 610 SELECT new_emp();
 611
 612          new_emp
 613 --------------------------
 614  (None,1000.0,25,"(2,2)")
 615 </screen>
 616
 617      or by calling it as a table function:
 618
 619 <screen>
 620 SELECT * FROM new_emp();
 621
 622  name | salary | age | cubicle
 623 ------+--------+-----+---------
 624  None | 1000.0 |  25 | (2,2)
 625 </screen>
 626
 627      The second way is described more fully in <xref
 628      linkend="xfunc-sql-table-functions"/>.
 629     </para>
 630
 631     <para>
 632      When you use a function that returns a composite type,
 633      you might want only one field (attribute) from its result.
 634      You can do that with syntax like this:
 635
 636 <screen>
 637 SELECT (new_emp()).name;
 638
 639  name
 640 ------
 641  None
 642 </screen>
 643
 644      The extra parentheses are needed to keep the parser from getting
 645      confused.  If you try to do it without them, you get something like this:
 646
 647 <screen>
 648 SELECT new_emp().name;
 649 ERROR:  syntax error at or near "."
 650 LINE 1: SELECT new_emp().name;
 651                         ^
 652 </screen>
 653     </para>
 654
 655     <para>
 656      Another option is to use functional notation for extracting an attribute:
 657
 658 <screen>
 659 SELECT name(new_emp());
 660
 661  name
 662 ------
 663  None
 664 </screen>
 665
 666      As explained in <xref linkend="rowtypes-usage"/>, the field notation and
 667      functional notation are equivalent.
 668     </para>
 669
 670     <para>
 671      Another way to use a function returning a composite type is to pass the
 672      result to another function that accepts the correct row type as input:
 673
 674 <screen>
 675 CREATE FUNCTION getname(emp) RETURNS text AS $$
 676     SELECT $1.name;
 677 $$ LANGUAGE SQL;
 678
 679 SELECT getname(new_emp());
 680  getname
 681 ---------
 682  None
 683 (1 row)
 684 </screen>
 685     </para>
 686    </sect2>
 687
 688    <sect2 id="xfunc-output-parameters">
 689     <title><acronym>SQL</acronym> Functions with Output Parameters</title>
 690
 691    <indexterm>
 692     <primary>function</primary>
 693     <secondary>output parameter</secondary>
 694    </indexterm>
 695
 696     <para>
 697      An alternative way of describing a function's results is to define it
 698      with <firstterm>output parameters</firstterm>, as in this example:
 699
 700 <screen>
 701 CREATE FUNCTION add_em (IN x int, IN y int, OUT sum int)
 702 AS 'SELECT x + y'
 703 LANGUAGE SQL;
 704
 705 SELECT add_em(3,7);
 706  add_em
 707 --------
 708      10
 709 (1 row)
 710 </screen>
 711
 712      This is not essentially different from the version of <literal>add_em</literal>
 713      shown in <xref linkend="xfunc-sql-base-functions"/>.  The real value of
 714      output parameters is that they provide a convenient way of defining
 715      functions that return several columns.  For example,
 716
 717 <screen>
 718 CREATE FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int)
 719 AS 'SELECT x + y, x * y'
 720 LANGUAGE SQL;
 721
 722  SELECT * FROM sum_n_product(11,42);
 723  sum | product
 724 -----+---------
 725   53 |     462
 726 (1 row)
 727 </screen>
 728
 729      What has essentially happened here is that we have created an anonymous
 730      composite type for the result of the function.  The above example has
 731      the same end result as
 732
 733 <screen>
 734 CREATE TYPE sum_prod AS (sum int, product int);
 735
 736 CREATE FUNCTION sum_n_product (int, int) RETURNS sum_prod
 737 AS 'SELECT $1 + $2, $1 * $2'
 738 LANGUAGE SQL;
 739 </screen>
 740
 741      but not having to bother with the separate composite type definition
 742      is often handy.  Notice that the names attached to the output parameters
 743      are not just decoration, but determine the column names of the anonymous
 744      composite type.  (If you omit a name for an output parameter, the
 745      system will choose a name on its own.)
 746     </para>
 747
 748     <para>
 749      Notice that output parameters are not included in the calling argument
 750      list when invoking such a function from SQL.  This is because
 751      <productname>PostgreSQL</productname> considers only the input
 752      parameters to define the function's calling signature.  That means
 753      also that only the input parameters matter when referencing the function
 754      for purposes such as dropping it.  We could drop the above function
 755      with either of
 756
 757 <screen>
 758 DROP FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int);
 759 DROP FUNCTION sum_n_product (int, int);
 760 </screen>
 761     </para>
 762
 763     <para>
 764      Parameters can be marked as <literal>IN</literal> (the default),
 765      <literal>OUT</literal>, <literal>INOUT</literal>, or <literal>VARIADIC</literal>.
 766      An <literal>INOUT</literal>
 767      parameter serves as both an input parameter (part of the calling
 768      argument list) and an output parameter (part of the result record type).
 769      <literal>VARIADIC</literal> parameters are input parameters, but are treated
 770      specially as described below.
 771     </para>
 772    </sect2>
 773
 774    <sect2 id="xfunc-output-parameters-proc">
 775     <title><acronym>SQL</acronym> Procedures with Output Parameters</title>
 776
 777     <indexterm>
 778      <primary>procedures</primary>
 779      <secondary>output parameter</secondary>
 780     </indexterm>
 781
 782     <para>
 783      Output parameters are also supported in procedures, but they work a bit
 784      differently from functions.  In <command>CALL</command> commands,
 785      output parameters must be included in the argument list.
 786      For example, the bank account debiting routine from earlier could be
 787      written like this:
 788 <programlisting>
 789 CREATE PROCEDURE tp1 (accountno integer, debit numeric, OUT new_balance numeric) AS $$
 790     UPDATE bank
 791         SET balance = balance - debit
 792         WHERE accountno = tp1.accountno
 793     RETURNING balance;
 794 $$ LANGUAGE SQL;
 795 </programlisting>
 796      To call this procedure, an argument matching the <literal>OUT</literal>
 797      parameter must be included.  It's customary to write
 798      <literal>NULL</literal>:
 799 <programlisting>
 800 CALL tp1(17, 100.0, NULL);
 801 </programlisting>
 802      If you write something else, it must be an expression that is implicitly
 803      coercible to the declared type of the parameter, just as for input
 804      parameters.  Note however that such an expression will not be evaluated.
 805     </para>
 806
 807     <para>
 808      When calling a procedure from <application>PL/pgSQL</application>,
 809      instead of writing <literal>NULL</literal> you must write a variable
 810      that will receive the procedure's output.  See <xref
 811      linkend="plpgsql-statements-calling-procedure"/> for details.
 812     </para>
 813    </sect2>
 814
 815    <sect2 id="xfunc-sql-variadic-functions">
 816     <title><acronym>SQL</acronym> Functions with Variable Numbers of Arguments</title>
 817
 818     <indexterm>
 819      <primary>function</primary>
 820      <secondary>variadic</secondary>
 821     </indexterm>
 822
 823     <indexterm>
 824      <primary>variadic function</primary>
 825     </indexterm>
 826
 827     <para>
 828      <acronym>SQL</acronym> functions can be declared to accept
 829      variable numbers of arguments, so long as all the <quote>optional</quote>
 830      arguments are of the same data type.  The optional arguments will be
 831      passed to the function as an array.  The function is declared by
 832      marking the last parameter as <literal>VARIADIC</literal>; this parameter
 833      must be declared as being of an array type.  For example:
 834
 835 <screen>
 836 CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$
 837     SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
 838 $$ LANGUAGE SQL;
 839
 840 SELECT mleast(10, -1, 5, 4.4);
 841  mleast
 842 --------
 843      -1
 844 (1 row)
 845 </screen>
 846
 847      Effectively, all the actual arguments at or beyond the
 848      <literal>VARIADIC</literal> position are gathered up into a one-dimensional
 849      array, as if you had written
 850
 851 <screen>
 852 SELECT mleast(ARRAY[10, -1, 5, 4.4]);    -- doesn't work
 853 </screen>
 854
 855      You can't actually write that, though &mdash; or at least, it will
 856      not match this function definition.  A parameter marked
 857      <literal>VARIADIC</literal> matches one or more occurrences of its element
 858      type, not of its own type.
 859     </para>
 860
 861     <para>
 862      Sometimes it is useful to be able to pass an already-constructed array
 863      to a variadic function; this is particularly handy when one variadic
 864      function wants to pass on its array parameter to another one.  Also,
 865      this is the only secure way to call a variadic function found in a schema
 866      that permits untrusted users to create objects; see
 867      <xref linkend="typeconv-func"/>.  You can do this by
 868      specifying <literal>VARIADIC</literal> in the call:
 869
 870 <screen>
 871 SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
 872 </screen>
 873
 874      This prevents expansion of the function's variadic parameter into its
 875      element type, thereby allowing the array argument value to match
 876      normally.  <literal>VARIADIC</literal> can only be attached to the last
 877      actual argument of a function call.
 878     </para>
 879
 880     <para>
 881      Specifying <literal>VARIADIC</literal> in the call is also the only way to
 882      pass an empty array to a variadic function, for example:
 883
 884 <screen>
 885 SELECT mleast(VARIADIC ARRAY[]::numeric[]);
 886 </screen>
 887
 888      Simply writing <literal>SELECT mleast()</literal> does not work because a
 889      variadic parameter must match at least one actual argument.
 890      (You could define a second function also named <literal>mleast</literal>,
 891      with no parameters, if you wanted to allow such calls.)
 892     </para>
 893
 894     <para>
 895      The array element parameters generated from a variadic parameter are
 896      treated as not having any names of their own.  This means it is not
 897      possible to call a variadic function using named arguments (<xref
 898      linkend="sql-syntax-calling-funcs"/>), except when you specify
 899      <literal>VARIADIC</literal>.  For example, this will work:
 900
 901 <screen>
 902 SELECT mleast(VARIADIC arr =&gt; ARRAY[10, -1, 5, 4.4]);
 903 </screen>
 904
 905      but not these:
 906
 907 <screen>
 908 SELECT mleast(arr =&gt; 10);
 909 SELECT mleast(arr =&gt; ARRAY[10, -1, 5, 4.4]);
 910 </screen>
 911     </para>
 912    </sect2>
 913
 914    <sect2 id="xfunc-sql-parameter-defaults">
 915     <title><acronym>SQL</acronym> Functions with Default Values for Arguments</title>
 916
 917     <indexterm>
 918      <primary>function</primary>
 919      <secondary>default values for arguments</secondary>
 920     </indexterm>
 921
 922     <para>
 923      Functions can be declared with default values for some or all input
 924      arguments.  The default values are inserted whenever the function is
 925      called with insufficiently many actual arguments.  Since arguments
 926      can only be omitted from the end of the actual argument list, all
 927      parameters after a parameter with a default value have to have
 928      default values as well.  (Although the use of named argument notation
 929      could allow this restriction to be relaxed, it's still enforced so that
 930      positional argument notation works sensibly.)  Whether or not you use it,
 931      this capability creates a need for precautions when calling functions in
 932      databases where some users mistrust other users; see
 933      <xref linkend="typeconv-func"/>.
 934     </para>
 935
 936     <para>
 937      For example:
 938 <screen>
 939 CREATE FUNCTION foo(a int, b int DEFAULT 2, c int DEFAULT 3)
 940 RETURNS int
 941 LANGUAGE SQL
 942 AS $$
 943     SELECT $1 + $2 + $3;
 944 $$;
 945
 946 SELECT foo(10, 20, 30);
 947  foo
 948 -----
 949   60
 950 (1 row)
 951
 952 SELECT foo(10, 20);
 953  foo
 954 -----
 955   33
 956 (1 row)
 957
 958 SELECT foo(10);
 959  foo
 960 -----
 961   15
 962 (1 row)
 963
 964 SELECT foo();  -- fails since there is no default for the first argument
 965 ERROR:  function foo() does not exist
 966 </screen>
 967      The <literal>=</literal> sign can also be used in place of the
 968      key word <literal>DEFAULT</literal>.
 969     </para>
 970    </sect2>
 971
 972    <sect2 id="xfunc-sql-table-functions">
 973     <title><acronym>SQL</acronym> Functions as Table Sources</title>
 974
 975     <para>
 976      All SQL functions can be used in the <literal>FROM</literal> clause of a query,
 977      but it is particularly useful for functions returning composite types.
 978      If the function is defined to return a base type, the table function
 979      produces a one-column table.  If the function is defined to return
 980      a composite type, the table function produces a column for each attribute
 981      of the composite type.
 982     </para>
 983
 984     <para>
 985      Here is an example:
 986
 987 <screen>
 988 CREATE TABLE foo (fooid int, foosubid int, fooname text);
 989 INSERT INTO foo VALUES (1, 1, 'Joe');
 990 INSERT INTO foo VALUES (1, 2, 'Ed');
 991 INSERT INTO foo VALUES (2, 1, 'Mary');
 992
 993 CREATE FUNCTION getfoo(int) RETURNS foo AS $$
 994     SELECT * FROM foo WHERE fooid = $1;
 995 $$ LANGUAGE SQL;
 996
 997 SELECT *, upper(fooname) FROM getfoo(1) AS t1;
 998
 999  fooid | foosubid | fooname | upper
1000 -------+----------+---------+-------
1001      1 |        1 | Joe     | JOE
1002 (1 row)
1003 </screen>
1004
1005      As the example shows, we can work with the columns of the function's
1006      result just the same as if they were columns of a regular table.
1007     </para>
1008
1009     <para>
1010      Note that we only got one row out of the function.  This is because
1011      we did not use <literal>SETOF</literal>.  That is described in the next section.
1012     </para>
1013    </sect2>
1014
1015    <sect2 id="xfunc-sql-functions-returning-set">
1016     <title><acronym>SQL</acronym> Functions Returning Sets</title>
1017
1018     <indexterm>
1019      <primary>function</primary>
1020      <secondary>with SETOF</secondary>
1021     </indexterm>
1022
1023     <para>
1024      When an SQL function is declared as returning <literal>SETOF
1025      <replaceable>sometype</replaceable></literal>, the function's final
1026      query is executed to completion, and each row it
1027      outputs is returned as an element of the result set.
1028     </para>
1029
1030     <para>
1031      This feature is normally used when calling the function in the <literal>FROM</literal>
1032      clause.  In this case each row returned by the function becomes
1033      a row of the table seen by the query.  For example, assume that
1034      table <literal>foo</literal> has the same contents as above, and we say:
1035
1036 <programlisting>
1037 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
1038     SELECT * FROM foo WHERE fooid = $1;
1039 $$ LANGUAGE SQL;
1040
1041 SELECT * FROM getfoo(1) AS t1;
1042 </programlisting>
1043
1044      Then we would get:
1045 <screen>
1046  fooid | foosubid | fooname
1047 -------+----------+---------
1048      1 |        1 | Joe
1049      1 |        2 | Ed
1050 (2 rows)
1051 </screen>
1052     </para>
1053
1054     <para>
1055      It is also possible to return multiple rows with the columns defined by
1056      output parameters, like this:
1057
1058 <programlisting>
1059 CREATE TABLE tab (y int, z int);
1060 INSERT INTO tab VALUES (1, 2), (3, 4), (5, 6), (7, 8);
1061
1062 CREATE FUNCTION sum_n_product_with_tab (x int, OUT sum int, OUT product int)
1063 RETURNS SETOF record
1064 AS $$
1065     SELECT $1 + tab.y, $1 * tab.y FROM tab;
1066 $$ LANGUAGE SQL;
1067
1068 SELECT * FROM sum_n_product_with_tab(10);
1069  sum | product
1070 -----+---------
1071   11 |      10
1072   13 |      30
1073   15 |      50
1074   17 |      70
1075 (4 rows)
1076 </programlisting>
1077
1078      The key point here is that you must write <literal>RETURNS SETOF record</literal>
1079      to indicate that the function returns multiple rows instead of just one.
1080      If there is only one output parameter, write that parameter's type
1081      instead of <type>record</type>.
1082     </para>
1083
1084     <para>
1085      It is frequently useful to construct a query's result by invoking a
1086      set-returning function multiple times, with the parameters for each
1087      invocation coming from successive rows of a table or subquery.  The
1088      preferred way to do this is to use the <literal>LATERAL</literal> key word,
1089      which is described in <xref linkend="queries-lateral"/>.
1090      Here is an example using a set-returning function to enumerate
1091      elements of a tree structure:
1092
1093 <screen>
1094 SELECT * FROM nodes;
1095    name    | parent
1096 -----------+--------
1097  Top       |
1098  Child1    | Top
1099  Child2    | Top
1100  Child3    | Top
1101  SubChild1 | Child1
1102  SubChild2 | Child1
1103 (6 rows)
1104
1105 CREATE FUNCTION listchildren(text) RETURNS SETOF text AS $$
1106     SELECT name FROM nodes WHERE parent = $1
1107 $$ LANGUAGE SQL STABLE;
1108
1109 SELECT * FROM listchildren('Top');
1110  listchildren
1111 --------------
1112  Child1
1113  Child2
1114  Child3
1115 (3 rows)
1116
1117 SELECT name, child FROM nodes, LATERAL listchildren(name) AS child;
1118   name  |   child
1119 --------+-----------
1120  Top    | Child1
1121  Top    | Child2
1122  Top    | Child3
1123  Child1 | SubChild1
1124  Child1 | SubChild2
1125 (5 rows)
1126 </screen>
1127
1128      This example does not do anything that we couldn't have done with a
1129      simple join, but in more complex calculations the option to put
1130      some of the work into a function can be quite convenient.
1131     </para>
1132
1133     <para>
1134      Functions returning sets can also be called in the select list
1135      of a query.  For each row that the query
1136      generates by itself, the set-returning function is invoked, and an output
1137      row is generated for each element of the function's result set.
1138      The previous example could also be done with queries like
1139      these:
1140
1141 <screen>
1142 SELECT listchildren('Top');
1143  listchildren
1144 --------------
1145  Child1
1146  Child2
1147  Child3
1148 (3 rows)
1149
1150 SELECT name, listchildren(name) FROM nodes;
1151   name  | listchildren
1152 --------+--------------
1153  Top    | Child1
1154  Top    | Child2
1155  Top    | Child3
1156  Child1 | SubChild1
1157  Child1 | SubChild2
1158 (5 rows)
1159 </screen>
1160
1161      In the last <command>SELECT</command>,
1162      notice that no output row appears for <literal>Child2</literal>, <literal>Child3</literal>, etc.
1163      This happens because <function>listchildren</function> returns an empty set
1164      for those arguments, so no result rows are generated.  This is the same
1165      behavior as we got from an inner join to the function result when using
1166      the <literal>LATERAL</literal> syntax.
1167     </para>
1168
1169     <para>
1170      <productname>PostgreSQL</productname>'s behavior for a set-returning function in a
1171      query's select list is almost exactly the same as if the set-returning
1172      function had been written in a <literal>LATERAL FROM</literal>-clause item
1173      instead.  For example,
1174 <programlisting>
1175 SELECT x, generate_series(1,5) AS g FROM tab;
1176 </programlisting>
1177      is almost equivalent to
1178 <programlisting>
1179 SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g;
1180 </programlisting>
1181      It would be exactly the same, except that in this specific example,
1182      the planner could choose to put <structname>g</structname> on the outside of the
1183      nested-loop join, since <structname>g</structname> has no actual lateral dependency
1184      on <structname>tab</structname>.  That would result in a different output row
1185      order.  Set-returning functions in the select list are always evaluated
1186      as though they are on the inside of a nested-loop join with the rest of
1187      the <literal>FROM</literal> clause, so that the function(s) are run to
1188      completion before the next row from the <literal>FROM</literal> clause is
1189      considered.
1190     </para>
1191
1192     <para>
1193      If there is more than one set-returning function in the query's select
1194      list, the behavior is similar to what you get from putting the functions
1195      into a single <literal>LATERAL ROWS FROM( ... )</literal> <literal>FROM</literal>-clause
1196      item.  For each row from the underlying query, there is an output row
1197      using the first result from each function, then an output row using the
1198      second result, and so on.  If some of the set-returning functions
1199      produce fewer outputs than others, null values are substituted for the
1200      missing data, so that the total number of rows emitted for one
1201      underlying row is the same as for the set-returning function that
1202      produced the most outputs.  Thus the set-returning functions
1203      run <quote>in lockstep</quote> until they are all exhausted, and then
1204      execution continues with the next underlying row.
1205     </para>
1206
1207     <para>
1208      Set-returning functions can be nested in a select list, although that is
1209      not allowed in <literal>FROM</literal>-clause items.  In such cases, each level
1210      of nesting is treated separately, as though it were
1211      a separate <literal>LATERAL ROWS FROM( ... )</literal> item.  For example, in
1212 <programlisting>
1213 SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab;
1214 </programlisting>
1215      the set-returning functions <function>srf2</function>, <function>srf3</function>,
1216      and <function>srf5</function> would be run in lockstep for each row
1217      of <structname>tab</structname>, and then <function>srf1</function> and <function>srf4</function>
1218      would be applied in lockstep to each row produced by the lower
1219      functions.
1220     </para>
1221
1222     <para>
1223      Set-returning functions cannot be used within conditional-evaluation
1224      constructs, such as <literal>CASE</literal> or <literal>COALESCE</literal>.  For
1225      example, consider
1226 <programlisting>
1227 SELECT x, CASE WHEN x &gt; 0 THEN generate_series(1, 5) ELSE 0 END FROM tab;
1228 </programlisting>
1229      It might seem that this should produce five repetitions of input rows
1230      that have <literal>x &gt; 0</literal>, and a single repetition of those that do
1231      not; but actually, because <function>generate_series(1, 5)</function> would be
1232      run in an implicit <literal>LATERAL FROM</literal> item before
1233      the <literal>CASE</literal> expression is ever evaluated, it would produce five
1234      repetitions of every input row.  To reduce confusion, such cases produce
1235      a parse-time error instead.
1236     </para>
1237
1238     <note>
1239      <para>
1240       If a function's last command is <command>INSERT</command>,
1241       <command>UPDATE</command>, <command>DELETE</command>, or
1242       <command>MERGE</command> with <literal>RETURNING</literal>, that command will
1243       always be executed to completion, even if the function is not declared
1244       with <literal>SETOF</literal> or the calling query does not fetch all the
1245       result rows.  Any extra rows produced by the <literal>RETURNING</literal>
1246       clause are silently dropped, but the commanded table modifications
1247       still happen (and are all completed before returning from the function).
1248      </para>
1249     </note>
1250
1251     <note>
1252      <para>
1253       Before <productname>PostgreSQL</productname> 10, putting more than one
1254       set-returning function in the same select list did not behave very
1255       sensibly unless they always produced equal numbers of rows.  Otherwise,
1256       what you got was a number of output rows equal to the least common
1257       multiple of the numbers of rows produced by the set-returning
1258       functions.  Also, nested set-returning functions did not work as
1259       described above; instead, a set-returning function could have at most
1260       one set-returning argument, and each nest of set-returning functions
1261       was run independently.  Also, conditional execution (set-returning
1262       functions inside <literal>CASE</literal> etc.) was previously allowed,
1263       complicating things even more.
1264       Use of the <literal>LATERAL</literal> syntax is recommended when writing
1265       queries that need to work in older <productname>PostgreSQL</productname> versions,
1266       because that will give consistent results across different versions.
1267       If you have a query that is relying on conditional execution of a
1268       set-returning function, you may be able to fix it by moving the
1269       conditional test into a custom set-returning function.  For example,
1270 <programlisting>
1271 SELECT x, CASE WHEN y &gt; 0 THEN generate_series(1, z) ELSE 5 END FROM tab;
1272 </programlisting>
1273       could become
1274 <programlisting>
1275 CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int)
1276   RETURNS SETOF int AS $$
1277 BEGIN
1278   IF cond THEN
1279     RETURN QUERY SELECT generate_series(start, fin);
1280   ELSE
1281     RETURN QUERY SELECT els;
1282   END IF;
1283 END$$ LANGUAGE plpgsql;
1284
1285 SELECT x, case_generate_series(y &gt; 0, 1, z, 5) FROM tab;
1286 </programlisting>
1287       This formulation will work the same in all versions
1288       of <productname>PostgreSQL</productname>.
1289      </para>
1290     </note>
1291    </sect2>
1292
1293    <sect2 id="xfunc-sql-functions-returning-table">
1294     <title><acronym>SQL</acronym> Functions Returning <literal>TABLE</literal></title>
1295
1296     <indexterm>
1297      <primary>function</primary>
1298      <secondary>RETURNS TABLE</secondary>
1299     </indexterm>
1300
1301     <para>
1302      There is another way to declare a function as returning a set,
1303      which is to use the syntax
1304      <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>.
1305      This is equivalent to using one or more <literal>OUT</literal> parameters plus
1306      marking the function as returning <literal>SETOF record</literal> (or
1307      <literal>SETOF</literal> a single output parameter's type, as appropriate).
1308      This notation is specified in recent versions of the SQL standard, and
1309      thus may be more portable than using <literal>SETOF</literal>.
1310     </para>
1311
1312     <para>
1313      For example, the preceding sum-and-product example could also be
1314      done this way:
1315
1316 <programlisting>
1317 CREATE FUNCTION sum_n_product_with_tab (x int)
1318 RETURNS TABLE(sum int, product int) AS $$
1319     SELECT $1 + tab.y, $1 * tab.y FROM tab;
1320 $$ LANGUAGE SQL;
1321 </programlisting>
1322
1323      It is not allowed to use explicit <literal>OUT</literal> or <literal>INOUT</literal>
1324      parameters with the <literal>RETURNS TABLE</literal> notation &mdash; you must
1325      put all the output columns in the <literal>TABLE</literal> list.
1326     </para>
1327    </sect2>
1328
1329    <sect2 id="xfunc-sql-polymorphic-functions">
1330     <title>Polymorphic <acronym>SQL</acronym> Functions</title>
1331
1332     <para>
1333      <acronym>SQL</acronym> functions can be declared to accept and
1334      return the polymorphic types described in <xref
1335      linkend="extend-types-polymorphic"/>.  Here is a polymorphic
1336      function <function>make_array</function> that builds up an array
1337      from two arbitrary data type elements:
1338 <screen>
1339 CREATE FUNCTION make_array(anyelement, anyelement) RETURNS anyarray AS $$
1340     SELECT ARRAY[$1, $2];
1341 $$ LANGUAGE SQL;
1342
1343 SELECT make_array(1, 2) AS intarray, make_array('a'::text, 'b') AS textarray;
1344  intarray | textarray
1345 ----------+-----------
1346  {1,2}    | {a,b}
1347 (1 row)
1348 </screen>
1349     </para>
1350
1351     <para>
1352      Notice the use of the typecast <literal>'a'::text</literal>
1353      to specify that the argument is of type <type>text</type>. This is
1354      required if the argument is just a string literal, since otherwise
1355      it would be treated as type
1356      <type>unknown</type>, and array of <type>unknown</type> is not a valid
1357      type.
1358      Without the typecast, you will get errors like this:
1359 <screen>
1360 ERROR:  could not determine polymorphic type because input has type unknown
1361 </screen>
1362     </para>
1363
1364     <para>
1365      With <function>make_array</function> declared as above, you must
1366      provide two arguments that are of exactly the same data type; the
1367      system will not attempt to resolve any type differences.  Thus for
1368      example this does not work:
1369 <screen>
1370 SELECT make_array(1, 2.5) AS numericarray;
1371 ERROR:  function make_array(integer, numeric) does not exist
1372 </screen>
1373      An alternative approach is to use the <quote>common</quote> family of
1374      polymorphic types, which allows the system to try to identify a
1375      suitable common type:
1376 <screen>
1377 CREATE FUNCTION make_array2(anycompatible, anycompatible)
1378 RETURNS anycompatiblearray AS $$
1379     SELECT ARRAY[$1, $2];
1380 $$ LANGUAGE SQL;
1381
1382 SELECT make_array2(1, 2.5) AS numericarray;
1383  numericarray
1384 --------------
1385  {1,2.5}
1386 (1 row)
1387 </screen>
1388      Because the rules for common type resolution default to choosing
1389      type <type>text</type> when all inputs are of unknown types, this
1390      also works:
1391 <screen>
1392 SELECT make_array2('a', 'b') AS textarray;
1393  textarray
1394 -----------
1395  {a,b}
1396 (1 row)
1397 </screen>
1398     </para>
1399
1400     <para>
1401      It is permitted to have polymorphic arguments with a fixed
1402      return type, but the converse is not. For example:
1403 <screen>
1404 CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$
1405     SELECT $1 &gt; $2;
1406 $$ LANGUAGE SQL;
1407
1408 SELECT is_greater(1, 2);
1409  is_greater
1410 ------------
1411  f
1412 (1 row)
1413
1414 CREATE FUNCTION invalid_func() RETURNS anyelement AS $$
1415     SELECT 1;
1416 $$ LANGUAGE SQL;
1417 ERROR:  cannot determine result data type
1418 DETAIL:  A result of type anyelement requires at least one input of type anyelement, anyarray, anynonarray, anyenum, or anyrange.
1419 </screen>
1420     </para>
1421
1422     <para>
1423      Polymorphism can be used with functions that have output arguments.
1424      For example:
1425 <screen>
1426 CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray)
1427 AS 'select $1, array[$1,$1]' LANGUAGE SQL;
1428
1429 SELECT * FROM dup(22);
1430  f2 |   f3
1431 ----+---------
1432  22 | {22,22}
1433 (1 row)
1434 </screen>
1435     </para>
1436
1437     <para>
1438      Polymorphism can also be used with variadic functions.
1439      For example:
1440 <screen>
1441 CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
1442     SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
1443 $$ LANGUAGE SQL;
1444
1445 SELECT anyleast(10, -1, 5, 4);
1446  anyleast
1447 ----------
1448        -1
1449 (1 row)
1450
1451 SELECT anyleast('abc'::text, 'def');
1452  anyleast
1453 ----------
1454  abc
1455 (1 row)
1456
1457 CREATE FUNCTION concat_values(text, VARIADIC anyarray) RETURNS text AS $$
1458     SELECT array_to_string($2, $1);
1459 $$ LANGUAGE SQL;
1460
1461 SELECT concat_values('|', 1, 4, 2);
1462  concat_values
1463 ---------------
1464  1|4|2
1465 (1 row)
1466 </screen>
1467     </para>
1468    </sect2>
1469
1470    <sect2 id="xfunc-sql-collations">
1471     <title><acronym>SQL</acronym> Functions with Collations</title>
1472
1473     <indexterm>
1474      <primary>collation</primary>
1475      <secondary>in SQL functions</secondary>
1476     </indexterm>
1477
1478     <para>
1479      When an SQL function has one or more parameters of collatable data types,
1480      a collation is identified for each function call depending on the
1481      collations assigned to the actual arguments, as described in <xref
1482      linkend="collation"/>.  If a collation is successfully identified
1483      (i.e., there are no conflicts of implicit collations among the arguments)
1484      then all the collatable parameters are treated as having that collation
1485      implicitly.  This will affect the behavior of collation-sensitive
1486      operations within the function.  For example, using the
1487      <function>anyleast</function> function described above, the result of
1488 <programlisting>
1489 SELECT anyleast('abc'::text, 'ABC');
1490 </programlisting>
1491      will depend on the database's default collation.  In <literal>C</literal> locale
1492      the result will be <literal>ABC</literal>, but in many other locales it will
1493      be <literal>abc</literal>.  The collation to use can be forced by adding
1494      a <literal>COLLATE</literal> clause to any of the arguments, for example
1495 <programlisting>
1496 SELECT anyleast('abc'::text, 'ABC' COLLATE "C");
1497 </programlisting>
1498      Alternatively, if you wish a function to operate with a particular
1499      collation regardless of what it is called with, insert
1500      <literal>COLLATE</literal> clauses as needed in the function definition.
1501      This version of <function>anyleast</function> would always use <literal>en_US</literal>
1502      locale to compare strings:
1503 <programlisting>
1504 CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
1505     SELECT min($1[i] COLLATE "en_US") FROM generate_subscripts($1, 1) g(i);
1506 $$ LANGUAGE SQL;
1507 </programlisting>
1508      But note that this will throw an error if applied to a non-collatable
1509      data type.
1510     </para>
1511
1512     <para>
1513      If no common collation can be identified among the actual arguments,
1514      then an SQL function treats its parameters as having their data types'
1515      default collation (which is usually the database's default collation,
1516      but could be different for parameters of domain types).
1517     </para>
1518
1519     <para>
1520      The behavior of collatable parameters can be thought of as a limited
1521      form of polymorphism, applicable only to textual data types.
1522     </para>
1523    </sect2>
1524   </sect1>
1525
1526   <sect1 id="xfunc-overload">
1527    <title>Function Overloading</title>
1528
1529    <indexterm zone="xfunc-overload">
1530     <primary>overloading</primary>
1531     <secondary>functions</secondary>
1532    </indexterm>
1533
1534    <para>
1535     More than one function can be defined with the same SQL name, so long
1536     as the arguments they take are different.  In other words,
1537     function names can be <firstterm>overloaded</firstterm>.  Whether or not
1538     you use it, this capability entails security precautions when calling
1539     functions in databases where some users mistrust other users; see
1540     <xref linkend="typeconv-func"/>.  When a query is executed, the server
1541     will determine which function to call from the data types and the number
1542     of the provided arguments.  Overloading can also be used to simulate
1543     functions with a variable number of arguments, up to a finite maximum
1544     number.
1545    </para>
1546
1547    <para>
1548     When creating a family of overloaded functions, one should be
1549     careful not to create ambiguities.  For instance, given the
1550     functions:
1551 <programlisting>
1552 CREATE FUNCTION test(int, real) RETURNS ...
1553 CREATE FUNCTION test(smallint, double precision) RETURNS ...
1554 </programlisting>
1555     it is not immediately clear which function would be called with
1556     some trivial input like <literal>test(1, 1.5)</literal>.  The
1557     currently implemented resolution rules are described in
1558     <xref linkend="typeconv"/>, but it is unwise to design a system that subtly
1559     relies on this behavior.
1560    </para>
1561
1562    <para>
1563     A function that takes a single argument of a composite type should
1564     generally not have the same name as any attribute (field) of that type.
1565     Recall that <literal><replaceable>attribute</replaceable>(<replaceable>table</replaceable>)</literal>
1566     is considered equivalent
1567     to <literal><replaceable>table</replaceable>.<replaceable>attribute</replaceable></literal>.
1568     In the case that there is an
1569     ambiguity between a function on a composite type and an attribute of
1570     the composite type, the attribute will always be used.  It is possible
1571     to override that choice by schema-qualifying the function name
1572     (that is, <literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>table</replaceable>)
1573     </literal>) but it's better to
1574     avoid the problem by not choosing conflicting names.
1575    </para>
1576
1577    <para>
1578     Another possible conflict is between variadic and non-variadic functions.
1579     For instance, it is possible to create both <literal>foo(numeric)</literal> and
1580     <literal>foo(VARIADIC numeric[])</literal>.  In this case it is unclear which one
1581     should be matched to a call providing a single numeric argument, such as
1582     <literal>foo(10.1)</literal>.  The rule is that the function appearing
1583     earlier in the search path is used, or if the two functions are in the
1584     same schema, the non-variadic one is preferred.
1585    </para>
1586
1587    <para>
1588     When overloading C-language functions, there is an additional
1589     constraint: The C name of each function in the family of
1590     overloaded functions must be different from the C names of all
1591     other functions, either internal or dynamically loaded.  If this
1592     rule is violated, the behavior is not portable.  You might get a
1593     run-time linker error, or one of the functions will get called
1594     (usually the internal one).  The alternative form of the
1595     <literal>AS</literal> clause for the SQL <command>CREATE
1596     FUNCTION</command> command decouples the SQL function name from
1597     the function name in the C source code.  For instance:
1598 <programlisting>
1599 CREATE FUNCTION test(int) RETURNS int
1600     AS '<replaceable>filename</replaceable>', 'test_1arg'
1601     LANGUAGE C;
1602 CREATE FUNCTION test(int, int) RETURNS int
1603     AS '<replaceable>filename</replaceable>', 'test_2arg'
1604     LANGUAGE C;
1605 </programlisting>
1606     The names of the C functions here reflect one of many possible conventions.
1607    </para>
1608   </sect1>
1609
1610   <sect1 id="xfunc-volatility">
1611    <title>Function Volatility Categories</title>
1612
1613    <indexterm zone="xfunc-volatility">
1614     <primary>volatility</primary>
1615     <secondary>functions</secondary>
1616    </indexterm>
1617    <indexterm zone="xfunc-volatility">
1618     <primary>VOLATILE</primary>
1619    </indexterm>
1620    <indexterm zone="xfunc-volatility">
1621     <primary>STABLE</primary>
1622    </indexterm>
1623    <indexterm zone="xfunc-volatility">
1624     <primary>IMMUTABLE</primary>
1625    </indexterm>
1626
1627    <para>
1628     Every function has a <firstterm>volatility</firstterm> classification, with
1629     the possibilities being <literal>VOLATILE</literal>, <literal>STABLE</literal>, or
1630     <literal>IMMUTABLE</literal>.  <literal>VOLATILE</literal> is the default if the
1631     <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link>
1632     command does not specify a category.  The volatility category is a
1633     promise to the optimizer about the behavior of the function:
1634
1635    <itemizedlist>
1636     <listitem>
1637      <para>
1638       A <literal>VOLATILE</literal> function can do anything, including modifying
1639       the database.  It can return different results on successive calls with
1640       the same arguments.  The optimizer makes no assumptions about the
1641       behavior of such functions.  A query using a volatile function will
1642       re-evaluate the function at every row where its value is needed.
1643      </para>
1644     </listitem>
1645     <listitem>
1646      <para>
1647       A <literal>STABLE</literal> function cannot modify the database and is
1648       guaranteed to return the same results given the same arguments
1649       for all rows within a single statement. This category allows the
1650       optimizer to optimize multiple calls of the function to a single
1651       call. In particular, it is safe to use an expression containing
1652       such a function in an index scan condition. (Since an index scan
1653       will evaluate the comparison value only once, not once at each
1654       row, it is not valid to use a <literal>VOLATILE</literal> function in an
1655       index scan condition.)
1656      </para>
1657     </listitem>
1658     <listitem>
1659      <para>
1660       An <literal>IMMUTABLE</literal> function cannot modify the database and is
1661       guaranteed to return the same results given the same arguments forever.
1662       This category allows the optimizer to pre-evaluate the function when
1663       a query calls it with constant arguments.  For example, a query like
1664       <literal>SELECT ... WHERE x = 2 + 2</literal> can be simplified on sight to
1665       <literal>SELECT ... WHERE x = 4</literal>, because the function underlying
1666       the integer addition operator is marked <literal>IMMUTABLE</literal>.
1667      </para>
1668     </listitem>
1669    </itemizedlist>
1670    </para>
1671
1672    <para>
1673     For best optimization results, you should label your functions with the
1674     strictest volatility category that is valid for them.
1675    </para>
1676
1677    <para>
1678     Any function with side-effects <emphasis>must</emphasis> be labeled
1679     <literal>VOLATILE</literal>, so that calls to it cannot be optimized away.
1680     Even a function with no side-effects needs to be labeled
1681     <literal>VOLATILE</literal> if its value can change within a single query;
1682     some examples are <literal>random()</literal>, <literal>currval()</literal>,
1683     <literal>timeofday()</literal>.
1684    </para>
1685
1686    <para>
1687     Another important example is that the <function>current_timestamp</function>
1688     family of functions qualify as <literal>STABLE</literal>, since their values do
1689     not change within a transaction.
1690    </para>
1691
1692    <para>
1693     There is relatively little difference between <literal>STABLE</literal> and
1694     <literal>IMMUTABLE</literal> categories when considering simple interactive
1695     queries that are planned and immediately executed: it doesn't matter
1696     a lot whether a function is executed once during planning or once during
1697     query execution startup.  But there is a big difference if the plan is
1698     saved and reused later.  Labeling a function <literal>IMMUTABLE</literal> when
1699     it really isn't might allow it to be prematurely folded to a constant during
1700     planning, resulting in a stale value being re-used during subsequent uses
1701     of the plan.  This is a hazard when using prepared statements or when
1702     using function languages that cache plans (such as
1703     <application>PL/pgSQL</application>).
1704    </para>
1705
1706    <para>
1707     For functions written in SQL or in any of the standard procedural
1708     languages, there is a second important property determined by the
1709     volatility category, namely the visibility of any data changes that have
1710     been made by the SQL command that is calling the function.  A
1711     <literal>VOLATILE</literal> function will see such changes, a <literal>STABLE</literal>
1712     or <literal>IMMUTABLE</literal> function will not.  This behavior is implemented
1713     using the snapshotting behavior of MVCC (see <xref linkend="mvcc"/>):
1714     <literal>STABLE</literal> and <literal>IMMUTABLE</literal> functions use a snapshot
1715     established as of the start of the calling query, whereas
1716     <literal>VOLATILE</literal> functions obtain a fresh snapshot at the start of
1717     each query they execute.
1718    </para>
1719
1720    <note>
1721     <para>
1722      Functions written in C can manage snapshots however they want, but it's
1723      usually a good idea to make C functions work this way too.
1724     </para>
1725    </note>
1726
1727    <para>
1728     Because of this snapshotting behavior,
1729     a function containing only <command>SELECT</command> commands can safely be
1730     marked <literal>STABLE</literal>, even if it selects from tables that might be
1731     undergoing modifications by concurrent queries.
1732     <productname>PostgreSQL</productname> will execute all commands of a
1733     <literal>STABLE</literal> function using the snapshot established for the
1734     calling query, and so it will see a fixed view of the database throughout
1735     that query.
1736    </para>
1737
1738    <para>
1739     The same snapshotting behavior is used for <command>SELECT</command> commands
1740     within <literal>IMMUTABLE</literal> functions.  It is generally unwise to select
1741     from database tables within an <literal>IMMUTABLE</literal> function at all,
1742     since the immutability will be broken if the table contents ever change.
1743     However, <productname>PostgreSQL</productname> does not enforce that you
1744     do not do that.
1745    </para>
1746
1747    <para>
1748     A common error is to label a function <literal>IMMUTABLE</literal> when its
1749     results depend on a configuration parameter.  For example, a function
1750     that manipulates timestamps might well have results that depend on the
1751     <xref linkend="guc-timezone"/> setting.  For safety, such functions should
1752     be labeled <literal>STABLE</literal> instead.
1753    </para>
1754
1755    <note>
1756     <para>
1757      <productname>PostgreSQL</productname> requires that <literal>STABLE</literal>
1758      and <literal>IMMUTABLE</literal> functions contain no SQL commands other
1759      than <command>SELECT</command> to prevent data modification.
1760      (This is not a completely bulletproof test, since such functions could
1761      still call <literal>VOLATILE</literal> functions that modify the database.
1762      If you do that, you will find that the <literal>STABLE</literal> or
1763      <literal>IMMUTABLE</literal> function does not notice the database changes
1764      applied by the called function, since they are hidden from its snapshot.)
1765     </para>
1766    </note>
1767   </sect1>
1768
1769   <sect1 id="xfunc-pl">
1770    <title>Procedural Language Functions</title>
1771
1772    <para>
1773     <productname>PostgreSQL</productname> allows user-defined functions
1774     to be written in other languages besides SQL and C.  These other
1775     languages are generically called <firstterm>procedural
1776     languages</firstterm> (<acronym>PL</acronym>s).
1777     Procedural languages aren't built into the
1778     <productname>PostgreSQL</productname> server; they are offered
1779     by loadable modules.
1780     See <xref linkend="xplang"/> and following chapters for more
1781     information.
1782    </para>
1783   </sect1>
1784
1785   <sect1 id="xfunc-internal">
1786    <title>Internal Functions</title>
1787
1788    <indexterm zone="xfunc-internal"><primary>function</primary><secondary>internal</secondary></indexterm>
1789
1790    <para>
1791     Internal functions are functions written in C that have been statically
1792     linked into the <productname>PostgreSQL</productname> server.
1793     The <quote>body</quote> of the function definition
1794     specifies the C-language name of the function, which need not be the
1795     same as the name being declared for SQL use.
1796     (For reasons of backward compatibility, an empty body
1797     is accepted as meaning that the C-language function name is the
1798     same as the SQL name.)
1799    </para>
1800
1801    <para>
1802     Normally, all internal functions present in the
1803     server are declared during the initialization of the database cluster
1804     (see <xref linkend="creating-cluster"/>),
1805     but a user could use <command>CREATE FUNCTION</command>
1806     to create additional alias names for an internal function.
1807     Internal functions are declared in <command>CREATE FUNCTION</command>
1808     with language name <literal>internal</literal>.  For instance, to
1809     create an alias for the <function>sqrt</function> function:
1810 <programlisting>
1811 CREATE FUNCTION square_root(double precision) RETURNS double precision
1812     AS 'dsqrt'
1813     LANGUAGE internal
1814     STRICT;
1815 </programlisting>
1816     (Most internal functions expect to be declared <quote>strict</quote>.)
1817    </para>
1818
1819    <note>
1820     <para>
1821      Not all <quote>predefined</quote> functions are
1822      <quote>internal</quote> in the above sense.  Some predefined
1823      functions are written in SQL.
1824     </para>
1825    </note>
1826   </sect1>
1827
1828   <sect1 id="xfunc-c">
1829    <title>C-Language Functions</title>
1830
1831    <indexterm zone="xfunc-c">
1832     <primary>function</primary>
1833     <secondary>user-defined</secondary>
1834     <tertiary>in C</tertiary>
1835    </indexterm>
1836
1837    <para>
1838     User-defined functions can be written in C (or a language that can
1839     be made compatible with C, such as C++).  Such functions are
1840     compiled into dynamically loadable objects (also called shared
1841     libraries) and are loaded by the server on demand.  The dynamic
1842     loading feature is what distinguishes <quote>C language</quote> functions
1843     from <quote>internal</quote> functions &mdash; the actual coding conventions
1844     are essentially the same for both.  (Hence, the standard internal
1845     function library is a rich source of coding examples for user-defined
1846     C functions.)
1847    </para>
1848
1849    <para>
1850     Currently only one calling convention is used for C functions
1851     (<quote>version 1</quote>). Support for that calling convention is
1852     indicated by writing a <literal>PG_FUNCTION_INFO_V1()</literal> macro
1853     call for the function, as illustrated below.
1854    </para>
1855
1856   <sect2 id="xfunc-c-dynload">
1857    <title>Dynamic Loading</title>
1858
1859    <indexterm zone="xfunc-c-dynload">
1860     <primary>dynamic loading</primary>
1861    </indexterm>
1862
1863    <para>
1864     The first time a user-defined function in a particular
1865     loadable object file is called in a session,
1866     the dynamic loader loads that object file into memory so that the
1867     function can be called.  The <command>CREATE FUNCTION</command>
1868     for a user-defined C function must therefore specify two pieces of
1869     information for the function: the name of the loadable
1870     object file, and the C name (link symbol) of the specific function to call
1871     within that object file.  If the C name is not explicitly specified then
1872     it is assumed to be the same as the SQL function name.
1873    </para>
1874
1875    <para>
1876     The following algorithm is used to locate the shared object file
1877     based on the name given in the <command>CREATE FUNCTION</command>
1878     command:
1879
1880     <orderedlist>
1881      <listitem>
1882       <para>
1883        If the name is an absolute path, the given file is loaded.
1884       </para>
1885      </listitem>
1886
1887      <listitem>
1888       <para>
1889        If the name starts with the string <literal>$libdir</literal>,
1890        that part is replaced by the <productname>PostgreSQL</productname> package
1891         library directory
1892        name, which is determined at build time.<indexterm><primary>$libdir</primary></indexterm>
1893       </para>
1894      </listitem>
1895
1896      <listitem>
1897       <para>
1898        If the name does not contain a directory part, the file is
1899        searched for in the path specified by the configuration variable
1900        <xref linkend="guc-dynamic-library-path"/>.<indexterm><primary>dynamic_library_path</primary></indexterm>
1901       </para>
1902      </listitem>
1903
1904      <listitem>
1905       <para>
1906        Otherwise (the file was not found in the path, or it contains a
1907        non-absolute directory part), the dynamic loader will try to
1908        take the name as given, which will most likely fail.  (It is
1909        unreliable to depend on the current working directory.)
1910       </para>
1911      </listitem>
1912     </orderedlist>
1913
1914     If this sequence does not work, the platform-specific shared
1915     library file name extension (often <filename>.so</filename>) is
1916     appended to the given name and this sequence is tried again.  If
1917     that fails as well, the load will fail.
1918    </para>
1919
1920    <para>
1921     It is recommended to locate shared libraries either relative to
1922     <literal>$libdir</literal> or through the dynamic library path.
1923     This simplifies version upgrades if the new installation is at a
1924     different location.  The actual directory that
1925     <literal>$libdir</literal> stands for can be found out with the
1926     command <literal>pg_config --pkglibdir</literal>.
1927    </para>
1928
1929    <para>
1930     The user ID the <productname>PostgreSQL</productname> server runs
1931     as must be able to traverse the path to the file you intend to
1932     load.  Making the file or a higher-level directory not readable
1933     and/or not executable by the <systemitem>postgres</systemitem>
1934     user is a common mistake.
1935    </para>
1936
1937    <para>
1938     In any case, the file name that is given in the
1939     <command>CREATE FUNCTION</command> command is recorded literally
1940     in the system catalogs, so if the file needs to be loaded again
1941     the same procedure is applied.
1942    </para>
1943
1944    <note>
1945     <para>
1946      <productname>PostgreSQL</productname> will not compile a C function
1947      automatically.  The object file must be compiled before it is referenced
1948      in a <command>CREATE
1949      FUNCTION</command> command.  See <xref linkend="dfunc"/> for additional
1950      information.
1951     </para>
1952    </note>
1953
1954    <indexterm zone="xfunc-c-dynload">
1955     <primary>magic block</primary>
1956    </indexterm>
1957
1958    <para>
1959     To ensure that a dynamically loaded object file is not loaded into an
1960     incompatible server, <productname>PostgreSQL</productname> checks that the
1961     file contains a <quote>magic block</quote> with the appropriate contents.
1962     This allows the server to detect obvious incompatibilities, such as code
1963     compiled for a different major version of
1964     <productname>PostgreSQL</productname>. To include a magic block,
1965     write this in one (and only one) of the module source files, after having
1966     included the header <filename>fmgr.h</filename>:
1967
1968 <programlisting>
1969 PG_MODULE_MAGIC;
1970 </programlisting>
1971    </para>
1972
1973    <para>
1974     After it is used for the first time, a dynamically loaded object
1975     file is retained in memory.  Future calls in the same session to
1976     the function(s) in that file will only incur the small overhead of
1977     a symbol table lookup.  If you need to force a reload of an object
1978     file, for example after recompiling it, begin a fresh session.
1979    </para>
1980
1981    <indexterm zone="xfunc-c-dynload">
1982     <primary>_PG_init</primary>
1983    </indexterm>
1984    <indexterm zone="xfunc-c-dynload">
1985     <primary>library initialization function</primary>
1986    </indexterm>
1987
1988    <para>
1989     Optionally, a dynamically loaded file can contain an initialization
1990     function.  If the file includes a function named
1991     <function>_PG_init</function>, that function will be called immediately after
1992     loading the file.  The function receives no parameters and should
1993     return void.  There is presently no way to unload a dynamically loaded file.
1994    </para>
1995
1996   </sect2>
1997
1998    <sect2 id="xfunc-c-basetype">
1999     <title>Base Types in C-Language Functions</title>
2000
2001     <indexterm zone="xfunc-c-basetype">
2002      <primary>data type</primary>
2003      <secondary>internal organization</secondary>
2004     </indexterm>
2005
2006     <para>
2007      To know how to write C-language functions, you need to know how
2008      <productname>PostgreSQL</productname> internally represents base
2009      data types and how they can be passed to and from functions.
2010      Internally, <productname>PostgreSQL</productname> regards a base
2011      type as a <quote>blob of memory</quote>.  The user-defined
2012      functions that you define over a type in turn define the way that
2013      <productname>PostgreSQL</productname> can operate on it.  That
2014      is, <productname>PostgreSQL</productname> will only store and
2015      retrieve the data from disk and use your user-defined functions
2016      to input, process, and output the data.
2017     </para>
2018
2019     <para>
2020      Base types can have one of three internal formats:
2021
2022      <itemizedlist>
2023       <listitem>
2024        <para>
2025         pass by value, fixed-length
2026        </para>
2027       </listitem>
2028       <listitem>
2029        <para>
2030         pass by reference, fixed-length
2031        </para>
2032       </listitem>
2033       <listitem>
2034        <para>
2035         pass by reference, variable-length
2036        </para>
2037       </listitem>
2038      </itemizedlist>
2039     </para>
2040
2041     <para>
2042      By-value  types  can  only be 1, 2, or 4 bytes in length
2043      (also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
2044      You should be careful to define your types such that they will be the
2045      same size (in bytes) on all architectures.  For example, the
2046      <literal>long</literal> type is dangerous because it is 4 bytes on some
2047      machines and 8 bytes on others, whereas <type>int</type> type is 4 bytes
2048      on most Unix machines.  A reasonable implementation of the
2049      <type>int4</type> type on Unix machines might be:
2050
2051 <programlisting>
2052 /* 4-byte integer, passed by value */
2053 typedef int int4;
2054 </programlisting>
2055
2056      (The actual PostgreSQL C code calls this type <type>int32</type>, because
2057      it is a convention in C that <type>int<replaceable>XX</replaceable></type>
2058      means <replaceable>XX</replaceable> <emphasis>bits</emphasis>.  Note
2059      therefore also that the C type <type>int8</type> is 1 byte in size.  The
2060      SQL type <type>int8</type> is called <type>int64</type> in C.  See also
2061      <xref linkend="xfunc-c-type-table"/>.)
2062     </para>
2063
2064     <para>
2065      On  the  other hand, fixed-length types of any size can
2066      be passed by-reference.  For example, here is a  sample
2067      implementation of a <productname>PostgreSQL</productname> type:
2068
2069 <programlisting>
2070 /* 16-byte structure, passed by reference */
2071 typedef struct
2072 {
2073     double  x, y;
2074 } Point;
2075 </programlisting>
2076
2077      Only  pointers  to  such types can be used when passing
2078      them in and out of <productname>PostgreSQL</productname> functions.
2079      To return a value of such a type, allocate the right amount of
2080      memory with <literal>palloc</literal>, fill in the allocated memory,
2081      and return a pointer to it.  (Also, if you just want to return the
2082      same value as one of your input arguments that's of the same data type,
2083      you can skip the extra <literal>palloc</literal> and just return the
2084      pointer to the input value.)
2085     </para>
2086
2087     <para>
2088      Finally, all variable-length types must also be  passed
2089      by  reference.   All  variable-length  types must begin
2090      with an opaque length field of exactly 4 bytes, which will be set
2091      by <symbol>SET_VARSIZE</symbol>; never set this field directly! All data to
2092      be  stored within that type must be located in the memory
2093      immediately  following  that  length  field.   The
2094      length field contains the total length of the structure,
2095      that is,  it  includes  the  size  of  the  length  field
2096      itself.
2097     </para>
2098
2099     <para>
2100      Another important point is to avoid leaving any uninitialized bits
2101      within data type values; for example, take care to zero out any
2102      alignment padding bytes that might be present in structs.  Without
2103      this, logically-equivalent constants of your data type might be
2104      seen as unequal by the planner, leading to inefficient (though not
2105      incorrect) plans.
2106     </para>
2107
2108     <warning>
2109      <para>
2110       <emphasis>Never</emphasis> modify the contents of a pass-by-reference input
2111       value.  If you do so you are likely to corrupt on-disk data, since
2112       the pointer you are given might point directly into a disk buffer.
2113       The sole exception to this rule is explained in
2114       <xref linkend="xaggr"/>.
2115      </para>
2116     </warning>
2117
2118     <para>
2119      As an example, we can define the type <type>text</type> as
2120      follows:
2121
2122 <programlisting>
2123 typedef struct {
2124     int32 length;
2125     char data[FLEXIBLE_ARRAY_MEMBER];
2126 } text;
2127 </programlisting>
2128
2129      The <literal>[FLEXIBLE_ARRAY_MEMBER]</literal> notation means that the actual
2130      length of the data part is not specified by this declaration.
2131     </para>
2132
2133     <para>
2134      When manipulating
2135      variable-length types, we must  be  careful  to  allocate
2136      the  correct amount  of memory and set the length field correctly.
2137      For example, if we wanted to  store  40  bytes  in  a <structname>text</structname>
2138      structure, we might use a code fragment like this:
2139
2140 <programlisting><![CDATA[
2141 #include "postgres.h"
2142 ...
2143 char buffer[40]; /* our source data */
2144 ...
2145 text *destination = (text *) palloc(VARHDRSZ + 40);
2146 SET_VARSIZE(destination, VARHDRSZ + 40);
2147 memcpy(destination->data, buffer, 40);
2148 ...
2149 ]]>
2150 </programlisting>
2151
2152      <literal>VARHDRSZ</literal> is the same as <literal>sizeof(int32)</literal>, but
2153      it's considered good style to use the macro <literal>VARHDRSZ</literal>
2154      to refer to the size of the overhead for a variable-length type.
2155      Also, the length field <emphasis>must</emphasis> be set using the
2156      <literal>SET_VARSIZE</literal> macro, not by simple assignment.
2157     </para>
2158
2159     <para>
2160      <xref linkend="xfunc-c-type-table"/> shows the C types
2161      corresponding to many of the built-in SQL data types
2162      of <productname>PostgreSQL</productname>.
2163      The <quote>Defined In</quote> column gives the header file that
2164      needs to be included to get the type definition.  (The actual
2165      definition might be in a different file that is included by the
2166      listed file.  It is recommended that users stick to the defined
2167      interface.)  Note that you should always include
2168      <filename>postgres.h</filename> first in any source file of server
2169      code, because it declares a number of things that you will need
2170      anyway, and because including other headers first can cause
2171      portability issues.
2172     </para>
2173
2174      <table tocentry="1" id="xfunc-c-type-table">
2175       <title>Equivalent C Types for Built-in SQL Types</title>
2176       <tgroup cols="3">
2177        <colspec colname="col1" colwidth="1*"/>
2178        <colspec colname="col2" colwidth="1*"/>
2179        <colspec colname="col3" colwidth="2*"/>
2180        <thead>
2181         <row>
2182          <entry>
2183           SQL Type
2184          </entry>
2185          <entry>
2186           C Type
2187          </entry>
2188          <entry>
2189           Defined In
2190          </entry>
2191         </row>
2192        </thead>
2193        <tbody>
2194         <row>
2195          <entry><type>boolean</type></entry>
2196          <entry><type>bool</type></entry>
2197          <entry><filename>postgres.h</filename> (maybe compiler built-in)</entry>
2198         </row>
2199         <row>
2200          <entry><type>box</type></entry>
2201          <entry><type>BOX*</type></entry>
2202          <entry><filename>utils/geo_decls.h</filename></entry>
2203         </row>
2204         <row>
2205          <entry><type>bytea</type></entry>
2206          <entry><type>bytea*</type></entry>
2207          <entry><filename>postgres.h</filename></entry>
2208         </row>
2209         <row>
2210          <entry><type>"char"</type></entry>
2211          <entry><type>char</type></entry>
2212          <entry>(compiler built-in)</entry>
2213         </row>
2214         <row>
2215          <entry><type>character</type></entry>
2216          <entry><type>BpChar*</type></entry>
2217          <entry><filename>postgres.h</filename></entry>
2218         </row>
2219         <row>
2220          <entry><type>cid</type></entry>
2221          <entry><type>CommandId</type></entry>
2222          <entry><filename>postgres.h</filename></entry>
2223         </row>
2224         <row>
2225          <entry><type>date</type></entry>
2226          <entry><type>DateADT</type></entry>
2227          <entry><filename>utils/date.h</filename></entry>
2228         </row>
2229         <row>
2230          <entry><type>float4</type> (<type>real</type>)</entry>
2231          <entry><type>float4</type></entry>
2232         <entry><filename>postgres.h</filename></entry>
2233         </row>
2234         <row>
2235          <entry><type>float8</type> (<type>double precision</type>)</entry>
2236          <entry><type>float8</type></entry>
2237          <entry><filename>postgres.h</filename></entry>
2238         </row>
2239         <row>
2240          <entry><type>int2</type> (<type>smallint</type>)</entry>
2241          <entry><type>int16</type></entry>
2242          <entry><filename>postgres.h</filename></entry>
2243         </row>
2244         <row>
2245          <entry><type>int4</type> (<type>integer</type>)</entry>
2246          <entry><type>int32</type></entry>
2247          <entry><filename>postgres.h</filename></entry>
2248         </row>
2249         <row>
2250          <entry><type>int8</type> (<type>bigint</type>)</entry>
2251          <entry><type>int64</type></entry>
2252          <entry><filename>postgres.h</filename></entry>
2253         </row>
2254         <row>
2255          <entry><type>interval</type></entry>
2256          <entry><type>Interval*</type></entry>
2257          <entry><filename>datatype/timestamp.h</filename></entry>
2258         </row>
2259         <row>
2260          <entry><type>lseg</type></entry>
2261          <entry><type>LSEG*</type></entry>
2262          <entry><filename>utils/geo_decls.h</filename></entry>
2263         </row>
2264         <row>
2265          <entry><type>name</type></entry>
2266          <entry><type>Name</type></entry>
2267          <entry><filename>postgres.h</filename></entry>
2268         </row>
2269         <row>
2270          <entry><type>numeric</type></entry>
2271          <entry><type>Numeric</type></entry>
2272          <entry><filename>utils/numeric.h</filename></entry>
2273         </row>
2274         <row>
2275          <entry><type>oid</type></entry>
2276          <entry><type>Oid</type></entry>
2277          <entry><filename>postgres.h</filename></entry>
2278         </row>
2279         <row>
2280          <entry><type>oidvector</type></entry>
2281          <entry><type>oidvector*</type></entry>
2282          <entry><filename>postgres.h</filename></entry>
2283         </row>
2284         <row>
2285          <entry><type>path</type></entry>
2286          <entry><type>PATH*</type></entry>
2287          <entry><filename>utils/geo_decls.h</filename></entry>
2288         </row>
2289         <row>
2290          <entry><type>point</type></entry>
2291          <entry><type>POINT*</type></entry>
2292          <entry><filename>utils/geo_decls.h</filename></entry>
2293         </row>
2294         <row>
2295          <entry><type>regproc</type></entry>
2296          <entry><type>RegProcedure</type></entry>
2297          <entry><filename>postgres.h</filename></entry>
2298         </row>
2299         <row>
2300          <entry><type>text</type></entry>
2301          <entry><type>text*</type></entry>
2302          <entry><filename>postgres.h</filename></entry>
2303         </row>
2304         <row>
2305          <entry><type>tid</type></entry>
2306          <entry><type>ItemPointer</type></entry>
2307          <entry><filename>storage/itemptr.h</filename></entry>
2308         </row>
2309         <row>
2310          <entry><type>time</type></entry>
2311          <entry><type>TimeADT</type></entry>
2312          <entry><filename>utils/date.h</filename></entry>
2313         </row>
2314         <row>
2315          <entry><type>time with time zone</type></entry>
2316          <entry><type>TimeTzADT</type></entry>
2317          <entry><filename>utils/date.h</filename></entry>
2318         </row>
2319         <row>
2320          <entry><type>timestamp</type></entry>
2321          <entry><type>Timestamp</type></entry>
2322          <entry><filename>datatype/timestamp.h</filename></entry>
2323         </row>
2324         <row>
2325          <entry><type>timestamp with time zone</type></entry>
2326          <entry><type>TimestampTz</type></entry>
2327          <entry><filename>datatype/timestamp.h</filename></entry>
2328         </row>
2329         <row>
2330          <entry><type>varchar</type></entry>
2331          <entry><type>VarChar*</type></entry>
2332          <entry><filename>postgres.h</filename></entry>
2333         </row>
2334         <row>
2335          <entry><type>xid</type></entry>
2336          <entry><type>TransactionId</type></entry>
2337          <entry><filename>postgres.h</filename></entry>
2338         </row>
2339        </tbody>
2340       </tgroup>
2341      </table>
2342
2343     <para>
2344      Now that we've gone over all of the possible structures
2345      for base types, we can show some examples of real functions.
2346     </para>
2347    </sect2>
2348
2349    <sect2 id="xfunc-c-v1-call-conv">
2350     <title>Version 1 Calling Conventions</title>
2351
2352     <para>
2353      The version-1 calling convention relies on macros to suppress most
2354      of the complexity of passing arguments and results.  The C declaration
2355      of a version-1 function is always:
2356 <programlisting>
2357 Datum funcname(PG_FUNCTION_ARGS)
2358 </programlisting>
2359      In addition, the macro call:
2360 <programlisting>
2361 PG_FUNCTION_INFO_V1(funcname);
2362 </programlisting>
2363      must appear in the same source file.  (Conventionally, it's
2364      written just before the function itself.)  This macro call is not
2365      needed for <literal>internal</literal>-language functions, since
2366      <productname>PostgreSQL</productname> assumes that all internal functions
2367      use the version-1 convention.  It is, however, required for
2368      dynamically-loaded functions.
2369     </para>
2370
2371     <para>
2372      In a version-1 function, each actual argument is fetched using a
2373      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2374      macro that corresponds to the argument's data type.  (In non-strict
2375      functions there needs to be a previous check about argument null-ness
2376      using <function>PG_ARGISNULL()</function>; see below.)
2377      The result is returned using a
2378      <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2379      macro for the return type.
2380      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2381      takes as its argument the number of the function argument to
2382      fetch, where the count starts at 0.
2383      <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2384      takes as its argument the actual value to return.
2385     </para>
2386
2387     <para>
2388      Here are some examples using the version-1 calling convention:
2389     </para>
2390
2391 <programlisting><![CDATA[
2392 #include "postgres.h"
2393 #include <string.h>
2394 #include "fmgr.h"
2395 #include "utils/geo_decls.h"
2396 #include "varatt.h"
2397
2398 PG_MODULE_MAGIC;
2399
2400 /* by value */
2401
2402 PG_FUNCTION_INFO_V1(add_one);
2403
2404 Datum
2405 add_one(PG_FUNCTION_ARGS)
2406 {
2407     int32   arg = PG_GETARG_INT32(0);
2408
2409     PG_RETURN_INT32(arg + 1);
2410 }
2411
2412 /* by reference, fixed length */
2413
2414 PG_FUNCTION_INFO_V1(add_one_float8);
2415
2416 Datum
2417 add_one_float8(PG_FUNCTION_ARGS)
2418 {
2419     /* The macros for FLOAT8 hide its pass-by-reference nature. */
2420     float8   arg = PG_GETARG_FLOAT8(0);
2421
2422     PG_RETURN_FLOAT8(arg + 1.0);
2423 }
2424
2425 PG_FUNCTION_INFO_V1(makepoint);
2426
2427 Datum
2428 makepoint(PG_FUNCTION_ARGS)
2429 {
2430     /* Here, the pass-by-reference nature of Point is not hidden. */
2431     Point     *pointx = PG_GETARG_POINT_P(0);
2432     Point     *pointy = PG_GETARG_POINT_P(1);
2433     Point     *new_point = (Point *) palloc(sizeof(Point));
2434
2435     new_point->x = pointx->x;
2436     new_point->y = pointy->y;
2437
2438     PG_RETURN_POINT_P(new_point);
2439 }
2440
2441 /* by reference, variable length */
2442
2443 PG_FUNCTION_INFO_V1(copytext);
2444
2445 Datum
2446 copytext(PG_FUNCTION_ARGS)
2447 {
2448     text     *t = PG_GETARG_TEXT_PP(0);
2449
2450     /*
2451      * VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus the
2452      * VARHDRSZ or VARHDRSZ_SHORT of its header.  Construct the copy with a
2453      * full-length header.
2454      */
2455     text     *new_t = (text *) palloc(VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
2456     SET_VARSIZE(new_t, VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
2457
2458     /*
2459      * VARDATA is a pointer to the data region of the new struct.  The source
2460      * could be a short datum, so retrieve its data through VARDATA_ANY.
2461      */
2462     memcpy(VARDATA(new_t),          /* destination */
2463            VARDATA_ANY(t),          /* source */
2464            VARSIZE_ANY_EXHDR(t));   /* how many bytes */
2465     PG_RETURN_TEXT_P(new_t);
2466 }
2467
2468 PG_FUNCTION_INFO_V1(concat_text);
2469
2470 Datum
2471 concat_text(PG_FUNCTION_ARGS)
2472 {
2473     text  *arg1 = PG_GETARG_TEXT_PP(0);
2474     text  *arg2 = PG_GETARG_TEXT_PP(1);
2475     int32 arg1_size = VARSIZE_ANY_EXHDR(arg1);
2476     int32 arg2_size = VARSIZE_ANY_EXHDR(arg2);
2477     int32 new_text_size = arg1_size + arg2_size + VARHDRSZ;
2478     text *new_text = (text *) palloc(new_text_size);
2479
2480     SET_VARSIZE(new_text, new_text_size);
2481     memcpy(VARDATA(new_text), VARDATA_ANY(arg1), arg1_size);
2482     memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size);
2483     PG_RETURN_TEXT_P(new_text);
2484 }
2485 ]]>
2486 </programlisting>
2487
2488     <para>
2489      Supposing that the above code has been prepared in file
2490      <filename>funcs.c</filename> and compiled into a shared object,
2491      we could define the functions to <productname>PostgreSQL</productname>
2492      with commands like this:
2493     </para>
2494
2495 <programlisting>
2496 CREATE FUNCTION add_one(integer) RETURNS integer
2497      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one'
2498      LANGUAGE C STRICT;
2499
2500 -- note overloading of SQL function name "add_one"
2501 CREATE FUNCTION add_one(double precision) RETURNS double precision
2502      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one_float8'
2503      LANGUAGE C STRICT;
2504
2505 CREATE FUNCTION makepoint(point, point) RETURNS point
2506      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'makepoint'
2507      LANGUAGE C STRICT;
2508
2509 CREATE FUNCTION copytext(text) RETURNS text
2510      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'copytext'
2511      LANGUAGE C STRICT;
2512
2513 CREATE FUNCTION concat_text(text, text) RETURNS text
2514      AS '<replaceable>DIRECTORY</replaceable>/funcs', 'concat_text'
2515      LANGUAGE C STRICT;
2516 </programlisting>
2517
2518     <para>
2519      Here, <replaceable>DIRECTORY</replaceable> stands for the
2520      directory of the shared library file (for instance the
2521      <productname>PostgreSQL</productname> tutorial directory, which
2522      contains the code for the examples used in this section).
2523      (Better style would be to use just <literal>'funcs'</literal> in the
2524      <literal>AS</literal> clause, after having added
2525      <replaceable>DIRECTORY</replaceable> to the search path.  In any
2526      case, we can omit the system-specific extension for a shared
2527      library, commonly <literal>.so</literal>.)
2528     </para>
2529
2530     <para>
2531      Notice that we have specified the functions as <quote>strict</quote>,
2532      meaning that
2533      the system should automatically assume a null result if any input
2534      value is null.  By doing this, we avoid having to check for null inputs
2535      in the function code.  Without this, we'd have to check for null values
2536      explicitly, using <function>PG_ARGISNULL()</function>.
2537     </para>
2538
2539     <para>
2540      The macro <function>PG_ARGISNULL(<replaceable>n</replaceable>)</function>
2541      allows a function to test whether each input is null.  (Of course, doing
2542      this is only necessary in functions not declared <quote>strict</quote>.)
2543      As with the
2544      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
2545      the input arguments are counted beginning at zero.  Note that one
2546      should refrain from executing
2547      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
2548      one has verified that the argument isn't null.
2549      To return a null result, execute <function>PG_RETURN_NULL()</function>;
2550      this works in both strict and nonstrict functions.
2551     </para>
2552
2553     <para>
2554      At first glance, the version-1 coding conventions might appear
2555      to be just pointless obscurantism, compared to using
2556      plain <literal>C</literal> calling conventions.  They do however allow
2557      us to deal with <literal>NULL</literal>able arguments/return values,
2558      and <quote>toasted</quote> (compressed or out-of-line) values.
2559     </para>
2560
2561     <para>
2562      Other options provided by the version-1 interface are two
2563      variants of the
2564      <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2565      macros. The first of these,
2566      <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>,
2567      guarantees to return a copy of the specified argument that is
2568      safe for writing into. (The normal macros will sometimes return a
2569      pointer to a value that is physically stored in a table, which
2570      must not be written to. Using the
2571      <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
2572      macros guarantees a writable result.)
2573     The second variant consists of the
2574     <function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function>
2575     macros which take three arguments. The first is the number of the
2576     function argument (as above). The second and third are the offset and
2577     length of the segment to be returned. Offsets are counted from
2578     zero, and a negative length requests that the remainder of the
2579     value be returned. These macros provide more efficient access to
2580     parts of large values in the case where they have storage type
2581     <quote>external</quote>. (The storage type of a column can be specified using
2582     <literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER
2583     COLUMN <replaceable>colname</replaceable> SET STORAGE
2584     <replaceable>storagetype</replaceable></literal>. <replaceable>storagetype</replaceable> is one of
2585     <literal>plain</literal>, <literal>external</literal>, <literal>extended</literal>,
2586      or <literal>main</literal>.)
2587     </para>
2588
2589     <para>
2590      Finally, the version-1 function call conventions make it possible
2591      to return set results (<xref linkend="xfunc-c-return-set"/>) and
2592      implement trigger functions (<xref linkend="triggers"/>) and
2593      procedural-language call handlers (<xref
2594      linkend="plhandler"/>).  For more details
2595      see <filename>src/backend/utils/fmgr/README</filename> in the
2596      source distribution.
2597     </para>
2598    </sect2>
2599
2600    <sect2 id="xfunc-c-code">
2601     <title>Writing Code</title>
2602
2603     <para>
2604      Before we turn to the more advanced topics, we should discuss
2605      some coding rules for <productname>PostgreSQL</productname>
2606      C-language functions.  While it might be possible to load functions
2607      written in languages other than C into
2608      <productname>PostgreSQL</productname>, this is usually difficult
2609      (when it is possible at all) because other languages, such as
2610      C++, FORTRAN, or Pascal often do not follow the same calling
2611      convention as C.  That is, other languages do not pass argument
2612      and return values between functions in the same way.  For this
2613      reason, we will assume that your C-language functions are
2614      actually written in C.
2615     </para>
2616
2617     <para>
2618      The basic rules for writing and building C functions are as follows:
2619
2620      <itemizedlist>
2621       <listitem>
2622        <para>
2623         Use <literal>pg_config
2624         --includedir-server</literal><indexterm><primary>pg_config</primary><secondary>with user-defined C functions</secondary></indexterm>
2625         to find out where the <productname>PostgreSQL</productname> server header
2626         files are installed on your system (or the system that your
2627         users will be running on).
2628        </para>
2629       </listitem>
2630
2631       <listitem>
2632        <para>
2633         Compiling and linking your code so that it can be dynamically
2634         loaded into <productname>PostgreSQL</productname> always
2635         requires special flags.  See <xref linkend="dfunc"/> for a
2636         detailed explanation of how to do it for your particular
2637         operating system.
2638        </para>
2639       </listitem>
2640
2641       <listitem>
2642        <para>
2643         Remember to define a <quote>magic block</quote> for your shared library,
2644         as described in <xref linkend="xfunc-c-dynload"/>.
2645        </para>
2646       </listitem>
2647
2648       <listitem>
2649        <para>
2650         When allocating memory, use the
2651         <productname>PostgreSQL</productname> functions
2652         <function>palloc</function><indexterm><primary>palloc</primary></indexterm> and <function>pfree</function><indexterm><primary>pfree</primary></indexterm>
2653         instead of the corresponding C library functions
2654         <function>malloc</function> and <function>free</function>.
2655         The memory allocated by <function>palloc</function> will be
2656         freed automatically at the end of each transaction, preventing
2657         memory leaks.
2658        </para>
2659       </listitem>
2660
2661       <listitem>
2662        <para>
2663         Always zero the bytes of your structures using <function>memset</function>
2664         (or allocate them with <function>palloc0</function> in the first place).
2665         Even if you assign to each field of your structure, there might be
2666         alignment padding (holes in the structure) that contain
2667         garbage values.  Without this, it's difficult to
2668         support hash indexes or hash joins, as you must pick out only
2669         the significant bits of your data structure to compute a hash.
2670         The planner also sometimes relies on comparing constants via
2671         bitwise equality, so you can get undesirable planning results if
2672         logically-equivalent values aren't bitwise equal.
2673        </para>
2674       </listitem>
2675
2676       <listitem>
2677        <para>
2678         Most of the internal <productname>PostgreSQL</productname>
2679         types are declared in <filename>postgres.h</filename>, while
2680         the function manager interfaces
2681         (<symbol>PG_FUNCTION_ARGS</symbol>, etc.)  are in
2682         <filename>fmgr.h</filename>, so you will need to include at
2683         least these two files.  For portability reasons it's best to
2684         include <filename>postgres.h</filename> <emphasis>first</emphasis>,
2685         before any other system or user header files.  Including
2686         <filename>postgres.h</filename> will also include
2687         <filename>elog.h</filename> and <filename>palloc.h</filename>
2688         for you.
2689        </para>
2690       </listitem>
2691
2692       <listitem>
2693        <para>
2694         Symbol names defined within object files must not conflict
2695         with each other or with symbols defined in the
2696         <productname>PostgreSQL</productname> server executable.  You
2697         will have to rename your functions or variables if you get
2698         error messages to this effect.
2699        </para>
2700       </listitem>
2701      </itemizedlist>
2702     </para>
2703    </sect2>
2704
2705 &dfunc;
2706
2707    <sect2 id="xfunc-api-abi-stability-guidance">
2708     <title>Server API and ABI Stability Guidance</title>
2709
2710     <para>
2711      This section contains guidance to authors of extensions and other server
2712      plugins about API and ABI stability in the
2713      <productname>PostgreSQL</productname> server.
2714     </para>
2715
2716     <sect3 id="xfunc-guidance-general">
2717      <title>General</title>
2718
2719      <para>
2720       The <productname>PostgreSQL</productname> server contains several
2721       well-demarcated APIs for server plugins, such as the function manager
2722       (<acronym>fmgr</acronym>, described in this chapter),
2723       <acronym>SPI</acronym> (<xref linkend="spi"/>), and various hooks
2724       specifically designed for extensions. These interfaces are carefully
2725       managed for long-term stability and compatibility. However, the entire
2726       set of global functions and variables in the server effectively
2727       constitutes the publicly usable API, and most of it was not designed
2728       with extensibility and long-term stability in mind.
2729      </para>
2730
2731      <para>
2732       Therefore, while taking advantage of these interfaces is valid, the
2733       further one strays from the well-trodden path, the likelier it will be
2734       that one might encounter API or ABI compatibility issues at some point.
2735       Extension authors are encouraged to provide feedback about their
2736       requirements, so that over time, as new use patterns arise, certain
2737       interfaces can be considered more stabilized or new, better-designed
2738       interfaces can be added.
2739      </para>
2740     </sect3>
2741
2742     <sect3 id="xfunc-guidance-api-compatibility">
2743      <title>API Compatibility</title>
2744      <para>
2745       The <acronym>API</acronym>, or application programming interface, is the
2746       interface used at compile time.
2747      </para>
2748
2749      <sect4 id="xfunc-guidance-api-major-versions">
2750       <title>Major Versions</title>
2751       <para>
2752        There is <emphasis>no</emphasis> promise of API compatibility between
2753        <productname>PostgreSQL</productname> major versions. Extension code
2754        therefore might require source code changes to work with multiple major
2755        versions. These can usually be managed with preprocessor conditions
2756        such as <literal>#if PG_VERSION_NUM &gt;= 160000</literal>.
2757        Sophisticated extensions that use interfaces beyond the well-demarcated
2758        ones usually require a few such changes for each major server version.
2759       </para>
2760      </sect4>
2761
2762      <sect4 id="xfunc-guidance-api-mninor-versions">
2763       <title>Minor Versions</title>
2764       <para>
2765        <productname>PostgreSQL</productname> makes an effort to avoid server
2766        API breaks in minor releases. In general, extension code that compiles
2767        and works with a minor release should also compile and work with any
2768        other minor release of the same major version, past or future.
2769       </para>
2770
2771       <para>
2772        When a change <emphasis>is</emphasis> required, it will be carefully
2773        managed, taking the requirements of extensions into account. Such
2774        changes will be communicated in the release notes (<xref
2775        linkend="release"/>).
2776       </para>
2777      </sect4>
2778     </sect3>
2779
2780     <sect3 id="xfunc-guidance-abi-compatibility">
2781      <title>ABI Compatibility</title>
2782       <para>
2783        The <acronym>ABI</acronym>, or application binary interface, is the
2784        interface used at run time.
2785       </para>
2786
2787      <sect4 id="xfunc-guidance-abi-major-versions">
2788       <title>Major Versions</title>
2789       <para>
2790        Servers of different major versions have intentionally incompatible
2791        ABIs. Extensions that use server APIs must therefore be re-compiled for
2792        each major release. The inclusion of <literal>PG_MODULE_MAGIC</literal>
2793        (see <xref linkend="xfunc-c-dynload"/>) ensures that code compiled for
2794        one major version will be rejected by other major versions.
2795       </para>
2796      </sect4>
2797
2798      <sect4 id="xfunc-guidance-abi-mninor-versions">
2799       <title>Minor Versions</title>
2800       <para>
2801        <productname>PostgreSQL</productname> makes an effort to avoid server
2802        ABI breaks in minor releases. In general, an extension compiled against
2803        any minor release should work with any other minor release of the same
2804        major version, past or future.
2805       </para>
2806
2807       <para>
2808        When a change <emphasis>is</emphasis> required,
2809        <productname>PostgreSQL</productname> will choose the least invasive
2810        change possible, for example by squeezing a new field into padding
2811        space or appending it to the end of a struct. These sorts of changes
2812        should not impact extensions unless they use very unusual code
2813        patterns.
2814       </para>
2815
2816       <para>
2817        In rare cases, however, even such non-invasive changes may be
2818        impractical or impossible. In such an event, the change will be
2819        carefully managed, taking the requirements of extensions into account.
2820        Such changes will also be documented in the release notes (<xref
2821        linkend="release"/>).
2822       </para>
2823
2824       <para>
2825        Note, however, that many parts of the server are not designed or
2826        maintained as publicly-consumable APIs (and that, in most cases, the
2827        actual boundary is also not well-defined). If urgent needs arise,
2828        changes in those parts will naturally be made with less consideration
2829        for extension code than changes in well-defined and widely used
2830        interfaces.
2831       </para>
2832
2833       <para>
2834        Also, in the absence of automated detection of such changes, this is
2835        not a guarantee, but historically such breaking changes have been
2836        extremely rare.
2837       </para>
2838
2839      </sect4>
2840     </sect3>
2841   </sect2>
2842
2843    <sect2 id="xfunc-c-composite-type-args">
2844     <title>Composite-Type Arguments</title>
2845
2846     <para>
2847      Composite types do not have a fixed layout like C structures.
2848      Instances of a composite type can contain null fields.  In
2849      addition, composite types that are part of an inheritance
2850      hierarchy can have different fields than other members of the
2851      same inheritance hierarchy.  Therefore,
2852      <productname>PostgreSQL</productname> provides a function
2853      interface for accessing fields of composite types from C.
2854     </para>
2855
2856     <para>
2857      Suppose we want to write a function to answer the query:
2858
2859 <programlisting>
2860 SELECT name, c_overpaid(emp, 1500) AS overpaid
2861     FROM emp
2862     WHERE name = 'Bill' OR name = 'Sam';
2863 </programlisting>
2864
2865      Using the version-1 calling conventions, we can define
2866      <function>c_overpaid</function> as:
2867
2868 <programlisting><![CDATA[
2869 #include "postgres.h"
2870 #include "executor/executor.h"  /* for GetAttributeByName() */
2871
2872 PG_MODULE_MAGIC;
2873
2874 PG_FUNCTION_INFO_V1(c_overpaid);
2875
2876 Datum
2877 c_overpaid(PG_FUNCTION_ARGS)
2878 {
2879     HeapTupleHeader  t = PG_GETARG_HEAPTUPLEHEADER(0);
2880     int32            limit = PG_GETARG_INT32(1);
2881     bool isnull;
2882     Datum salary;
2883
2884     salary = GetAttributeByName(t, "salary", &isnull);
2885     if (isnull)
2886         PG_RETURN_BOOL(false);
2887     /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
2888
2889     PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
2890 }
2891 ]]>
2892 </programlisting>
2893     </para>
2894
2895     <para>
2896      <function>GetAttributeByName</function> is the
2897      <productname>PostgreSQL</productname> system function that
2898      returns attributes out of the specified row.  It has
2899      three arguments: the argument of type <type>HeapTupleHeader</type> passed
2900      into
2901      the  function, the name of the desired attribute, and a
2902      return parameter that tells whether  the  attribute
2903      is  null.   <function>GetAttributeByName</function> returns a <type>Datum</type>
2904      value that you can convert to the proper data type by using the
2905      appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function>
2906      function.  Note that the return value is meaningless if the null flag is
2907      set; always check the null flag before trying to do anything with the
2908      result.
2909     </para>
2910
2911     <para>
2912      There is also <function>GetAttributeByNum</function>, which selects
2913      the target attribute by column number instead of name.
2914     </para>
2915
2916     <para>
2917      The following command declares the function
2918      <function>c_overpaid</function> in SQL:
2919
2920 <programlisting>
2921 CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
2922     AS '<replaceable>DIRECTORY</replaceable>/funcs', 'c_overpaid'
2923     LANGUAGE C STRICT;
2924 </programlisting>
2925
2926      Notice we have used <literal>STRICT</literal> so that we did not have to
2927      check whether the input arguments were NULL.
2928     </para>
2929    </sect2>
2930
2931    <sect2 id="xfunc-c-returning-rows">
2932     <title>Returning Rows (Composite Types)</title>
2933
2934     <para>
2935      To return a row or composite-type value from a C-language
2936      function, you can use a special API that provides macros and
2937      functions to hide most of the complexity of building composite
2938      data types.  To use this API, the source file must include:
2939 <programlisting>
2940 #include "funcapi.h"
2941 </programlisting>
2942     </para>
2943
2944     <para>
2945      There are two ways you can build a composite data value (henceforth
2946      a <quote>tuple</quote>): you can build it from an array of Datum values,
2947      or from an array of C strings that can be passed to the input
2948      conversion functions of the tuple's column data types.  In either
2949      case, you first need to obtain or construct a <structname>TupleDesc</structname>
2950      descriptor for the tuple structure.  When working with Datums, you
2951      pass the <structname>TupleDesc</structname> to <function>BlessTupleDesc</function>,
2952      and then call <function>heap_form_tuple</function> for each row.  When working
2953      with C strings, you pass the <structname>TupleDesc</structname> to
2954      <function>TupleDescGetAttInMetadata</function>, and then call
2955      <function>BuildTupleFromCStrings</function> for each row.  In the case of a
2956      function returning a set of tuples, the setup steps can all be done
2957      once during the first call of the function.
2958     </para>
2959
2960     <para>
2961      Several helper functions are available for setting up the needed
2962      <structname>TupleDesc</structname>.  The recommended way to do this in most
2963      functions returning composite values is to call:
2964 <programlisting>
2965 TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
2966                                    Oid *resultTypeId,
2967                                    TupleDesc *resultTupleDesc)
2968 </programlisting>
2969      passing the same <literal>fcinfo</literal> struct passed to the calling function
2970      itself.  (This of course requires that you use the version-1
2971      calling conventions.)  <varname>resultTypeId</varname> can be specified
2972      as <literal>NULL</literal> or as the address of a local variable to receive the
2973      function's result type OID.  <varname>resultTupleDesc</varname> should be the
2974      address of a local <structname>TupleDesc</structname> variable.  Check that the
2975      result is <literal>TYPEFUNC_COMPOSITE</literal>; if so,
2976      <varname>resultTupleDesc</varname> has been filled with the needed
2977      <structname>TupleDesc</structname>.  (If it is not, you can report an error along
2978      the lines of <quote>function returning record called in context that
2979      cannot accept type record</quote>.)
2980     </para>
2981
2982     <tip>
2983      <para>
2984       <function>get_call_result_type</function> can resolve the actual type of a
2985       polymorphic function result; so it is useful in functions that return
2986       scalar polymorphic results, not only functions that return composites.
2987       The <varname>resultTypeId</varname> output is primarily useful for functions
2988       returning polymorphic scalars.
2989      </para>
2990     </tip>
2991
2992     <note>
2993      <para>
2994       <function>get_call_result_type</function> has a sibling
2995       <function>get_expr_result_type</function>, which can be used to resolve the
2996       expected output type for a function call represented by an expression
2997       tree.  This can be used when trying to determine the result type from
2998       outside the function itself.  There is also
2999       <function>get_func_result_type</function>, which can be used when only the
3000       function's OID is available.  However these functions are not able
3001       to deal with functions declared to return <structname>record</structname>, and
3002       <function>get_func_result_type</function> cannot resolve polymorphic types,
3003       so you should preferentially use <function>get_call_result_type</function>.
3004      </para>
3005     </note>
3006
3007     <para>
3008      Older, now-deprecated functions for obtaining
3009      <structname>TupleDesc</structname>s are:
3010 <programlisting>
3011 TupleDesc RelationNameGetTupleDesc(const char *relname)
3012 </programlisting>
3013      to get a <structname>TupleDesc</structname> for the row type of a named relation,
3014      and:
3015 <programlisting>
3016 TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
3017 </programlisting>
3018      to get a <structname>TupleDesc</structname> based on a type OID. This can
3019      be used to get a <structname>TupleDesc</structname> for a base or
3020      composite type.  It will not work for a function that returns
3021      <structname>record</structname>, however, and it cannot resolve polymorphic
3022      types.
3023     </para>
3024
3025     <para>
3026      Once you have a <structname>TupleDesc</structname>, call:
3027 <programlisting>
3028 TupleDesc BlessTupleDesc(TupleDesc tupdesc)
3029 </programlisting>
3030      if you plan to work with Datums, or:
3031 <programlisting>
3032 AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
3033 </programlisting>
3034      if you plan to work with C strings.  If you are writing a function
3035      returning set, you can save the results of these functions in the
3036      <structname>FuncCallContext</structname> structure &mdash; use the
3037      <structfield>tuple_desc</structfield> or <structfield>attinmeta</structfield> field
3038      respectively.
3039     </para>
3040
3041     <para>
3042      When working with Datums, use:
3043 <programlisting>
3044 HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)
3045 </programlisting>
3046      to build a <structname>HeapTuple</structname> given user data in Datum form.
3047     </para>
3048
3049     <para>
3050      When working with C strings, use:
3051 <programlisting>
3052 HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
3053 </programlisting>
3054      to build a <structname>HeapTuple</structname> given user data
3055      in C string form.  <parameter>values</parameter> is an array of C strings,
3056      one for each attribute of the return row. Each C string should be in
3057      the form expected by the input function of the attribute data
3058      type. In order to return a null value for one of the attributes,
3059      the corresponding pointer in the <parameter>values</parameter> array
3060      should be set to <symbol>NULL</symbol>.  This function will need to
3061      be called again for each row you return.
3062     </para>
3063
3064     <para>
3065      Once you have built a tuple to return from your function, it
3066      must be converted into a <type>Datum</type>. Use:
3067 <programlisting>
3068 HeapTupleGetDatum(HeapTuple tuple)
3069 </programlisting>
3070      to convert a <structname>HeapTuple</structname> into a valid Datum.  This
3071      <type>Datum</type> can be returned directly if you intend to return
3072      just a single row, or it can be used as the current return value
3073      in a set-returning function.
3074     </para>
3075
3076     <para>
3077      An example appears in the next section.
3078     </para>
3079
3080    </sect2>
3081
3082    <sect2 id="xfunc-c-return-set">
3083     <title>Returning Sets</title>
3084
3085     <para>
3086      C-language functions have two options for returning sets (multiple
3087      rows).  In one method, called <firstterm>ValuePerCall</firstterm>
3088      mode, a set-returning function is called repeatedly (passing the same
3089      arguments each time) and it returns one new row on each call, until
3090      it has no more rows to return and signals that by returning NULL.
3091      The set-returning function (<acronym>SRF</acronym>) must therefore
3092      save enough state across calls to remember what it was doing and
3093      return the correct next item on each call.
3094      In the other method, called <firstterm>Materialize</firstterm> mode,
3095      an SRF fills and returns a tuplestore object containing its
3096      entire result; then only one call occurs for the whole result, and
3097      no inter-call state is needed.
3098     </para>
3099
3100     <para>
3101      When using ValuePerCall mode, it is important to remember that the
3102      query is not guaranteed to be run to completion; that is, due to
3103      options such as <literal>LIMIT</literal>, the executor might stop
3104      making calls to the set-returning function before all rows have been
3105      fetched.  This means it is not safe to perform cleanup activities in
3106      the last call, because that might not ever happen.  It's recommended
3107      to use Materialize mode for functions that need access to external
3108      resources, such as file descriptors.
3109     </para>
3110
3111     <para>
3112      The remainder of this section documents a set of helper macros that
3113      are commonly used (though not required to be used) for SRFs using
3114      ValuePerCall mode.  Additional details about Materialize mode can be
3115      found in <filename>src/backend/utils/fmgr/README</filename>.  Also,
3116      the <filename>contrib</filename> modules in
3117      the <productname>PostgreSQL</productname> source distribution contain
3118      many examples of SRFs using both ValuePerCall and Materialize mode.
3119     </para>
3120
3121     <para>
3122      To use the ValuePerCall support macros described here,
3123      include <filename>funcapi.h</filename>.  These macros work with a
3124      structure <structname>FuncCallContext</structname> that contains the
3125      state that needs to be saved across calls.  Within the calling
3126      SRF, <literal>fcinfo-&gt;flinfo-&gt;fn_extra</literal> is used to
3127      hold a pointer to <structname>FuncCallContext</structname> across
3128      calls.  The macros automatically fill that field on first use,
3129      and expect to find the same pointer there on subsequent uses.
3130 <programlisting>
3131 typedef struct FuncCallContext
3132 {
3133     /*
3134      * Number of times we've been called before
3135      *
3136      * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
3137      * incremented for you every time SRF_RETURN_NEXT() is called.
3138      */
3139     uint64 call_cntr;
3140
3141     /*
3142      * OPTIONAL maximum number of calls
3143      *
3144      * max_calls is here for convenience only and setting it is optional.
3145      * If not set, you must provide alternative means to know when the
3146      * function is done.
3147      */
3148     uint64 max_calls;
3149
3150     /*
3151      * OPTIONAL pointer to miscellaneous user-provided context information
3152      *
3153      * user_fctx is for use as a pointer to your own data to retain
3154      * arbitrary context information between calls of your function.
3155      */
3156     void *user_fctx;
3157
3158     /*
3159      * OPTIONAL pointer to struct containing attribute type input metadata
3160      *
3161      * attinmeta is for use when returning tuples (i.e., composite data types)
3162      * and is not used when returning base data types. It is only needed
3163      * if you intend to use BuildTupleFromCStrings() to create the return
3164      * tuple.
3165      */
3166     AttInMetadata *attinmeta;
3167
3168     /*
3169      * memory context used for structures that must live for multiple calls
3170      *
3171      * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
3172      * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
3173      * context for any memory that is to be reused across multiple calls
3174      * of the SRF.
3175      */
3176     MemoryContext multi_call_memory_ctx;
3177
3178     /*
3179      * OPTIONAL pointer to struct containing tuple description
3180      *
3181      * tuple_desc is for use when returning tuples (i.e., composite data types)
3182      * and is only needed if you are going to build the tuples with
3183      * heap_form_tuple() rather than with BuildTupleFromCStrings().  Note that
3184      * the TupleDesc pointer stored here should usually have been run through
3185      * BlessTupleDesc() first.
3186      */
3187     TupleDesc tuple_desc;
3188
3189 } FuncCallContext;
3190 </programlisting>
3191     </para>
3192
3193     <para>
3194      The macros to be used by an <acronym>SRF</acronym> using this
3195      infrastructure are:
3196 <programlisting>
3197 SRF_IS_FIRSTCALL()
3198 </programlisting>
3199      Use this to determine if your function is being called for the first or a
3200      subsequent time. On the first call (only), call:
3201 <programlisting>
3202 SRF_FIRSTCALL_INIT()
3203 </programlisting>
3204      to initialize the <structname>FuncCallContext</structname>. On every function call,
3205      including the first, call:
3206 <programlisting>
3207 SRF_PERCALL_SETUP()
3208 </programlisting>
3209      to set up for using the <structname>FuncCallContext</structname>.
3210     </para>
3211
3212     <para>
3213      If your function has data to return in the current call, use:
3214 <programlisting>
3215 SRF_RETURN_NEXT(funcctx, result)
3216 </programlisting>
3217      to return it to the caller.  (<literal>result</literal> must be of type
3218      <type>Datum</type>, either a single value or a tuple prepared as
3219      described above.)  Finally, when your function is finished
3220      returning data, use:
3221 <programlisting>
3222 SRF_RETURN_DONE(funcctx)
3223 </programlisting>
3224      to clean up and end the <acronym>SRF</acronym>.
3225     </para>
3226
3227     <para>
3228      The memory context that is current when the <acronym>SRF</acronym> is called is
3229      a transient context that will be cleared between calls.  This means
3230      that you do not need to call <function>pfree</function> on everything
3231      you allocated using <function>palloc</function>; it will go away anyway.  However, if you want to allocate
3232      any data structures to live across calls, you need to put them somewhere
3233      else.  The memory context referenced by
3234      <structfield>multi_call_memory_ctx</structfield> is a suitable location for any
3235      data that needs to survive until the <acronym>SRF</acronym> is finished running.  In most
3236      cases, this means that you should switch into
3237      <structfield>multi_call_memory_ctx</structfield> while doing the
3238      first-call setup.
3239      Use <literal>funcctx-&gt;user_fctx</literal> to hold a pointer to
3240      any such cross-call data structures.
3241      (Data you allocate
3242      in <structfield>multi_call_memory_ctx</structfield> will go away
3243      automatically when the query ends, so it is not necessary to free
3244      that data manually, either.)
3245     </para>
3246
3247     <warning>
3248      <para>
3249       While the actual arguments to the function remain unchanged between
3250       calls, if you detoast the argument values (which is normally done
3251       transparently by the
3252       <function>PG_GETARG_<replaceable>xxx</replaceable></function> macro)
3253       in the transient context then the detoasted copies will be freed on
3254       each cycle. Accordingly, if you keep references to such values in
3255       your <structfield>user_fctx</structfield>, you must either copy them into the
3256       <structfield>multi_call_memory_ctx</structfield> after detoasting, or ensure
3257       that you detoast the values only in that context.
3258      </para>
3259     </warning>
3260
3261     <para>
3262      A complete pseudo-code example looks like the following:
3263 <programlisting>
3264 Datum
3265 my_set_returning_function(PG_FUNCTION_ARGS)
3266 {
3267     FuncCallContext  *funcctx;
3268     Datum             result;
3269     <replaceable>further declarations as needed</replaceable>
3270
3271     if (SRF_IS_FIRSTCALL())
3272     {
3273         MemoryContext oldcontext;
3274
3275         funcctx = SRF_FIRSTCALL_INIT();
3276         oldcontext = MemoryContextSwitchTo(funcctx-&gt;multi_call_memory_ctx);
3277         /* One-time setup code appears here: */
3278         <replaceable>user code</replaceable>
3279         <replaceable>if returning composite</replaceable>
3280             <replaceable>build TupleDesc, and perhaps AttInMetadata</replaceable>
3281         <replaceable>endif returning composite</replaceable>
3282         <replaceable>user code</replaceable>
3283         MemoryContextSwitchTo(oldcontext);
3284     }
3285
3286     /* Each-time setup code appears here: */
3287     <replaceable>user code</replaceable>
3288     funcctx = SRF_PERCALL_SETUP();
3289     <replaceable>user code</replaceable>
3290
3291     /* this is just one way we might test whether we are done: */
3292     if (funcctx-&gt;call_cntr &lt; funcctx-&gt;max_calls)
3293     {
3294         /* Here we want to return another item: */
3295         <replaceable>user code</replaceable>
3296         <replaceable>obtain result Datum</replaceable>
3297         SRF_RETURN_NEXT(funcctx, result);
3298     }
3299     else
3300     {
3301         /* Here we are done returning items, so just report that fact. */
3302         /* (Resist the temptation to put cleanup code here.) */
3303         SRF_RETURN_DONE(funcctx);
3304     }
3305 }
3306 </programlisting>
3307     </para>
3308
3309     <para>
3310      A complete example of a simple <acronym>SRF</acronym> returning a composite type
3311      looks like:
3312 <programlisting><![CDATA[
3313 PG_FUNCTION_INFO_V1(retcomposite);
3314
3315 Datum
3316 retcomposite(PG_FUNCTION_ARGS)
3317 {
3318     FuncCallContext     *funcctx;
3319     int                  call_cntr;
3320     int                  max_calls;
3321     TupleDesc            tupdesc;
3322     AttInMetadata       *attinmeta;
3323
3324     /* stuff done only on the first call of the function */
3325     if (SRF_IS_FIRSTCALL())
3326     {
3327         MemoryContext   oldcontext;
3328
3329         /* create a function context for cross-call persistence */
3330         funcctx = SRF_FIRSTCALL_INIT();
3331
3332         /* switch to memory context appropriate for multiple function calls */
3333         oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
3334
3335         /* total number of tuples to be returned */
3336         funcctx->max_calls = PG_GETARG_INT32(0);
3337
3338         /* Build a tuple descriptor for our result type */
3339         if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
3340             ereport(ERROR,
3341                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
3342                      errmsg("function returning record called in context "
3343                             "that cannot accept type record")));
3344
3345         /*
3346          * generate attribute metadata needed later to produce tuples from raw
3347          * C strings
3348          */
3349         attinmeta = TupleDescGetAttInMetadata(tupdesc);
3350         funcctx->attinmeta = attinmeta;
3351
3352         MemoryContextSwitchTo(oldcontext);
3353     }
3354
3355     /* stuff done on every call of the function */
3356     funcctx = SRF_PERCALL_SETUP();
3357
3358     call_cntr = funcctx->call_cntr;
3359     max_calls = funcctx->max_calls;
3360     attinmeta = funcctx->attinmeta;
3361
3362     if (call_cntr < max_calls)    /* do when there is more left to send */
3363     {
3364         char       **values;
3365         HeapTuple    tuple;
3366         Datum        result;
3367
3368         /*
3369          * Prepare a values array for building the returned tuple.
3370          * This should be an array of C strings which will
3371          * be processed later by the type input functions.
3372          */
3373         values = (char **) palloc(3 * sizeof(char *));
3374         values[0] = (char *) palloc(16 * sizeof(char));
3375         values[1] = (char *) palloc(16 * sizeof(char));
3376         values[2] = (char *) palloc(16 * sizeof(char));
3377
3378         snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
3379         snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
3380         snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
3381
3382         /* build a tuple */
3383         tuple = BuildTupleFromCStrings(attinmeta, values);
3384
3385         /* make the tuple into a datum */
3386         result = HeapTupleGetDatum(tuple);
3387
3388         /* clean up (this is not really necessary) */
3389         pfree(values[0]);
3390         pfree(values[1]);
3391         pfree(values[2]);
3392         pfree(values);
3393
3394         SRF_RETURN_NEXT(funcctx, result);
3395     }
3396     else    /* do when there is no more left */
3397     {
3398         SRF_RETURN_DONE(funcctx);
3399     }
3400 }
3401 ]]>
3402 </programlisting>
3403
3404      One way to declare this function in SQL is:
3405 <programlisting>
3406 CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);
3407
3408 CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
3409     RETURNS SETOF __retcomposite
3410     AS '<replaceable>filename</replaceable>', 'retcomposite'
3411     LANGUAGE C IMMUTABLE STRICT;
3412 </programlisting>
3413      A different way is to use OUT parameters:
3414 <programlisting>
3415 CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
3416     OUT f1 integer, OUT f2 integer, OUT f3 integer)
3417     RETURNS SETOF record
3418     AS '<replaceable>filename</replaceable>', 'retcomposite'
3419     LANGUAGE C IMMUTABLE STRICT;
3420 </programlisting>
3421      Notice that in this method the output type of the function is formally
3422      an anonymous <structname>record</structname> type.
3423     </para>
3424    </sect2>
3425
3426    <sect2 id="xfunc-c-polymorphic">
3427     <title>Polymorphic Arguments and Return Types</title>
3428
3429     <para>
3430      C-language functions can be declared to accept and
3431      return the polymorphic types described in <xref
3432      linkend="extend-types-polymorphic"/>.
3433      When a function's arguments or return types
3434      are defined as polymorphic types, the function author cannot know
3435      in advance what data type it will be called with, or
3436      need to return. There are two routines provided in <filename>fmgr.h</filename>
3437      to allow a version-1 C function to discover the actual data types
3438      of its arguments and the type it is expected to return. The routines are
3439      called <literal>get_fn_expr_rettype(FmgrInfo *flinfo)</literal> and
3440      <literal>get_fn_expr_argtype(FmgrInfo *flinfo, int argnum)</literal>.
3441      They return the result or argument type OID, or <symbol>InvalidOid</symbol> if the
3442      information is not available.
3443      The structure <literal>flinfo</literal> is normally accessed as
3444      <literal>fcinfo-&gt;flinfo</literal>. The parameter <literal>argnum</literal>
3445      is zero based.  <function>get_call_result_type</function> can also be used
3446      as an alternative to <function>get_fn_expr_rettype</function>.
3447      There is also <function>get_fn_expr_variadic</function>, which can be used to
3448      find out whether variadic arguments have been merged into an array.
3449      This is primarily useful for <literal>VARIADIC "any"</literal> functions,
3450      since such merging will always have occurred for variadic functions
3451      taking ordinary array types.
3452     </para>
3453
3454     <para>
3455      For example, suppose we want to write a function to accept a single
3456      element of any type, and return a one-dimensional array of that type:
3457
3458 <programlisting>
3459 PG_FUNCTION_INFO_V1(make_array);
3460 Datum
3461 make_array(PG_FUNCTION_ARGS)
3462 {
3463     ArrayType  *result;
3464     Oid         element_type = get_fn_expr_argtype(fcinfo-&gt;flinfo, 0);
3465     Datum       element;
3466     bool        isnull;
3467     int16       typlen;
3468     bool        typbyval;
3469     char        typalign;
3470     int         ndims;
3471     int         dims[MAXDIM];
3472     int         lbs[MAXDIM];
3473
3474     if (!OidIsValid(element_type))
3475         elog(ERROR, "could not determine data type of input");
3476
3477     /* get the provided element, being careful in case it's NULL */
3478     isnull = PG_ARGISNULL(0);
3479     if (isnull)
3480         element = (Datum) 0;
3481     else
3482         element = PG_GETARG_DATUM(0);
3483
3484     /* we have one dimension */
3485     ndims = 1;
3486     /* and one element */
3487     dims[0] = 1;
3488     /* and lower bound is 1 */
3489     lbs[0] = 1;
3490
3491     /* get required info about the element type */
3492     get_typlenbyvalalign(element_type, &amp;typlen, &amp;typbyval, &amp;typalign);
3493
3494     /* now build the array */
3495     result = construct_md_array(&amp;element, &amp;isnull, ndims, dims, lbs,
3496                                 element_type, typlen, typbyval, typalign);
3497
3498     PG_RETURN_ARRAYTYPE_P(result);
3499 }
3500 </programlisting>
3501     </para>
3502
3503     <para>
3504      The following command declares the function
3505      <function>make_array</function> in SQL:
3506
3507 <programlisting>
3508 CREATE FUNCTION make_array(anyelement) RETURNS anyarray
3509     AS '<replaceable>DIRECTORY</replaceable>/funcs', 'make_array'
3510     LANGUAGE C IMMUTABLE;
3511 </programlisting>
3512     </para>
3513
3514     <para>
3515      There is a variant of polymorphism that is only available to C-language
3516      functions: they can be declared to take parameters of type
3517      <literal>"any"</literal>.  (Note that this type name must be double-quoted,
3518      since it's also an SQL reserved word.)  This works like
3519      <type>anyelement</type> except that it does not constrain different
3520      <literal>"any"</literal> arguments to be the same type, nor do they help
3521      determine the function's result type.  A C-language function can also
3522      declare its final parameter to be <literal>VARIADIC "any"</literal>.  This will
3523      match one or more actual arguments of any type (not necessarily the same
3524      type).  These arguments will <emphasis>not</emphasis> be gathered into an array
3525      as happens with normal variadic functions; they will just be passed to
3526      the function separately.  The <function>PG_NARGS()</function> macro and the
3527      methods described above must be used to determine the number of actual
3528      arguments and their types when using this feature.  Also, users of such
3529      a function might wish to use the <literal>VARIADIC</literal> keyword in their
3530      function call, with the expectation that the function would treat the
3531      array elements as separate arguments.  The function itself must implement
3532      that behavior if wanted, after using <function>get_fn_expr_variadic</function> to
3533      detect that the actual argument was marked with <literal>VARIADIC</literal>.
3534     </para>
3535    </sect2>
3536
3537    <sect2 id="xfunc-shared-addin">
3538     <title>Shared Memory</title>
3539
3540     <sect3 id="xfunc-shared-addin-at-startup">
3541      <title>Requesting Shared Memory at Startup</title>
3542
3543      <para>
3544       Add-ins can reserve shared memory on server startup.  To do so, the
3545       add-in's shared library must be preloaded by specifying it in
3546       <xref linkend="guc-shared-preload-libraries"/><indexterm><primary>shared_preload_libraries</primary></indexterm>.
3547       The shared library should also register a
3548       <literal>shmem_request_hook</literal> in its
3549       <function>_PG_init</function> function.  This
3550       <literal>shmem_request_hook</literal> can reserve shared memory by
3551       calling:
3552 <programlisting>
3553 void RequestAddinShmemSpace(Size size)
3554 </programlisting>
3555       Each backend should obtain a pointer to the reserved shared memory by
3556       calling:
3557 <programlisting>
3558 void *ShmemInitStruct(const char *name, Size size, bool *foundPtr)
3559 </programlisting>
3560       If this function sets <literal>foundPtr</literal> to
3561       <literal>false</literal>, the caller should proceed to initialize the
3562       contents of the reserved shared memory.  If <literal>foundPtr</literal>
3563       is set to <literal>true</literal>, the shared memory was already
3564       initialized by another backend, and the caller need not initialize
3565       further.
3566      </para>
3567
3568      <para>
3569       To avoid race conditions, each backend should use the LWLock
3570       <function>AddinShmemInitLock</function> when initializing its allocation
3571       of shared memory, as shown here:
3572 <programlisting>
3573 static mystruct *ptr = NULL;
3574 bool        found;
3575
3576 LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
3577 ptr = ShmemInitStruct("my struct name", size, &amp;found);
3578 if (!found)
3579 {
3580     ... initialize contents of shared memory ...
3581     ptr->locks = GetNamedLWLockTranche("my tranche name");
3582 }
3583 LWLockRelease(AddinShmemInitLock);
3584 </programlisting>
3585       <literal>shmem_startup_hook</literal> provides a convenient place for the
3586       initialization code, but it is not strictly required that all such code
3587       be placed in this hook.  Each backend will execute the registered
3588       <literal>shmem_startup_hook</literal> shortly after it attaches to shared
3589       memory.  Note that add-ins should still acquire
3590       <function>AddinShmemInitLock</function> within this hook, as shown in the
3591       example above.
3592      </para>
3593
3594      <para>
3595       An example of a <literal>shmem_request_hook</literal> and
3596       <literal>shmem_startup_hook</literal> can be found in
3597       <filename>contrib/pg_stat_statements/pg_stat_statements.c</filename> in
3598       the <productname>PostgreSQL</productname> source tree.
3599      </para>
3600     </sect3>
3601
3602     <sect3 id="xfunc-shared-addin-after-startup">
3603      <title>Requesting Shared Memory After Startup</title>
3604
3605      <para>
3606       There is another, more flexible method of reserving shared memory that
3607       can be done after server startup and outside a
3608       <literal>shmem_request_hook</literal>.  To do so, each backend that will
3609       use the shared memory should obtain a pointer to it by calling:
3610 <programlisting>
3611 void *GetNamedDSMSegment(const char *name, size_t size,
3612                          void (*init_callback) (void *ptr),
3613                          bool *found)
3614 </programlisting>
3615       If a dynamic shared memory segment with the given name does not yet
3616       exist, this function will allocate it and initialize it with the provided
3617       <function>init_callback</function> callback function.  If the segment has
3618       already been allocated and initialized by another backend, this function
3619       simply attaches the existing dynamic shared memory segment to the current
3620       backend.
3621      </para>
3622
3623      <para>
3624       Unlike shared memory reserved at server startup, there is no need to
3625       acquire <function>AddinShmemInitLock</function> or otherwise take action
3626       to avoid race conditions when reserving shared memory with
3627       <function>GetNamedDSMSegment</function>.  This function ensures that only
3628       one backend allocates and initializes the segment and that all other
3629       backends receive a pointer to the fully allocated and initialized
3630       segment.
3631      </para>
3632
3633      <para>
3634       A complete usage example of <function>GetNamedDSMSegment</function> can
3635       be found in
3636       <filename>src/test/modules/test_dsm_registry/test_dsm_registry.c</filename>
3637       in the <productname>PostgreSQL</productname> source tree.
3638      </para>
3639     </sect3>
3640    </sect2>
3641
3642    <sect2 id="xfunc-addin-lwlocks">
3643     <title>LWLocks</title>
3644
3645     <sect3 id="xfunc-addin-lwlocks-at-startup">
3646      <title>Requesting LWLocks at Startup</title>
3647
3648      <para>
3649       Add-ins can reserve LWLocks on server startup.  As with shared memory
3650       reserved at server startup, the add-in's shared library must be preloaded
3651       by specifying it in
3652       <xref linkend="guc-shared-preload-libraries"/><indexterm><primary>shared_preload_libraries</primary></indexterm>,
3653       and the shared library should register a
3654       <literal>shmem_request_hook</literal> in its
3655       <function>_PG_init</function> function.  This
3656       <literal>shmem_request_hook</literal> can reserve LWLocks by calling:
3657 <programlisting>
3658 void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
3659 </programlisting>
3660       This ensures that an array of <literal>num_lwlocks</literal> LWLocks is
3661       available under the name <literal>tranche_name</literal>.  A pointer to
3662       this array can be obtained by calling:
3663 <programlisting>
3664 LWLockPadded *GetNamedLWLockTranche(const char *tranche_name)
3665 </programlisting>
3666      </para>
3667     </sect3>
3668
3669     <sect3 id="xfunc-addin-lwlocks-after-startup">
3670      <title>Requesting LWLocks After Startup</title>
3671
3672      <para>
3673       There is another, more flexible method of obtaining LWLocks that can be
3674       done after server startup and outside a
3675       <literal>shmem_request_hook</literal>.  To do so, first allocate a
3676       <literal>tranche_id</literal> by calling:
3677 <programlisting>
3678 int LWLockNewTrancheId(void)
3679 </programlisting>
3680       Next, initialize each LWLock, passing the new
3681       <literal>tranche_id</literal> as an argument:
3682 <programlisting>
3683 void LWLockInitialize(LWLock *lock, int tranche_id)
3684 </programlisting>
3685       Similar to shared memory, each backend should ensure that only one
3686       process allocates a new <literal>tranche_id</literal> and initializes
3687       each new LWLock.  One way to do this is to only call these functions in
3688       your shared memory initialization code with the
3689       <function>AddinShmemInitLock</function> held exclusively.  If using
3690       <function>GetNamedDSMSegment</function>, calling these functions in the
3691       <function>init_callback</function> callback function is sufficient to
3692       avoid race conditions.
3693      </para>
3694
3695      <para>
3696       Finally, each backend using the <literal>tranche_id</literal> should
3697       associate it with a <literal>tranche_name</literal> by calling:
3698 <programlisting>
3699 void LWLockRegisterTranche(int tranche_id, const char *tranche_name)
3700 </programlisting>
3701      </para>
3702
3703      <para>
3704       A complete usage example of <function>LWLockNewTrancheId</function>,
3705       <function>LWLockInitialize</function>, and
3706       <function>LWLockRegisterTranche</function> can be found in
3707       <filename>contrib/pg_prewarm/autoprewarm.c</filename> in the
3708       <productname>PostgreSQL</productname> source tree.
3709      </para>
3710     </sect3>
3711    </sect2>
3712
3713    <sect2 id="xfunc-addin-wait-events">
3714     <title>Custom Wait Events</title>
3715
3716     <para>
3717      Add-ins can define custom wait events under the wait event type
3718      <literal>Extension</literal> by calling:
3719 <programlisting>
3720 uint32 WaitEventExtensionNew(const char *wait_event_name)
3721 </programlisting>
3722      The wait event is associated to a user-facing custom string.
3723      An example can be found in <filename>src/test/modules/worker_spi</filename>
3724      in the PostgreSQL source tree.
3725     </para>
3726     <para>
3727      Custom wait events can be viewed in
3728      <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>:
3729 <screen>
3730 =# SELECT wait_event_type, wait_event FROM pg_stat_activity
3731      WHERE backend_type ~ 'worker_spi';
3732  wait_event_type |  wait_event
3733 -----------------+---------------
3734  Extension       | WorkerSpiMain
3735 (1 row)
3736 </screen>
3737     </para>
3738    </sect2>
3739
3740    <sect2 id="xfunc-addin-injection-points">
3741     <title>Injection Points</title>
3742
3743     <para>
3744      An injection point with a given <literal>name</literal> is declared using
3745      macro:
3746 <programlisting>
3747 INJECTION_POINT(name);
3748 </programlisting>
3749
3750      There are a few injection points already declared at strategic points
3751      within the server code. After adding a new injection point the code needs
3752      to be compiled in order for that injection point to be available in the
3753      binary. Add-ins written in C-language can declare injection points in
3754      their own code using the same macro.
3755     </para>
3756
3757     <para>
3758      Executing an injection point can require allocating a small amount of
3759      memory, which can fail. If you need to have an injection point in a
3760      critical section where dynamic allocations are not allowed, you can use
3761      a two-step approach with the following macros:
3762 <programlisting>
3763 INJECTION_POINT_LOAD(name);
3764 INJECTION_POINT_CACHED(name);
3765 </programlisting>
3766
3767      Before entering the critical section,
3768      call <function>INJECTION_POINT_LOAD</function>. It checks the shared
3769      memory state, and loads the callback into backend-private memory if it is
3770      active. Inside the critical section, use
3771      <function>INJECTION_POINT_CACHED</function> to execute the callback.
3772     </para>
3773
3774     <para>
3775      Add-ins can attach callbacks to an already-declared injection point by
3776      calling:
3777 <programlisting>
3778 extern void InjectionPointAttach(const char *name,
3779                                  const char *library,
3780                                  const char *function,
3781                                  const void *private_data,
3782                                  int private_data_size);
3783 </programlisting>
3784
3785      <literal>name</literal> is the name of the injection point, which when
3786      reached during execution will execute the <literal>function</literal>
3787      loaded from <literal>library</literal>. <literal>private_data</literal>
3788      is a private area of data of size <literal>private_data_size</literal>
3789      given as argument to the callback when executed.
3790     </para>
3791
3792     <para>
3793      Here is an example of callback for
3794      <literal>InjectionPointCallback</literal>:
3795 <programlisting>
3796 static void
3797 custom_injection_callback(const char *name, const void *private_data)
3798 {
3799     uint32 wait_event_info = WaitEventInjectionPointNew(name);
3800
3801     pgstat_report_wait_start(wait_event_info);
3802     elog(NOTICE, "%s: executed custom callback", name);
3803     pgstat_report_wait_end();
3804 }
3805 </programlisting>
3806      This callback prints a message to server error log with severity
3807      <literal>NOTICE</literal>, but callbacks may implement more complex
3808      logic.
3809     </para>
3810
3811     <para>
3812      An alternative way to define the action to take when an injection point
3813      is reached is to add the testing code alongside the normal source
3814      code. This can be useful if the action e.g. depends on local variables
3815      that are not accessible to loaded modules. The
3816      <function>IS_INJECTION_POINT_ATTACHED</function> macro can then be used
3817      to check if an injection point is attached, for example:
3818 <programlisting>
3819 #ifdef USE_INJECTION_POINTS
3820 if (IS_INJECTION_POINT_ATTACHED("before-foobar"))
3821 {
3822     /* change a local variable if injection point is attached */
3823     local_var = 123;
3824
3825     /* also execute the callback */
3826     INJECTION_POINT_CACHED("before-foobar");
3827 }
3828 #endif
3829 </programlisting>
3830      Note that the callback attached to the injection point will not be
3831      executed by the <function>IS_INJECTION_POINT_ATTACHED</function>
3832      macro. If you want to execute the callback, you must also call
3833      <function>INJECTION_POINT_CACHED</function> like in the above example.
3834     </para>
3835
3836     <para>
3837      Optionally, it is possible to detach an injection point by calling:
3838 <programlisting>
3839 extern bool InjectionPointDetach(const char *name);
3840 </programlisting>
3841      On success, <literal>true</literal> is returned, <literal>false</literal>
3842      otherwise.
3843     </para>
3844
3845     <para>
3846      A callback attached to an injection point is available across all the
3847      backends including the backends started after
3848      <literal>InjectionPointAttach</literal> is called. It remains attached
3849      while the server is running or until the injection point is detached
3850      using <literal>InjectionPointDetach</literal>.
3851     </para>
3852
3853     <para>
3854      An example can be found in
3855      <filename>src/test/modules/injection_points</filename> in the PostgreSQL
3856      source tree.
3857     </para>
3858
3859     <para>
3860      Enabling injections points requires
3861      <option>--enable-injection-points</option> with
3862      <command>configure</command> or <option>-Dinjection_points=true</option>
3863      with <application>Meson</application>.
3864     </para>
3865    </sect2>
3866
3867    <sect2 id="xfunc-addin-custom-cumulative-statistics">
3868     <title>Custom Cumulative Statistics</title>
3869
3870     <para>
3871      It is possible for add-ins written in C-language to use custom types
3872      of cumulative statistics registered in the
3873      <link linkend="monitoring-stats-setup">Cumulative Statistics System</link>.
3874     </para>
3875
3876     <para>
3877      First, define a <literal>PgStat_KindInfo</literal> that includes all
3878      the information related to the custom type registered. For example:
3879 <programlisting>
3880 static const PgStat_KindInfo custom_stats = {
3881     .name = "custom_stats",
3882     .fixed_amount = false,
3883     .shared_size = sizeof(PgStatShared_Custom),
3884     .shared_data_off = offsetof(PgStatShared_Custom, stats),
3885     .shared_data_len = sizeof(((PgStatShared_Custom *) 0)->stats),
3886     .pending_size = sizeof(PgStat_StatCustomEntry),
3887 }
3888 </programlisting>
3889
3890      Then, each backend that needs to use this custom type needs to register
3891      it with <literal>pgstat_register_kind</literal> and a unique ID used to
3892      store the entries related to this type of statistics:
3893 <programlisting>
3894 extern PgStat_Kind pgstat_register_kind(PgStat_Kind kind,
3895                                         const PgStat_KindInfo *kind_info);
3896 </programlisting>
3897      While developing a new extension, use
3898      <literal>PGSTAT_KIND_EXPERIMENTAL</literal> for
3899      <parameter>kind</parameter>. When you are ready to release the extension
3900      to users, reserve a kind ID at the
3901      <ulink url="https://wiki.postgresql.org/wiki/CustomCumulativeStats">
3902      Custom Cumulative Statistics</ulink> page.
3903     </para>
3904
3905     <para>
3906      The details of the API for <literal>PgStat_KindInfo</literal> can
3907      be found in <filename>src/include/utils/pgstat_internal.h</filename>.
3908     </para>
3909
3910     <para>
3911      The type of statistics registered is associated with a name and a unique
3912      ID shared across the server in shared memory. Each backend using a
3913      custom type of statistics maintains a local cache storing the information
3914      of each custom <literal>PgStat_KindInfo</literal>.
3915     </para>
3916
3917     <para>
3918      Place the extension module implementing the custom cumulative statistics
3919      type in <xref linkend="guc-shared-preload-libraries"/> so that it will
3920      be loaded early during <productname>PostgreSQL</productname> startup.
3921     </para>
3922
3923     <para>
3924      An example describing how to register and use custom statistics can be
3925      found in <filename>src/test/modules/injection_points</filename>.
3926     </para>
3927    </sect2>
3928
3929    <sect2 id="extend-cpp">
3930     <title>Using C++ for Extensibility</title>
3931
3932     <indexterm zone="extend-cpp">
3933      <primary>C++</primary>
3934     </indexterm>
3935
3936     <para>
3937      Although the <productname>PostgreSQL</productname> backend is written in
3938      C, it is possible to write extensions in C++ if these guidelines are
3939      followed:
3940
3941      <itemizedlist>
3942       <listitem>
3943        <para>
3944          All functions accessed by the backend must present a C interface
3945          to the backend;  these C functions can then call C++ functions.
3946          For example, <literal>extern C</literal> linkage is required for
3947          backend-accessed functions.  This is also necessary for any
3948          functions that are passed as pointers between the backend and
3949          C++ code.
3950        </para>
3951       </listitem>
3952       <listitem>
3953        <para>
3954         Free memory using the appropriate deallocation method.  For example,
3955         most backend memory is allocated using <function>palloc()</function>, so use
3956         <function>pfree()</function> to free it.  Using C++
3957         <function>delete</function> in such cases will fail.
3958        </para>
3959       </listitem>
3960       <listitem>
3961        <para>
3962         Prevent exceptions from propagating into the C code (use a catch-all
3963         block at the top level of all <literal>extern C</literal> functions).  This
3964         is necessary even if the C++ code does not explicitly throw any
3965         exceptions, because events like out-of-memory can still throw
3966         exceptions.  Any exceptions must be caught and appropriate errors
3967         passed back to the C interface.  If possible, compile C++ with
3968         <option>-fno-exceptions</option> to eliminate exceptions entirely; in such
3969         cases, you must check for failures in your C++ code, e.g.,  check for
3970         NULL returned by <function>new()</function>.
3971        </para>
3972       </listitem>
3973       <listitem>
3974        <para>
3975         If calling backend functions from C++ code, be sure that the
3976         C++ call stack contains only plain old data structures
3977         (<acronym>POD</acronym>).  This is necessary because backend errors
3978         generate a distant <function>longjmp()</function> that does not properly
3979         unroll a C++ call stack with non-POD objects.
3980        </para>
3981       </listitem>
3982      </itemizedlist>
3983     </para>
3984
3985     <para>
3986      In summary, it is best to place C++ code behind a wall of
3987      <literal>extern C</literal> functions that interface to the backend,
3988      and avoid exception, memory, and call stack leakage.
3989     </para>
3990    </sect2>
3991
3992   </sect1>
3993
3994   <sect1 id="xfunc-optimization">
3995    <title>Function Optimization Information</title>
3996
3997   <indexterm zone="xfunc-optimization">
3998    <primary>optimization information</primary>
3999    <secondary>for functions</secondary>
4000   </indexterm>
4001
4002    <para>
4003     By default, a function is just a <quote>black box</quote> that the
4004     database system knows very little about the behavior of.  However,
4005     that means that queries using the function may be executed much less
4006     efficiently than they could be.  It is possible to supply additional
4007     knowledge that helps the planner optimize function calls.
4008    </para>
4009
4010    <para>
4011     Some basic facts can be supplied by declarative annotations provided in
4012     the <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link> command.  Most important of
4013     these is the function's <link linkend="xfunc-volatility">volatility
4014     category</link> (<literal>IMMUTABLE</literal>, <literal>STABLE</literal>,
4015     or <literal>VOLATILE</literal>); one should always be careful to
4016     specify this correctly when defining a function.
4017     The parallel safety property (<literal>PARALLEL
4018     UNSAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or
4019     <literal>PARALLEL SAFE</literal>) must also be specified if you hope
4020     to use the function in parallelized queries.
4021     It can also be useful to specify the function's estimated execution
4022     cost, and/or the number of rows a set-returning function is estimated
4023     to return.  However, the declarative way of specifying those two
4024     facts only allows specifying a constant value, which is often
4025     inadequate.
4026    </para>
4027
4028    <para>
4029     It is also possible to attach a <firstterm>planner support
4030     function</firstterm> to an SQL-callable function (called
4031     its <firstterm>target function</firstterm>), and thereby provide
4032     knowledge about the target function that is too complex to be
4033     represented declaratively.  Planner support functions have to be
4034     written in C (although their target functions might not be), so this is
4035     an advanced feature that relatively few people will use.
4036    </para>
4037
4038    <para>
4039     A planner support function must have the SQL signature
4040 <programlisting>
4041 supportfn(internal) returns internal
4042 </programlisting>
4043     It is attached to its target function by specifying
4044     the <literal>SUPPORT</literal> clause when creating the target function.
4045    </para>
4046
4047    <para>
4048     The details of the API for planner support functions can be found in
4049     file <filename>src/include/nodes/supportnodes.h</filename> in the
4050     <productname>PostgreSQL</productname> source code.  Here we provide
4051     just an overview of what planner support functions can do.
4052     The set of possible requests to a support function is extensible,
4053     so more things might be possible in future versions.
4054    </para>
4055
4056    <para>
4057     Some function calls can be simplified during planning based on
4058     properties specific to the function.  For example,
4059     <literal>int4mul(n, 1)</literal> could be simplified to
4060     just <literal>n</literal>.  This type of transformation can be
4061     performed by a planner support function, by having it implement
4062     the <literal>SupportRequestSimplify</literal> request type.
4063     The support function will be called for each instance of its target
4064     function found in a query parse tree.  If it finds that the particular
4065     call can be simplified into some other form, it can build and return a
4066     parse tree representing that expression.  This will automatically work
4067     for operators based on the function, too &mdash; in the example just
4068     given, <literal>n * 1</literal> would also be simplified to
4069     <literal>n</literal>.
4070     (But note that this is just an example; this particular
4071     optimization is not actually performed by
4072     standard <productname>PostgreSQL</productname>.)
4073     We make no guarantee that <productname>PostgreSQL</productname> will
4074     never call the target function in cases that the support function could
4075     simplify.  Ensure rigorous equivalence between the simplified
4076     expression and an actual execution of the target function.
4077    </para>
4078
4079    <para>
4080     For target functions that return <type>boolean</type>, it is often useful to estimate
4081     the fraction of rows that will be selected by a <literal>WHERE</literal> clause using that
4082     function.  This can be done by a support function that implements
4083     the <literal>SupportRequestSelectivity</literal> request type.
4084    </para>
4085
4086    <para>
4087     If the target function's run time is highly dependent on its inputs,
4088     it may be useful to provide a non-constant cost estimate for it.
4089     This can be done by a support function that implements
4090     the <literal>SupportRequestCost</literal> request type.
4091    </para>
4092
4093    <para>
4094     For target functions that return sets, it is often useful to provide
4095     a non-constant estimate for the number of rows that will be returned.
4096     This can be done by a support function that implements
4097     the <literal>SupportRequestRows</literal> request type.
4098    </para>
4099
4100    <para>
4101     For target functions that return <type>boolean</type>, it may be possible to
4102     convert a function call appearing in <literal>WHERE</literal> into an indexable operator
4103     clause or clauses.  The converted clauses might be exactly equivalent
4104     to the function's condition, or they could be somewhat weaker (that is,
4105     they might accept some values that the function condition does not).
4106     In the latter case the index condition is said to
4107     be <firstterm>lossy</firstterm>; it can still be used to scan an index,
4108     but the function call will have to be executed for each row returned by
4109     the index to see if it really passes the <literal>WHERE</literal> condition or not.
4110     To create such conditions, the support function must implement
4111     the <literal>SupportRequestIndexCondition</literal> request type.
4112    </para>
4113   </sect1>