Remove old RULE privilege completely.
[pgsql.git] / doc / src / sgml / xfunc.sgml
blobaf7864a1b5b2ae97426a67682fd7b7b4630fb7f7
1 <!-- doc/src/sgml/xfunc.sgml -->
3 <sect1 id="xfunc">
4 <title>User-Defined Functions</title>
6 <indexterm zone="xfunc">
7 <primary>function</primary>
8 <secondary>user-defined</secondary>
9 </indexterm>
11 <para>
12 <productname>PostgreSQL</productname> provides four kinds of
13 functions:
15 <itemizedlist>
16 <listitem>
17 <para>
18 query language functions (functions written in
19 <acronym>SQL</acronym>) (<xref linkend="xfunc-sql"/>)
20 </para>
21 </listitem>
22 <listitem>
23 <para>
24 procedural language functions (functions written in, for
25 example, <application>PL/pgSQL</application> or <application>PL/Tcl</application>)
26 (<xref linkend="xfunc-pl"/>)
27 </para>
28 </listitem>
29 <listitem>
30 <para>
31 internal functions (<xref linkend="xfunc-internal"/>)
32 </para>
33 </listitem>
34 <listitem>
35 <para>
36 C-language functions (<xref linkend="xfunc-c"/>)
37 </para>
38 </listitem>
39 </itemizedlist>
40 </para>
42 <para>
43 Every kind
44 of function can take base types, composite types, or
45 combinations of these as arguments (parameters). In addition,
46 every kind of function can return a base type or
47 a composite type. Functions can also be defined to return
48 sets of base or composite values.
49 </para>
51 <para>
52 Many kinds of functions can take or return certain pseudo-types
53 (such as polymorphic types), but the available facilities vary.
54 Consult the description of each kind of function for more details.
55 </para>
57 <para>
58 It's easiest to define <acronym>SQL</acronym>
59 functions, so we'll start by discussing those.
60 Most of the concepts presented for <acronym>SQL</acronym> functions
61 will carry over to the other types of functions.
62 </para>
64 <para>
65 Throughout this chapter, it can be useful to look at the reference
66 page of the <link linkend="sql-createfunction"><command>CREATE
67 FUNCTION</command></link> command to
68 understand the examples better. Some examples from this chapter
69 can be found in <filename>funcs.sql</filename> and
70 <filename>funcs.c</filename> in the <filename>src/tutorial</filename>
71 directory in the <productname>PostgreSQL</productname> source
72 distribution.
73 </para>
74 </sect1>
76 <sect1 id="xproc">
77 <title>User-Defined Procedures</title>
79 <indexterm zone="xproc">
80 <primary>procedure</primary>
81 <secondary>user-defined</secondary>
82 </indexterm>
84 <para>
85 A procedure is a database object similar to a function.
86 The key differences are:
88 <itemizedlist>
89 <listitem>
90 <para>
91 Procedures are defined with
92 the <link linkend="sql-createprocedure"><command>CREATE
93 PROCEDURE</command></link> command, not <command>CREATE
94 FUNCTION</command>.
95 </para>
96 </listitem>
97 <listitem>
98 <para>
99 Procedures do not return a function value; hence <command>CREATE
100 PROCEDURE</command> lacks a <literal>RETURNS</literal> clause.
101 However, procedures can instead return data to their callers via
102 output parameters.
103 </para>
104 </listitem>
105 <listitem>
106 <para>
107 While a function is called as part of a query or DML command, a
108 procedure is called in isolation using
109 the <link linkend="sql-call"><command>CALL</command></link> command.
110 </para>
111 </listitem>
112 <listitem>
113 <para>
114 A procedure can commit or roll back transactions during its
115 execution (then automatically beginning a new transaction), so long
116 as the invoking <command>CALL</command> command is not part of an
117 explicit transaction block. A function cannot do that.
118 </para>
119 </listitem>
120 <listitem>
121 <para>
122 Certain function attributes, such as strictness, don't apply to
123 procedures. Those attributes control how the function is
124 used in a query, which isn't relevant to procedures.
125 </para>
126 </listitem>
127 </itemizedlist>
128 </para>
130 <para>
131 The explanations in the following sections about how to define
132 user-defined functions apply to procedures as well, except for the
133 points made above.
134 </para>
136 <para>
137 Collectively, functions and procedures are also known
138 as <firstterm>routines</firstterm><indexterm><primary>routine</primary></indexterm>.
139 There are commands such as <link linkend="sql-alterroutine"><command>ALTER ROUTINE</command></link>
140 and <link linkend="sql-droproutine"><command>DROP ROUTINE</command></link> that can operate on functions and
141 procedures without having to know which kind it is. Note, however, that
142 there is no <literal>CREATE ROUTINE</literal> command.
143 </para>
144 </sect1>
146 <sect1 id="xfunc-sql">
147 <title>Query Language (<acronym>SQL</acronym>) Functions</title>
149 <indexterm zone="xfunc-sql">
150 <primary>function</primary>
151 <secondary>user-defined</secondary>
152 <tertiary>in SQL</tertiary>
153 </indexterm>
155 <para>
156 SQL functions execute an arbitrary list of SQL statements, returning
157 the result of the last query in the list.
158 In the simple (non-set)
159 case, the first row of the last query's result will be returned.
160 (Bear in mind that <quote>the first row</quote> of a multirow
161 result is not well-defined unless you use <literal>ORDER BY</literal>.)
162 If the last query happens
163 to return no rows at all, the null value will be returned.
164 </para>
166 <para>
167 Alternatively, an SQL function can be declared to return a set (that is,
168 multiple rows) by specifying the function's return type as <literal>SETOF
169 <replaceable>sometype</replaceable></literal>, or equivalently by declaring it as
170 <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>. In this case
171 all rows of the last query's result are returned. Further details appear
172 below.
173 </para>
175 <para>
176 The body of an SQL function must be a list of SQL
177 statements separated by semicolons. A semicolon after the last
178 statement is optional. Unless the function is declared to return
179 <type>void</type>, the last statement must be a <command>SELECT</command>,
180 or an <command>INSERT</command>, <command>UPDATE</command>,
181 <command>DELETE</command>, or <command>MERGE</command>
182 that has a <literal>RETURNING</literal> clause.
183 </para>
185 <para>
186 Any collection of commands in the <acronym>SQL</acronym>
187 language can be packaged together and defined as a function.
188 Besides <command>SELECT</command> queries, the commands can include data
189 modification queries (<command>INSERT</command>,
190 <command>UPDATE</command>, <command>DELETE</command>, and
191 <command>MERGE</command>), as well as
192 other SQL commands. (You cannot use transaction control commands, e.g.,
193 <command>COMMIT</command>, <command>SAVEPOINT</command>, and some utility
194 commands, e.g., <literal>VACUUM</literal>, in <acronym>SQL</acronym> functions.)
195 However, the final command
196 must be a <command>SELECT</command> or have a <literal>RETURNING</literal>
197 clause that returns whatever is
198 specified as the function's return type. Alternatively, if you
199 want to define an SQL function that performs actions but has no
200 useful value to return, you can define it as returning <type>void</type>.
201 For example, this function removes rows with negative salaries from
202 the <literal>emp</literal> table:
204 <screen>
205 CREATE FUNCTION clean_emp() RETURNS void AS '
206 DELETE FROM emp
207 WHERE salary &lt; 0;
208 ' LANGUAGE SQL;
210 SELECT clean_emp();
212 clean_emp
213 -----------
215 (1 row)
216 </screen>
217 </para>
219 <para>
220 You can also write this as a procedure, thus avoiding the issue of the
221 return type. For example:
222 <screen>
223 CREATE PROCEDURE clean_emp() AS '
224 DELETE FROM emp
225 WHERE salary &lt; 0;
226 ' LANGUAGE SQL;
228 CALL clean_emp();
229 </screen>
230 In simple cases like this, the difference between a function returning
231 <type>void</type> and a procedure is mostly stylistic. However,
232 procedures offer additional functionality such as transaction control
233 that is not available in functions. Also, procedures are SQL standard
234 whereas returning <type>void</type> is a PostgreSQL extension.
235 </para>
237 <note>
238 <para>
239 The entire body of an SQL function is parsed before any of it is
240 executed. While an SQL function can contain commands that alter
241 the system catalogs (e.g., <command>CREATE TABLE</command>), the effects
242 of such commands will not be visible during parse analysis of
243 later commands in the function. Thus, for example,
244 <literal>CREATE TABLE foo (...); INSERT INTO foo VALUES(...);</literal>
245 will not work as desired if packaged up into a single SQL function,
246 since <structname>foo</structname> won't exist yet when the <command>INSERT</command>
247 command is parsed. It's recommended to use <application>PL/pgSQL</application>
248 instead of an SQL function in this type of situation.
249 </para>
250 </note>
252 <para>
253 The syntax of the <command>CREATE FUNCTION</command> command requires
254 the function body to be written as a string constant. It is usually
255 most convenient to use dollar quoting (see <xref
256 linkend="sql-syntax-dollar-quoting"/>) for the string constant.
257 If you choose to use regular single-quoted string constant syntax,
258 you must double single quote marks (<literal>'</literal>) and backslashes
259 (<literal>\</literal>) (assuming escape string syntax) in the body of
260 the function (see <xref linkend="sql-syntax-strings"/>).
261 </para>
263 <sect2 id="xfunc-sql-function-arguments">
264 <title>Arguments for <acronym>SQL</acronym> Functions</title>
266 <indexterm>
267 <primary>function</primary>
268 <secondary>named argument</secondary>
269 </indexterm>
271 <para>
272 Arguments of an SQL function can be referenced in the function
273 body using either names or numbers. Examples of both methods appear
274 below.
275 </para>
277 <para>
278 To use a name, declare the function argument as having a name, and
279 then just write that name in the function body. If the argument name
280 is the same as any column name in the current SQL command within the
281 function, the column name will take precedence. To override this,
282 qualify the argument name with the name of the function itself, that is
283 <literal><replaceable>function_name</replaceable>.<replaceable>argument_name</replaceable></literal>.
284 (If this would conflict with a qualified column name, again the column
285 name wins. You can avoid the ambiguity by choosing a different alias for
286 the table within the SQL command.)
287 </para>
289 <para>
290 In the older numeric approach, arguments are referenced using the syntax
291 <literal>$<replaceable>n</replaceable></literal>: <literal>$1</literal> refers to the first input
292 argument, <literal>$2</literal> to the second, and so on. This will work
293 whether or not the particular argument was declared with a name.
294 </para>
296 <para>
297 If an argument is of a composite type, then the dot notation,
298 e.g., <literal><replaceable>argname</replaceable>.<replaceable>fieldname</replaceable></literal> or
299 <literal>$1.<replaceable>fieldname</replaceable></literal>, can be used to access attributes of the
300 argument. Again, you might need to qualify the argument's name with the
301 function name to make the form with an argument name unambiguous.
302 </para>
304 <para>
305 SQL function arguments can only be used as data values,
306 not as identifiers. Thus for example this is reasonable:
307 <programlisting>
308 INSERT INTO mytable VALUES ($1);
309 </programlisting>
310 but this will not work:
311 <programlisting>
312 INSERT INTO $1 VALUES (42);
313 </programlisting>
314 </para>
316 <note>
317 <para>
318 The ability to use names to reference SQL function arguments was added
319 in <productname>PostgreSQL</productname> 9.2. Functions to be used in
320 older servers must use the <literal>$<replaceable>n</replaceable></literal> notation.
321 </para>
322 </note>
323 </sect2>
325 <sect2 id="xfunc-sql-base-functions">
326 <title><acronym>SQL</acronym> Functions on Base Types</title>
328 <para>
329 The simplest possible <acronym>SQL</acronym> function has no arguments and
330 simply returns a base type, such as <type>integer</type>:
332 <screen>
333 CREATE FUNCTION one() RETURNS integer AS $$
334 SELECT 1 AS result;
335 $$ LANGUAGE SQL;
337 -- Alternative syntax for string literal:
338 CREATE FUNCTION one() RETURNS integer AS '
339 SELECT 1 AS result;
340 ' LANGUAGE SQL;
342 SELECT one();
345 -----
347 </screen>
348 </para>
350 <para>
351 Notice that we defined a column alias within the function body for the result of the function
352 (with the name <literal>result</literal>), but this column alias is not visible
353 outside the function. Hence, the result is labeled <literal>one</literal>
354 instead of <literal>result</literal>.
355 </para>
357 <para>
358 It is almost as easy to define <acronym>SQL</acronym> functions
359 that take base types as arguments:
361 <screen>
362 CREATE FUNCTION add_em(x integer, y integer) RETURNS integer AS $$
363 SELECT x + y;
364 $$ LANGUAGE SQL;
366 SELECT add_em(1, 2) AS answer;
368 answer
369 --------
371 </screen>
372 </para>
374 <para>
375 Alternatively, we could dispense with names for the arguments and
376 use numbers:
378 <screen>
379 CREATE FUNCTION add_em(integer, integer) RETURNS integer AS $$
380 SELECT $1 + $2;
381 $$ LANGUAGE SQL;
383 SELECT add_em(1, 2) AS answer;
385 answer
386 --------
388 </screen>
389 </para>
391 <para>
392 Here is a more useful function, which might be used to debit a
393 bank account:
395 <programlisting>
396 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
397 UPDATE bank
398 SET balance = balance - debit
399 WHERE accountno = tf1.accountno;
400 SELECT 1;
401 $$ LANGUAGE SQL;
402 </programlisting>
404 A user could execute this function to debit account 17 by $100.00 as
405 follows:
407 <programlisting>
408 SELECT tf1(17, 100.0);
409 </programlisting>
410 </para>
412 <para>
413 In this example, we chose the name <literal>accountno</literal> for the first
414 argument, but this is the same as the name of a column in the
415 <literal>bank</literal> table. Within the <command>UPDATE</command> command,
416 <literal>accountno</literal> refers to the column <literal>bank.accountno</literal>,
417 so <literal>tf1.accountno</literal> must be used to refer to the argument.
418 We could of course avoid this by using a different name for the argument.
419 </para>
421 <para>
422 In practice one would probably like a more useful result from the
423 function than a constant 1, so a more likely definition
426 <programlisting>
427 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
428 UPDATE bank
429 SET balance = balance - debit
430 WHERE accountno = tf1.accountno;
431 SELECT balance FROM bank WHERE accountno = tf1.accountno;
432 $$ LANGUAGE SQL;
433 </programlisting>
435 which adjusts the balance and returns the new balance.
436 The same thing could be done in one command using <literal>RETURNING</literal>:
438 <programlisting>
439 CREATE FUNCTION tf1 (accountno integer, debit numeric) RETURNS numeric AS $$
440 UPDATE bank
441 SET balance = balance - debit
442 WHERE accountno = tf1.accountno
443 RETURNING balance;
444 $$ LANGUAGE SQL;
445 </programlisting>
446 </para>
448 <para>
449 If the final <literal>SELECT</literal> or <literal>RETURNING</literal>
450 clause in an <acronym>SQL</acronym> function does not return exactly
451 the function's declared result
452 type, <productname>PostgreSQL</productname> will automatically cast
453 the value to the required type, if that is possible with an implicit
454 or assignment cast. Otherwise, you must write an explicit cast.
455 For example, suppose we wanted the
456 previous <function>add_em</function> function to return
457 type <type>float8</type> instead. It's sufficient to write
459 <programlisting>
460 CREATE FUNCTION add_em(integer, integer) RETURNS float8 AS $$
461 SELECT $1 + $2;
462 $$ LANGUAGE SQL;
463 </programlisting>
465 since the <type>integer</type> sum can be implicitly cast
466 to <type>float8</type>.
467 (See <xref linkend="typeconv"/> or <xref linkend="sql-createcast"/>
468 for more about casts.)
469 </para>
470 </sect2>
472 <sect2 id="xfunc-sql-composite-functions">
473 <title><acronym>SQL</acronym> Functions on Composite Types</title>
475 <para>
476 When writing functions with arguments of composite types, we must not
477 only specify which argument we want but also the desired attribute
478 (field) of that argument. For example, suppose that
479 <type>emp</type> is a table containing employee data, and therefore
480 also the name of the composite type of each row of the table. Here
481 is a function <function>double_salary</function> that computes what someone's
482 salary would be if it were doubled:
484 <screen>
485 CREATE TABLE emp (
486 name text,
487 salary numeric,
488 age integer,
489 cubicle point
492 INSERT INTO emp VALUES ('Bill', 4200, 45, '(2,1)');
494 CREATE FUNCTION double_salary(emp) RETURNS numeric AS $$
495 SELECT $1.salary * 2 AS salary;
496 $$ LANGUAGE SQL;
498 SELECT name, double_salary(emp.*) AS dream
499 FROM emp
500 WHERE emp.cubicle ~= point '(2,1)';
502 name | dream
503 ------+-------
504 Bill | 8400
505 </screen>
506 </para>
508 <para>
509 Notice the use of the syntax <literal>$1.salary</literal>
510 to select one field of the argument row value. Also notice
511 how the calling <command>SELECT</command> command
512 uses <replaceable>table_name</replaceable><literal>.*</literal> to select
513 the entire current row of a table as a composite value. The table
514 row can alternatively be referenced using just the table name,
515 like this:
516 <screen>
517 SELECT name, double_salary(emp) AS dream
518 FROM emp
519 WHERE emp.cubicle ~= point '(2,1)';
520 </screen>
521 but this usage is deprecated since it's easy to get confused.
522 (See <xref linkend="rowtypes-usage"/> for details about these
523 two notations for the composite value of a table row.)
524 </para>
526 <para>
527 Sometimes it is handy to construct a composite argument value
528 on-the-fly. This can be done with the <literal>ROW</literal> construct.
529 For example, we could adjust the data being passed to the function:
530 <screen>
531 SELECT name, double_salary(ROW(name, salary*1.1, age, cubicle)) AS dream
532 FROM emp;
533 </screen>
534 </para>
536 <para>
537 It is also possible to build a function that returns a composite type.
538 This is an example of a function
539 that returns a single <type>emp</type> row:
541 <programlisting>
542 CREATE FUNCTION new_emp() RETURNS emp AS $$
543 SELECT text 'None' AS name,
544 1000.0 AS salary,
545 25 AS age,
546 point '(2,2)' AS cubicle;
547 $$ LANGUAGE SQL;
548 </programlisting>
550 In this example we have specified each of the attributes
551 with a constant value, but any computation
552 could have been substituted for these constants.
553 </para>
555 <para>
556 Note two important things about defining the function:
558 <itemizedlist>
559 <listitem>
560 <para>
561 The select list order in the query must be exactly the same as
562 that in which the columns appear in the composite type.
563 (Naming the columns, as we did above,
564 is irrelevant to the system.)
565 </para>
566 </listitem>
567 <listitem>
568 <para>
569 We must ensure each expression's type can be cast to that of
570 the corresponding column of the composite type.
571 Otherwise we'll get errors like this:
572 <screen>
573 <computeroutput>
574 ERROR: return type mismatch in function declared to return emp
575 DETAIL: Final statement returns text instead of point at column 4.
576 </computeroutput>
577 </screen>
578 As with the base-type case, the system will not insert explicit
579 casts automatically, only implicit or assignment casts.
580 </para>
581 </listitem>
582 </itemizedlist>
583 </para>
585 <para>
586 A different way to define the same function is:
588 <programlisting>
589 CREATE FUNCTION new_emp() RETURNS emp AS $$
590 SELECT ROW('None', 1000.0, 25, '(2,2)')::emp;
591 $$ LANGUAGE SQL;
592 </programlisting>
594 Here we wrote a <command>SELECT</command> that returns just a single
595 column of the correct composite type. This isn't really better
596 in this situation, but it is a handy alternative in some cases
597 &mdash; for example, if we need to compute the result by calling
598 another function that returns the desired composite value.
599 Another example is that if we are trying to write a function that
600 returns a domain over composite, rather than a plain composite type,
601 it is always necessary to write it as returning a single column,
602 since there is no way to cause a coercion of the whole row result.
603 </para>
605 <para>
606 We could call this function directly either by using it in
607 a value expression:
609 <screen>
610 SELECT new_emp();
612 new_emp
613 --------------------------
614 (None,1000.0,25,"(2,2)")
615 </screen>
617 or by calling it as a table function:
619 <screen>
620 SELECT * FROM new_emp();
622 name | salary | age | cubicle
623 ------+--------+-----+---------
624 None | 1000.0 | 25 | (2,2)
625 </screen>
627 The second way is described more fully in <xref
628 linkend="xfunc-sql-table-functions"/>.
629 </para>
631 <para>
632 When you use a function that returns a composite type,
633 you might want only one field (attribute) from its result.
634 You can do that with syntax like this:
636 <screen>
637 SELECT (new_emp()).name;
639 name
640 ------
641 None
642 </screen>
644 The extra parentheses are needed to keep the parser from getting
645 confused. If you try to do it without them, you get something like this:
647 <screen>
648 SELECT new_emp().name;
649 ERROR: syntax error at or near "."
650 LINE 1: SELECT new_emp().name;
652 </screen>
653 </para>
655 <para>
656 Another option is to use functional notation for extracting an attribute:
658 <screen>
659 SELECT name(new_emp());
661 name
662 ------
663 None
664 </screen>
666 As explained in <xref linkend="rowtypes-usage"/>, the field notation and
667 functional notation are equivalent.
668 </para>
670 <para>
671 Another way to use a function returning a composite type is to pass the
672 result to another function that accepts the correct row type as input:
674 <screen>
675 CREATE FUNCTION getname(emp) RETURNS text AS $$
676 SELECT $1.name;
677 $$ LANGUAGE SQL;
679 SELECT getname(new_emp());
680 getname
681 ---------
682 None
683 (1 row)
684 </screen>
685 </para>
686 </sect2>
688 <sect2 id="xfunc-output-parameters">
689 <title><acronym>SQL</acronym> Functions with Output Parameters</title>
691 <indexterm>
692 <primary>function</primary>
693 <secondary>output parameter</secondary>
694 </indexterm>
696 <para>
697 An alternative way of describing a function's results is to define it
698 with <firstterm>output parameters</firstterm>, as in this example:
700 <screen>
701 CREATE FUNCTION add_em (IN x int, IN y int, OUT sum int)
702 AS 'SELECT x + y'
703 LANGUAGE SQL;
705 SELECT add_em(3,7);
706 add_em
707 --------
709 (1 row)
710 </screen>
712 This is not essentially different from the version of <literal>add_em</literal>
713 shown in <xref linkend="xfunc-sql-base-functions"/>. The real value of
714 output parameters is that they provide a convenient way of defining
715 functions that return several columns. For example,
717 <screen>
718 CREATE FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int)
719 AS 'SELECT x + y, x * y'
720 LANGUAGE SQL;
722 SELECT * FROM sum_n_product(11,42);
723 sum | product
724 -----+---------
725 53 | 462
726 (1 row)
727 </screen>
729 What has essentially happened here is that we have created an anonymous
730 composite type for the result of the function. The above example has
731 the same end result as
733 <screen>
734 CREATE TYPE sum_prod AS (sum int, product int);
736 CREATE FUNCTION sum_n_product (int, int) RETURNS sum_prod
737 AS 'SELECT $1 + $2, $1 * $2'
738 LANGUAGE SQL;
739 </screen>
741 but not having to bother with the separate composite type definition
742 is often handy. Notice that the names attached to the output parameters
743 are not just decoration, but determine the column names of the anonymous
744 composite type. (If you omit a name for an output parameter, the
745 system will choose a name on its own.)
746 </para>
748 <para>
749 Notice that output parameters are not included in the calling argument
750 list when invoking such a function from SQL. This is because
751 <productname>PostgreSQL</productname> considers only the input
752 parameters to define the function's calling signature. That means
753 also that only the input parameters matter when referencing the function
754 for purposes such as dropping it. We could drop the above function
755 with either of
757 <screen>
758 DROP FUNCTION sum_n_product (x int, y int, OUT sum int, OUT product int);
759 DROP FUNCTION sum_n_product (int, int);
760 </screen>
761 </para>
763 <para>
764 Parameters can be marked as <literal>IN</literal> (the default),
765 <literal>OUT</literal>, <literal>INOUT</literal>, or <literal>VARIADIC</literal>.
766 An <literal>INOUT</literal>
767 parameter serves as both an input parameter (part of the calling
768 argument list) and an output parameter (part of the result record type).
769 <literal>VARIADIC</literal> parameters are input parameters, but are treated
770 specially as described below.
771 </para>
772 </sect2>
774 <sect2 id="xfunc-output-parameters-proc">
775 <title><acronym>SQL</acronym> Procedures with Output Parameters</title>
777 <indexterm>
778 <primary>procedures</primary>
779 <secondary>output parameter</secondary>
780 </indexterm>
782 <para>
783 Output parameters are also supported in procedures, but they work a bit
784 differently from functions. In <command>CALL</command> commands,
785 output parameters must be included in the argument list.
786 For example, the bank account debiting routine from earlier could be
787 written like this:
788 <programlisting>
789 CREATE PROCEDURE tp1 (accountno integer, debit numeric, OUT new_balance numeric) AS $$
790 UPDATE bank
791 SET balance = balance - debit
792 WHERE accountno = tp1.accountno
793 RETURNING balance;
794 $$ LANGUAGE SQL;
795 </programlisting>
796 To call this procedure, an argument matching the <literal>OUT</literal>
797 parameter must be included. It's customary to write
798 <literal>NULL</literal>:
799 <programlisting>
800 CALL tp1(17, 100.0, NULL);
801 </programlisting>
802 If you write something else, it must be an expression that is implicitly
803 coercible to the declared type of the parameter, just as for input
804 parameters. Note however that such an expression will not be evaluated.
805 </para>
807 <para>
808 When calling a procedure from <application>PL/pgSQL</application>,
809 instead of writing <literal>NULL</literal> you must write a variable
810 that will receive the procedure's output. See <xref
811 linkend="plpgsql-statements-calling-procedure"/> for details.
812 </para>
813 </sect2>
815 <sect2 id="xfunc-sql-variadic-functions">
816 <title><acronym>SQL</acronym> Functions with Variable Numbers of Arguments</title>
818 <indexterm>
819 <primary>function</primary>
820 <secondary>variadic</secondary>
821 </indexterm>
823 <indexterm>
824 <primary>variadic function</primary>
825 </indexterm>
827 <para>
828 <acronym>SQL</acronym> functions can be declared to accept
829 variable numbers of arguments, so long as all the <quote>optional</quote>
830 arguments are of the same data type. The optional arguments will be
831 passed to the function as an array. The function is declared by
832 marking the last parameter as <literal>VARIADIC</literal>; this parameter
833 must be declared as being of an array type. For example:
835 <screen>
836 CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$
837 SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
838 $$ LANGUAGE SQL;
840 SELECT mleast(10, -1, 5, 4.4);
841 mleast
842 --------
844 (1 row)
845 </screen>
847 Effectively, all the actual arguments at or beyond the
848 <literal>VARIADIC</literal> position are gathered up into a one-dimensional
849 array, as if you had written
851 <screen>
852 SELECT mleast(ARRAY[10, -1, 5, 4.4]); -- doesn't work
853 </screen>
855 You can't actually write that, though &mdash; or at least, it will
856 not match this function definition. A parameter marked
857 <literal>VARIADIC</literal> matches one or more occurrences of its element
858 type, not of its own type.
859 </para>
861 <para>
862 Sometimes it is useful to be able to pass an already-constructed array
863 to a variadic function; this is particularly handy when one variadic
864 function wants to pass on its array parameter to another one. Also,
865 this is the only secure way to call a variadic function found in a schema
866 that permits untrusted users to create objects; see
867 <xref linkend="typeconv-func"/>. You can do this by
868 specifying <literal>VARIADIC</literal> in the call:
870 <screen>
871 SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
872 </screen>
874 This prevents expansion of the function's variadic parameter into its
875 element type, thereby allowing the array argument value to match
876 normally. <literal>VARIADIC</literal> can only be attached to the last
877 actual argument of a function call.
878 </para>
880 <para>
881 Specifying <literal>VARIADIC</literal> in the call is also the only way to
882 pass an empty array to a variadic function, for example:
884 <screen>
885 SELECT mleast(VARIADIC ARRAY[]::numeric[]);
886 </screen>
888 Simply writing <literal>SELECT mleast()</literal> does not work because a
889 variadic parameter must match at least one actual argument.
890 (You could define a second function also named <literal>mleast</literal>,
891 with no parameters, if you wanted to allow such calls.)
892 </para>
894 <para>
895 The array element parameters generated from a variadic parameter are
896 treated as not having any names of their own. This means it is not
897 possible to call a variadic function using named arguments (<xref
898 linkend="sql-syntax-calling-funcs"/>), except when you specify
899 <literal>VARIADIC</literal>. For example, this will work:
901 <screen>
902 SELECT mleast(VARIADIC arr =&gt; ARRAY[10, -1, 5, 4.4]);
903 </screen>
905 but not these:
907 <screen>
908 SELECT mleast(arr =&gt; 10);
909 SELECT mleast(arr =&gt; ARRAY[10, -1, 5, 4.4]);
910 </screen>
911 </para>
912 </sect2>
914 <sect2 id="xfunc-sql-parameter-defaults">
915 <title><acronym>SQL</acronym> Functions with Default Values for Arguments</title>
917 <indexterm>
918 <primary>function</primary>
919 <secondary>default values for arguments</secondary>
920 </indexterm>
922 <para>
923 Functions can be declared with default values for some or all input
924 arguments. The default values are inserted whenever the function is
925 called with insufficiently many actual arguments. Since arguments
926 can only be omitted from the end of the actual argument list, all
927 parameters after a parameter with a default value have to have
928 default values as well. (Although the use of named argument notation
929 could allow this restriction to be relaxed, it's still enforced so that
930 positional argument notation works sensibly.) Whether or not you use it,
931 this capability creates a need for precautions when calling functions in
932 databases where some users mistrust other users; see
933 <xref linkend="typeconv-func"/>.
934 </para>
936 <para>
937 For example:
938 <screen>
939 CREATE FUNCTION foo(a int, b int DEFAULT 2, c int DEFAULT 3)
940 RETURNS int
941 LANGUAGE SQL
942 AS $$
943 SELECT $1 + $2 + $3;
946 SELECT foo(10, 20, 30);
948 -----
950 (1 row)
952 SELECT foo(10, 20);
954 -----
956 (1 row)
958 SELECT foo(10);
960 -----
962 (1 row)
964 SELECT foo(); -- fails since there is no default for the first argument
965 ERROR: function foo() does not exist
966 </screen>
967 The <literal>=</literal> sign can also be used in place of the
968 key word <literal>DEFAULT</literal>.
969 </para>
970 </sect2>
972 <sect2 id="xfunc-sql-table-functions">
973 <title><acronym>SQL</acronym> Functions as Table Sources</title>
975 <para>
976 All SQL functions can be used in the <literal>FROM</literal> clause of a query,
977 but it is particularly useful for functions returning composite types.
978 If the function is defined to return a base type, the table function
979 produces a one-column table. If the function is defined to return
980 a composite type, the table function produces a column for each attribute
981 of the composite type.
982 </para>
984 <para>
985 Here is an example:
987 <screen>
988 CREATE TABLE foo (fooid int, foosubid int, fooname text);
989 INSERT INTO foo VALUES (1, 1, 'Joe');
990 INSERT INTO foo VALUES (1, 2, 'Ed');
991 INSERT INTO foo VALUES (2, 1, 'Mary');
993 CREATE FUNCTION getfoo(int) RETURNS foo AS $$
994 SELECT * FROM foo WHERE fooid = $1;
995 $$ LANGUAGE SQL;
997 SELECT *, upper(fooname) FROM getfoo(1) AS t1;
999 fooid | foosubid | fooname | upper
1000 -------+----------+---------+-------
1001 1 | 1 | Joe | JOE
1002 (1 row)
1003 </screen>
1005 As the example shows, we can work with the columns of the function's
1006 result just the same as if they were columns of a regular table.
1007 </para>
1009 <para>
1010 Note that we only got one row out of the function. This is because
1011 we did not use <literal>SETOF</literal>. That is described in the next section.
1012 </para>
1013 </sect2>
1015 <sect2 id="xfunc-sql-functions-returning-set">
1016 <title><acronym>SQL</acronym> Functions Returning Sets</title>
1018 <indexterm>
1019 <primary>function</primary>
1020 <secondary>with SETOF</secondary>
1021 </indexterm>
1023 <para>
1024 When an SQL function is declared as returning <literal>SETOF
1025 <replaceable>sometype</replaceable></literal>, the function's final
1026 query is executed to completion, and each row it
1027 outputs is returned as an element of the result set.
1028 </para>
1030 <para>
1031 This feature is normally used when calling the function in the <literal>FROM</literal>
1032 clause. In this case each row returned by the function becomes
1033 a row of the table seen by the query. For example, assume that
1034 table <literal>foo</literal> has the same contents as above, and we say:
1036 <programlisting>
1037 CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
1038 SELECT * FROM foo WHERE fooid = $1;
1039 $$ LANGUAGE SQL;
1041 SELECT * FROM getfoo(1) AS t1;
1042 </programlisting>
1044 Then we would get:
1045 <screen>
1046 fooid | foosubid | fooname
1047 -------+----------+---------
1048 1 | 1 | Joe
1049 1 | 2 | Ed
1050 (2 rows)
1051 </screen>
1052 </para>
1054 <para>
1055 It is also possible to return multiple rows with the columns defined by
1056 output parameters, like this:
1058 <programlisting>
1059 CREATE TABLE tab (y int, z int);
1060 INSERT INTO tab VALUES (1, 2), (3, 4), (5, 6), (7, 8);
1062 CREATE FUNCTION sum_n_product_with_tab (x int, OUT sum int, OUT product int)
1063 RETURNS SETOF record
1064 AS $$
1065 SELECT $1 + tab.y, $1 * tab.y FROM tab;
1066 $$ LANGUAGE SQL;
1068 SELECT * FROM sum_n_product_with_tab(10);
1069 sum | product
1070 -----+---------
1071 11 | 10
1072 13 | 30
1073 15 | 50
1074 17 | 70
1075 (4 rows)
1076 </programlisting>
1078 The key point here is that you must write <literal>RETURNS SETOF record</literal>
1079 to indicate that the function returns multiple rows instead of just one.
1080 If there is only one output parameter, write that parameter's type
1081 instead of <type>record</type>.
1082 </para>
1084 <para>
1085 It is frequently useful to construct a query's result by invoking a
1086 set-returning function multiple times, with the parameters for each
1087 invocation coming from successive rows of a table or subquery. The
1088 preferred way to do this is to use the <literal>LATERAL</literal> key word,
1089 which is described in <xref linkend="queries-lateral"/>.
1090 Here is an example using a set-returning function to enumerate
1091 elements of a tree structure:
1093 <screen>
1094 SELECT * FROM nodes;
1095 name | parent
1096 -----------+--------
1097 Top |
1098 Child1 | Top
1099 Child2 | Top
1100 Child3 | Top
1101 SubChild1 | Child1
1102 SubChild2 | Child1
1103 (6 rows)
1105 CREATE FUNCTION listchildren(text) RETURNS SETOF text AS $$
1106 SELECT name FROM nodes WHERE parent = $1
1107 $$ LANGUAGE SQL STABLE;
1109 SELECT * FROM listchildren('Top');
1110 listchildren
1111 --------------
1112 Child1
1113 Child2
1114 Child3
1115 (3 rows)
1117 SELECT name, child FROM nodes, LATERAL listchildren(name) AS child;
1118 name | child
1119 --------+-----------
1120 Top | Child1
1121 Top | Child2
1122 Top | Child3
1123 Child1 | SubChild1
1124 Child1 | SubChild2
1125 (5 rows)
1126 </screen>
1128 This example does not do anything that we couldn't have done with a
1129 simple join, but in more complex calculations the option to put
1130 some of the work into a function can be quite convenient.
1131 </para>
1133 <para>
1134 Functions returning sets can also be called in the select list
1135 of a query. For each row that the query
1136 generates by itself, the set-returning function is invoked, and an output
1137 row is generated for each element of the function's result set.
1138 The previous example could also be done with queries like
1139 these:
1141 <screen>
1142 SELECT listchildren('Top');
1143 listchildren
1144 --------------
1145 Child1
1146 Child2
1147 Child3
1148 (3 rows)
1150 SELECT name, listchildren(name) FROM nodes;
1151 name | listchildren
1152 --------+--------------
1153 Top | Child1
1154 Top | Child2
1155 Top | Child3
1156 Child1 | SubChild1
1157 Child1 | SubChild2
1158 (5 rows)
1159 </screen>
1161 In the last <command>SELECT</command>,
1162 notice that no output row appears for <literal>Child2</literal>, <literal>Child3</literal>, etc.
1163 This happens because <function>listchildren</function> returns an empty set
1164 for those arguments, so no result rows are generated. This is the same
1165 behavior as we got from an inner join to the function result when using
1166 the <literal>LATERAL</literal> syntax.
1167 </para>
1169 <para>
1170 <productname>PostgreSQL</productname>'s behavior for a set-returning function in a
1171 query's select list is almost exactly the same as if the set-returning
1172 function had been written in a <literal>LATERAL FROM</literal>-clause item
1173 instead. For example,
1174 <programlisting>
1175 SELECT x, generate_series(1,5) AS g FROM tab;
1176 </programlisting>
1177 is almost equivalent to
1178 <programlisting>
1179 SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g;
1180 </programlisting>
1181 It would be exactly the same, except that in this specific example,
1182 the planner could choose to put <structname>g</structname> on the outside of the
1183 nested-loop join, since <structname>g</structname> has no actual lateral dependency
1184 on <structname>tab</structname>. That would result in a different output row
1185 order. Set-returning functions in the select list are always evaluated
1186 as though they are on the inside of a nested-loop join with the rest of
1187 the <literal>FROM</literal> clause, so that the function(s) are run to
1188 completion before the next row from the <literal>FROM</literal> clause is
1189 considered.
1190 </para>
1192 <para>
1193 If there is more than one set-returning function in the query's select
1194 list, the behavior is similar to what you get from putting the functions
1195 into a single <literal>LATERAL ROWS FROM( ... )</literal> <literal>FROM</literal>-clause
1196 item. For each row from the underlying query, there is an output row
1197 using the first result from each function, then an output row using the
1198 second result, and so on. If some of the set-returning functions
1199 produce fewer outputs than others, null values are substituted for the
1200 missing data, so that the total number of rows emitted for one
1201 underlying row is the same as for the set-returning function that
1202 produced the most outputs. Thus the set-returning functions
1203 run <quote>in lockstep</quote> until they are all exhausted, and then
1204 execution continues with the next underlying row.
1205 </para>
1207 <para>
1208 Set-returning functions can be nested in a select list, although that is
1209 not allowed in <literal>FROM</literal>-clause items. In such cases, each level
1210 of nesting is treated separately, as though it were
1211 a separate <literal>LATERAL ROWS FROM( ... )</literal> item. For example, in
1212 <programlisting>
1213 SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab;
1214 </programlisting>
1215 the set-returning functions <function>srf2</function>, <function>srf3</function>,
1216 and <function>srf5</function> would be run in lockstep for each row
1217 of <structname>tab</structname>, and then <function>srf1</function> and <function>srf4</function>
1218 would be applied in lockstep to each row produced by the lower
1219 functions.
1220 </para>
1222 <para>
1223 Set-returning functions cannot be used within conditional-evaluation
1224 constructs, such as <literal>CASE</literal> or <literal>COALESCE</literal>. For
1225 example, consider
1226 <programlisting>
1227 SELECT x, CASE WHEN x &gt; 0 THEN generate_series(1, 5) ELSE 0 END FROM tab;
1228 </programlisting>
1229 It might seem that this should produce five repetitions of input rows
1230 that have <literal>x &gt; 0</literal>, and a single repetition of those that do
1231 not; but actually, because <function>generate_series(1, 5)</function> would be
1232 run in an implicit <literal>LATERAL FROM</literal> item before
1233 the <literal>CASE</literal> expression is ever evaluated, it would produce five
1234 repetitions of every input row. To reduce confusion, such cases produce
1235 a parse-time error instead.
1236 </para>
1238 <note>
1239 <para>
1240 If a function's last command is <command>INSERT</command>,
1241 <command>UPDATE</command>, <command>DELETE</command>, or
1242 <command>MERGE</command> with <literal>RETURNING</literal>, that command will
1243 always be executed to completion, even if the function is not declared
1244 with <literal>SETOF</literal> or the calling query does not fetch all the
1245 result rows. Any extra rows produced by the <literal>RETURNING</literal>
1246 clause are silently dropped, but the commanded table modifications
1247 still happen (and are all completed before returning from the function).
1248 </para>
1249 </note>
1251 <note>
1252 <para>
1253 Before <productname>PostgreSQL</productname> 10, putting more than one
1254 set-returning function in the same select list did not behave very
1255 sensibly unless they always produced equal numbers of rows. Otherwise,
1256 what you got was a number of output rows equal to the least common
1257 multiple of the numbers of rows produced by the set-returning
1258 functions. Also, nested set-returning functions did not work as
1259 described above; instead, a set-returning function could have at most
1260 one set-returning argument, and each nest of set-returning functions
1261 was run independently. Also, conditional execution (set-returning
1262 functions inside <literal>CASE</literal> etc.) was previously allowed,
1263 complicating things even more.
1264 Use of the <literal>LATERAL</literal> syntax is recommended when writing
1265 queries that need to work in older <productname>PostgreSQL</productname> versions,
1266 because that will give consistent results across different versions.
1267 If you have a query that is relying on conditional execution of a
1268 set-returning function, you may be able to fix it by moving the
1269 conditional test into a custom set-returning function. For example,
1270 <programlisting>
1271 SELECT x, CASE WHEN y &gt; 0 THEN generate_series(1, z) ELSE 5 END FROM tab;
1272 </programlisting>
1273 could become
1274 <programlisting>
1275 CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int)
1276 RETURNS SETOF int AS $$
1277 BEGIN
1278 IF cond THEN
1279 RETURN QUERY SELECT generate_series(start, fin);
1280 ELSE
1281 RETURN QUERY SELECT els;
1282 END IF;
1283 END$$ LANGUAGE plpgsql;
1285 SELECT x, case_generate_series(y &gt; 0, 1, z, 5) FROM tab;
1286 </programlisting>
1287 This formulation will work the same in all versions
1288 of <productname>PostgreSQL</productname>.
1289 </para>
1290 </note>
1291 </sect2>
1293 <sect2 id="xfunc-sql-functions-returning-table">
1294 <title><acronym>SQL</acronym> Functions Returning <literal>TABLE</literal></title>
1296 <indexterm>
1297 <primary>function</primary>
1298 <secondary>RETURNS TABLE</secondary>
1299 </indexterm>
1301 <para>
1302 There is another way to declare a function as returning a set,
1303 which is to use the syntax
1304 <literal>RETURNS TABLE(<replaceable>columns</replaceable>)</literal>.
1305 This is equivalent to using one or more <literal>OUT</literal> parameters plus
1306 marking the function as returning <literal>SETOF record</literal> (or
1307 <literal>SETOF</literal> a single output parameter's type, as appropriate).
1308 This notation is specified in recent versions of the SQL standard, and
1309 thus may be more portable than using <literal>SETOF</literal>.
1310 </para>
1312 <para>
1313 For example, the preceding sum-and-product example could also be
1314 done this way:
1316 <programlisting>
1317 CREATE FUNCTION sum_n_product_with_tab (x int)
1318 RETURNS TABLE(sum int, product int) AS $$
1319 SELECT $1 + tab.y, $1 * tab.y FROM tab;
1320 $$ LANGUAGE SQL;
1321 </programlisting>
1323 It is not allowed to use explicit <literal>OUT</literal> or <literal>INOUT</literal>
1324 parameters with the <literal>RETURNS TABLE</literal> notation &mdash; you must
1325 put all the output columns in the <literal>TABLE</literal> list.
1326 </para>
1327 </sect2>
1329 <sect2 id="xfunc-sql-polymorphic-functions">
1330 <title>Polymorphic <acronym>SQL</acronym> Functions</title>
1332 <para>
1333 <acronym>SQL</acronym> functions can be declared to accept and
1334 return the polymorphic types described in <xref
1335 linkend="extend-types-polymorphic"/>. Here is a polymorphic
1336 function <function>make_array</function> that builds up an array
1337 from two arbitrary data type elements:
1338 <screen>
1339 CREATE FUNCTION make_array(anyelement, anyelement) RETURNS anyarray AS $$
1340 SELECT ARRAY[$1, $2];
1341 $$ LANGUAGE SQL;
1343 SELECT make_array(1, 2) AS intarray, make_array('a'::text, 'b') AS textarray;
1344 intarray | textarray
1345 ----------+-----------
1346 {1,2} | {a,b}
1347 (1 row)
1348 </screen>
1349 </para>
1351 <para>
1352 Notice the use of the typecast <literal>'a'::text</literal>
1353 to specify that the argument is of type <type>text</type>. This is
1354 required if the argument is just a string literal, since otherwise
1355 it would be treated as type
1356 <type>unknown</type>, and array of <type>unknown</type> is not a valid
1357 type.
1358 Without the typecast, you will get errors like this:
1359 <screen>
1360 ERROR: could not determine polymorphic type because input has type unknown
1361 </screen>
1362 </para>
1364 <para>
1365 With <function>make_array</function> declared as above, you must
1366 provide two arguments that are of exactly the same data type; the
1367 system will not attempt to resolve any type differences. Thus for
1368 example this does not work:
1369 <screen>
1370 SELECT make_array(1, 2.5) AS numericarray;
1371 ERROR: function make_array(integer, numeric) does not exist
1372 </screen>
1373 An alternative approach is to use the <quote>common</quote> family of
1374 polymorphic types, which allows the system to try to identify a
1375 suitable common type:
1376 <screen>
1377 CREATE FUNCTION make_array2(anycompatible, anycompatible)
1378 RETURNS anycompatiblearray AS $$
1379 SELECT ARRAY[$1, $2];
1380 $$ LANGUAGE SQL;
1382 SELECT make_array2(1, 2.5) AS numericarray;
1383 numericarray
1384 --------------
1385 {1,2.5}
1386 (1 row)
1387 </screen>
1388 Because the rules for common type resolution default to choosing
1389 type <type>text</type> when all inputs are of unknown types, this
1390 also works:
1391 <screen>
1392 SELECT make_array2('a', 'b') AS textarray;
1393 textarray
1394 -----------
1395 {a,b}
1396 (1 row)
1397 </screen>
1398 </para>
1400 <para>
1401 It is permitted to have polymorphic arguments with a fixed
1402 return type, but the converse is not. For example:
1403 <screen>
1404 CREATE FUNCTION is_greater(anyelement, anyelement) RETURNS boolean AS $$
1405 SELECT $1 &gt; $2;
1406 $$ LANGUAGE SQL;
1408 SELECT is_greater(1, 2);
1409 is_greater
1410 ------------
1412 (1 row)
1414 CREATE FUNCTION invalid_func() RETURNS anyelement AS $$
1415 SELECT 1;
1416 $$ LANGUAGE SQL;
1417 ERROR: cannot determine result data type
1418 DETAIL: A result of type anyelement requires at least one input of type anyelement, anyarray, anynonarray, anyenum, or anyrange.
1419 </screen>
1420 </para>
1422 <para>
1423 Polymorphism can be used with functions that have output arguments.
1424 For example:
1425 <screen>
1426 CREATE FUNCTION dup (f1 anyelement, OUT f2 anyelement, OUT f3 anyarray)
1427 AS 'select $1, array[$1,$1]' LANGUAGE SQL;
1429 SELECT * FROM dup(22);
1430 f2 | f3
1431 ----+---------
1432 22 | {22,22}
1433 (1 row)
1434 </screen>
1435 </para>
1437 <para>
1438 Polymorphism can also be used with variadic functions.
1439 For example:
1440 <screen>
1441 CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
1442 SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
1443 $$ LANGUAGE SQL;
1445 SELECT anyleast(10, -1, 5, 4);
1446 anyleast
1447 ----------
1449 (1 row)
1451 SELECT anyleast('abc'::text, 'def');
1452 anyleast
1453 ----------
1455 (1 row)
1457 CREATE FUNCTION concat_values(text, VARIADIC anyarray) RETURNS text AS $$
1458 SELECT array_to_string($2, $1);
1459 $$ LANGUAGE SQL;
1461 SELECT concat_values('|', 1, 4, 2);
1462 concat_values
1463 ---------------
1464 1|4|2
1465 (1 row)
1466 </screen>
1467 </para>
1468 </sect2>
1470 <sect2 id="xfunc-sql-collations">
1471 <title><acronym>SQL</acronym> Functions with Collations</title>
1473 <indexterm>
1474 <primary>collation</primary>
1475 <secondary>in SQL functions</secondary>
1476 </indexterm>
1478 <para>
1479 When an SQL function has one or more parameters of collatable data types,
1480 a collation is identified for each function call depending on the
1481 collations assigned to the actual arguments, as described in <xref
1482 linkend="collation"/>. If a collation is successfully identified
1483 (i.e., there are no conflicts of implicit collations among the arguments)
1484 then all the collatable parameters are treated as having that collation
1485 implicitly. This will affect the behavior of collation-sensitive
1486 operations within the function. For example, using the
1487 <function>anyleast</function> function described above, the result of
1488 <programlisting>
1489 SELECT anyleast('abc'::text, 'ABC');
1490 </programlisting>
1491 will depend on the database's default collation. In <literal>C</literal> locale
1492 the result will be <literal>ABC</literal>, but in many other locales it will
1493 be <literal>abc</literal>. The collation to use can be forced by adding
1494 a <literal>COLLATE</literal> clause to any of the arguments, for example
1495 <programlisting>
1496 SELECT anyleast('abc'::text, 'ABC' COLLATE "C");
1497 </programlisting>
1498 Alternatively, if you wish a function to operate with a particular
1499 collation regardless of what it is called with, insert
1500 <literal>COLLATE</literal> clauses as needed in the function definition.
1501 This version of <function>anyleast</function> would always use <literal>en_US</literal>
1502 locale to compare strings:
1503 <programlisting>
1504 CREATE FUNCTION anyleast (VARIADIC anyarray) RETURNS anyelement AS $$
1505 SELECT min($1[i] COLLATE "en_US") FROM generate_subscripts($1, 1) g(i);
1506 $$ LANGUAGE SQL;
1507 </programlisting>
1508 But note that this will throw an error if applied to a non-collatable
1509 data type.
1510 </para>
1512 <para>
1513 If no common collation can be identified among the actual arguments,
1514 then an SQL function treats its parameters as having their data types'
1515 default collation (which is usually the database's default collation,
1516 but could be different for parameters of domain types).
1517 </para>
1519 <para>
1520 The behavior of collatable parameters can be thought of as a limited
1521 form of polymorphism, applicable only to textual data types.
1522 </para>
1523 </sect2>
1524 </sect1>
1526 <sect1 id="xfunc-overload">
1527 <title>Function Overloading</title>
1529 <indexterm zone="xfunc-overload">
1530 <primary>overloading</primary>
1531 <secondary>functions</secondary>
1532 </indexterm>
1534 <para>
1535 More than one function can be defined with the same SQL name, so long
1536 as the arguments they take are different. In other words,
1537 function names can be <firstterm>overloaded</firstterm>. Whether or not
1538 you use it, this capability entails security precautions when calling
1539 functions in databases where some users mistrust other users; see
1540 <xref linkend="typeconv-func"/>. When a query is executed, the server
1541 will determine which function to call from the data types and the number
1542 of the provided arguments. Overloading can also be used to simulate
1543 functions with a variable number of arguments, up to a finite maximum
1544 number.
1545 </para>
1547 <para>
1548 When creating a family of overloaded functions, one should be
1549 careful not to create ambiguities. For instance, given the
1550 functions:
1551 <programlisting>
1552 CREATE FUNCTION test(int, real) RETURNS ...
1553 CREATE FUNCTION test(smallint, double precision) RETURNS ...
1554 </programlisting>
1555 it is not immediately clear which function would be called with
1556 some trivial input like <literal>test(1, 1.5)</literal>. The
1557 currently implemented resolution rules are described in
1558 <xref linkend="typeconv"/>, but it is unwise to design a system that subtly
1559 relies on this behavior.
1560 </para>
1562 <para>
1563 A function that takes a single argument of a composite type should
1564 generally not have the same name as any attribute (field) of that type.
1565 Recall that <literal><replaceable>attribute</replaceable>(<replaceable>table</replaceable>)</literal>
1566 is considered equivalent
1567 to <literal><replaceable>table</replaceable>.<replaceable>attribute</replaceable></literal>.
1568 In the case that there is an
1569 ambiguity between a function on a composite type and an attribute of
1570 the composite type, the attribute will always be used. It is possible
1571 to override that choice by schema-qualifying the function name
1572 (that is, <literal><replaceable>schema</replaceable>.<replaceable>func</replaceable>(<replaceable>table</replaceable>)
1573 </literal>) but it's better to
1574 avoid the problem by not choosing conflicting names.
1575 </para>
1577 <para>
1578 Another possible conflict is between variadic and non-variadic functions.
1579 For instance, it is possible to create both <literal>foo(numeric)</literal> and
1580 <literal>foo(VARIADIC numeric[])</literal>. In this case it is unclear which one
1581 should be matched to a call providing a single numeric argument, such as
1582 <literal>foo(10.1)</literal>. The rule is that the function appearing
1583 earlier in the search path is used, or if the two functions are in the
1584 same schema, the non-variadic one is preferred.
1585 </para>
1587 <para>
1588 When overloading C-language functions, there is an additional
1589 constraint: The C name of each function in the family of
1590 overloaded functions must be different from the C names of all
1591 other functions, either internal or dynamically loaded. If this
1592 rule is violated, the behavior is not portable. You might get a
1593 run-time linker error, or one of the functions will get called
1594 (usually the internal one). The alternative form of the
1595 <literal>AS</literal> clause for the SQL <command>CREATE
1596 FUNCTION</command> command decouples the SQL function name from
1597 the function name in the C source code. For instance:
1598 <programlisting>
1599 CREATE FUNCTION test(int) RETURNS int
1600 AS '<replaceable>filename</replaceable>', 'test_1arg'
1601 LANGUAGE C;
1602 CREATE FUNCTION test(int, int) RETURNS int
1603 AS '<replaceable>filename</replaceable>', 'test_2arg'
1604 LANGUAGE C;
1605 </programlisting>
1606 The names of the C functions here reflect one of many possible conventions.
1607 </para>
1608 </sect1>
1610 <sect1 id="xfunc-volatility">
1611 <title>Function Volatility Categories</title>
1613 <indexterm zone="xfunc-volatility">
1614 <primary>volatility</primary>
1615 <secondary>functions</secondary>
1616 </indexterm>
1617 <indexterm zone="xfunc-volatility">
1618 <primary>VOLATILE</primary>
1619 </indexterm>
1620 <indexterm zone="xfunc-volatility">
1621 <primary>STABLE</primary>
1622 </indexterm>
1623 <indexterm zone="xfunc-volatility">
1624 <primary>IMMUTABLE</primary>
1625 </indexterm>
1627 <para>
1628 Every function has a <firstterm>volatility</firstterm> classification, with
1629 the possibilities being <literal>VOLATILE</literal>, <literal>STABLE</literal>, or
1630 <literal>IMMUTABLE</literal>. <literal>VOLATILE</literal> is the default if the
1631 <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link>
1632 command does not specify a category. The volatility category is a
1633 promise to the optimizer about the behavior of the function:
1635 <itemizedlist>
1636 <listitem>
1637 <para>
1638 A <literal>VOLATILE</literal> function can do anything, including modifying
1639 the database. It can return different results on successive calls with
1640 the same arguments. The optimizer makes no assumptions about the
1641 behavior of such functions. A query using a volatile function will
1642 re-evaluate the function at every row where its value is needed.
1643 </para>
1644 </listitem>
1645 <listitem>
1646 <para>
1647 A <literal>STABLE</literal> function cannot modify the database and is
1648 guaranteed to return the same results given the same arguments
1649 for all rows within a single statement. This category allows the
1650 optimizer to optimize multiple calls of the function to a single
1651 call. In particular, it is safe to use an expression containing
1652 such a function in an index scan condition. (Since an index scan
1653 will evaluate the comparison value only once, not once at each
1654 row, it is not valid to use a <literal>VOLATILE</literal> function in an
1655 index scan condition.)
1656 </para>
1657 </listitem>
1658 <listitem>
1659 <para>
1660 An <literal>IMMUTABLE</literal> function cannot modify the database and is
1661 guaranteed to return the same results given the same arguments forever.
1662 This category allows the optimizer to pre-evaluate the function when
1663 a query calls it with constant arguments. For example, a query like
1664 <literal>SELECT ... WHERE x = 2 + 2</literal> can be simplified on sight to
1665 <literal>SELECT ... WHERE x = 4</literal>, because the function underlying
1666 the integer addition operator is marked <literal>IMMUTABLE</literal>.
1667 </para>
1668 </listitem>
1669 </itemizedlist>
1670 </para>
1672 <para>
1673 For best optimization results, you should label your functions with the
1674 strictest volatility category that is valid for them.
1675 </para>
1677 <para>
1678 Any function with side-effects <emphasis>must</emphasis> be labeled
1679 <literal>VOLATILE</literal>, so that calls to it cannot be optimized away.
1680 Even a function with no side-effects needs to be labeled
1681 <literal>VOLATILE</literal> if its value can change within a single query;
1682 some examples are <literal>random()</literal>, <literal>currval()</literal>,
1683 <literal>timeofday()</literal>.
1684 </para>
1686 <para>
1687 Another important example is that the <function>current_timestamp</function>
1688 family of functions qualify as <literal>STABLE</literal>, since their values do
1689 not change within a transaction.
1690 </para>
1692 <para>
1693 There is relatively little difference between <literal>STABLE</literal> and
1694 <literal>IMMUTABLE</literal> categories when considering simple interactive
1695 queries that are planned and immediately executed: it doesn't matter
1696 a lot whether a function is executed once during planning or once during
1697 query execution startup. But there is a big difference if the plan is
1698 saved and reused later. Labeling a function <literal>IMMUTABLE</literal> when
1699 it really isn't might allow it to be prematurely folded to a constant during
1700 planning, resulting in a stale value being re-used during subsequent uses
1701 of the plan. This is a hazard when using prepared statements or when
1702 using function languages that cache plans (such as
1703 <application>PL/pgSQL</application>).
1704 </para>
1706 <para>
1707 For functions written in SQL or in any of the standard procedural
1708 languages, there is a second important property determined by the
1709 volatility category, namely the visibility of any data changes that have
1710 been made by the SQL command that is calling the function. A
1711 <literal>VOLATILE</literal> function will see such changes, a <literal>STABLE</literal>
1712 or <literal>IMMUTABLE</literal> function will not. This behavior is implemented
1713 using the snapshotting behavior of MVCC (see <xref linkend="mvcc"/>):
1714 <literal>STABLE</literal> and <literal>IMMUTABLE</literal> functions use a snapshot
1715 established as of the start of the calling query, whereas
1716 <literal>VOLATILE</literal> functions obtain a fresh snapshot at the start of
1717 each query they execute.
1718 </para>
1720 <note>
1721 <para>
1722 Functions written in C can manage snapshots however they want, but it's
1723 usually a good idea to make C functions work this way too.
1724 </para>
1725 </note>
1727 <para>
1728 Because of this snapshotting behavior,
1729 a function containing only <command>SELECT</command> commands can safely be
1730 marked <literal>STABLE</literal>, even if it selects from tables that might be
1731 undergoing modifications by concurrent queries.
1732 <productname>PostgreSQL</productname> will execute all commands of a
1733 <literal>STABLE</literal> function using the snapshot established for the
1734 calling query, and so it will see a fixed view of the database throughout
1735 that query.
1736 </para>
1738 <para>
1739 The same snapshotting behavior is used for <command>SELECT</command> commands
1740 within <literal>IMMUTABLE</literal> functions. It is generally unwise to select
1741 from database tables within an <literal>IMMUTABLE</literal> function at all,
1742 since the immutability will be broken if the table contents ever change.
1743 However, <productname>PostgreSQL</productname> does not enforce that you
1744 do not do that.
1745 </para>
1747 <para>
1748 A common error is to label a function <literal>IMMUTABLE</literal> when its
1749 results depend on a configuration parameter. For example, a function
1750 that manipulates timestamps might well have results that depend on the
1751 <xref linkend="guc-timezone"/> setting. For safety, such functions should
1752 be labeled <literal>STABLE</literal> instead.
1753 </para>
1755 <note>
1756 <para>
1757 <productname>PostgreSQL</productname> requires that <literal>STABLE</literal>
1758 and <literal>IMMUTABLE</literal> functions contain no SQL commands other
1759 than <command>SELECT</command> to prevent data modification.
1760 (This is not a completely bulletproof test, since such functions could
1761 still call <literal>VOLATILE</literal> functions that modify the database.
1762 If you do that, you will find that the <literal>STABLE</literal> or
1763 <literal>IMMUTABLE</literal> function does not notice the database changes
1764 applied by the called function, since they are hidden from its snapshot.)
1765 </para>
1766 </note>
1767 </sect1>
1769 <sect1 id="xfunc-pl">
1770 <title>Procedural Language Functions</title>
1772 <para>
1773 <productname>PostgreSQL</productname> allows user-defined functions
1774 to be written in other languages besides SQL and C. These other
1775 languages are generically called <firstterm>procedural
1776 languages</firstterm> (<acronym>PL</acronym>s).
1777 Procedural languages aren't built into the
1778 <productname>PostgreSQL</productname> server; they are offered
1779 by loadable modules.
1780 See <xref linkend="xplang"/> and following chapters for more
1781 information.
1782 </para>
1783 </sect1>
1785 <sect1 id="xfunc-internal">
1786 <title>Internal Functions</title>
1788 <indexterm zone="xfunc-internal"><primary>function</primary><secondary>internal</secondary></indexterm>
1790 <para>
1791 Internal functions are functions written in C that have been statically
1792 linked into the <productname>PostgreSQL</productname> server.
1793 The <quote>body</quote> of the function definition
1794 specifies the C-language name of the function, which need not be the
1795 same as the name being declared for SQL use.
1796 (For reasons of backward compatibility, an empty body
1797 is accepted as meaning that the C-language function name is the
1798 same as the SQL name.)
1799 </para>
1801 <para>
1802 Normally, all internal functions present in the
1803 server are declared during the initialization of the database cluster
1804 (see <xref linkend="creating-cluster"/>),
1805 but a user could use <command>CREATE FUNCTION</command>
1806 to create additional alias names for an internal function.
1807 Internal functions are declared in <command>CREATE FUNCTION</command>
1808 with language name <literal>internal</literal>. For instance, to
1809 create an alias for the <function>sqrt</function> function:
1810 <programlisting>
1811 CREATE FUNCTION square_root(double precision) RETURNS double precision
1812 AS 'dsqrt'
1813 LANGUAGE internal
1814 STRICT;
1815 </programlisting>
1816 (Most internal functions expect to be declared <quote>strict</quote>.)
1817 </para>
1819 <note>
1820 <para>
1821 Not all <quote>predefined</quote> functions are
1822 <quote>internal</quote> in the above sense. Some predefined
1823 functions are written in SQL.
1824 </para>
1825 </note>
1826 </sect1>
1828 <sect1 id="xfunc-c">
1829 <title>C-Language Functions</title>
1831 <indexterm zone="xfunc-c">
1832 <primary>function</primary>
1833 <secondary>user-defined</secondary>
1834 <tertiary>in C</tertiary>
1835 </indexterm>
1837 <para>
1838 User-defined functions can be written in C (or a language that can
1839 be made compatible with C, such as C++). Such functions are
1840 compiled into dynamically loadable objects (also called shared
1841 libraries) and are loaded by the server on demand. The dynamic
1842 loading feature is what distinguishes <quote>C language</quote> functions
1843 from <quote>internal</quote> functions &mdash; the actual coding conventions
1844 are essentially the same for both. (Hence, the standard internal
1845 function library is a rich source of coding examples for user-defined
1846 C functions.)
1847 </para>
1849 <para>
1850 Currently only one calling convention is used for C functions
1851 (<quote>version 1</quote>). Support for that calling convention is
1852 indicated by writing a <literal>PG_FUNCTION_INFO_V1()</literal> macro
1853 call for the function, as illustrated below.
1854 </para>
1856 <sect2 id="xfunc-c-dynload">
1857 <title>Dynamic Loading</title>
1859 <indexterm zone="xfunc-c-dynload">
1860 <primary>dynamic loading</primary>
1861 </indexterm>
1863 <para>
1864 The first time a user-defined function in a particular
1865 loadable object file is called in a session,
1866 the dynamic loader loads that object file into memory so that the
1867 function can be called. The <command>CREATE FUNCTION</command>
1868 for a user-defined C function must therefore specify two pieces of
1869 information for the function: the name of the loadable
1870 object file, and the C name (link symbol) of the specific function to call
1871 within that object file. If the C name is not explicitly specified then
1872 it is assumed to be the same as the SQL function name.
1873 </para>
1875 <para>
1876 The following algorithm is used to locate the shared object file
1877 based on the name given in the <command>CREATE FUNCTION</command>
1878 command:
1880 <orderedlist>
1881 <listitem>
1882 <para>
1883 If the name is an absolute path, the given file is loaded.
1884 </para>
1885 </listitem>
1887 <listitem>
1888 <para>
1889 If the name starts with the string <literal>$libdir</literal>,
1890 that part is replaced by the <productname>PostgreSQL</productname> package
1891 library directory
1892 name, which is determined at build time.<indexterm><primary>$libdir</primary></indexterm>
1893 </para>
1894 </listitem>
1896 <listitem>
1897 <para>
1898 If the name does not contain a directory part, the file is
1899 searched for in the path specified by the configuration variable
1900 <xref linkend="guc-dynamic-library-path"/>.<indexterm><primary>dynamic_library_path</primary></indexterm>
1901 </para>
1902 </listitem>
1904 <listitem>
1905 <para>
1906 Otherwise (the file was not found in the path, or it contains a
1907 non-absolute directory part), the dynamic loader will try to
1908 take the name as given, which will most likely fail. (It is
1909 unreliable to depend on the current working directory.)
1910 </para>
1911 </listitem>
1912 </orderedlist>
1914 If this sequence does not work, the platform-specific shared
1915 library file name extension (often <filename>.so</filename>) is
1916 appended to the given name and this sequence is tried again. If
1917 that fails as well, the load will fail.
1918 </para>
1920 <para>
1921 It is recommended to locate shared libraries either relative to
1922 <literal>$libdir</literal> or through the dynamic library path.
1923 This simplifies version upgrades if the new installation is at a
1924 different location. The actual directory that
1925 <literal>$libdir</literal> stands for can be found out with the
1926 command <literal>pg_config --pkglibdir</literal>.
1927 </para>
1929 <para>
1930 The user ID the <productname>PostgreSQL</productname> server runs
1931 as must be able to traverse the path to the file you intend to
1932 load. Making the file or a higher-level directory not readable
1933 and/or not executable by the <systemitem>postgres</systemitem>
1934 user is a common mistake.
1935 </para>
1937 <para>
1938 In any case, the file name that is given in the
1939 <command>CREATE FUNCTION</command> command is recorded literally
1940 in the system catalogs, so if the file needs to be loaded again
1941 the same procedure is applied.
1942 </para>
1944 <note>
1945 <para>
1946 <productname>PostgreSQL</productname> will not compile a C function
1947 automatically. The object file must be compiled before it is referenced
1948 in a <command>CREATE
1949 FUNCTION</command> command. See <xref linkend="dfunc"/> for additional
1950 information.
1951 </para>
1952 </note>
1954 <indexterm zone="xfunc-c-dynload">
1955 <primary>magic block</primary>
1956 </indexterm>
1958 <para>
1959 To ensure that a dynamically loaded object file is not loaded into an
1960 incompatible server, <productname>PostgreSQL</productname> checks that the
1961 file contains a <quote>magic block</quote> with the appropriate contents.
1962 This allows the server to detect obvious incompatibilities, such as code
1963 compiled for a different major version of
1964 <productname>PostgreSQL</productname>. To include a magic block,
1965 write this in one (and only one) of the module source files, after having
1966 included the header <filename>fmgr.h</filename>:
1968 <programlisting>
1969 PG_MODULE_MAGIC;
1970 </programlisting>
1971 </para>
1973 <para>
1974 After it is used for the first time, a dynamically loaded object
1975 file is retained in memory. Future calls in the same session to
1976 the function(s) in that file will only incur the small overhead of
1977 a symbol table lookup. If you need to force a reload of an object
1978 file, for example after recompiling it, begin a fresh session.
1979 </para>
1981 <indexterm zone="xfunc-c-dynload">
1982 <primary>_PG_init</primary>
1983 </indexterm>
1984 <indexterm zone="xfunc-c-dynload">
1985 <primary>library initialization function</primary>
1986 </indexterm>
1988 <para>
1989 Optionally, a dynamically loaded file can contain an initialization
1990 function. If the file includes a function named
1991 <function>_PG_init</function>, that function will be called immediately after
1992 loading the file. The function receives no parameters and should
1993 return void. There is presently no way to unload a dynamically loaded file.
1994 </para>
1996 </sect2>
1998 <sect2 id="xfunc-c-basetype">
1999 <title>Base Types in C-Language Functions</title>
2001 <indexterm zone="xfunc-c-basetype">
2002 <primary>data type</primary>
2003 <secondary>internal organization</secondary>
2004 </indexterm>
2006 <para>
2007 To know how to write C-language functions, you need to know how
2008 <productname>PostgreSQL</productname> internally represents base
2009 data types and how they can be passed to and from functions.
2010 Internally, <productname>PostgreSQL</productname> regards a base
2011 type as a <quote>blob of memory</quote>. The user-defined
2012 functions that you define over a type in turn define the way that
2013 <productname>PostgreSQL</productname> can operate on it. That
2014 is, <productname>PostgreSQL</productname> will only store and
2015 retrieve the data from disk and use your user-defined functions
2016 to input, process, and output the data.
2017 </para>
2019 <para>
2020 Base types can have one of three internal formats:
2022 <itemizedlist>
2023 <listitem>
2024 <para>
2025 pass by value, fixed-length
2026 </para>
2027 </listitem>
2028 <listitem>
2029 <para>
2030 pass by reference, fixed-length
2031 </para>
2032 </listitem>
2033 <listitem>
2034 <para>
2035 pass by reference, variable-length
2036 </para>
2037 </listitem>
2038 </itemizedlist>
2039 </para>
2041 <para>
2042 By-value types can only be 1, 2, or 4 bytes in length
2043 (also 8 bytes, if <literal>sizeof(Datum)</literal> is 8 on your machine).
2044 You should be careful to define your types such that they will be the
2045 same size (in bytes) on all architectures. For example, the
2046 <literal>long</literal> type is dangerous because it is 4 bytes on some
2047 machines and 8 bytes on others, whereas <type>int</type> type is 4 bytes
2048 on most Unix machines. A reasonable implementation of the
2049 <type>int4</type> type on Unix machines might be:
2051 <programlisting>
2052 /* 4-byte integer, passed by value */
2053 typedef int int4;
2054 </programlisting>
2056 (The actual PostgreSQL C code calls this type <type>int32</type>, because
2057 it is a convention in C that <type>int<replaceable>XX</replaceable></type>
2058 means <replaceable>XX</replaceable> <emphasis>bits</emphasis>. Note
2059 therefore also that the C type <type>int8</type> is 1 byte in size. The
2060 SQL type <type>int8</type> is called <type>int64</type> in C. See also
2061 <xref linkend="xfunc-c-type-table"/>.)
2062 </para>
2064 <para>
2065 On the other hand, fixed-length types of any size can
2066 be passed by-reference. For example, here is a sample
2067 implementation of a <productname>PostgreSQL</productname> type:
2069 <programlisting>
2070 /* 16-byte structure, passed by reference */
2071 typedef struct
2073 double x, y;
2074 } Point;
2075 </programlisting>
2077 Only pointers to such types can be used when passing
2078 them in and out of <productname>PostgreSQL</productname> functions.
2079 To return a value of such a type, allocate the right amount of
2080 memory with <literal>palloc</literal>, fill in the allocated memory,
2081 and return a pointer to it. (Also, if you just want to return the
2082 same value as one of your input arguments that's of the same data type,
2083 you can skip the extra <literal>palloc</literal> and just return the
2084 pointer to the input value.)
2085 </para>
2087 <para>
2088 Finally, all variable-length types must also be passed
2089 by reference. All variable-length types must begin
2090 with an opaque length field of exactly 4 bytes, which will be set
2091 by <symbol>SET_VARSIZE</symbol>; never set this field directly! All data to
2092 be stored within that type must be located in the memory
2093 immediately following that length field. The
2094 length field contains the total length of the structure,
2095 that is, it includes the size of the length field
2096 itself.
2097 </para>
2099 <para>
2100 Another important point is to avoid leaving any uninitialized bits
2101 within data type values; for example, take care to zero out any
2102 alignment padding bytes that might be present in structs. Without
2103 this, logically-equivalent constants of your data type might be
2104 seen as unequal by the planner, leading to inefficient (though not
2105 incorrect) plans.
2106 </para>
2108 <warning>
2109 <para>
2110 <emphasis>Never</emphasis> modify the contents of a pass-by-reference input
2111 value. If you do so you are likely to corrupt on-disk data, since
2112 the pointer you are given might point directly into a disk buffer.
2113 The sole exception to this rule is explained in
2114 <xref linkend="xaggr"/>.
2115 </para>
2116 </warning>
2118 <para>
2119 As an example, we can define the type <type>text</type> as
2120 follows:
2122 <programlisting>
2123 typedef struct {
2124 int32 length;
2125 char data[FLEXIBLE_ARRAY_MEMBER];
2126 } text;
2127 </programlisting>
2129 The <literal>[FLEXIBLE_ARRAY_MEMBER]</literal> notation means that the actual
2130 length of the data part is not specified by this declaration.
2131 </para>
2133 <para>
2134 When manipulating
2135 variable-length types, we must be careful to allocate
2136 the correct amount of memory and set the length field correctly.
2137 For example, if we wanted to store 40 bytes in a <structname>text</structname>
2138 structure, we might use a code fragment like this:
2140 <programlisting><![CDATA[
2141 #include "postgres.h"
2143 char buffer[40]; /* our source data */
2145 text *destination = (text *) palloc(VARHDRSZ + 40);
2146 SET_VARSIZE(destination, VARHDRSZ + 40);
2147 memcpy(destination->data, buffer, 40);
2150 </programlisting>
2152 <literal>VARHDRSZ</literal> is the same as <literal>sizeof(int32)</literal>, but
2153 it's considered good style to use the macro <literal>VARHDRSZ</literal>
2154 to refer to the size of the overhead for a variable-length type.
2155 Also, the length field <emphasis>must</emphasis> be set using the
2156 <literal>SET_VARSIZE</literal> macro, not by simple assignment.
2157 </para>
2159 <para>
2160 <xref linkend="xfunc-c-type-table"/> shows the C types
2161 corresponding to many of the built-in SQL data types
2162 of <productname>PostgreSQL</productname>.
2163 The <quote>Defined In</quote> column gives the header file that
2164 needs to be included to get the type definition. (The actual
2165 definition might be in a different file that is included by the
2166 listed file. It is recommended that users stick to the defined
2167 interface.) Note that you should always include
2168 <filename>postgres.h</filename> first in any source file of server
2169 code, because it declares a number of things that you will need
2170 anyway, and because including other headers first can cause
2171 portability issues.
2172 </para>
2174 <table tocentry="1" id="xfunc-c-type-table">
2175 <title>Equivalent C Types for Built-in SQL Types</title>
2176 <tgroup cols="3">
2177 <colspec colname="col1" colwidth="1*"/>
2178 <colspec colname="col2" colwidth="1*"/>
2179 <colspec colname="col3" colwidth="2*"/>
2180 <thead>
2181 <row>
2182 <entry>
2183 SQL Type
2184 </entry>
2185 <entry>
2186 C Type
2187 </entry>
2188 <entry>
2189 Defined In
2190 </entry>
2191 </row>
2192 </thead>
2193 <tbody>
2194 <row>
2195 <entry><type>boolean</type></entry>
2196 <entry><type>bool</type></entry>
2197 <entry><filename>postgres.h</filename> (maybe compiler built-in)</entry>
2198 </row>
2199 <row>
2200 <entry><type>box</type></entry>
2201 <entry><type>BOX*</type></entry>
2202 <entry><filename>utils/geo_decls.h</filename></entry>
2203 </row>
2204 <row>
2205 <entry><type>bytea</type></entry>
2206 <entry><type>bytea*</type></entry>
2207 <entry><filename>postgres.h</filename></entry>
2208 </row>
2209 <row>
2210 <entry><type>"char"</type></entry>
2211 <entry><type>char</type></entry>
2212 <entry>(compiler built-in)</entry>
2213 </row>
2214 <row>
2215 <entry><type>character</type></entry>
2216 <entry><type>BpChar*</type></entry>
2217 <entry><filename>postgres.h</filename></entry>
2218 </row>
2219 <row>
2220 <entry><type>cid</type></entry>
2221 <entry><type>CommandId</type></entry>
2222 <entry><filename>postgres.h</filename></entry>
2223 </row>
2224 <row>
2225 <entry><type>date</type></entry>
2226 <entry><type>DateADT</type></entry>
2227 <entry><filename>utils/date.h</filename></entry>
2228 </row>
2229 <row>
2230 <entry><type>float4</type> (<type>real</type>)</entry>
2231 <entry><type>float4</type></entry>
2232 <entry><filename>postgres.h</filename></entry>
2233 </row>
2234 <row>
2235 <entry><type>float8</type> (<type>double precision</type>)</entry>
2236 <entry><type>float8</type></entry>
2237 <entry><filename>postgres.h</filename></entry>
2238 </row>
2239 <row>
2240 <entry><type>int2</type> (<type>smallint</type>)</entry>
2241 <entry><type>int16</type></entry>
2242 <entry><filename>postgres.h</filename></entry>
2243 </row>
2244 <row>
2245 <entry><type>int4</type> (<type>integer</type>)</entry>
2246 <entry><type>int32</type></entry>
2247 <entry><filename>postgres.h</filename></entry>
2248 </row>
2249 <row>
2250 <entry><type>int8</type> (<type>bigint</type>)</entry>
2251 <entry><type>int64</type></entry>
2252 <entry><filename>postgres.h</filename></entry>
2253 </row>
2254 <row>
2255 <entry><type>interval</type></entry>
2256 <entry><type>Interval*</type></entry>
2257 <entry><filename>datatype/timestamp.h</filename></entry>
2258 </row>
2259 <row>
2260 <entry><type>lseg</type></entry>
2261 <entry><type>LSEG*</type></entry>
2262 <entry><filename>utils/geo_decls.h</filename></entry>
2263 </row>
2264 <row>
2265 <entry><type>name</type></entry>
2266 <entry><type>Name</type></entry>
2267 <entry><filename>postgres.h</filename></entry>
2268 </row>
2269 <row>
2270 <entry><type>numeric</type></entry>
2271 <entry><type>Numeric</type></entry>
2272 <entry><filename>utils/numeric.h</filename></entry>
2273 </row>
2274 <row>
2275 <entry><type>oid</type></entry>
2276 <entry><type>Oid</type></entry>
2277 <entry><filename>postgres.h</filename></entry>
2278 </row>
2279 <row>
2280 <entry><type>oidvector</type></entry>
2281 <entry><type>oidvector*</type></entry>
2282 <entry><filename>postgres.h</filename></entry>
2283 </row>
2284 <row>
2285 <entry><type>path</type></entry>
2286 <entry><type>PATH*</type></entry>
2287 <entry><filename>utils/geo_decls.h</filename></entry>
2288 </row>
2289 <row>
2290 <entry><type>point</type></entry>
2291 <entry><type>POINT*</type></entry>
2292 <entry><filename>utils/geo_decls.h</filename></entry>
2293 </row>
2294 <row>
2295 <entry><type>regproc</type></entry>
2296 <entry><type>RegProcedure</type></entry>
2297 <entry><filename>postgres.h</filename></entry>
2298 </row>
2299 <row>
2300 <entry><type>text</type></entry>
2301 <entry><type>text*</type></entry>
2302 <entry><filename>postgres.h</filename></entry>
2303 </row>
2304 <row>
2305 <entry><type>tid</type></entry>
2306 <entry><type>ItemPointer</type></entry>
2307 <entry><filename>storage/itemptr.h</filename></entry>
2308 </row>
2309 <row>
2310 <entry><type>time</type></entry>
2311 <entry><type>TimeADT</type></entry>
2312 <entry><filename>utils/date.h</filename></entry>
2313 </row>
2314 <row>
2315 <entry><type>time with time zone</type></entry>
2316 <entry><type>TimeTzADT</type></entry>
2317 <entry><filename>utils/date.h</filename></entry>
2318 </row>
2319 <row>
2320 <entry><type>timestamp</type></entry>
2321 <entry><type>Timestamp</type></entry>
2322 <entry><filename>datatype/timestamp.h</filename></entry>
2323 </row>
2324 <row>
2325 <entry><type>timestamp with time zone</type></entry>
2326 <entry><type>TimestampTz</type></entry>
2327 <entry><filename>datatype/timestamp.h</filename></entry>
2328 </row>
2329 <row>
2330 <entry><type>varchar</type></entry>
2331 <entry><type>VarChar*</type></entry>
2332 <entry><filename>postgres.h</filename></entry>
2333 </row>
2334 <row>
2335 <entry><type>xid</type></entry>
2336 <entry><type>TransactionId</type></entry>
2337 <entry><filename>postgres.h</filename></entry>
2338 </row>
2339 </tbody>
2340 </tgroup>
2341 </table>
2343 <para>
2344 Now that we've gone over all of the possible structures
2345 for base types, we can show some examples of real functions.
2346 </para>
2347 </sect2>
2349 <sect2 id="xfunc-c-v1-call-conv">
2350 <title>Version 1 Calling Conventions</title>
2352 <para>
2353 The version-1 calling convention relies on macros to suppress most
2354 of the complexity of passing arguments and results. The C declaration
2355 of a version-1 function is always:
2356 <programlisting>
2357 Datum funcname(PG_FUNCTION_ARGS)
2358 </programlisting>
2359 In addition, the macro call:
2360 <programlisting>
2361 PG_FUNCTION_INFO_V1(funcname);
2362 </programlisting>
2363 must appear in the same source file. (Conventionally, it's
2364 written just before the function itself.) This macro call is not
2365 needed for <literal>internal</literal>-language functions, since
2366 <productname>PostgreSQL</productname> assumes that all internal functions
2367 use the version-1 convention. It is, however, required for
2368 dynamically-loaded functions.
2369 </para>
2371 <para>
2372 In a version-1 function, each actual argument is fetched using a
2373 <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2374 macro that corresponds to the argument's data type. (In non-strict
2375 functions there needs to be a previous check about argument null-ness
2376 using <function>PG_ARGISNULL()</function>; see below.)
2377 The result is returned using a
2378 <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2379 macro for the return type.
2380 <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2381 takes as its argument the number of the function argument to
2382 fetch, where the count starts at 0.
2383 <function>PG_RETURN_<replaceable>xxx</replaceable>()</function>
2384 takes as its argument the actual value to return.
2385 </para>
2387 <para>
2388 Here are some examples using the version-1 calling convention:
2389 </para>
2391 <programlisting><![CDATA[
2392 #include "postgres.h"
2393 #include <string.h>
2394 #include "fmgr.h"
2395 #include "utils/geo_decls.h"
2396 #include "varatt.h"
2398 PG_MODULE_MAGIC;
2400 /* by value */
2402 PG_FUNCTION_INFO_V1(add_one);
2404 Datum
2405 add_one(PG_FUNCTION_ARGS)
2407 int32 arg = PG_GETARG_INT32(0);
2409 PG_RETURN_INT32(arg + 1);
2412 /* by reference, fixed length */
2414 PG_FUNCTION_INFO_V1(add_one_float8);
2416 Datum
2417 add_one_float8(PG_FUNCTION_ARGS)
2419 /* The macros for FLOAT8 hide its pass-by-reference nature. */
2420 float8 arg = PG_GETARG_FLOAT8(0);
2422 PG_RETURN_FLOAT8(arg + 1.0);
2425 PG_FUNCTION_INFO_V1(makepoint);
2427 Datum
2428 makepoint(PG_FUNCTION_ARGS)
2430 /* Here, the pass-by-reference nature of Point is not hidden. */
2431 Point *pointx = PG_GETARG_POINT_P(0);
2432 Point *pointy = PG_GETARG_POINT_P(1);
2433 Point *new_point = (Point *) palloc(sizeof(Point));
2435 new_point->x = pointx->x;
2436 new_point->y = pointy->y;
2438 PG_RETURN_POINT_P(new_point);
2441 /* by reference, variable length */
2443 PG_FUNCTION_INFO_V1(copytext);
2445 Datum
2446 copytext(PG_FUNCTION_ARGS)
2448 text *t = PG_GETARG_TEXT_PP(0);
2451 * VARSIZE_ANY_EXHDR is the size of the struct in bytes, minus the
2452 * VARHDRSZ or VARHDRSZ_SHORT of its header. Construct the copy with a
2453 * full-length header.
2455 text *new_t = (text *) palloc(VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
2456 SET_VARSIZE(new_t, VARSIZE_ANY_EXHDR(t) + VARHDRSZ);
2459 * VARDATA is a pointer to the data region of the new struct. The source
2460 * could be a short datum, so retrieve its data through VARDATA_ANY.
2462 memcpy(VARDATA(new_t), /* destination */
2463 VARDATA_ANY(t), /* source */
2464 VARSIZE_ANY_EXHDR(t)); /* how many bytes */
2465 PG_RETURN_TEXT_P(new_t);
2468 PG_FUNCTION_INFO_V1(concat_text);
2470 Datum
2471 concat_text(PG_FUNCTION_ARGS)
2473 text *arg1 = PG_GETARG_TEXT_PP(0);
2474 text *arg2 = PG_GETARG_TEXT_PP(1);
2475 int32 arg1_size = VARSIZE_ANY_EXHDR(arg1);
2476 int32 arg2_size = VARSIZE_ANY_EXHDR(arg2);
2477 int32 new_text_size = arg1_size + arg2_size + VARHDRSZ;
2478 text *new_text = (text *) palloc(new_text_size);
2480 SET_VARSIZE(new_text, new_text_size);
2481 memcpy(VARDATA(new_text), VARDATA_ANY(arg1), arg1_size);
2482 memcpy(VARDATA(new_text) + arg1_size, VARDATA_ANY(arg2), arg2_size);
2483 PG_RETURN_TEXT_P(new_text);
2486 </programlisting>
2488 <para>
2489 Supposing that the above code has been prepared in file
2490 <filename>funcs.c</filename> and compiled into a shared object,
2491 we could define the functions to <productname>PostgreSQL</productname>
2492 with commands like this:
2493 </para>
2495 <programlisting>
2496 CREATE FUNCTION add_one(integer) RETURNS integer
2497 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one'
2498 LANGUAGE C STRICT;
2500 -- note overloading of SQL function name "add_one"
2501 CREATE FUNCTION add_one(double precision) RETURNS double precision
2502 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'add_one_float8'
2503 LANGUAGE C STRICT;
2505 CREATE FUNCTION makepoint(point, point) RETURNS point
2506 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'makepoint'
2507 LANGUAGE C STRICT;
2509 CREATE FUNCTION copytext(text) RETURNS text
2510 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'copytext'
2511 LANGUAGE C STRICT;
2513 CREATE FUNCTION concat_text(text, text) RETURNS text
2514 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'concat_text'
2515 LANGUAGE C STRICT;
2516 </programlisting>
2518 <para>
2519 Here, <replaceable>DIRECTORY</replaceable> stands for the
2520 directory of the shared library file (for instance the
2521 <productname>PostgreSQL</productname> tutorial directory, which
2522 contains the code for the examples used in this section).
2523 (Better style would be to use just <literal>'funcs'</literal> in the
2524 <literal>AS</literal> clause, after having added
2525 <replaceable>DIRECTORY</replaceable> to the search path. In any
2526 case, we can omit the system-specific extension for a shared
2527 library, commonly <literal>.so</literal>.)
2528 </para>
2530 <para>
2531 Notice that we have specified the functions as <quote>strict</quote>,
2532 meaning that
2533 the system should automatically assume a null result if any input
2534 value is null. By doing this, we avoid having to check for null inputs
2535 in the function code. Without this, we'd have to check for null values
2536 explicitly, using <function>PG_ARGISNULL()</function>.
2537 </para>
2539 <para>
2540 The macro <function>PG_ARGISNULL(<replaceable>n</replaceable>)</function>
2541 allows a function to test whether each input is null. (Of course, doing
2542 this is only necessary in functions not declared <quote>strict</quote>.)
2543 As with the
2544 <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> macros,
2545 the input arguments are counted beginning at zero. Note that one
2546 should refrain from executing
2547 <function>PG_GETARG_<replaceable>xxx</replaceable>()</function> until
2548 one has verified that the argument isn't null.
2549 To return a null result, execute <function>PG_RETURN_NULL()</function>;
2550 this works in both strict and nonstrict functions.
2551 </para>
2553 <para>
2554 At first glance, the version-1 coding conventions might appear
2555 to be just pointless obscurantism, compared to using
2556 plain <literal>C</literal> calling conventions. They do however allow
2557 us to deal with <literal>NULL</literal>able arguments/return values,
2558 and <quote>toasted</quote> (compressed or out-of-line) values.
2559 </para>
2561 <para>
2562 Other options provided by the version-1 interface are two
2563 variants of the
2564 <function>PG_GETARG_<replaceable>xxx</replaceable>()</function>
2565 macros. The first of these,
2566 <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>,
2567 guarantees to return a copy of the specified argument that is
2568 safe for writing into. (The normal macros will sometimes return a
2569 pointer to a value that is physically stored in a table, which
2570 must not be written to. Using the
2571 <function>PG_GETARG_<replaceable>xxx</replaceable>_COPY()</function>
2572 macros guarantees a writable result.)
2573 The second variant consists of the
2574 <function>PG_GETARG_<replaceable>xxx</replaceable>_SLICE()</function>
2575 macros which take three arguments. The first is the number of the
2576 function argument (as above). The second and third are the offset and
2577 length of the segment to be returned. Offsets are counted from
2578 zero, and a negative length requests that the remainder of the
2579 value be returned. These macros provide more efficient access to
2580 parts of large values in the case where they have storage type
2581 <quote>external</quote>. (The storage type of a column can be specified using
2582 <literal>ALTER TABLE <replaceable>tablename</replaceable> ALTER
2583 COLUMN <replaceable>colname</replaceable> SET STORAGE
2584 <replaceable>storagetype</replaceable></literal>. <replaceable>storagetype</replaceable> is one of
2585 <literal>plain</literal>, <literal>external</literal>, <literal>extended</literal>,
2586 or <literal>main</literal>.)
2587 </para>
2589 <para>
2590 Finally, the version-1 function call conventions make it possible
2591 to return set results (<xref linkend="xfunc-c-return-set"/>) and
2592 implement trigger functions (<xref linkend="triggers"/>) and
2593 procedural-language call handlers (<xref
2594 linkend="plhandler"/>). For more details
2595 see <filename>src/backend/utils/fmgr/README</filename> in the
2596 source distribution.
2597 </para>
2598 </sect2>
2600 <sect2 id="xfunc-c-code">
2601 <title>Writing Code</title>
2603 <para>
2604 Before we turn to the more advanced topics, we should discuss
2605 some coding rules for <productname>PostgreSQL</productname>
2606 C-language functions. While it might be possible to load functions
2607 written in languages other than C into
2608 <productname>PostgreSQL</productname>, this is usually difficult
2609 (when it is possible at all) because other languages, such as
2610 C++, FORTRAN, or Pascal often do not follow the same calling
2611 convention as C. That is, other languages do not pass argument
2612 and return values between functions in the same way. For this
2613 reason, we will assume that your C-language functions are
2614 actually written in C.
2615 </para>
2617 <para>
2618 The basic rules for writing and building C functions are as follows:
2620 <itemizedlist>
2621 <listitem>
2622 <para>
2623 Use <literal>pg_config
2624 --includedir-server</literal><indexterm><primary>pg_config</primary><secondary>with user-defined C functions</secondary></indexterm>
2625 to find out where the <productname>PostgreSQL</productname> server header
2626 files are installed on your system (or the system that your
2627 users will be running on).
2628 </para>
2629 </listitem>
2631 <listitem>
2632 <para>
2633 Compiling and linking your code so that it can be dynamically
2634 loaded into <productname>PostgreSQL</productname> always
2635 requires special flags. See <xref linkend="dfunc"/> for a
2636 detailed explanation of how to do it for your particular
2637 operating system.
2638 </para>
2639 </listitem>
2641 <listitem>
2642 <para>
2643 Remember to define a <quote>magic block</quote> for your shared library,
2644 as described in <xref linkend="xfunc-c-dynload"/>.
2645 </para>
2646 </listitem>
2648 <listitem>
2649 <para>
2650 When allocating memory, use the
2651 <productname>PostgreSQL</productname> functions
2652 <function>palloc</function><indexterm><primary>palloc</primary></indexterm> and <function>pfree</function><indexterm><primary>pfree</primary></indexterm>
2653 instead of the corresponding C library functions
2654 <function>malloc</function> and <function>free</function>.
2655 The memory allocated by <function>palloc</function> will be
2656 freed automatically at the end of each transaction, preventing
2657 memory leaks.
2658 </para>
2659 </listitem>
2661 <listitem>
2662 <para>
2663 Always zero the bytes of your structures using <function>memset</function>
2664 (or allocate them with <function>palloc0</function> in the first place).
2665 Even if you assign to each field of your structure, there might be
2666 alignment padding (holes in the structure) that contain
2667 garbage values. Without this, it's difficult to
2668 support hash indexes or hash joins, as you must pick out only
2669 the significant bits of your data structure to compute a hash.
2670 The planner also sometimes relies on comparing constants via
2671 bitwise equality, so you can get undesirable planning results if
2672 logically-equivalent values aren't bitwise equal.
2673 </para>
2674 </listitem>
2676 <listitem>
2677 <para>
2678 Most of the internal <productname>PostgreSQL</productname>
2679 types are declared in <filename>postgres.h</filename>, while
2680 the function manager interfaces
2681 (<symbol>PG_FUNCTION_ARGS</symbol>, etc.) are in
2682 <filename>fmgr.h</filename>, so you will need to include at
2683 least these two files. For portability reasons it's best to
2684 include <filename>postgres.h</filename> <emphasis>first</emphasis>,
2685 before any other system or user header files. Including
2686 <filename>postgres.h</filename> will also include
2687 <filename>elog.h</filename> and <filename>palloc.h</filename>
2688 for you.
2689 </para>
2690 </listitem>
2692 <listitem>
2693 <para>
2694 Symbol names defined within object files must not conflict
2695 with each other or with symbols defined in the
2696 <productname>PostgreSQL</productname> server executable. You
2697 will have to rename your functions or variables if you get
2698 error messages to this effect.
2699 </para>
2700 </listitem>
2701 </itemizedlist>
2702 </para>
2703 </sect2>
2705 &dfunc;
2707 <sect2 id="xfunc-api-abi-stability-guidance">
2708 <title>Server API and ABI Stability Guidance</title>
2710 <para>
2711 This section contains guidance to authors of extensions and other server
2712 plugins about API and ABI stability in the
2713 <productname>PostgreSQL</productname> server.
2714 </para>
2716 <sect3 id="xfunc-guidance-general">
2717 <title>General</title>
2719 <para>
2720 The <productname>PostgreSQL</productname> server contains several
2721 well-demarcated APIs for server plugins, such as the function manager
2722 (<acronym>fmgr</acronym>, described in this chapter),
2723 <acronym>SPI</acronym> (<xref linkend="spi"/>), and various hooks
2724 specifically designed for extensions. These interfaces are carefully
2725 managed for long-term stability and compatibility. However, the entire
2726 set of global functions and variables in the server effectively
2727 constitutes the publicly usable API, and most of it was not designed
2728 with extensibility and long-term stability in mind.
2729 </para>
2731 <para>
2732 Therefore, while taking advantage of these interfaces is valid, the
2733 further one strays from the well-trodden path, the likelier it will be
2734 that one might encounter API or ABI compatibility issues at some point.
2735 Extension authors are encouraged to provide feedback about their
2736 requirements, so that over time, as new use patterns arise, certain
2737 interfaces can be considered more stabilized or new, better-designed
2738 interfaces can be added.
2739 </para>
2740 </sect3>
2742 <sect3 id="xfunc-guidance-api-compatibility">
2743 <title>API Compatibility</title>
2744 <para>
2745 The <acronym>API</acronym>, or application programming interface, is the
2746 interface used at compile time.
2747 </para>
2749 <sect4 id="xfunc-guidance-api-major-versions">
2750 <title>Major Versions</title>
2751 <para>
2752 There is <emphasis>no</emphasis> promise of API compatibility between
2753 <productname>PostgreSQL</productname> major versions. Extension code
2754 therefore might require source code changes to work with multiple major
2755 versions. These can usually be managed with preprocessor conditions
2756 such as <literal>#if PG_VERSION_NUM &gt;= 160000</literal>.
2757 Sophisticated extensions that use interfaces beyond the well-demarcated
2758 ones usually require a few such changes for each major server version.
2759 </para>
2760 </sect4>
2762 <sect4 id="xfunc-guidance-api-mninor-versions">
2763 <title>Minor Versions</title>
2764 <para>
2765 <productname>PostgreSQL</productname> makes an effort to avoid server
2766 API breaks in minor releases. In general, extension code that compiles
2767 and works with a minor release should also compile and work with any
2768 other minor release of the same major version, past or future.
2769 </para>
2771 <para>
2772 When a change <emphasis>is</emphasis> required, it will be carefully
2773 managed, taking the requirements of extensions into account. Such
2774 changes will be communicated in the release notes (<xref
2775 linkend="release"/>).
2776 </para>
2777 </sect4>
2778 </sect3>
2780 <sect3 id="xfunc-guidance-abi-compatibility">
2781 <title>ABI Compatibility</title>
2782 <para>
2783 The <acronym>ABI</acronym>, or application binary interface, is the
2784 interface used at run time.
2785 </para>
2787 <sect4 id="xfunc-guidance-abi-major-versions">
2788 <title>Major Versions</title>
2789 <para>
2790 Servers of different major versions have intentionally incompatible
2791 ABIs. Extensions that use server APIs must therefore be re-compiled for
2792 each major release. The inclusion of <literal>PG_MODULE_MAGIC</literal>
2793 (see <xref linkend="xfunc-c-dynload"/>) ensures that code compiled for
2794 one major version will be rejected by other major versions.
2795 </para>
2796 </sect4>
2798 <sect4 id="xfunc-guidance-abi-mninor-versions">
2799 <title>Minor Versions</title>
2800 <para>
2801 <productname>PostgreSQL</productname> makes an effort to avoid server
2802 ABI breaks in minor releases. In general, an extension compiled against
2803 any minor release should work with any other minor release of the same
2804 major version, past or future.
2805 </para>
2807 <para>
2808 When a change <emphasis>is</emphasis> required,
2809 <productname>PostgreSQL</productname> will choose the least invasive
2810 change possible, for example by squeezing a new field into padding
2811 space or appending it to the end of a struct. These sorts of changes
2812 should not impact extensions unless they use very unusual code
2813 patterns.
2814 </para>
2816 <para>
2817 In rare cases, however, even such non-invasive changes may be
2818 impractical or impossible. In such an event, the change will be
2819 carefully managed, taking the requirements of extensions into account.
2820 Such changes will also be documented in the release notes (<xref
2821 linkend="release"/>).
2822 </para>
2824 <para>
2825 Note, however, that many parts of the server are not designed or
2826 maintained as publicly-consumable APIs (and that, in most cases, the
2827 actual boundary is also not well-defined). If urgent needs arise,
2828 changes in those parts will naturally be made with less consideration
2829 for extension code than changes in well-defined and widely used
2830 interfaces.
2831 </para>
2833 <para>
2834 Also, in the absence of automated detection of such changes, this is
2835 not a guarantee, but historically such breaking changes have been
2836 extremely rare.
2837 </para>
2839 </sect4>
2840 </sect3>
2841 </sect2>
2843 <sect2 id="xfunc-c-composite-type-args">
2844 <title>Composite-Type Arguments</title>
2846 <para>
2847 Composite types do not have a fixed layout like C structures.
2848 Instances of a composite type can contain null fields. In
2849 addition, composite types that are part of an inheritance
2850 hierarchy can have different fields than other members of the
2851 same inheritance hierarchy. Therefore,
2852 <productname>PostgreSQL</productname> provides a function
2853 interface for accessing fields of composite types from C.
2854 </para>
2856 <para>
2857 Suppose we want to write a function to answer the query:
2859 <programlisting>
2860 SELECT name, c_overpaid(emp, 1500) AS overpaid
2861 FROM emp
2862 WHERE name = 'Bill' OR name = 'Sam';
2863 </programlisting>
2865 Using the version-1 calling conventions, we can define
2866 <function>c_overpaid</function> as:
2868 <programlisting><![CDATA[
2869 #include "postgres.h"
2870 #include "executor/executor.h" /* for GetAttributeByName() */
2872 PG_MODULE_MAGIC;
2874 PG_FUNCTION_INFO_V1(c_overpaid);
2876 Datum
2877 c_overpaid(PG_FUNCTION_ARGS)
2879 HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0);
2880 int32 limit = PG_GETARG_INT32(1);
2881 bool isnull;
2882 Datum salary;
2884 salary = GetAttributeByName(t, "salary", &isnull);
2885 if (isnull)
2886 PG_RETURN_BOOL(false);
2887 /* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
2889 PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
2892 </programlisting>
2893 </para>
2895 <para>
2896 <function>GetAttributeByName</function> is the
2897 <productname>PostgreSQL</productname> system function that
2898 returns attributes out of the specified row. It has
2899 three arguments: the argument of type <type>HeapTupleHeader</type> passed
2900 into
2901 the function, the name of the desired attribute, and a
2902 return parameter that tells whether the attribute
2903 is null. <function>GetAttributeByName</function> returns a <type>Datum</type>
2904 value that you can convert to the proper data type by using the
2905 appropriate <function>DatumGet<replaceable>XXX</replaceable>()</function>
2906 function. Note that the return value is meaningless if the null flag is
2907 set; always check the null flag before trying to do anything with the
2908 result.
2909 </para>
2911 <para>
2912 There is also <function>GetAttributeByNum</function>, which selects
2913 the target attribute by column number instead of name.
2914 </para>
2916 <para>
2917 The following command declares the function
2918 <function>c_overpaid</function> in SQL:
2920 <programlisting>
2921 CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
2922 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'c_overpaid'
2923 LANGUAGE C STRICT;
2924 </programlisting>
2926 Notice we have used <literal>STRICT</literal> so that we did not have to
2927 check whether the input arguments were NULL.
2928 </para>
2929 </sect2>
2931 <sect2 id="xfunc-c-returning-rows">
2932 <title>Returning Rows (Composite Types)</title>
2934 <para>
2935 To return a row or composite-type value from a C-language
2936 function, you can use a special API that provides macros and
2937 functions to hide most of the complexity of building composite
2938 data types. To use this API, the source file must include:
2939 <programlisting>
2940 #include "funcapi.h"
2941 </programlisting>
2942 </para>
2944 <para>
2945 There are two ways you can build a composite data value (henceforth
2946 a <quote>tuple</quote>): you can build it from an array of Datum values,
2947 or from an array of C strings that can be passed to the input
2948 conversion functions of the tuple's column data types. In either
2949 case, you first need to obtain or construct a <structname>TupleDesc</structname>
2950 descriptor for the tuple structure. When working with Datums, you
2951 pass the <structname>TupleDesc</structname> to <function>BlessTupleDesc</function>,
2952 and then call <function>heap_form_tuple</function> for each row. When working
2953 with C strings, you pass the <structname>TupleDesc</structname> to
2954 <function>TupleDescGetAttInMetadata</function>, and then call
2955 <function>BuildTupleFromCStrings</function> for each row. In the case of a
2956 function returning a set of tuples, the setup steps can all be done
2957 once during the first call of the function.
2958 </para>
2960 <para>
2961 Several helper functions are available for setting up the needed
2962 <structname>TupleDesc</structname>. The recommended way to do this in most
2963 functions returning composite values is to call:
2964 <programlisting>
2965 TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
2966 Oid *resultTypeId,
2967 TupleDesc *resultTupleDesc)
2968 </programlisting>
2969 passing the same <literal>fcinfo</literal> struct passed to the calling function
2970 itself. (This of course requires that you use the version-1
2971 calling conventions.) <varname>resultTypeId</varname> can be specified
2972 as <literal>NULL</literal> or as the address of a local variable to receive the
2973 function's result type OID. <varname>resultTupleDesc</varname> should be the
2974 address of a local <structname>TupleDesc</structname> variable. Check that the
2975 result is <literal>TYPEFUNC_COMPOSITE</literal>; if so,
2976 <varname>resultTupleDesc</varname> has been filled with the needed
2977 <structname>TupleDesc</structname>. (If it is not, you can report an error along
2978 the lines of <quote>function returning record called in context that
2979 cannot accept type record</quote>.)
2980 </para>
2982 <tip>
2983 <para>
2984 <function>get_call_result_type</function> can resolve the actual type of a
2985 polymorphic function result; so it is useful in functions that return
2986 scalar polymorphic results, not only functions that return composites.
2987 The <varname>resultTypeId</varname> output is primarily useful for functions
2988 returning polymorphic scalars.
2989 </para>
2990 </tip>
2992 <note>
2993 <para>
2994 <function>get_call_result_type</function> has a sibling
2995 <function>get_expr_result_type</function>, which can be used to resolve the
2996 expected output type for a function call represented by an expression
2997 tree. This can be used when trying to determine the result type from
2998 outside the function itself. There is also
2999 <function>get_func_result_type</function>, which can be used when only the
3000 function's OID is available. However these functions are not able
3001 to deal with functions declared to return <structname>record</structname>, and
3002 <function>get_func_result_type</function> cannot resolve polymorphic types,
3003 so you should preferentially use <function>get_call_result_type</function>.
3004 </para>
3005 </note>
3007 <para>
3008 Older, now-deprecated functions for obtaining
3009 <structname>TupleDesc</structname>s are:
3010 <programlisting>
3011 TupleDesc RelationNameGetTupleDesc(const char *relname)
3012 </programlisting>
3013 to get a <structname>TupleDesc</structname> for the row type of a named relation,
3014 and:
3015 <programlisting>
3016 TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
3017 </programlisting>
3018 to get a <structname>TupleDesc</structname> based on a type OID. This can
3019 be used to get a <structname>TupleDesc</structname> for a base or
3020 composite type. It will not work for a function that returns
3021 <structname>record</structname>, however, and it cannot resolve polymorphic
3022 types.
3023 </para>
3025 <para>
3026 Once you have a <structname>TupleDesc</structname>, call:
3027 <programlisting>
3028 TupleDesc BlessTupleDesc(TupleDesc tupdesc)
3029 </programlisting>
3030 if you plan to work with Datums, or:
3031 <programlisting>
3032 AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
3033 </programlisting>
3034 if you plan to work with C strings. If you are writing a function
3035 returning set, you can save the results of these functions in the
3036 <structname>FuncCallContext</structname> structure &mdash; use the
3037 <structfield>tuple_desc</structfield> or <structfield>attinmeta</structfield> field
3038 respectively.
3039 </para>
3041 <para>
3042 When working with Datums, use:
3043 <programlisting>
3044 HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)
3045 </programlisting>
3046 to build a <structname>HeapTuple</structname> given user data in Datum form.
3047 </para>
3049 <para>
3050 When working with C strings, use:
3051 <programlisting>
3052 HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
3053 </programlisting>
3054 to build a <structname>HeapTuple</structname> given user data
3055 in C string form. <parameter>values</parameter> is an array of C strings,
3056 one for each attribute of the return row. Each C string should be in
3057 the form expected by the input function of the attribute data
3058 type. In order to return a null value for one of the attributes,
3059 the corresponding pointer in the <parameter>values</parameter> array
3060 should be set to <symbol>NULL</symbol>. This function will need to
3061 be called again for each row you return.
3062 </para>
3064 <para>
3065 Once you have built a tuple to return from your function, it
3066 must be converted into a <type>Datum</type>. Use:
3067 <programlisting>
3068 HeapTupleGetDatum(HeapTuple tuple)
3069 </programlisting>
3070 to convert a <structname>HeapTuple</structname> into a valid Datum. This
3071 <type>Datum</type> can be returned directly if you intend to return
3072 just a single row, or it can be used as the current return value
3073 in a set-returning function.
3074 </para>
3076 <para>
3077 An example appears in the next section.
3078 </para>
3080 </sect2>
3082 <sect2 id="xfunc-c-return-set">
3083 <title>Returning Sets</title>
3085 <para>
3086 C-language functions have two options for returning sets (multiple
3087 rows). In one method, called <firstterm>ValuePerCall</firstterm>
3088 mode, a set-returning function is called repeatedly (passing the same
3089 arguments each time) and it returns one new row on each call, until
3090 it has no more rows to return and signals that by returning NULL.
3091 The set-returning function (<acronym>SRF</acronym>) must therefore
3092 save enough state across calls to remember what it was doing and
3093 return the correct next item on each call.
3094 In the other method, called <firstterm>Materialize</firstterm> mode,
3095 an SRF fills and returns a tuplestore object containing its
3096 entire result; then only one call occurs for the whole result, and
3097 no inter-call state is needed.
3098 </para>
3100 <para>
3101 When using ValuePerCall mode, it is important to remember that the
3102 query is not guaranteed to be run to completion; that is, due to
3103 options such as <literal>LIMIT</literal>, the executor might stop
3104 making calls to the set-returning function before all rows have been
3105 fetched. This means it is not safe to perform cleanup activities in
3106 the last call, because that might not ever happen. It's recommended
3107 to use Materialize mode for functions that need access to external
3108 resources, such as file descriptors.
3109 </para>
3111 <para>
3112 The remainder of this section documents a set of helper macros that
3113 are commonly used (though not required to be used) for SRFs using
3114 ValuePerCall mode. Additional details about Materialize mode can be
3115 found in <filename>src/backend/utils/fmgr/README</filename>. Also,
3116 the <filename>contrib</filename> modules in
3117 the <productname>PostgreSQL</productname> source distribution contain
3118 many examples of SRFs using both ValuePerCall and Materialize mode.
3119 </para>
3121 <para>
3122 To use the ValuePerCall support macros described here,
3123 include <filename>funcapi.h</filename>. These macros work with a
3124 structure <structname>FuncCallContext</structname> that contains the
3125 state that needs to be saved across calls. Within the calling
3126 SRF, <literal>fcinfo-&gt;flinfo-&gt;fn_extra</literal> is used to
3127 hold a pointer to <structname>FuncCallContext</structname> across
3128 calls. The macros automatically fill that field on first use,
3129 and expect to find the same pointer there on subsequent uses.
3130 <programlisting>
3131 typedef struct FuncCallContext
3134 * Number of times we've been called before
3136 * call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
3137 * incremented for you every time SRF_RETURN_NEXT() is called.
3139 uint64 call_cntr;
3142 * OPTIONAL maximum number of calls
3144 * max_calls is here for convenience only and setting it is optional.
3145 * If not set, you must provide alternative means to know when the
3146 * function is done.
3148 uint64 max_calls;
3151 * OPTIONAL pointer to miscellaneous user-provided context information
3153 * user_fctx is for use as a pointer to your own data to retain
3154 * arbitrary context information between calls of your function.
3156 void *user_fctx;
3159 * OPTIONAL pointer to struct containing attribute type input metadata
3161 * attinmeta is for use when returning tuples (i.e., composite data types)
3162 * and is not used when returning base data types. It is only needed
3163 * if you intend to use BuildTupleFromCStrings() to create the return
3164 * tuple.
3166 AttInMetadata *attinmeta;
3169 * memory context used for structures that must live for multiple calls
3171 * multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
3172 * by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
3173 * context for any memory that is to be reused across multiple calls
3174 * of the SRF.
3176 MemoryContext multi_call_memory_ctx;
3179 * OPTIONAL pointer to struct containing tuple description
3181 * tuple_desc is for use when returning tuples (i.e., composite data types)
3182 * and is only needed if you are going to build the tuples with
3183 * heap_form_tuple() rather than with BuildTupleFromCStrings(). Note that
3184 * the TupleDesc pointer stored here should usually have been run through
3185 * BlessTupleDesc() first.
3187 TupleDesc tuple_desc;
3189 } FuncCallContext;
3190 </programlisting>
3191 </para>
3193 <para>
3194 The macros to be used by an <acronym>SRF</acronym> using this
3195 infrastructure are:
3196 <programlisting>
3197 SRF_IS_FIRSTCALL()
3198 </programlisting>
3199 Use this to determine if your function is being called for the first or a
3200 subsequent time. On the first call (only), call:
3201 <programlisting>
3202 SRF_FIRSTCALL_INIT()
3203 </programlisting>
3204 to initialize the <structname>FuncCallContext</structname>. On every function call,
3205 including the first, call:
3206 <programlisting>
3207 SRF_PERCALL_SETUP()
3208 </programlisting>
3209 to set up for using the <structname>FuncCallContext</structname>.
3210 </para>
3212 <para>
3213 If your function has data to return in the current call, use:
3214 <programlisting>
3215 SRF_RETURN_NEXT(funcctx, result)
3216 </programlisting>
3217 to return it to the caller. (<literal>result</literal> must be of type
3218 <type>Datum</type>, either a single value or a tuple prepared as
3219 described above.) Finally, when your function is finished
3220 returning data, use:
3221 <programlisting>
3222 SRF_RETURN_DONE(funcctx)
3223 </programlisting>
3224 to clean up and end the <acronym>SRF</acronym>.
3225 </para>
3227 <para>
3228 The memory context that is current when the <acronym>SRF</acronym> is called is
3229 a transient context that will be cleared between calls. This means
3230 that you do not need to call <function>pfree</function> on everything
3231 you allocated using <function>palloc</function>; it will go away anyway. However, if you want to allocate
3232 any data structures to live across calls, you need to put them somewhere
3233 else. The memory context referenced by
3234 <structfield>multi_call_memory_ctx</structfield> is a suitable location for any
3235 data that needs to survive until the <acronym>SRF</acronym> is finished running. In most
3236 cases, this means that you should switch into
3237 <structfield>multi_call_memory_ctx</structfield> while doing the
3238 first-call setup.
3239 Use <literal>funcctx-&gt;user_fctx</literal> to hold a pointer to
3240 any such cross-call data structures.
3241 (Data you allocate
3242 in <structfield>multi_call_memory_ctx</structfield> will go away
3243 automatically when the query ends, so it is not necessary to free
3244 that data manually, either.)
3245 </para>
3247 <warning>
3248 <para>
3249 While the actual arguments to the function remain unchanged between
3250 calls, if you detoast the argument values (which is normally done
3251 transparently by the
3252 <function>PG_GETARG_<replaceable>xxx</replaceable></function> macro)
3253 in the transient context then the detoasted copies will be freed on
3254 each cycle. Accordingly, if you keep references to such values in
3255 your <structfield>user_fctx</structfield>, you must either copy them into the
3256 <structfield>multi_call_memory_ctx</structfield> after detoasting, or ensure
3257 that you detoast the values only in that context.
3258 </para>
3259 </warning>
3261 <para>
3262 A complete pseudo-code example looks like the following:
3263 <programlisting>
3264 Datum
3265 my_set_returning_function(PG_FUNCTION_ARGS)
3267 FuncCallContext *funcctx;
3268 Datum result;
3269 <replaceable>further declarations as needed</replaceable>
3271 if (SRF_IS_FIRSTCALL())
3273 MemoryContext oldcontext;
3275 funcctx = SRF_FIRSTCALL_INIT();
3276 oldcontext = MemoryContextSwitchTo(funcctx-&gt;multi_call_memory_ctx);
3277 /* One-time setup code appears here: */
3278 <replaceable>user code</replaceable>
3279 <replaceable>if returning composite</replaceable>
3280 <replaceable>build TupleDesc, and perhaps AttInMetadata</replaceable>
3281 <replaceable>endif returning composite</replaceable>
3282 <replaceable>user code</replaceable>
3283 MemoryContextSwitchTo(oldcontext);
3286 /* Each-time setup code appears here: */
3287 <replaceable>user code</replaceable>
3288 funcctx = SRF_PERCALL_SETUP();
3289 <replaceable>user code</replaceable>
3291 /* this is just one way we might test whether we are done: */
3292 if (funcctx-&gt;call_cntr &lt; funcctx-&gt;max_calls)
3294 /* Here we want to return another item: */
3295 <replaceable>user code</replaceable>
3296 <replaceable>obtain result Datum</replaceable>
3297 SRF_RETURN_NEXT(funcctx, result);
3299 else
3301 /* Here we are done returning items, so just report that fact. */
3302 /* (Resist the temptation to put cleanup code here.) */
3303 SRF_RETURN_DONE(funcctx);
3306 </programlisting>
3307 </para>
3309 <para>
3310 A complete example of a simple <acronym>SRF</acronym> returning a composite type
3311 looks like:
3312 <programlisting><![CDATA[
3313 PG_FUNCTION_INFO_V1(retcomposite);
3315 Datum
3316 retcomposite(PG_FUNCTION_ARGS)
3318 FuncCallContext *funcctx;
3319 int call_cntr;
3320 int max_calls;
3321 TupleDesc tupdesc;
3322 AttInMetadata *attinmeta;
3324 /* stuff done only on the first call of the function */
3325 if (SRF_IS_FIRSTCALL())
3327 MemoryContext oldcontext;
3329 /* create a function context for cross-call persistence */
3330 funcctx = SRF_FIRSTCALL_INIT();
3332 /* switch to memory context appropriate for multiple function calls */
3333 oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
3335 /* total number of tuples to be returned */
3336 funcctx->max_calls = PG_GETARG_INT32(0);
3338 /* Build a tuple descriptor for our result type */
3339 if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
3340 ereport(ERROR,
3341 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
3342 errmsg("function returning record called in context "
3343 "that cannot accept type record")));
3346 * generate attribute metadata needed later to produce tuples from raw
3347 * C strings
3349 attinmeta = TupleDescGetAttInMetadata(tupdesc);
3350 funcctx->attinmeta = attinmeta;
3352 MemoryContextSwitchTo(oldcontext);
3355 /* stuff done on every call of the function */
3356 funcctx = SRF_PERCALL_SETUP();
3358 call_cntr = funcctx->call_cntr;
3359 max_calls = funcctx->max_calls;
3360 attinmeta = funcctx->attinmeta;
3362 if (call_cntr < max_calls) /* do when there is more left to send */
3364 char **values;
3365 HeapTuple tuple;
3366 Datum result;
3369 * Prepare a values array for building the returned tuple.
3370 * This should be an array of C strings which will
3371 * be processed later by the type input functions.
3373 values = (char **) palloc(3 * sizeof(char *));
3374 values[0] = (char *) palloc(16 * sizeof(char));
3375 values[1] = (char *) palloc(16 * sizeof(char));
3376 values[2] = (char *) palloc(16 * sizeof(char));
3378 snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
3379 snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
3380 snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
3382 /* build a tuple */
3383 tuple = BuildTupleFromCStrings(attinmeta, values);
3385 /* make the tuple into a datum */
3386 result = HeapTupleGetDatum(tuple);
3388 /* clean up (this is not really necessary) */
3389 pfree(values[0]);
3390 pfree(values[1]);
3391 pfree(values[2]);
3392 pfree(values);
3394 SRF_RETURN_NEXT(funcctx, result);
3396 else /* do when there is no more left */
3398 SRF_RETURN_DONE(funcctx);
3402 </programlisting>
3404 One way to declare this function in SQL is:
3405 <programlisting>
3406 CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);
3408 CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
3409 RETURNS SETOF __retcomposite
3410 AS '<replaceable>filename</replaceable>', 'retcomposite'
3411 LANGUAGE C IMMUTABLE STRICT;
3412 </programlisting>
3413 A different way is to use OUT parameters:
3414 <programlisting>
3415 CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
3416 OUT f1 integer, OUT f2 integer, OUT f3 integer)
3417 RETURNS SETOF record
3418 AS '<replaceable>filename</replaceable>', 'retcomposite'
3419 LANGUAGE C IMMUTABLE STRICT;
3420 </programlisting>
3421 Notice that in this method the output type of the function is formally
3422 an anonymous <structname>record</structname> type.
3423 </para>
3424 </sect2>
3426 <sect2 id="xfunc-c-polymorphic">
3427 <title>Polymorphic Arguments and Return Types</title>
3429 <para>
3430 C-language functions can be declared to accept and
3431 return the polymorphic types described in <xref
3432 linkend="extend-types-polymorphic"/>.
3433 When a function's arguments or return types
3434 are defined as polymorphic types, the function author cannot know
3435 in advance what data type it will be called with, or
3436 need to return. There are two routines provided in <filename>fmgr.h</filename>
3437 to allow a version-1 C function to discover the actual data types
3438 of its arguments and the type it is expected to return. The routines are
3439 called <literal>get_fn_expr_rettype(FmgrInfo *flinfo)</literal> and
3440 <literal>get_fn_expr_argtype(FmgrInfo *flinfo, int argnum)</literal>.
3441 They return the result or argument type OID, or <symbol>InvalidOid</symbol> if the
3442 information is not available.
3443 The structure <literal>flinfo</literal> is normally accessed as
3444 <literal>fcinfo-&gt;flinfo</literal>. The parameter <literal>argnum</literal>
3445 is zero based. <function>get_call_result_type</function> can also be used
3446 as an alternative to <function>get_fn_expr_rettype</function>.
3447 There is also <function>get_fn_expr_variadic</function>, which can be used to
3448 find out whether variadic arguments have been merged into an array.
3449 This is primarily useful for <literal>VARIADIC "any"</literal> functions,
3450 since such merging will always have occurred for variadic functions
3451 taking ordinary array types.
3452 </para>
3454 <para>
3455 For example, suppose we want to write a function to accept a single
3456 element of any type, and return a one-dimensional array of that type:
3458 <programlisting>
3459 PG_FUNCTION_INFO_V1(make_array);
3460 Datum
3461 make_array(PG_FUNCTION_ARGS)
3463 ArrayType *result;
3464 Oid element_type = get_fn_expr_argtype(fcinfo-&gt;flinfo, 0);
3465 Datum element;
3466 bool isnull;
3467 int16 typlen;
3468 bool typbyval;
3469 char typalign;
3470 int ndims;
3471 int dims[MAXDIM];
3472 int lbs[MAXDIM];
3474 if (!OidIsValid(element_type))
3475 elog(ERROR, "could not determine data type of input");
3477 /* get the provided element, being careful in case it's NULL */
3478 isnull = PG_ARGISNULL(0);
3479 if (isnull)
3480 element = (Datum) 0;
3481 else
3482 element = PG_GETARG_DATUM(0);
3484 /* we have one dimension */
3485 ndims = 1;
3486 /* and one element */
3487 dims[0] = 1;
3488 /* and lower bound is 1 */
3489 lbs[0] = 1;
3491 /* get required info about the element type */
3492 get_typlenbyvalalign(element_type, &amp;typlen, &amp;typbyval, &amp;typalign);
3494 /* now build the array */
3495 result = construct_md_array(&amp;element, &amp;isnull, ndims, dims, lbs,
3496 element_type, typlen, typbyval, typalign);
3498 PG_RETURN_ARRAYTYPE_P(result);
3500 </programlisting>
3501 </para>
3503 <para>
3504 The following command declares the function
3505 <function>make_array</function> in SQL:
3507 <programlisting>
3508 CREATE FUNCTION make_array(anyelement) RETURNS anyarray
3509 AS '<replaceable>DIRECTORY</replaceable>/funcs', 'make_array'
3510 LANGUAGE C IMMUTABLE;
3511 </programlisting>
3512 </para>
3514 <para>
3515 There is a variant of polymorphism that is only available to C-language
3516 functions: they can be declared to take parameters of type
3517 <literal>"any"</literal>. (Note that this type name must be double-quoted,
3518 since it's also an SQL reserved word.) This works like
3519 <type>anyelement</type> except that it does not constrain different
3520 <literal>"any"</literal> arguments to be the same type, nor do they help
3521 determine the function's result type. A C-language function can also
3522 declare its final parameter to be <literal>VARIADIC "any"</literal>. This will
3523 match one or more actual arguments of any type (not necessarily the same
3524 type). These arguments will <emphasis>not</emphasis> be gathered into an array
3525 as happens with normal variadic functions; they will just be passed to
3526 the function separately. The <function>PG_NARGS()</function> macro and the
3527 methods described above must be used to determine the number of actual
3528 arguments and their types when using this feature. Also, users of such
3529 a function might wish to use the <literal>VARIADIC</literal> keyword in their
3530 function call, with the expectation that the function would treat the
3531 array elements as separate arguments. The function itself must implement
3532 that behavior if wanted, after using <function>get_fn_expr_variadic</function> to
3533 detect that the actual argument was marked with <literal>VARIADIC</literal>.
3534 </para>
3535 </sect2>
3537 <sect2 id="xfunc-shared-addin">
3538 <title>Shared Memory</title>
3540 <sect3 id="xfunc-shared-addin-at-startup">
3541 <title>Requesting Shared Memory at Startup</title>
3543 <para>
3544 Add-ins can reserve shared memory on server startup. To do so, the
3545 add-in's shared library must be preloaded by specifying it in
3546 <xref linkend="guc-shared-preload-libraries"/><indexterm><primary>shared_preload_libraries</primary></indexterm>.
3547 The shared library should also register a
3548 <literal>shmem_request_hook</literal> in its
3549 <function>_PG_init</function> function. This
3550 <literal>shmem_request_hook</literal> can reserve shared memory by
3551 calling:
3552 <programlisting>
3553 void RequestAddinShmemSpace(Size size)
3554 </programlisting>
3555 Each backend should obtain a pointer to the reserved shared memory by
3556 calling:
3557 <programlisting>
3558 void *ShmemInitStruct(const char *name, Size size, bool *foundPtr)
3559 </programlisting>
3560 If this function sets <literal>foundPtr</literal> to
3561 <literal>false</literal>, the caller should proceed to initialize the
3562 contents of the reserved shared memory. If <literal>foundPtr</literal>
3563 is set to <literal>true</literal>, the shared memory was already
3564 initialized by another backend, and the caller need not initialize
3565 further.
3566 </para>
3568 <para>
3569 To avoid race conditions, each backend should use the LWLock
3570 <function>AddinShmemInitLock</function> when initializing its allocation
3571 of shared memory, as shown here:
3572 <programlisting>
3573 static mystruct *ptr = NULL;
3574 bool found;
3576 LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
3577 ptr = ShmemInitStruct("my struct name", size, &amp;found);
3578 if (!found)
3580 ... initialize contents of shared memory ...
3581 ptr->locks = GetNamedLWLockTranche("my tranche name");
3583 LWLockRelease(AddinShmemInitLock);
3584 </programlisting>
3585 <literal>shmem_startup_hook</literal> provides a convenient place for the
3586 initialization code, but it is not strictly required that all such code
3587 be placed in this hook. Each backend will execute the registered
3588 <literal>shmem_startup_hook</literal> shortly after it attaches to shared
3589 memory. Note that add-ins should still acquire
3590 <function>AddinShmemInitLock</function> within this hook, as shown in the
3591 example above.
3592 </para>
3594 <para>
3595 An example of a <literal>shmem_request_hook</literal> and
3596 <literal>shmem_startup_hook</literal> can be found in
3597 <filename>contrib/pg_stat_statements/pg_stat_statements.c</filename> in
3598 the <productname>PostgreSQL</productname> source tree.
3599 </para>
3600 </sect3>
3602 <sect3 id="xfunc-shared-addin-after-startup">
3603 <title>Requesting Shared Memory After Startup</title>
3605 <para>
3606 There is another, more flexible method of reserving shared memory that
3607 can be done after server startup and outside a
3608 <literal>shmem_request_hook</literal>. To do so, each backend that will
3609 use the shared memory should obtain a pointer to it by calling:
3610 <programlisting>
3611 void *GetNamedDSMSegment(const char *name, size_t size,
3612 void (*init_callback) (void *ptr),
3613 bool *found)
3614 </programlisting>
3615 If a dynamic shared memory segment with the given name does not yet
3616 exist, this function will allocate it and initialize it with the provided
3617 <function>init_callback</function> callback function. If the segment has
3618 already been allocated and initialized by another backend, this function
3619 simply attaches the existing dynamic shared memory segment to the current
3620 backend.
3621 </para>
3623 <para>
3624 Unlike shared memory reserved at server startup, there is no need to
3625 acquire <function>AddinShmemInitLock</function> or otherwise take action
3626 to avoid race conditions when reserving shared memory with
3627 <function>GetNamedDSMSegment</function>. This function ensures that only
3628 one backend allocates and initializes the segment and that all other
3629 backends receive a pointer to the fully allocated and initialized
3630 segment.
3631 </para>
3633 <para>
3634 A complete usage example of <function>GetNamedDSMSegment</function> can
3635 be found in
3636 <filename>src/test/modules/test_dsm_registry/test_dsm_registry.c</filename>
3637 in the <productname>PostgreSQL</productname> source tree.
3638 </para>
3639 </sect3>
3640 </sect2>
3642 <sect2 id="xfunc-addin-lwlocks">
3643 <title>LWLocks</title>
3645 <sect3 id="xfunc-addin-lwlocks-at-startup">
3646 <title>Requesting LWLocks at Startup</title>
3648 <para>
3649 Add-ins can reserve LWLocks on server startup. As with shared memory
3650 reserved at server startup, the add-in's shared library must be preloaded
3651 by specifying it in
3652 <xref linkend="guc-shared-preload-libraries"/><indexterm><primary>shared_preload_libraries</primary></indexterm>,
3653 and the shared library should register a
3654 <literal>shmem_request_hook</literal> in its
3655 <function>_PG_init</function> function. This
3656 <literal>shmem_request_hook</literal> can reserve LWLocks by calling:
3657 <programlisting>
3658 void RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
3659 </programlisting>
3660 This ensures that an array of <literal>num_lwlocks</literal> LWLocks is
3661 available under the name <literal>tranche_name</literal>. A pointer to
3662 this array can be obtained by calling:
3663 <programlisting>
3664 LWLockPadded *GetNamedLWLockTranche(const char *tranche_name)
3665 </programlisting>
3666 </para>
3667 </sect3>
3669 <sect3 id="xfunc-addin-lwlocks-after-startup">
3670 <title>Requesting LWLocks After Startup</title>
3672 <para>
3673 There is another, more flexible method of obtaining LWLocks that can be
3674 done after server startup and outside a
3675 <literal>shmem_request_hook</literal>. To do so, first allocate a
3676 <literal>tranche_id</literal> by calling:
3677 <programlisting>
3678 int LWLockNewTrancheId(void)
3679 </programlisting>
3680 Next, initialize each LWLock, passing the new
3681 <literal>tranche_id</literal> as an argument:
3682 <programlisting>
3683 void LWLockInitialize(LWLock *lock, int tranche_id)
3684 </programlisting>
3685 Similar to shared memory, each backend should ensure that only one
3686 process allocates a new <literal>tranche_id</literal> and initializes
3687 each new LWLock. One way to do this is to only call these functions in
3688 your shared memory initialization code with the
3689 <function>AddinShmemInitLock</function> held exclusively. If using
3690 <function>GetNamedDSMSegment</function>, calling these functions in the
3691 <function>init_callback</function> callback function is sufficient to
3692 avoid race conditions.
3693 </para>
3695 <para>
3696 Finally, each backend using the <literal>tranche_id</literal> should
3697 associate it with a <literal>tranche_name</literal> by calling:
3698 <programlisting>
3699 void LWLockRegisterTranche(int tranche_id, const char *tranche_name)
3700 </programlisting>
3701 </para>
3703 <para>
3704 A complete usage example of <function>LWLockNewTrancheId</function>,
3705 <function>LWLockInitialize</function>, and
3706 <function>LWLockRegisterTranche</function> can be found in
3707 <filename>contrib/pg_prewarm/autoprewarm.c</filename> in the
3708 <productname>PostgreSQL</productname> source tree.
3709 </para>
3710 </sect3>
3711 </sect2>
3713 <sect2 id="xfunc-addin-wait-events">
3714 <title>Custom Wait Events</title>
3716 <para>
3717 Add-ins can define custom wait events under the wait event type
3718 <literal>Extension</literal> by calling:
3719 <programlisting>
3720 uint32 WaitEventExtensionNew(const char *wait_event_name)
3721 </programlisting>
3722 The wait event is associated to a user-facing custom string.
3723 An example can be found in <filename>src/test/modules/worker_spi</filename>
3724 in the PostgreSQL source tree.
3725 </para>
3726 <para>
3727 Custom wait events can be viewed in
3728 <link linkend="monitoring-pg-stat-activity-view"><structname>pg_stat_activity</structname></link>:
3729 <screen>
3730 =# SELECT wait_event_type, wait_event FROM pg_stat_activity
3731 WHERE backend_type ~ 'worker_spi';
3732 wait_event_type | wait_event
3733 -----------------+---------------
3734 Extension | WorkerSpiMain
3735 (1 row)
3736 </screen>
3737 </para>
3738 </sect2>
3740 <sect2 id="xfunc-addin-injection-points">
3741 <title>Injection Points</title>
3743 <para>
3744 An injection point with a given <literal>name</literal> is declared using
3745 macro:
3746 <programlisting>
3747 INJECTION_POINT(name);
3748 </programlisting>
3750 There are a few injection points already declared at strategic points
3751 within the server code. After adding a new injection point the code needs
3752 to be compiled in order for that injection point to be available in the
3753 binary. Add-ins written in C-language can declare injection points in
3754 their own code using the same macro.
3755 </para>
3757 <para>
3758 Executing an injection point can require allocating a small amount of
3759 memory, which can fail. If you need to have an injection point in a
3760 critical section where dynamic allocations are not allowed, you can use
3761 a two-step approach with the following macros:
3762 <programlisting>
3763 INJECTION_POINT_LOAD(name);
3764 INJECTION_POINT_CACHED(name);
3765 </programlisting>
3767 Before entering the critical section,
3768 call <function>INJECTION_POINT_LOAD</function>. It checks the shared
3769 memory state, and loads the callback into backend-private memory if it is
3770 active. Inside the critical section, use
3771 <function>INJECTION_POINT_CACHED</function> to execute the callback.
3772 </para>
3774 <para>
3775 Add-ins can attach callbacks to an already-declared injection point by
3776 calling:
3777 <programlisting>
3778 extern void InjectionPointAttach(const char *name,
3779 const char *library,
3780 const char *function,
3781 const void *private_data,
3782 int private_data_size);
3783 </programlisting>
3785 <literal>name</literal> is the name of the injection point, which when
3786 reached during execution will execute the <literal>function</literal>
3787 loaded from <literal>library</literal>. <literal>private_data</literal>
3788 is a private area of data of size <literal>private_data_size</literal>
3789 given as argument to the callback when executed.
3790 </para>
3792 <para>
3793 Here is an example of callback for
3794 <literal>InjectionPointCallback</literal>:
3795 <programlisting>
3796 static void
3797 custom_injection_callback(const char *name, const void *private_data)
3799 uint32 wait_event_info = WaitEventInjectionPointNew(name);
3801 pgstat_report_wait_start(wait_event_info);
3802 elog(NOTICE, "%s: executed custom callback", name);
3803 pgstat_report_wait_end();
3805 </programlisting>
3806 This callback prints a message to server error log with severity
3807 <literal>NOTICE</literal>, but callbacks may implement more complex
3808 logic.
3809 </para>
3811 <para>
3812 An alternative way to define the action to take when an injection point
3813 is reached is to add the testing code alongside the normal source
3814 code. This can be useful if the action e.g. depends on local variables
3815 that are not accessible to loaded modules. The
3816 <function>IS_INJECTION_POINT_ATTACHED</function> macro can then be used
3817 to check if an injection point is attached, for example:
3818 <programlisting>
3819 #ifdef USE_INJECTION_POINTS
3820 if (IS_INJECTION_POINT_ATTACHED("before-foobar"))
3822 /* change a local variable if injection point is attached */
3823 local_var = 123;
3825 /* also execute the callback */
3826 INJECTION_POINT_CACHED("before-foobar");
3828 #endif
3829 </programlisting>
3830 Note that the callback attached to the injection point will not be
3831 executed by the <function>IS_INJECTION_POINT_ATTACHED</function>
3832 macro. If you want to execute the callback, you must also call
3833 <function>INJECTION_POINT_CACHED</function> like in the above example.
3834 </para>
3836 <para>
3837 Optionally, it is possible to detach an injection point by calling:
3838 <programlisting>
3839 extern bool InjectionPointDetach(const char *name);
3840 </programlisting>
3841 On success, <literal>true</literal> is returned, <literal>false</literal>
3842 otherwise.
3843 </para>
3845 <para>
3846 A callback attached to an injection point is available across all the
3847 backends including the backends started after
3848 <literal>InjectionPointAttach</literal> is called. It remains attached
3849 while the server is running or until the injection point is detached
3850 using <literal>InjectionPointDetach</literal>.
3851 </para>
3853 <para>
3854 An example can be found in
3855 <filename>src/test/modules/injection_points</filename> in the PostgreSQL
3856 source tree.
3857 </para>
3859 <para>
3860 Enabling injections points requires
3861 <option>--enable-injection-points</option> with
3862 <command>configure</command> or <option>-Dinjection_points=true</option>
3863 with <application>Meson</application>.
3864 </para>
3865 </sect2>
3867 <sect2 id="xfunc-addin-custom-cumulative-statistics">
3868 <title>Custom Cumulative Statistics</title>
3870 <para>
3871 It is possible for add-ins written in C-language to use custom types
3872 of cumulative statistics registered in the
3873 <link linkend="monitoring-stats-setup">Cumulative Statistics System</link>.
3874 </para>
3876 <para>
3877 First, define a <literal>PgStat_KindInfo</literal> that includes all
3878 the information related to the custom type registered. For example:
3879 <programlisting>
3880 static const PgStat_KindInfo custom_stats = {
3881 .name = "custom_stats",
3882 .fixed_amount = false,
3883 .shared_size = sizeof(PgStatShared_Custom),
3884 .shared_data_off = offsetof(PgStatShared_Custom, stats),
3885 .shared_data_len = sizeof(((PgStatShared_Custom *) 0)->stats),
3886 .pending_size = sizeof(PgStat_StatCustomEntry),
3888 </programlisting>
3890 Then, each backend that needs to use this custom type needs to register
3891 it with <literal>pgstat_register_kind</literal> and a unique ID used to
3892 store the entries related to this type of statistics:
3893 <programlisting>
3894 extern PgStat_Kind pgstat_register_kind(PgStat_Kind kind,
3895 const PgStat_KindInfo *kind_info);
3896 </programlisting>
3897 While developing a new extension, use
3898 <literal>PGSTAT_KIND_EXPERIMENTAL</literal> for
3899 <parameter>kind</parameter>. When you are ready to release the extension
3900 to users, reserve a kind ID at the
3901 <ulink url="https://wiki.postgresql.org/wiki/CustomCumulativeStats">
3902 Custom Cumulative Statistics</ulink> page.
3903 </para>
3905 <para>
3906 The details of the API for <literal>PgStat_KindInfo</literal> can
3907 be found in <filename>src/include/utils/pgstat_internal.h</filename>.
3908 </para>
3910 <para>
3911 The type of statistics registered is associated with a name and a unique
3912 ID shared across the server in shared memory. Each backend using a
3913 custom type of statistics maintains a local cache storing the information
3914 of each custom <literal>PgStat_KindInfo</literal>.
3915 </para>
3917 <para>
3918 Place the extension module implementing the custom cumulative statistics
3919 type in <xref linkend="guc-shared-preload-libraries"/> so that it will
3920 be loaded early during <productname>PostgreSQL</productname> startup.
3921 </para>
3923 <para>
3924 An example describing how to register and use custom statistics can be
3925 found in <filename>src/test/modules/injection_points</filename>.
3926 </para>
3927 </sect2>
3929 <sect2 id="extend-cpp">
3930 <title>Using C++ for Extensibility</title>
3932 <indexterm zone="extend-cpp">
3933 <primary>C++</primary>
3934 </indexterm>
3936 <para>
3937 Although the <productname>PostgreSQL</productname> backend is written in
3938 C, it is possible to write extensions in C++ if these guidelines are
3939 followed:
3941 <itemizedlist>
3942 <listitem>
3943 <para>
3944 All functions accessed by the backend must present a C interface
3945 to the backend; these C functions can then call C++ functions.
3946 For example, <literal>extern C</literal> linkage is required for
3947 backend-accessed functions. This is also necessary for any
3948 functions that are passed as pointers between the backend and
3949 C++ code.
3950 </para>
3951 </listitem>
3952 <listitem>
3953 <para>
3954 Free memory using the appropriate deallocation method. For example,
3955 most backend memory is allocated using <function>palloc()</function>, so use
3956 <function>pfree()</function> to free it. Using C++
3957 <function>delete</function> in such cases will fail.
3958 </para>
3959 </listitem>
3960 <listitem>
3961 <para>
3962 Prevent exceptions from propagating into the C code (use a catch-all
3963 block at the top level of all <literal>extern C</literal> functions). This
3964 is necessary even if the C++ code does not explicitly throw any
3965 exceptions, because events like out-of-memory can still throw
3966 exceptions. Any exceptions must be caught and appropriate errors
3967 passed back to the C interface. If possible, compile C++ with
3968 <option>-fno-exceptions</option> to eliminate exceptions entirely; in such
3969 cases, you must check for failures in your C++ code, e.g., check for
3970 NULL returned by <function>new()</function>.
3971 </para>
3972 </listitem>
3973 <listitem>
3974 <para>
3975 If calling backend functions from C++ code, be sure that the
3976 C++ call stack contains only plain old data structures
3977 (<acronym>POD</acronym>). This is necessary because backend errors
3978 generate a distant <function>longjmp()</function> that does not properly
3979 unroll a C++ call stack with non-POD objects.
3980 </para>
3981 </listitem>
3982 </itemizedlist>
3983 </para>
3985 <para>
3986 In summary, it is best to place C++ code behind a wall of
3987 <literal>extern C</literal> functions that interface to the backend,
3988 and avoid exception, memory, and call stack leakage.
3989 </para>
3990 </sect2>
3992 </sect1>
3994 <sect1 id="xfunc-optimization">
3995 <title>Function Optimization Information</title>
3997 <indexterm zone="xfunc-optimization">
3998 <primary>optimization information</primary>
3999 <secondary>for functions</secondary>
4000 </indexterm>
4002 <para>
4003 By default, a function is just a <quote>black box</quote> that the
4004 database system knows very little about the behavior of. However,
4005 that means that queries using the function may be executed much less
4006 efficiently than they could be. It is possible to supply additional
4007 knowledge that helps the planner optimize function calls.
4008 </para>
4010 <para>
4011 Some basic facts can be supplied by declarative annotations provided in
4012 the <link linkend="sql-createfunction"><command>CREATE FUNCTION</command></link> command. Most important of
4013 these is the function's <link linkend="xfunc-volatility">volatility
4014 category</link> (<literal>IMMUTABLE</literal>, <literal>STABLE</literal>,
4015 or <literal>VOLATILE</literal>); one should always be careful to
4016 specify this correctly when defining a function.
4017 The parallel safety property (<literal>PARALLEL
4018 UNSAFE</literal>, <literal>PARALLEL RESTRICTED</literal>, or
4019 <literal>PARALLEL SAFE</literal>) must also be specified if you hope
4020 to use the function in parallelized queries.
4021 It can also be useful to specify the function's estimated execution
4022 cost, and/or the number of rows a set-returning function is estimated
4023 to return. However, the declarative way of specifying those two
4024 facts only allows specifying a constant value, which is often
4025 inadequate.
4026 </para>
4028 <para>
4029 It is also possible to attach a <firstterm>planner support
4030 function</firstterm> to an SQL-callable function (called
4031 its <firstterm>target function</firstterm>), and thereby provide
4032 knowledge about the target function that is too complex to be
4033 represented declaratively. Planner support functions have to be
4034 written in C (although their target functions might not be), so this is
4035 an advanced feature that relatively few people will use.
4036 </para>
4038 <para>
4039 A planner support function must have the SQL signature
4040 <programlisting>
4041 supportfn(internal) returns internal
4042 </programlisting>
4043 It is attached to its target function by specifying
4044 the <literal>SUPPORT</literal> clause when creating the target function.
4045 </para>
4047 <para>
4048 The details of the API for planner support functions can be found in
4049 file <filename>src/include/nodes/supportnodes.h</filename> in the
4050 <productname>PostgreSQL</productname> source code. Here we provide
4051 just an overview of what planner support functions can do.
4052 The set of possible requests to a support function is extensible,
4053 so more things might be possible in future versions.
4054 </para>
4056 <para>
4057 Some function calls can be simplified during planning based on
4058 properties specific to the function. For example,
4059 <literal>int4mul(n, 1)</literal> could be simplified to
4060 just <literal>n</literal>. This type of transformation can be
4061 performed by a planner support function, by having it implement
4062 the <literal>SupportRequestSimplify</literal> request type.
4063 The support function will be called for each instance of its target
4064 function found in a query parse tree. If it finds that the particular
4065 call can be simplified into some other form, it can build and return a
4066 parse tree representing that expression. This will automatically work
4067 for operators based on the function, too &mdash; in the example just
4068 given, <literal>n * 1</literal> would also be simplified to
4069 <literal>n</literal>.
4070 (But note that this is just an example; this particular
4071 optimization is not actually performed by
4072 standard <productname>PostgreSQL</productname>.)
4073 We make no guarantee that <productname>PostgreSQL</productname> will
4074 never call the target function in cases that the support function could
4075 simplify. Ensure rigorous equivalence between the simplified
4076 expression and an actual execution of the target function.
4077 </para>
4079 <para>
4080 For target functions that return <type>boolean</type>, it is often useful to estimate
4081 the fraction of rows that will be selected by a <literal>WHERE</literal> clause using that
4082 function. This can be done by a support function that implements
4083 the <literal>SupportRequestSelectivity</literal> request type.
4084 </para>
4086 <para>
4087 If the target function's run time is highly dependent on its inputs,
4088 it may be useful to provide a non-constant cost estimate for it.
4089 This can be done by a support function that implements
4090 the <literal>SupportRequestCost</literal> request type.
4091 </para>
4093 <para>
4094 For target functions that return sets, it is often useful to provide
4095 a non-constant estimate for the number of rows that will be returned.
4096 This can be done by a support function that implements
4097 the <literal>SupportRequestRows</literal> request type.
4098 </para>
4100 <para>
4101 For target functions that return <type>boolean</type>, it may be possible to
4102 convert a function call appearing in <literal>WHERE</literal> into an indexable operator
4103 clause or clauses. The converted clauses might be exactly equivalent
4104 to the function's condition, or they could be somewhat weaker (that is,
4105 they might accept some values that the function condition does not).
4106 In the latter case the index condition is said to
4107 be <firstterm>lossy</firstterm>; it can still be used to scan an index,
4108 but the function call will have to be executed for each row returned by
4109 the index to see if it really passes the <literal>WHERE</literal> condition or not.
4110 To create such conditions, the support function must implement
4111 the <literal>SupportRequestIndexCondition</literal> request type.
4112 </para>
4113 </sect1>