The code to unlink dropped relations in FinishPreparedTransaction() was
[PostgreSQL.git] / doc / src / sgml / syntax.sgml
blobc1c42259366879d734d013f28debfcafdfed5a40
1 <!-- $PostgreSQL$ -->
3 <chapter id="sql-syntax">
4 <title>SQL Syntax</title>
6 <indexterm zone="sql-syntax">
7 <primary>syntax</primary>
8 <secondary>SQL</secondary>
9 </indexterm>
11 <para>
12 This chapter describes the syntax of SQL. It forms the foundation
13 for understanding the following chapters which will go into detail
14 about how SQL commands are applied to define and modify data.
15 </para>
17 <para>
18 We also advise users who are already familiar with SQL to read this
19 chapter carefully because it contains several rules and concepts that
20 are implemented inconsistently among SQL databases or that are
21 specific to <productname>PostgreSQL</productname>.
22 </para>
24 <sect1 id="sql-syntax-lexical">
25 <title>Lexical Structure</title>
27 <indexterm>
28 <primary>token</primary>
29 </indexterm>
31 <para>
32 SQL input consists of a sequence of
33 <firstterm>commands</firstterm>. A command is composed of a
34 sequence of <firstterm>tokens</firstterm>, terminated by a
35 semicolon (<quote>;</quote>). The end of the input stream also
36 terminates a command. Which tokens are valid depends on the syntax
37 of the particular command.
38 </para>
40 <para>
41 A token can be a <firstterm>key word</firstterm>, an
42 <firstterm>identifier</firstterm>, a <firstterm>quoted
43 identifier</firstterm>, a <firstterm>literal</firstterm> (or
44 constant), or a special character symbol. Tokens are normally
45 separated by whitespace (space, tab, newline), but need not be if
46 there is no ambiguity (which is generally only the case if a
47 special character is adjacent to some other token type).
48 </para>
50 <para>
51 Additionally, <firstterm>comments</firstterm> can occur in SQL
52 input. They are not tokens, they are effectively equivalent to
53 whitespace.
54 </para>
56 <para>
57 For example, the following is (syntactically) valid SQL input:
58 <programlisting>
59 SELECT * FROM MY_TABLE;
60 UPDATE MY_TABLE SET A = 5;
61 INSERT INTO MY_TABLE VALUES (3, 'hi there');
62 </programlisting>
63 This is a sequence of three commands, one per line (although this
64 is not required; more than one command can be on a line, and
65 commands can usefully be split across lines).
66 </para>
68 <para>
69 The SQL syntax is not very consistent regarding what tokens
70 identify commands and which are operands or parameters. The first
71 few tokens are generally the command name, so in the above example
72 we would usually speak of a <quote>SELECT</quote>, an
73 <quote>UPDATE</quote>, and an <quote>INSERT</quote> command. But
74 for instance the <command>UPDATE</command> command always requires
75 a <token>SET</token> token to appear in a certain position, and
76 this particular variation of <command>INSERT</command> also
77 requires a <token>VALUES</token> in order to be complete. The
78 precise syntax rules for each command are described in <xref linkend="reference">.
79 </para>
81 <sect2 id="sql-syntax-identifiers">
82 <title>Identifiers and Key Words</title>
84 <indexterm zone="sql-syntax-identifiers">
85 <primary>identifier</primary>
86 <secondary>syntax of</secondary>
87 </indexterm>
89 <indexterm zone="sql-syntax-identifiers">
90 <primary>name</primary>
91 <secondary>syntax of</secondary>
92 </indexterm>
94 <indexterm zone="sql-syntax-identifiers">
95 <primary>key word</primary>
96 <secondary>syntax of</secondary>
97 </indexterm>
99 <para>
100 Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
101 <token>VALUES</token> in the example above are examples of
102 <firstterm>key words</firstterm>, that is, words that have a fixed
103 meaning in the SQL language. The tokens <token>MY_TABLE</token>
104 and <token>A</token> are examples of
105 <firstterm>identifiers</firstterm>. They identify names of
106 tables, columns, or other database objects, depending on the
107 command they are used in. Therefore they are sometimes simply
108 called <quote>names</quote>. Key words and identifiers have the
109 same lexical structure, meaning that one cannot know whether a
110 token is an identifier or a key word without knowing the language.
111 A complete list of key words can be found in <xref
112 linkend="sql-keywords-appendix">.
113 </para>
115 <para>
116 SQL identifiers and key words must begin with a letter
117 (<literal>a</literal>-<literal>z</literal>, but also letters with
118 diacritical marks and non-Latin letters) or an underscore
119 (<literal>_</literal>). Subsequent characters in an identifier or
120 key word can be letters, underscores, digits
121 (<literal>0</literal>-<literal>9</literal>), or dollar signs
122 (<literal>$</>). Note that dollar signs are not allowed in identifiers
123 according to the letter of the SQL standard, so their use might render
124 applications less portable.
125 The SQL standard will not define a key word that contains
126 digits or starts or ends with an underscore, so identifiers of this
127 form are safe against possible conflict with future extensions of the
128 standard.
129 </para>
131 <para>
132 <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
133 The system uses no more than <symbol>NAMEDATALEN</symbol>-1
134 bytes of an identifier; longer names can be written in
135 commands, but they will be truncated. By default,
136 <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
137 length is 63 bytes. If this limit is problematic, it can be raised by
138 changing the <symbol>NAMEDATALEN</symbol> constant in
139 <filename>src/include/pg_config_manual.h</filename>.
140 </para>
142 <para>
143 <indexterm>
144 <primary>case sensitivity</primary>
145 <secondary>of SQL commands</secondary>
146 </indexterm>
147 Identifier and key word names are case insensitive. Therefore:
148 <programlisting>
149 UPDATE MY_TABLE SET A = 5;
150 </programlisting>
151 can equivalently be written as:
152 <programlisting>
153 uPDaTE my_TabLE SeT a = 5;
154 </programlisting>
155 A convention often used is to write key words in upper
156 case and names in lower case, e.g.:
157 <programlisting>
158 UPDATE my_table SET a = 5;
159 </programlisting>
160 </para>
162 <para>
163 <indexterm>
164 <primary>quotation marks</primary>
165 <secondary>and identifiers</secondary>
166 </indexterm>
167 There is a second kind of identifier: the <firstterm>delimited
168 identifier</firstterm> or <firstterm>quoted
169 identifier</firstterm>. It is formed by enclosing an arbitrary
170 sequence of characters in double-quotes
171 (<literal>"</literal>). <!-- " font-lock mania --> A delimited
172 identifier is always an identifier, never a key word. So
173 <literal>"select"</literal> could be used to refer to a column or
174 table named <quote>select</quote>, whereas an unquoted
175 <literal>select</literal> would be taken as a key word and
176 would therefore provoke a parse error when used where a table or
177 column name is expected. The example can be written with quoted
178 identifiers like this:
179 <programlisting>
180 UPDATE "my_table" SET "a" = 5;
181 </programlisting>
182 </para>
184 <para>
185 Quoted identifiers can contain any character, except the character
186 with code zero. (To include a double quote, write two double quotes.)
187 This allows constructing table or column names that would
188 otherwise not be possible, such as ones containing spaces or
189 ampersands. The length limitation still applies.
190 </para>
192 <para>
193 <indexterm><primary>Unicode escape</primary><secondary>in
194 identifiers</secondary></indexterm> A variant of quoted
195 identifiers allows including escaped Unicode characters identified
196 by their code points. This variant starts
197 with <literal>U&amp;</literal> (upper or lower case U followed by
198 ampersand) immediately before the opening double quote, without
199 any spaces in between, for example <literal>U&amp;"foo"</literal>.
200 (Note that this creates an ambiguity with the
201 operator <literal>&amp;</literal>. Use spaces around the operator to
202 avoid this problem.) Inside the quotes, Unicode characters can be
203 specified in escaped form by writing a backslash followed by the
204 four-digit hexadecimal code point number or alternatively a
205 backslash followed by a plus sign followed by a six-digit
206 hexadecimal code point number. For example, the
207 identifier <literal>"data"</literal> could be written as
208 <programlisting>
209 U&amp;"d\0061t\+000061"
210 </programlisting>
211 The following less trivial example writes the Russian
212 word <quote>slon</quote> (elephant) in Cyrillic letters:
213 <programlisting>
214 U&amp;"\0441\043B\043E\043D"
215 </programlisting>
216 </para>
218 <para>
219 If a different escape character than backslash is desired, it can
220 be specified using
221 the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
222 clause after the string, for example:
223 <programlisting>
224 U&amp;"d!0061t!+000061" UESCAPE '!'
225 </programlisting>
226 The escape character can be any single character other than a
227 hexadecimal digit, the plus sign, a single quote, a double quote,
228 or a whitespace character. Note that the escape character is
229 written in single quotes, not double quotes.
230 </para>
232 <para>
233 To include the escape character in the identifier literally, write
234 it twice.
235 </para>
237 <para>
238 The Unicode escape syntax works only when the server encoding is
239 UTF8. When other server encodings are used, only code points in
240 the ASCII range (up to <literal>\007F</literal>) can be specified.
241 </para>
243 <para>
244 Quoting an identifier also makes it case-sensitive, whereas
245 unquoted names are always folded to lower case. For example, the
246 identifiers <literal>FOO</literal>, <literal>foo</literal>, and
247 <literal>"foo"</literal> are considered the same by
248 <productname>PostgreSQL</productname>, but
249 <literal>"Foo"</literal> and <literal>"FOO"</literal> are
250 different from these three and each other. (The folding of
251 unquoted names to lower case in <productname>PostgreSQL</> is
252 incompatible with the SQL standard, which says that unquoted names
253 should be folded to upper case. Thus, <literal>foo</literal>
254 should be equivalent to <literal>"FOO"</literal> not
255 <literal>"foo"</literal> according to the standard. If you want
256 to write portable applications you are advised to always quote a
257 particular name or never quote it.)
258 </para>
259 </sect2>
262 <sect2 id="sql-syntax-constants">
263 <title>Constants</title>
265 <indexterm zone="sql-syntax-constants">
266 <primary>constant</primary>
267 </indexterm>
269 <para>
270 There are three kinds of <firstterm>implicitly-typed
271 constants</firstterm> in <productname>PostgreSQL</productname>:
272 strings, bit strings, and numbers.
273 Constants can also be specified with explicit types, which can
274 enable more accurate representation and more efficient handling by
275 the system. These alternatives are discussed in the following
276 subsections.
277 </para>
279 <sect3 id="sql-syntax-strings">
280 <title>String Constants</title>
282 <indexterm zone="sql-syntax-strings">
283 <primary>character string</primary>
284 <secondary>constant</secondary>
285 </indexterm>
287 <para>
288 <indexterm>
289 <primary>quotation marks</primary>
290 <secondary>escaping</secondary>
291 </indexterm>
292 A string constant in SQL is an arbitrary sequence of characters
293 bounded by single quotes (<literal>'</literal>), for example
294 <literal>'This is a string'</literal>. To include
295 a single-quote character within a string constant,
296 write two adjacent single quotes, e.g.,
297 <literal>'Dianne''s horse'</literal>.
298 Note that this is <emphasis>not</> the same as a double-quote
299 character (<literal>"</>). <!-- font-lock sanity: " -->
300 </para>
302 <para>
303 Two string constants that are only separated by whitespace
304 <emphasis>with at least one newline</emphasis> are concatenated
305 and effectively treated as if the string had been written as one
306 constant. For example:
307 <programlisting>
308 SELECT 'foo'
309 'bar';
310 </programlisting>
311 is equivalent to:
312 <programlisting>
313 SELECT 'foobar';
314 </programlisting>
315 but:
316 <programlisting>
317 SELECT 'foo' 'bar';
318 </programlisting>
319 is not valid syntax. (This slightly bizarre behavior is specified
320 by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
321 following the standard.)
322 </para>
323 </sect3>
325 <sect3 id="sql-syntax-strings-escape">
326 <title>String Constants with C-Style Escapes</title>
328 <indexterm zone="sql-syntax-strings-escape">
329 <primary>escape string syntax</primary>
330 </indexterm>
331 <indexterm zone="sql-syntax-strings-escape">
332 <primary>backslash escapes</primary>
333 </indexterm>
335 <para>
336 <productname>PostgreSQL</productname> also accepts <quote>escape</>
337 string constants, which are an extension to the SQL standard.
338 An escape string constant is specified by writing the letter
339 <literal>E</literal> (upper or lower case) just before the opening single
340 quote, e.g., <literal>E'foo'</>. (When continuing an escape string
341 constant across lines, write <literal>E</> only before the first opening
342 quote.)
343 Within an escape string, a backslash character (<literal>\</>) begins a
344 C-like <firstterm>backslash escape</> sequence, in which the combination
345 of backslash and following character(s) represent a special byte
346 value, as shown in <xref linkend="sql-backslash-table">.
347 </para>
349 <table id="sql-backslash-table">
350 <title>Backslash Escape Sequences</title>
351 <tgroup cols="2">
352 <thead>
353 <row>
354 <entry>Backslash Escape Sequence</>
355 <entry>Interpretation</entry>
356 </row>
357 </thead>
359 <tbody>
360 <row>
361 <entry><literal>\b</literal></entry>
362 <entry>backspace</entry>
363 </row>
364 <row>
365 <entry><literal>\f</literal></entry>
366 <entry>form feed</entry>
367 </row>
368 <row>
369 <entry><literal>\n</literal></entry>
370 <entry>newline</entry>
371 </row>
372 <row>
373 <entry><literal>\r</literal></entry>
374 <entry>carriage return</entry>
375 </row>
376 <row>
377 <entry><literal>\t</literal></entry>
378 <entry>tab</entry>
379 </row>
380 <row>
381 <entry>
382 <literal>\<replaceable>o</replaceable></literal>,
383 <literal>\<replaceable>oo</replaceable></literal>,
384 <literal>\<replaceable>ooo</replaceable></literal>
385 (<replaceable>o</replaceable> = 0 - 7)
386 </entry>
387 <entry>octal byte value</entry>
388 </row>
389 <row>
390 <entry>
391 <literal>\x<replaceable>h</replaceable></literal>,
392 <literal>\x<replaceable>hh</replaceable></literal>
393 (<replaceable>h</replaceable> = 0 - 9, A - F)
394 </entry>
395 <entry>hexadecimal byte value</entry>
396 </row>
397 </tbody>
398 </tgroup>
399 </table>
401 <para>
402 Any other
403 character following a backslash is taken literally. Thus, to
404 include a backslash character, write two backslashes (<literal>\\</>).
405 Also, a single quote can be included in an escape string by writing
406 <literal>\'</literal>, in addition to the normal way of <literal>''</>.
407 </para>
409 <para>
410 It is your responsibility that the byte sequences you create are
411 valid characters in the server character set encoding. When the
412 server encoding is UTF-8, then the alternative Unicode escape
413 syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
414 should be used instead. (The alternative would be doing the
415 UTF-8 encoding by hand and writing out the bytes, which would be
416 very cumbersome.)
417 </para>
419 <caution>
420 <para>
421 If the configuration parameter
422 <xref linkend="guc-standard-conforming-strings"> is <literal>off</>,
423 then <productname>PostgreSQL</productname> recognizes backslash escapes
424 in both regular and escape string constants. This is for backward
425 compatibility with the historical behavior, where backslash escapes
426 were always recognized.
427 Although <varname>standard_conforming_strings</> currently defaults to
428 <literal>off</>, the default will change to <literal>on</> in a future
429 release for improved standards compliance. Applications are therefore
430 encouraged to migrate away from using backslash escapes. If you need
431 to use a backslash escape to represent a special character, write the
432 string constant with an <literal>E</> to be sure it will be handled the same
433 way in future releases.
434 </para>
436 <para>
437 In addition to <varname>standard_conforming_strings</>, the configuration
438 parameters <xref linkend="guc-escape-string-warning"> and
439 <xref linkend="guc-backslash-quote"> govern treatment of backslashes
440 in string constants.
441 </para>
442 </caution>
444 <para>
445 The character with the code zero cannot be in a string constant.
446 </para>
447 </sect3>
449 <sect3 id="sql-syntax-strings-uescape">
450 <title>String Constants with Unicode Escapes</title>
452 <indexterm zone="sql-syntax-strings-uescape">
453 <primary>Unicode escape</primary>
454 <secondary>in string constants</secondary>
455 </indexterm>
457 <para>
458 <productname>PostgreSQL</productname> also supports another type
459 of escape syntax for strings that allows specifying arbitrary
460 Unicode characters by code point. A Unicode escape string
461 constant starts with <literal>U&amp;</literal> (upper or lower case
462 letter U followed by ampersand) immediately before the opening
463 quote, without any spaces in between, for
464 example <literal>U&amp;'foo'</literal>. (Note that this creates an
465 ambiguity with the operator <literal>&amp;</literal>. Use spaces
466 around the operator to avoid this problem.) Inside the quotes,
467 Unicode characters can be specified in escaped form by writing a
468 backslash followed by the four-digit hexadecimal code point
469 number or alternatively a backslash followed by a plus sign
470 followed by a six-digit hexadecimal code point number. For
471 example, the string <literal>'data'</literal> could be written as
472 <programlisting>
473 U&amp;'d\0061t\+000061'
474 </programlisting>
475 The following less trivial example writes the Russian
476 word <quote>slon</quote> (elephant) in Cyrillic letters:
477 <programlisting>
478 U&amp;'\0441\043B\043E\043D'
479 </programlisting>
480 </para>
482 <para>
483 If a different escape character than backslash is desired, it can
484 be specified using
485 the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
486 clause after the string, for example:
487 <programlisting>
488 U&amp;'d!0061t!+000061' UESCAPE '!'
489 </programlisting>
490 The escape character can be any single character other than a
491 hexadecimal digit, the plus sign, a single quote, a double quote,
492 or a whitespace character.
493 </para>
495 <para>
496 The Unicode escape syntax works only when the server encoding is
497 UTF8. When other server encodings are used, only code points in
498 the ASCII range (up to <literal>\007F</literal>) can be
499 specified.
500 </para>
502 <para>
503 Also, the Unicode escape syntax for string constants only works
504 when the configuration
505 parameter <xref linkend="guc-standard-conforming-strings"> is
506 turned on. This is because otherwise this syntax could confuse
507 clients that parse the SQL statements to the point that it could
508 lead to SQL injections and similar security issues. If the
509 parameter is set to off, this syntax will be rejected with an
510 error message.
511 </para>
513 <para>
514 To include the escape character in the string literally, write it
515 twice.
516 </para>
517 </sect3>
519 <sect3 id="sql-syntax-dollar-quoting">
520 <title>Dollar-Quoted String Constants</title>
522 <indexterm>
523 <primary>dollar quoting</primary>
524 </indexterm>
526 <para>
527 While the standard syntax for specifying string constants is usually
528 convenient, it can be difficult to understand when the desired string
529 contains many single quotes or backslashes, since each of those must
530 be doubled. To allow more readable queries in such situations,
531 <productname>PostgreSQL</productname> provides another way, called
532 <quote>dollar quoting</quote>, to write string constants.
533 A dollar-quoted string constant
534 consists of a dollar sign (<literal>$</literal>), an optional
535 <quote>tag</quote> of zero or more characters, another dollar
536 sign, an arbitrary sequence of characters that makes up the
537 string content, a dollar sign, the same tag that began this
538 dollar quote, and a dollar sign. For example, here are two
539 different ways to specify the string <quote>Dianne's horse</>
540 using dollar quoting:
541 <programlisting>
542 $$Dianne's horse$$
543 $SomeTag$Dianne's horse$SomeTag$
544 </programlisting>
545 Notice that inside the dollar-quoted string, single quotes can be
546 used without needing to be escaped. Indeed, no characters inside
547 a dollar-quoted string are ever escaped: the string content is always
548 written literally. Backslashes are not special, and neither are
549 dollar signs, unless they are part of a sequence matching the opening
550 tag.
551 </para>
553 <para>
554 It is possible to nest dollar-quoted string constants by choosing
555 different tags at each nesting level. This is most commonly used in
556 writing function definitions. For example:
557 <programlisting>
558 $function$
559 BEGIN
560 RETURN ($1 ~ $q$[\t\r\n\v\\]$q$);
561 END;
562 $function$
563 </programlisting>
564 Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</> represents a
565 dollar-quoted literal string <literal>[\t\r\n\v\\]</>, which will
566 be recognized when the function body is executed by
567 <productname>PostgreSQL</>. But since the sequence does not match
568 the outer dollar quoting delimiter <literal>$function$</>, it is
569 just some more characters within the constant so far as the outer
570 string is concerned.
571 </para>
573 <para>
574 The tag, if any, of a dollar-quoted string follows the same rules
575 as an unquoted identifier, except that it cannot contain a dollar sign.
576 Tags are case sensitive, so <literal>$tag$String content$tag$</literal>
577 is correct, but <literal>$TAG$String content$tag$</literal> is not.
578 </para>
580 <para>
581 A dollar-quoted string that follows a keyword or identifier must
582 be separated from it by whitespace; otherwise the dollar quoting
583 delimiter would be taken as part of the preceding identifier.
584 </para>
586 <para>
587 Dollar quoting is not part of the SQL standard, but it is often a more
588 convenient way to write complicated string literals than the
589 standard-compliant single quote syntax. It is particularly useful when
590 representing string constants inside other constants, as is often needed
591 in procedural function definitions. With single-quote syntax, each
592 backslash in the above example would have to be written as four
593 backslashes, which would be reduced to two backslashes in parsing the
594 original string constant, and then to one when the inner string constant
595 is re-parsed during function execution.
596 </para>
597 </sect3>
599 <sect3 id="sql-syntax-bit-strings">
600 <title>Bit-String Constants</title>
602 <indexterm zone="sql-syntax-bit-strings">
603 <primary>bit string</primary>
604 <secondary>constant</secondary>
605 </indexterm>
607 <para>
608 Bit-string constants look like regular string constants with a
609 <literal>B</literal> (upper or lower case) immediately before the
610 opening quote (no intervening whitespace), e.g.,
611 <literal>B'1001'</literal>. The only characters allowed within
612 bit-string constants are <literal>0</literal> and
613 <literal>1</literal>.
614 </para>
616 <para>
617 Alternatively, bit-string constants can be specified in hexadecimal
618 notation, using a leading <literal>X</literal> (upper or lower case),
619 e.g., <literal>X'1FF'</literal>. This notation is equivalent to
620 a bit-string constant with four binary digits for each hexadecimal digit.
621 </para>
623 <para>
624 Both forms of bit-string constant can be continued
625 across lines in the same way as regular string constants.
626 Dollar quoting cannot be used in a bit-string constant.
627 </para>
628 </sect3>
630 <sect3>
631 <title>Numeric Constants</title>
633 <indexterm>
634 <primary>number</primary>
635 <secondary>constant</secondary>
636 </indexterm>
638 <para>
639 Numeric constants are accepted in these general forms:
640 <synopsis>
641 <replaceable>digits</replaceable>
642 <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
643 <optional><replaceable>digits</replaceable></optional>.<replaceable>digits</replaceable><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
644 <replaceable>digits</replaceable>e<optional>+-</optional><replaceable>digits</replaceable>
645 </synopsis>
646 where <replaceable>digits</replaceable> is one or more decimal
647 digits (0 through 9). At least one digit must be before or after the
648 decimal point, if one is used. At least one digit must follow the
649 exponent marker (<literal>e</literal>), if one is present.
650 There cannot be any spaces or other characters embedded in the
651 constant. Note that any leading plus or minus sign is not actually
652 considered part of the constant; it is an operator applied to the
653 constant.
654 </para>
656 <para>
657 These are some examples of valid numeric constants:
658 <literallayout>
662 .001
664 1.925e-3
665 </literallayout>
666 </para>
668 <para>
669 <indexterm><primary>integer</primary></indexterm>
670 <indexterm><primary>bigint</primary></indexterm>
671 <indexterm><primary>numeric</primary></indexterm>
672 A numeric constant that contains neither a decimal point nor an
673 exponent is initially presumed to be type <type>integer</> if its
674 value fits in type <type>integer</> (32 bits); otherwise it is
675 presumed to be type <type>bigint</> if its
676 value fits in type <type>bigint</> (64 bits); otherwise it is
677 taken to be type <type>numeric</>. Constants that contain decimal
678 points and/or exponents are always initially presumed to be type
679 <type>numeric</>.
680 </para>
682 <para>
683 The initially assigned data type of a numeric constant is just a
684 starting point for the type resolution algorithms. In most cases
685 the constant will be automatically coerced to the most
686 appropriate type depending on context. When necessary, you can
687 force a numeric value to be interpreted as a specific data type
688 by casting it.<indexterm><primary>type cast</primary></indexterm>
689 For example, you can force a numeric value to be treated as type
690 <type>real</> (<type>float4</>) by writing:
692 <programlisting>
693 REAL '1.23' -- string style
694 1.23::REAL -- PostgreSQL (historical) style
695 </programlisting>
697 These are actually just special cases of the general casting
698 notations discussed next.
699 </para>
700 </sect3>
702 <sect3 id="sql-syntax-constants-generic">
703 <title>Constants of Other Types</title>
705 <indexterm>
706 <primary>data type</primary>
707 <secondary>constant</secondary>
708 </indexterm>
710 <para>
711 A constant of an <emphasis>arbitrary</emphasis> type can be
712 entered using any one of the following notations:
713 <synopsis>
714 <replaceable>type</replaceable> '<replaceable>string</replaceable>'
715 '<replaceable>string</replaceable>'::<replaceable>type</replaceable>
716 CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
717 </synopsis>
718 The string constant's text is passed to the input conversion
719 routine for the type called <replaceable>type</replaceable>. The
720 result is a constant of the indicated type. The explicit type
721 cast can be omitted if there is no ambiguity as to the type the
722 constant must be (for example, when it is assigned directly to a
723 table column), in which case it is automatically coerced.
724 </para>
726 <para>
727 The string constant can be written using either regular SQL
728 notation or dollar-quoting.
729 </para>
731 <para>
732 It is also possible to specify a type coercion using a function-like
733 syntax:
734 <synopsis>
735 <replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
736 </synopsis>
737 but not all type names can be used in this way; see <xref
738 linkend="sql-syntax-type-casts"> for details.
739 </para>
741 <para>
742 The <literal>::</literal>, <literal>CAST()</literal>, and
743 function-call syntaxes can also be used to specify run-time type
744 conversions of arbitrary expressions, as discussed in <xref
745 linkend="sql-syntax-type-casts">. To avoid syntactic ambiguity, the
746 <literal><replaceable>type</> '<replaceable>string</>'</literal>
747 syntax can only be used to specify the type of a simple literal constant.
748 Another restriction on the
749 <literal><replaceable>type</> '<replaceable>string</>'</literal>
750 syntax is that it does not work for array types; use <literal>::</literal>
751 or <literal>CAST()</literal> to specify the type of an array constant.
752 </para>
754 <para>
755 The <literal>CAST()</> syntax conforms to SQL. The
756 <literal><replaceable>type</> '<replaceable>string</>'</literal>
757 syntax is a generalization of the standard: SQL specifies this syntax only
758 for a few data types, but <productname>PostgreSQL</productname> allows it
759 for all types. The syntax with
760 <literal>::</literal> is historical <productname>PostgreSQL</productname>
761 usage, as is the function-call syntax.
762 </para>
763 </sect3>
764 </sect2>
766 <sect2 id="sql-syntax-operators">
767 <title>Operators</title>
769 <indexterm zone="sql-syntax-operators">
770 <primary>operator</primary>
771 <secondary>syntax</secondary>
772 </indexterm>
774 <para>
775 An operator name is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
776 (63 by default) characters from the following list:
777 <literallayout>
778 + - * / &lt; &gt; = ~ ! @ # % ^ &amp; | ` ?
779 </literallayout>
781 There are a few restrictions on operator names, however:
782 <itemizedlist>
783 <listitem>
784 <para>
785 <literal>--</literal> and <literal>/*</literal> cannot appear
786 anywhere in an operator name, since they will be taken as the
787 start of a comment.
788 </para>
789 </listitem>
791 <listitem>
792 <para>
793 A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
794 unless the name also contains at least one of these characters:
795 <literallayout>
796 ~ ! @ # % ^ &amp; | ` ?
797 </literallayout>
798 For example, <literal>@-</literal> is an allowed operator name,
799 but <literal>*-</literal> is not. This restriction allows
800 <productname>PostgreSQL</productname> to parse SQL-compliant
801 queries without requiring spaces between tokens.
802 </para>
803 </listitem>
804 </itemizedlist>
805 </para>
807 <para>
808 When working with non-SQL-standard operator names, you will usually
809 need to separate adjacent operators with spaces to avoid ambiguity.
810 For example, if you have defined a left unary operator named <literal>@</literal>,
811 you cannot write <literal>X*@Y</literal>; you must write
812 <literal>X* @Y</literal> to ensure that
813 <productname>PostgreSQL</productname> reads it as two operator names
814 not one.
815 </para>
816 </sect2>
818 <sect2>
819 <title>Special Characters</title>
821 <para>
822 Some characters that are not alphanumeric have a special meaning
823 that is different from being an operator. Details on the usage can
824 be found at the location where the respective syntax element is
825 described. This section only exists to advise the existence and
826 summarize the purposes of these characters.
828 <itemizedlist>
829 <listitem>
830 <para>
831 A dollar sign (<literal>$</literal>) followed by digits is used
832 to represent a positional parameter in the body of a function
833 definition or a prepared statement. In other contexts the
834 dollar sign can be part of an identifier or a dollar-quoted string
835 constant.
836 </para>
837 </listitem>
839 <listitem>
840 <para>
841 Parentheses (<literal>()</literal>) have their usual meaning to
842 group expressions and enforce precedence. In some cases
843 parentheses are required as part of the fixed syntax of a
844 particular SQL command.
845 </para>
846 </listitem>
848 <listitem>
849 <para>
850 Brackets (<literal>[]</literal>) are used to select the elements
851 of an array. See <xref linkend="arrays"> for more information
852 on arrays.
853 </para>
854 </listitem>
856 <listitem>
857 <para>
858 Commas (<literal>,</literal>) are used in some syntactical
859 constructs to separate the elements of a list.
860 </para>
861 </listitem>
863 <listitem>
864 <para>
865 The semicolon (<literal>;</literal>) terminates an SQL command.
866 It cannot appear anywhere within a command, except within a
867 string constant or quoted identifier.
868 </para>
869 </listitem>
871 <listitem>
872 <para>
873 The colon (<literal>:</literal>) is used to select
874 <quote>slices</quote> from arrays. (See <xref
875 linkend="arrays">.) In certain SQL dialects (such as Embedded
876 SQL), the colon is used to prefix variable names.
877 </para>
878 </listitem>
880 <listitem>
881 <para>
882 The asterisk (<literal>*</literal>) is used in some contexts to denote
883 all the fields of a table row or composite value. It also
884 has a special meaning when used as the argument of an
885 aggregate function, namely that the aggregate does not require
886 any explicit parameter.
887 </para>
888 </listitem>
890 <listitem>
891 <para>
892 The period (<literal>.</literal>) is used in numeric
893 constants, and to separate schema, table, and column names.
894 </para>
895 </listitem>
896 </itemizedlist>
898 </para>
899 </sect2>
901 <sect2 id="sql-syntax-comments">
902 <title>Comments</title>
904 <indexterm zone="sql-syntax-comments">
905 <primary>comment</primary>
906 <secondary sortas="SQL">in SQL</secondary>
907 </indexterm>
909 <para>
910 A comment is a sequence of characters beginning with
911 double dashes and extending to the end of the line, e.g.:
912 <programlisting>
913 -- This is a standard SQL comment
914 </programlisting>
915 </para>
917 <para>
918 Alternatively, C-style block comments can be used:
919 <programlisting>
920 /* multiline comment
921 * with nesting: /* nested block comment */
923 </programlisting>
924 where the comment begins with <literal>/*</literal> and extends to
925 the matching occurrence of <literal>*/</literal>. These block
926 comments nest, as specified in the SQL standard but unlike C, so that one can
927 comment out larger blocks of code that might contain existing block
928 comments.
929 </para>
931 <para>
932 A comment is removed from the input stream before further syntax
933 analysis and is effectively replaced by whitespace.
934 </para>
935 </sect2>
937 <sect2 id="sql-precedence">
938 <title>Lexical Precedence</title>
940 <indexterm zone="sql-precedence">
941 <primary>operator</primary>
942 <secondary>precedence</secondary>
943 </indexterm>
945 <para>
946 <xref linkend="sql-precedence-table"> shows the precedence and
947 associativity of the operators in <productname>PostgreSQL</>.
948 Most operators have the same precedence and are left-associative.
949 The precedence and associativity of the operators is hard-wired
950 into the parser. This can lead to non-intuitive behavior; for
951 example the Boolean operators <literal>&lt;</> and
952 <literal>&gt;</> have a different precedence than the Boolean
953 operators <literal>&lt;=</> and <literal>&gt;=</>. Also, you will
954 sometimes need to add parentheses when using combinations of
955 binary and unary operators. For instance:
956 <programlisting>
957 SELECT 5 ! - 6;
958 </programlisting>
959 will be parsed as:
960 <programlisting>
961 SELECT 5 ! (- 6);
962 </programlisting>
963 because the parser has no idea &mdash; until it is too late
964 &mdash; that <token>!</token> is defined as a postfix operator,
965 not an infix one. To get the desired behavior in this case, you
966 must write:
967 <programlisting>
968 SELECT (5 !) - 6;
969 </programlisting>
970 This is the price one pays for extensibility.
971 </para>
973 <table id="sql-precedence-table">
974 <title>Operator Precedence (decreasing)</title>
976 <tgroup cols="3">
977 <thead>
978 <row>
979 <entry>Operator/Element</entry>
980 <entry>Associativity</entry>
981 <entry>Description</entry>
982 </row>
983 </thead>
985 <tbody>
986 <row>
987 <entry><token>.</token></entry>
988 <entry>left</entry>
989 <entry>table/column name separator</entry>
990 </row>
992 <row>
993 <entry><token>::</token></entry>
994 <entry>left</entry>
995 <entry><productname>PostgreSQL</productname>-style typecast</entry>
996 </row>
998 <row>
999 <entry><token>[</token> <token>]</token></entry>
1000 <entry>left</entry>
1001 <entry>array element selection</entry>
1002 </row>
1004 <row>
1005 <entry><token>-</token></entry>
1006 <entry>right</entry>
1007 <entry>unary minus</entry>
1008 </row>
1010 <row>
1011 <entry><token>^</token></entry>
1012 <entry>left</entry>
1013 <entry>exponentiation</entry>
1014 </row>
1016 <row>
1017 <entry><token>*</token> <token>/</token> <token>%</token></entry>
1018 <entry>left</entry>
1019 <entry>multiplication, division, modulo</entry>
1020 </row>
1022 <row>
1023 <entry><token>+</token> <token>-</token></entry>
1024 <entry>left</entry>
1025 <entry>addition, subtraction</entry>
1026 </row>
1028 <row>
1029 <entry><token>IS</token></entry>
1030 <entry></entry>
1031 <entry><literal>IS TRUE</>, <literal>IS FALSE</>, <literal>IS UNKNOWN</>, <literal>IS NULL</></entry>
1032 </row>
1034 <row>
1035 <entry><token>ISNULL</token></entry>
1036 <entry></entry>
1037 <entry>test for null</entry>
1038 </row>
1040 <row>
1041 <entry><token>NOTNULL</token></entry>
1042 <entry></entry>
1043 <entry>test for not null</entry>
1044 </row>
1046 <row>
1047 <entry>(any other)</entry>
1048 <entry>left</entry>
1049 <entry>all other native and user-defined operators</entry>
1050 </row>
1052 <row>
1053 <entry><token>IN</token></entry>
1054 <entry></entry>
1055 <entry>set membership</entry>
1056 </row>
1058 <row>
1059 <entry><token>BETWEEN</token></entry>
1060 <entry></entry>
1061 <entry>range containment</entry>
1062 </row>
1064 <row>
1065 <entry><token>OVERLAPS</token></entry>
1066 <entry></entry>
1067 <entry>time interval overlap</entry>
1068 </row>
1070 <row>
1071 <entry><token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
1072 <entry></entry>
1073 <entry>string pattern matching</entry>
1074 </row>
1076 <row>
1077 <entry><token>&lt;</token> <token>&gt;</token></entry>
1078 <entry></entry>
1079 <entry>less than, greater than</entry>
1080 </row>
1082 <row>
1083 <entry><token>=</token></entry>
1084 <entry>right</entry>
1085 <entry>equality, assignment</entry>
1086 </row>
1088 <row>
1089 <entry><token>NOT</token></entry>
1090 <entry>right</entry>
1091 <entry>logical negation</entry>
1092 </row>
1094 <row>
1095 <entry><token>AND</token></entry>
1096 <entry>left</entry>
1097 <entry>logical conjunction</entry>
1098 </row>
1100 <row>
1101 <entry><token>OR</token></entry>
1102 <entry>left</entry>
1103 <entry>logical disjunction</entry>
1104 </row>
1105 </tbody>
1106 </tgroup>
1107 </table>
1109 <para>
1110 Note that the operator precedence rules also apply to user-defined
1111 operators that have the same names as the built-in operators
1112 mentioned above. For example, if you define a
1113 <quote>+</quote> operator for some custom data type it will have
1114 the same precedence as the built-in <quote>+</quote> operator, no
1115 matter what yours does.
1116 </para>
1118 <para>
1119 When a schema-qualified operator name is used in the
1120 <literal>OPERATOR</> syntax, as for example in:
1121 <programlisting>
1122 SELECT 3 OPERATOR(pg_catalog.+) 4;
1123 </programlisting>
1124 the <literal>OPERATOR</> construct is taken to have the default precedence
1125 shown in <xref linkend="sql-precedence-table"> for <quote>any other</> operator. This is true no matter
1126 which specific operator appears inside <literal>OPERATOR()</>.
1127 </para>
1128 </sect2>
1129 </sect1>
1131 <sect1 id="sql-expressions">
1132 <title>Value Expressions</title>
1134 <indexterm zone="sql-expressions">
1135 <primary>expression</primary>
1136 <secondary>syntax</secondary>
1137 </indexterm>
1139 <indexterm zone="sql-expressions">
1140 <primary>value expression</primary>
1141 </indexterm>
1143 <indexterm>
1144 <primary>scalar</primary>
1145 <see>expression</see>
1146 </indexterm>
1148 <para>
1149 Value expressions are used in a variety of contexts, such
1150 as in the target list of the <command>SELECT</command> command, as
1151 new column values in <command>INSERT</command> or
1152 <command>UPDATE</command>, or in search conditions in a number of
1153 commands. The result of a value expression is sometimes called a
1154 <firstterm>scalar</firstterm>, to distinguish it from the result of
1155 a table expression (which is a table). Value expressions are
1156 therefore also called <firstterm>scalar expressions</firstterm> (or
1157 even simply <firstterm>expressions</firstterm>). The expression
1158 syntax allows the calculation of values from primitive parts using
1159 arithmetic, logical, set, and other operations.
1160 </para>
1162 <para>
1163 A value expression is one of the following:
1165 <itemizedlist>
1166 <listitem>
1167 <para>
1168 A constant or literal value
1169 </para>
1170 </listitem>
1172 <listitem>
1173 <para>
1174 A column reference
1175 </para>
1176 </listitem>
1178 <listitem>
1179 <para>
1180 A positional parameter reference, in the body of a function definition
1181 or prepared statement
1182 </para>
1183 </listitem>
1185 <listitem>
1186 <para>
1187 A subscripted expression
1188 </para>
1189 </listitem>
1191 <listitem>
1192 <para>
1193 A field selection expression
1194 </para>
1195 </listitem>
1197 <listitem>
1198 <para>
1199 An operator invocation
1200 </para>
1201 </listitem>
1203 <listitem>
1204 <para>
1205 A function call
1206 </para>
1207 </listitem>
1209 <listitem>
1210 <para>
1211 An aggregate expression
1212 </para>
1213 </listitem>
1215 <listitem>
1216 <para>
1217 A window function call
1218 </para>
1219 </listitem>
1221 <listitem>
1222 <para>
1223 A type cast
1224 </para>
1225 </listitem>
1227 <listitem>
1228 <para>
1229 A scalar subquery
1230 </para>
1231 </listitem>
1233 <listitem>
1234 <para>
1235 An array constructor
1236 </para>
1237 </listitem>
1239 <listitem>
1240 <para>
1241 A row constructor
1242 </para>
1243 </listitem>
1245 <listitem>
1246 <para>
1247 Another value expression in parentheses (used to group
1248 subexpressions and override
1249 precedence<indexterm><primary>parenthesis</></>)
1250 </para>
1251 </listitem>
1252 </itemizedlist>
1253 </para>
1255 <para>
1256 In addition to this list, there are a number of constructs that can
1257 be classified as an expression but do not follow any general syntax
1258 rules. These generally have the semantics of a function or
1259 operator and are explained in the appropriate location in <xref
1260 linkend="functions">. An example is the <literal>IS NULL</literal>
1261 clause.
1262 </para>
1264 <para>
1265 We have already discussed constants in <xref
1266 linkend="sql-syntax-constants">. The following sections discuss
1267 the remaining options.
1268 </para>
1270 <sect2>
1271 <title>Column References</title>
1273 <indexterm>
1274 <primary>column reference</primary>
1275 </indexterm>
1277 <para>
1278 A column can be referenced in the form:
1279 <synopsis>
1280 <replaceable>correlation</replaceable>.<replaceable>columnname</replaceable>
1281 </synopsis>
1282 </para>
1284 <para>
1285 <replaceable>correlation</replaceable> is the name of a
1286 table (possibly qualified with a schema name), or an alias for a table
1287 defined by means of a <literal>FROM</literal> clause, or one of
1288 the key words <literal>NEW</literal> or <literal>OLD</literal>.
1289 (<literal>NEW</literal> and <literal>OLD</literal> can only appear in rewrite rules,
1290 while other correlation names can be used in any SQL statement.)
1291 The correlation name and separating dot can be omitted if the column name
1292 is unique across all the tables being used in the current query. (See also <xref linkend="queries">.)
1293 </para>
1294 </sect2>
1296 <sect2>
1297 <title>Positional Parameters</title>
1299 <indexterm>
1300 <primary>parameter</primary>
1301 <secondary>syntax</secondary>
1302 </indexterm>
1304 <indexterm>
1305 <primary>$</primary>
1306 </indexterm>
1308 <para>
1309 A positional parameter reference is used to indicate a value
1310 that is supplied externally to an SQL statement. Parameters are
1311 used in SQL function definitions and in prepared queries. Some
1312 client libraries also support specifying data values separately
1313 from the SQL command string, in which case parameters are used to
1314 refer to the out-of-line data values.
1315 The form of a parameter reference is:
1316 <synopsis>
1317 $<replaceable>number</replaceable>
1318 </synopsis>
1319 </para>
1321 <para>
1322 For example, consider the definition of a function,
1323 <function>dept</function>, as:
1325 <programlisting>
1326 CREATE FUNCTION dept(text) RETURNS dept
1327 AS $$ SELECT * FROM dept WHERE name = $1 $$
1328 LANGUAGE SQL;
1329 </programlisting>
1331 Here the <literal>$1</literal> references the value of the first
1332 function argument whenever the function is invoked.
1333 </para>
1334 </sect2>
1336 <sect2>
1337 <title>Subscripts</title>
1339 <indexterm>
1340 <primary>subscript</primary>
1341 </indexterm>
1343 <para>
1344 If an expression yields a value of an array type, then a specific
1345 element of the array value can be extracted by writing
1346 <synopsis>
1347 <replaceable>expression</replaceable>[<replaceable>subscript</replaceable>]
1348 </synopsis>
1349 or multiple adjacent elements (an <quote>array slice</>) can be extracted
1350 by writing
1351 <synopsis>
1352 <replaceable>expression</replaceable>[<replaceable>lower_subscript</replaceable>:<replaceable>upper_subscript</replaceable>]
1353 </synopsis>
1354 (Here, the brackets <literal>[ ]</literal> are meant to appear literally.)
1355 Each <replaceable>subscript</replaceable> is itself an expression,
1356 which must yield an integer value.
1357 </para>
1359 <para>
1360 In general the array <replaceable>expression</replaceable> must be
1361 parenthesized, but the parentheses can be omitted when the expression
1362 to be subscripted is just a column reference or positional parameter.
1363 Also, multiple subscripts can be concatenated when the original array
1364 is multidimensional.
1365 For example:
1367 <programlisting>
1368 mytable.arraycolumn[4]
1369 mytable.two_d_column[17][34]
1370 $1[10:42]
1371 (arrayfunction(a,b))[42]
1372 </programlisting>
1374 The parentheses in the last example are required.
1375 See <xref linkend="arrays"> for more about arrays.
1376 </para>
1377 </sect2>
1379 <sect2>
1380 <title>Field Selection</title>
1382 <indexterm>
1383 <primary>field selection</primary>
1384 </indexterm>
1386 <para>
1387 If an expression yields a value of a composite type (row type), then a
1388 specific field of the row can be extracted by writing
1389 <synopsis>
1390 <replaceable>expression</replaceable>.<replaceable>fieldname</replaceable>
1391 </synopsis>
1392 </para>
1394 <para>
1395 In general the row <replaceable>expression</replaceable> must be
1396 parenthesized, but the parentheses can be omitted when the expression
1397 to be selected from is just a table reference or positional parameter.
1398 For example:
1400 <programlisting>
1401 mytable.mycolumn
1402 $1.somecolumn
1403 (rowfunction(a,b)).col3
1404 </programlisting>
1406 (Thus, a qualified column reference is actually just a special case
1407 of the field selection syntax.) An important special case is
1408 extracting a field from a table column that is of a composite type:
1410 <programlisting>
1411 (compositecol).somefield
1412 (mytable.compositecol).somefield
1413 </programlisting>
1415 The parentheses are required here to show that
1416 <structfield>compositecol</> is a column name not a table name,
1417 or that <structname>mytable</> is a table name not a schema name
1418 in the second case.
1419 </para>
1420 </sect2>
1422 <sect2>
1423 <title>Operator Invocations</title>
1425 <indexterm>
1426 <primary>operator</primary>
1427 <secondary>invocation</secondary>
1428 </indexterm>
1430 <para>
1431 There are three possible syntaxes for an operator invocation:
1432 <simplelist>
1433 <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> <replaceable>expression</replaceable> (binary infix operator)</member>
1434 <member><replaceable>operator</replaceable> <replaceable>expression</replaceable> (unary prefix operator)</member>
1435 <member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
1436 </simplelist>
1437 where the <replaceable>operator</replaceable> token follows the syntax
1438 rules of <xref linkend="sql-syntax-operators">, or is one of the
1439 key words <token>AND</token>, <token>OR</token>, and
1440 <token>NOT</token>, or is a qualified operator name in the form:
1441 <synopsis>
1442 <literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
1443 </synopsis>
1444 Which particular operators exist and whether
1445 they are unary or binary depends on what operators have been
1446 defined by the system or the user. <xref linkend="functions">
1447 describes the built-in operators.
1448 </para>
1449 </sect2>
1451 <sect2>
1452 <title>Function Calls</title>
1454 <indexterm>
1455 <primary>function</primary>
1456 <secondary>invocation</secondary>
1457 </indexterm>
1459 <para>
1460 The syntax for a function call is the name of a function
1461 (possibly qualified with a schema name), followed by its argument list
1462 enclosed in parentheses:
1464 <synopsis>
1465 <replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
1466 </synopsis>
1467 </para>
1469 <para>
1470 For example, the following computes the square root of 2:
1471 <programlisting>
1472 sqrt(2)
1473 </programlisting>
1474 </para>
1476 <para>
1477 The list of built-in functions is in <xref linkend="functions">.
1478 Other functions can be added by the user.
1479 </para>
1480 </sect2>
1482 <sect2 id="syntax-aggregates">
1483 <title>Aggregate Expressions</title>
1485 <indexterm zone="syntax-aggregates">
1486 <primary>aggregate function</primary>
1487 <secondary>invocation</secondary>
1488 </indexterm>
1490 <para>
1491 An <firstterm>aggregate expression</firstterm> represents the
1492 application of an aggregate function across the rows selected by a
1493 query. An aggregate function reduces multiple inputs to a single
1494 output value, such as the sum or average of the inputs. The
1495 syntax of an aggregate expression is one of the following:
1497 <synopsis>
1498 <replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable> [ , ... ] )
1499 <replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable> [ , ... ] )
1500 <replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)
1501 <replaceable>aggregate_name</replaceable> ( * )
1502 </synopsis>
1504 where <replaceable>aggregate_name</replaceable> is a previously
1505 defined aggregate (possibly qualified with a schema name), and
1506 <replaceable>expression</replaceable> is
1507 any value expression that does not itself contain an aggregate
1508 expression or a window function call.
1509 </para>
1511 <para>
1512 The first form of aggregate expression invokes the aggregate
1513 across all input rows for which the given expression(s) yield
1514 non-null values. (Actually, it is up to the aggregate function
1515 whether to ignore null values or not &mdash; but all the standard ones do.)
1516 The second form is the same as the first, since
1517 <literal>ALL</literal> is the default. The third form invokes the
1518 aggregate for all distinct non-null values of the expressions found
1519 in the input rows. The last form invokes the aggregate once for
1520 each input row regardless of null or non-null values; since no
1521 particular input value is specified, it is generally only useful
1522 for the <function>count(*)</function> aggregate function.
1523 </para>
1525 <para>
1526 For example, <literal>count(*)</literal> yields the total number
1527 of input rows; <literal>count(f1)</literal> yields the number of
1528 input rows in which <literal>f1</literal> is non-null;
1529 <literal>count(distinct f1)</literal> yields the number of
1530 distinct non-null values of <literal>f1</literal>.
1531 </para>
1533 <para>
1534 The predefined aggregate functions are described in <xref
1535 linkend="functions-aggregate">. Other aggregate functions can be added
1536 by the user.
1537 </para>
1539 <para>
1540 An aggregate expression can only appear in the result list or
1541 <literal>HAVING</> clause of a <command>SELECT</> command.
1542 It is forbidden in other clauses, such as <literal>WHERE</>,
1543 because those clauses are logically evaluated before the results
1544 of aggregates are formed.
1545 </para>
1547 <para>
1548 When an aggregate expression appears in a subquery (see
1549 <xref linkend="sql-syntax-scalar-subqueries"> and
1550 <xref linkend="functions-subquery">), the aggregate is normally
1551 evaluated over the rows of the subquery. But an exception occurs
1552 if the aggregate's arguments contain only outer-level variables:
1553 the aggregate then belongs to the nearest such outer level, and is
1554 evaluated over the rows of that query. The aggregate expression
1555 as a whole is then an outer reference for the subquery it appears in,
1556 and acts as a constant over any one evaluation of that subquery.
1557 The restriction about
1558 appearing only in the result list or <literal>HAVING</> clause
1559 applies with respect to the query level that the aggregate belongs to.
1560 </para>
1562 <note>
1563 <para>
1564 <productname>PostgreSQL</productname> currently does not support
1565 <literal>DISTINCT</> with more than one input expression.
1566 </para>
1567 </note>
1568 </sect2>
1570 <sect2 id="syntax-window-functions">
1571 <title>Window Function Calls</title>
1573 <indexterm zone="syntax-window-functions">
1574 <primary>window function</primary>
1575 <secondary>invocation</secondary>
1576 </indexterm>
1578 <indexterm zone="syntax-window-functions">
1579 <primary>OVER clause</primary>
1580 </indexterm>
1582 <para>
1583 A <firstterm>window function call</firstterm> represents the application
1584 of an aggregate-like function over some portion of the rows selected
1585 by a query. Unlike regular aggregate function calls, this is not tied
1586 to grouping of the selected rows into a single output row &mdash; each
1587 row remains separate in the query output. However the window function
1588 is able to scan all the rows that would be part of the current row's
1589 group according to the grouping specification (<literal>PARTITION BY</>
1590 list) of the window function call.
1591 The syntax of a window function call is one of the following:
1593 <synopsis>
1594 <replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER ( <replaceable class="parameter">window_definition</replaceable> )
1595 <replaceable>function_name</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional>) OVER <replaceable>window_name</replaceable>
1596 <replaceable>function_name</replaceable> ( * ) OVER ( <replaceable class="parameter">window_definition</replaceable> )
1597 <replaceable>function_name</replaceable> ( * ) OVER <replaceable>window_name</replaceable>
1598 </synopsis>
1599 where <replaceable class="parameter">window_definition</replaceable>
1600 has the syntax
1601 <synopsis>
1602 [ <replaceable class="parameter">existing_window_name</replaceable> ]
1603 [ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
1604 [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ]
1605 [ <replaceable class="parameter">frame_clause</replaceable> ]
1606 </synopsis>
1607 and the optional <replaceable class="parameter">frame_clause</replaceable>
1608 can be one of
1609 <synopsis>
1610 RANGE UNBOUNDED PRECEDING
1611 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
1612 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
1613 ROWS UNBOUNDED PRECEDING
1614 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
1615 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
1616 </synopsis>
1618 Here, <replaceable>expression</replaceable> represents any value
1619 expression that does not itself contain window function calls.
1620 The <literal>PARTITION BY</> and <literal>ORDER BY</> lists have
1621 essentially the same syntax and semantics as <literal>GROUP BY</>
1622 and <literal>ORDER BY</> clauses of the whole query.
1623 <replaceable>window_name</replaceable> is a reference to a named window
1624 specification defined in the query's <literal>WINDOW</literal> clause.
1625 Named window specifications are usually referenced with just
1626 <literal>OVER</> <replaceable>window_name</replaceable>, but it is
1627 also possible to write a window name inside the parentheses and then
1628 optionally supply an ordering clause and/or frame clause (the referenced
1629 window must lack these clauses, if they are supplied here).
1630 This latter syntax follows the same rules as modifying an existing
1631 window name within the <literal>WINDOW</literal> clause; see the
1632 <xref linkend="sql-select" endterm="sql-select-title"> reference
1633 page for details.
1634 </para>
1636 <para>
1637 The <replaceable class="parameter">frame_clause</replaceable> specifies
1638 the set of rows constituting the <firstterm>window frame</>, for those
1639 window functions that act on the frame instead of the whole partition.
1640 The default framing option is <literal>RANGE UNBOUNDED PRECEDING</>,
1641 which is the same as <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND
1642 CURRENT ROW</>; it selects rows up through the current row's last
1643 peer in the <literal>ORDER BY</> ordering (which means all rows if
1644 there is no <literal>ORDER BY</>). The options
1645 <literal>RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</> and
1646 <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</>
1647 are also equivalent: they always select all rows in the partition.
1648 Lastly, <literal>ROWS UNBOUNDED PRECEDING</> or its verbose equivalent
1649 <literal>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</> select
1650 all rows up through the current row (regardless of duplicates).
1651 Beware that this option can produce implementation-dependent results
1652 if the <literal>ORDER BY</> ordering does not order the rows uniquely.
1653 </para>
1655 <para>
1656 The built-in window functions are described in <xref
1657 linkend="functions-window-table">. Other window functions can be added by
1658 the user. Also, any built-in or user-defined aggregate function can be
1659 used as a window function.
1660 </para>
1662 <para>
1663 The syntaxes using <literal>*</> are used for calling parameter-less
1664 aggregate functions as window functions, for example
1665 <literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
1666 <literal>*</> is customarily not used for non-aggregate window functions.
1667 Aggregate window functions, unlike normal aggregate functions, do not
1668 allow <literal>DISTINCT</> to be used within the function argument list.
1669 </para>
1671 <para>
1672 Window function calls are permitted only in the <literal>SELECT</literal>
1673 list and the <literal>ORDER BY</> clause of the query.
1674 </para>
1676 <para>
1677 More information about window functions can be found in
1678 <xref linkend="tutorial-window"> and
1679 <xref linkend="queries-window">.
1680 </para>
1681 </sect2>
1683 <sect2 id="sql-syntax-type-casts">
1684 <title>Type Casts</title>
1686 <indexterm>
1687 <primary>data type</primary>
1688 <secondary>type cast</secondary>
1689 </indexterm>
1691 <indexterm>
1692 <primary>type cast</primary>
1693 </indexterm>
1695 <para>
1696 A type cast specifies a conversion from one data type to another.
1697 <productname>PostgreSQL</productname> accepts two equivalent syntaxes
1698 for type casts:
1699 <synopsis>
1700 CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable> )
1701 <replaceable>expression</replaceable>::<replaceable>type</replaceable>
1702 </synopsis>
1703 The <literal>CAST</> syntax conforms to SQL; the syntax with
1704 <literal>::</literal> is historical <productname>PostgreSQL</productname>
1705 usage.
1706 </para>
1708 <para>
1709 When a cast is applied to a value expression of a known type, it
1710 represents a run-time type conversion. The cast will succeed only
1711 if a suitable type conversion operation has been defined. Notice that this
1712 is subtly different from the use of casts with constants, as shown in
1713 <xref linkend="sql-syntax-constants-generic">. A cast applied to an
1714 unadorned string literal represents the initial assignment of a type
1715 to a literal constant value, and so it will succeed for any type
1716 (if the contents of the string literal are acceptable input syntax for the
1717 data type).
1718 </para>
1720 <para>
1721 An explicit type cast can usually be omitted if there is no ambiguity as
1722 to the type that a value expression must produce (for example, when it is
1723 assigned to a table column); the system will automatically apply a
1724 type cast in such cases. However, automatic casting is only done for
1725 casts that are marked <quote>OK to apply implicitly</>
1726 in the system catalogs. Other casts must be invoked with
1727 explicit casting syntax. This restriction is intended to prevent
1728 surprising conversions from being applied silently.
1729 </para>
1731 <para>
1732 It is also possible to specify a type cast using a function-like
1733 syntax:
1734 <synopsis>
1735 <replaceable>typename</replaceable> ( <replaceable>expression</replaceable> )
1736 </synopsis>
1737 However, this only works for types whose names are also valid as
1738 function names. For example, <literal>double precision</literal>
1739 cannot be used this way, but the equivalent <literal>float8</literal>
1740 can. Also, the names <literal>interval</>, <literal>time</>, and
1741 <literal>timestamp</> can only be used in this fashion if they are
1742 double-quoted, because of syntactic conflicts. Therefore, the use of
1743 the function-like cast syntax leads to inconsistencies and should
1744 probably be avoided.
1745 </para>
1747 <note>
1748 <para>
1749 The function-like syntax is in fact just a function call. When
1750 one of the two standard cast syntaxes is used to do a run-time
1751 conversion, it will internally invoke a registered function to
1752 perform the conversion. By convention, these conversion functions
1753 have the same name as their output type, and thus the <quote>function-like
1754 syntax</> is nothing more than a direct invocation of the underlying
1755 conversion function. Obviously, this is not something that a portable
1756 application should rely on. For further details see
1757 <xref linkend="sql-createcast" endterm="sql-createcast-title">.
1758 </para>
1759 </note>
1760 </sect2>
1762 <sect2 id="sql-syntax-scalar-subqueries">
1763 <title>Scalar Subqueries</title>
1765 <indexterm>
1766 <primary>subquery</primary>
1767 </indexterm>
1769 <para>
1770 A scalar subquery is an ordinary
1771 <command>SELECT</command> query in parentheses that returns exactly one
1772 row with one column. (See <xref linkend="queries"> for information about writing queries.)
1773 The <command>SELECT</command> query is executed
1774 and the single returned value is used in the surrounding value expression.
1775 It is an error to use a query that
1776 returns more than one row or more than one column as a scalar subquery.
1777 (But if, during a particular execution, the subquery returns no rows,
1778 there is no error; the scalar result is taken to be null.)
1779 The subquery can refer to variables from the surrounding query,
1780 which will act as constants during any one evaluation of the subquery.
1781 See also <xref linkend="functions-subquery"> for other expressions involving subqueries.
1782 </para>
1784 <para>
1785 For example, the following finds the largest city population in each
1786 state:
1787 <programlisting>
1788 SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
1789 FROM states;
1790 </programlisting>
1791 </para>
1792 </sect2>
1794 <sect2 id="sql-syntax-array-constructors">
1795 <title>Array Constructors</title>
1797 <indexterm>
1798 <primary>array</primary>
1799 <secondary>constructor</secondary>
1800 </indexterm>
1802 <indexterm>
1803 <primary>ARRAY</primary>
1804 </indexterm>
1806 <para>
1807 An array constructor is an expression that builds an
1808 array value using values for its member elements. A simple array
1809 constructor
1810 consists of the key word <literal>ARRAY</literal>, a left square bracket
1811 <literal>[</>, a list of expressions (separated by commas) for the
1812 array element values, and finally a right square bracket <literal>]</>.
1813 For example:
1814 <programlisting>
1815 SELECT ARRAY[1,2,3+4];
1816 array
1817 ---------
1818 {1,2,7}
1819 (1 row)
1820 </programlisting>
1821 By default,
1822 the array element type is the common type of the member expressions,
1823 determined using the same rules as for <literal>UNION</> or
1824 <literal>CASE</> constructs (see <xref linkend="typeconv-union-case">).
1825 You can override this by explicitly casting the array constructor to the
1826 desired type, for example:
1827 <programlisting>
1828 SELECT ARRAY[1,2,22.7]::integer[];
1829 array
1830 ----------
1831 {1,2,23}
1832 (1 row)
1833 </programlisting>
1834 This has the same effect as casting each expression to the array
1835 element type individually.
1836 For more on casting, see <xref linkend="sql-syntax-type-casts">.
1837 </para>
1839 <para>
1840 Multidimensional array values can be built by nesting array
1841 constructors.
1842 In the inner constructors, the key word <literal>ARRAY</literal> can
1843 be omitted. For example, these produce the same result:
1845 <programlisting>
1846 SELECT ARRAY[ARRAY[1,2], ARRAY[3,4]];
1847 array
1848 ---------------
1849 {{1,2},{3,4}}
1850 (1 row)
1852 SELECT ARRAY[[1,2],[3,4]];
1853 array
1854 ---------------
1855 {{1,2},{3,4}}
1856 (1 row)
1857 </programlisting>
1859 Since multidimensional arrays must be rectangular, inner constructors
1860 at the same level must produce sub-arrays of identical dimensions.
1861 Any cast applied to the outer <literal>ARRAY</> constructor propagates
1862 automatically to all the inner constructors.
1863 </para>
1865 <para>
1866 Multidimensional array constructor elements can be anything yielding
1867 an array of the proper kind, not only a sub-<literal>ARRAY</> construct.
1868 For example:
1869 <programlisting>
1870 CREATE TABLE arr(f1 int[], f2 int[]);
1872 INSERT INTO arr VALUES (ARRAY[[1,2],[3,4]], ARRAY[[5,6],[7,8]]);
1874 SELECT ARRAY[f1, f2, '{{9,10},{11,12}}'::int[]] FROM arr;
1875 array
1876 ------------------------------------------------
1877 {{{1,2},{3,4}},{{5,6},{7,8}},{{9,10},{11,12}}}
1878 (1 row)
1879 </programlisting>
1880 </para>
1882 <para>
1883 You can construct an empty array, but since it's impossible to have an
1884 array with no type, you must explicitly cast your empty array to the
1885 desired type. For example:
1886 <programlisting>
1887 SELECT ARRAY[]::integer[];
1888 array
1889 -------
1891 (1 row)
1892 </programlisting>
1893 </para>
1895 <para>
1896 It is also possible to construct an array from the results of a
1897 subquery. In this form, the array constructor is written with the
1898 key word <literal>ARRAY</literal> followed by a parenthesized (not
1899 bracketed) subquery. For example:
1900 <programlisting>
1901 SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
1902 ?column?
1903 -------------------------------------------------------------
1904 {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31}
1905 (1 row)
1906 </programlisting>
1907 The subquery must return a single column. The resulting
1908 one-dimensional array will have an element for each row in the
1909 subquery result, with an element type matching that of the
1910 subquery's output column.
1911 </para>
1913 <para>
1914 The subscripts of an array value built with <literal>ARRAY</literal>
1915 always begin with one. For more information about arrays, see
1916 <xref linkend="arrays">.
1917 </para>
1919 </sect2>
1921 <sect2 id="sql-syntax-row-constructors">
1922 <title>Row Constructors</title>
1924 <indexterm>
1925 <primary>composite type</primary>
1926 <secondary>constructor</secondary>
1927 </indexterm>
1929 <indexterm>
1930 <primary>row type</primary>
1931 <secondary>constructor</secondary>
1932 </indexterm>
1934 <indexterm>
1935 <primary>ROW</primary>
1936 </indexterm>
1938 <para>
1939 A row constructor is an expression that builds a row value (also
1940 called a composite value) using values
1941 for its member fields. A row constructor consists of the key word
1942 <literal>ROW</literal>, a left parenthesis, zero or more
1943 expressions (separated by commas) for the row field values, and finally
1944 a right parenthesis. For example:
1945 <programlisting>
1946 SELECT ROW(1,2.5,'this is a test');
1947 </programlisting>
1948 The key word <literal>ROW</> is optional when there is more than one
1949 expression in the list.
1950 </para>
1952 <para>
1953 A row constructor can include the syntax
1954 <replaceable>rowvalue</replaceable><literal>.*</literal>,
1955 which will be expanded to a list of the elements of the row value,
1956 just as occurs when the <literal>.*</> syntax is used at the top level
1957 of a <command>SELECT</> list. For example, if table <literal>t</> has
1958 columns <literal>f1</> and <literal>f2</>, these are the same:
1959 <programlisting>
1960 SELECT ROW(t.*, 42) FROM t;
1961 SELECT ROW(t.f1, t.f2, 42) FROM t;
1962 </programlisting>
1963 </para>
1965 <note>
1966 <para>
1967 Before <productname>PostgreSQL</productname> 8.2, the
1968 <literal>.*</literal> syntax was not expanded, so that writing
1969 <literal>ROW(t.*, 42)</> created a two-field row whose first field
1970 was another row value. The new behavior is usually more useful.
1971 If you need the old behavior of nested row values, write the inner
1972 row value without <literal>.*</literal>, for instance
1973 <literal>ROW(t, 42)</>.
1974 </para>
1975 </note>
1977 <para>
1978 By default, the value created by a <literal>ROW</> expression is of
1979 an anonymous record type. If necessary, it can be cast to a named
1980 composite type &mdash; either the row type of a table, or a composite type
1981 created with <command>CREATE TYPE AS</>. An explicit cast might be needed
1982 to avoid ambiguity. For example:
1983 <programlisting>
1984 CREATE TABLE mytable(f1 int, f2 float, f3 text);
1986 CREATE FUNCTION getf1(mytable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
1988 -- No cast needed since only one getf1() exists
1989 SELECT getf1(ROW(1,2.5,'this is a test'));
1990 getf1
1991 -------
1993 (1 row)
1995 CREATE TYPE myrowtype AS (f1 int, f2 text, f3 numeric);
1997 CREATE FUNCTION getf1(myrowtype) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
1999 -- Now we need a cast to indicate which function to call:
2000 SELECT getf1(ROW(1,2.5,'this is a test'));
2001 ERROR: function getf1(record) is not unique
2003 SELECT getf1(ROW(1,2.5,'this is a test')::mytable);
2004 getf1
2005 -------
2007 (1 row)
2009 SELECT getf1(CAST(ROW(11,'this is a test',2.5) AS myrowtype));
2010 getf1
2011 -------
2013 (1 row)
2014 </programlisting>
2015 </para>
2017 <para>
2018 Row constructors can be used to build composite values to be stored
2019 in a composite-type table column, or to be passed to a function that
2020 accepts a composite parameter. Also,
2021 it is possible to compare two row values or test a row with
2022 <literal>IS NULL</> or <literal>IS NOT NULL</>, for example:
2023 <programlisting>
2024 SELECT ROW(1,2.5,'this is a test') = ROW(1, 3, 'not the same');
2026 SELECT ROW(table.*) IS NULL FROM table; -- detect all-null rows
2027 </programlisting>
2028 For more detail see <xref linkend="functions-comparisons">.
2029 Row constructors can also be used in connection with subqueries,
2030 as discussed in <xref linkend="functions-subquery">.
2031 </para>
2033 </sect2>
2035 <sect2 id="syntax-express-eval">
2036 <title>Expression Evaluation Rules</title>
2038 <indexterm>
2039 <primary>expression</primary>
2040 <secondary>order of evaluation</secondary>
2041 </indexterm>
2043 <para>
2044 The order of evaluation of subexpressions is not defined. In
2045 particular, the inputs of an operator or function are not necessarily
2046 evaluated left-to-right or in any other fixed order.
2047 </para>
2049 <para>
2050 Furthermore, if the result of an expression can be determined by
2051 evaluating only some parts of it, then other subexpressions
2052 might not be evaluated at all. For instance, if one wrote:
2053 <programlisting>
2054 SELECT true OR somefunc();
2055 </programlisting>
2056 then <literal>somefunc()</literal> would (probably) not be called
2057 at all. The same would be the case if one wrote:
2058 <programlisting>
2059 SELECT somefunc() OR true;
2060 </programlisting>
2061 Note that this is not the same as the left-to-right
2062 <quote>short-circuiting</quote> of Boolean operators that is found
2063 in some programming languages.
2064 </para>
2066 <para>
2067 As a consequence, it is unwise to use functions with side effects
2068 as part of complex expressions. It is particularly dangerous to
2069 rely on side effects or evaluation order in <literal>WHERE</> and <literal>HAVING</> clauses,
2070 since those clauses are extensively reprocessed as part of
2071 developing an execution plan. Boolean
2072 expressions (<literal>AND</>/<literal>OR</>/<literal>NOT</> combinations) in those clauses can be reorganized
2073 in any manner allowed by the laws of Boolean algebra.
2074 </para>
2076 <para>
2077 When it is essential to force evaluation order, a <literal>CASE</>
2078 construct (see <xref linkend="functions-conditional">) can be
2079 used. For example, this is an untrustworthy way of trying to
2080 avoid division by zero in a <literal>WHERE</> clause:
2081 <programlisting>
2082 SELECT ... WHERE x &gt; 0 AND y/x &gt; 1.5;
2083 </programlisting>
2084 But this is safe:
2085 <programlisting>
2086 SELECT ... WHERE CASE WHEN x &gt; 0 THEN y/x &gt; 1.5 ELSE false END;
2087 </programlisting>
2088 A <literal>CASE</> construct used in this fashion will defeat optimization
2089 attempts, so it should only be done when necessary. (In this particular
2090 example, it would be better to sidestep the problem by writing
2091 <literal>y &gt; 1.5*x</> instead.)
2092 </para>
2093 </sect2>
2094 </sect1>
2096 </chapter>