1 <!-- doc/src/sgml/advanced.sgml -->
3 <chapter id=
"tutorial-advanced">
4 <title>Advanced Features
</title>
6 <sect1 id=
"tutorial-advanced-intro">
7 <title>Introduction
</title>
10 In the previous chapter we have covered the basics of using
11 <acronym>SQL
</acronym> to store and access your data in
12 <productname>PostgreSQL
</productname>. We will now discuss some
13 more advanced features of
<acronym>SQL
</acronym> that simplify
14 management and prevent loss or corruption of your data. Finally,
15 we will look at some
<productname>PostgreSQL
</productname>
20 This chapter will on occasion refer to examples found in
<xref
21 linkend=
"tutorial-sql"/> to change or improve them, so it will be
22 useful to have read that chapter. Some examples from
23 this chapter can also be found in
24 <filename>advanced.sql
</filename> in the tutorial directory. This
25 file also contains some sample data to load, which is not
26 repeated here. (Refer to
<xref linkend=
"tutorial-sql-intro"/> for
32 <sect1 id=
"tutorial-views">
35 <indexterm zone=
"tutorial-views">
36 <primary>view
</primary>
40 Refer back to the queries in
<xref linkend=
"tutorial-join"/>.
41 Suppose the combined listing of weather records and city location
42 is of particular interest to your application, but you do not want
43 to type the query each time you need it. You can create a
44 <firstterm>view
</firstterm> over the query, which gives a name to
45 the query that you can refer to like an ordinary table:
49 SELECT name, temp_lo, temp_hi, prcp, date, location
58 Making liberal use of views is a key aspect of good SQL database
59 design. Views allow you to encapsulate the details of the
60 structure of your tables, which might change as your application
61 evolves, behind consistent interfaces.
65 Views can be used in almost any place a real table can be used.
66 Building views upon other views is not uncommon.
71 <sect1 id=
"tutorial-fk">
72 <title>Foreign Keys
</title>
74 <indexterm zone=
"tutorial-fk">
75 <primary>foreign key
</primary>
78 <indexterm zone=
"tutorial-fk">
79 <primary>referential integrity
</primary>
83 Recall the
<classname>weather
</classname> and
84 <classname>cities
</classname> tables from
<xref
85 linkend=
"tutorial-sql"/>. Consider the following problem: You
86 want to make sure that no one can insert rows in the
87 <classname>weather
</classname> table that do not have a matching
88 entry in the
<classname>cities
</classname> table. This is called
89 maintaining the
<firstterm>referential integrity
</firstterm> of
90 your data. In simplistic database systems this would be
91 implemented (if at all) by first looking at the
92 <classname>cities
</classname> table to check if a matching record
93 exists, and then inserting or rejecting the new
94 <classname>weather
</classname> records. This approach has a
95 number of problems and is very inconvenient, so
96 <productname>PostgreSQL
</productname> can do this for you.
100 The new declaration of the tables would look like this:
103 CREATE TABLE cities (
104 name varchar(
80) primary key,
108 CREATE TABLE weather (
109 city varchar(
80) references cities(name),
117 Now try inserting an invalid record:
120 INSERT INTO weather VALUES ('Berkeley',
45,
53,
0.0, '
1994-
11-
28');
124 ERROR: insert or update on table
"weather" violates foreign key constraint
"weather_city_fkey"
125 DETAIL: Key (city)=(Berkeley) is not present in table
"cities".
130 The behavior of foreign keys can be finely tuned to your
131 application. We will not go beyond this simple example in this
132 tutorial, but just refer you to
<xref linkend=
"ddl"/>
133 for more information. Making correct use of
134 foreign keys will definitely improve the quality of your database
135 applications, so you are strongly encouraged to learn about them.
140 <sect1 id=
"tutorial-transactions">
141 <title>Transactions
</title>
143 <indexterm zone=
"tutorial-transactions">
144 <primary>transaction
</primary>
148 <firstterm>Transactions
</firstterm> are a fundamental concept of all database
149 systems. The essential point of a transaction is that it bundles
150 multiple steps into a single, all-or-nothing operation. The intermediate
151 states between the steps are not visible to other concurrent transactions,
152 and if some failure occurs that prevents the transaction from completing,
153 then none of the steps affect the database at all.
157 For example, consider a bank database that contains balances for various
158 customer accounts, as well as total deposit balances for branches.
159 Suppose that we want to record a payment of $
100.00 from Alice's account
160 to Bob's account. Simplifying outrageously, the SQL commands for this
164 UPDATE accounts SET balance = balance -
100.00
165 WHERE name = 'Alice';
166 UPDATE branches SET balance = balance -
100.00
167 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
168 UPDATE accounts SET balance = balance +
100.00
170 UPDATE branches SET balance = balance +
100.00
171 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
176 The details of these commands are not important here; the important
177 point is that there are several separate updates involved to accomplish
178 this rather simple operation. Our bank's officers will want to be
179 assured that either all these updates happen, or none of them happen.
180 It would certainly not do for a system failure to result in Bob
181 receiving $
100.00 that was not debited from Alice. Nor would Alice long
182 remain a happy customer if she was debited without Bob being credited.
183 We need a guarantee that if something goes wrong partway through the
184 operation, none of the steps executed so far will take effect. Grouping
185 the updates into a
<firstterm>transaction
</firstterm> gives us this guarantee.
186 A transaction is said to be
<firstterm>atomic
</firstterm>: from the point of
187 view of other transactions, it either happens completely or not at all.
192 guarantee that once a transaction is completed and acknowledged by
193 the database system, it has indeed been permanently recorded
194 and won't be lost even if a crash ensues shortly thereafter.
195 For example, if we are recording a cash withdrawal by Bob,
196 we do not want any chance that the debit to his account will
197 disappear in a crash just after he walks out the bank door.
198 A transactional database guarantees that all the updates made by
199 a transaction are logged in permanent storage (i.e., on disk) before
200 the transaction is reported complete.
204 Another important property of transactional databases is closely
205 related to the notion of atomic updates: when multiple transactions
206 are running concurrently, each one should not be able to see the
207 incomplete changes made by others. For example, if one transaction
208 is busy totalling all the branch balances, it would not do for it
209 to include the debit from Alice's branch but not the credit to
210 Bob's branch, nor vice versa. So transactions must be all-or-nothing
211 not only in terms of their permanent effect on the database, but
212 also in terms of their visibility as they happen. The updates made
213 so far by an open transaction are invisible to other transactions
214 until the transaction completes, whereupon all the updates become
215 visible simultaneously.
219 In
<productname>PostgreSQL
</productname>, a transaction is set up by surrounding
220 the SQL commands of the transaction with
221 <command>BEGIN
</command> and
<command>COMMIT
</command> commands. So our banking
222 transaction would actually look like:
226 UPDATE accounts SET balance = balance -
100.00
227 WHERE name = 'Alice';
234 If, partway through the transaction, we decide we do not want to
235 commit (perhaps we just noticed that Alice's balance went negative),
236 we can issue the command
<command>ROLLBACK
</command> instead of
237 <command>COMMIT
</command>, and all our updates so far will be canceled.
241 <productname>PostgreSQL
</productname> actually treats every SQL statement as being
242 executed within a transaction. If you do not issue a
<command>BEGIN
</command>
244 then each individual statement has an implicit
<command>BEGIN
</command> and
245 (if successful)
<command>COMMIT
</command> wrapped around it. A group of
246 statements surrounded by
<command>BEGIN
</command> and
<command>COMMIT
</command>
247 is sometimes called a
<firstterm>transaction block
</firstterm>.
252 Some client libraries issue
<command>BEGIN
</command> and
<command>COMMIT
</command>
253 commands automatically, so that you might get the effect of transaction
254 blocks without asking. Check the documentation for the interface
260 It's possible to control the statements in a transaction in a more
261 granular fashion through the use of
<firstterm>savepoints
</firstterm>. Savepoints
262 allow you to selectively discard parts of the transaction, while
263 committing the rest. After defining a savepoint with
264 <command>SAVEPOINT
</command>, you can if needed roll back to the savepoint
265 with
<command>ROLLBACK TO
</command>. All the transaction's database changes
266 between defining the savepoint and rolling back to it are discarded, but
267 changes earlier than the savepoint are kept.
271 After rolling back to a savepoint, it continues to be defined, so you can
272 roll back to it several times. Conversely, if you are sure you won't need
273 to roll back to a particular savepoint again, it can be released, so the
274 system can free some resources. Keep in mind that either releasing or
275 rolling back to a savepoint
276 will automatically release all savepoints that were defined after it.
280 All this is happening within the transaction block, so none of it
281 is visible to other database sessions. When and if you commit the
282 transaction block, the committed actions become visible as a unit
283 to other sessions, while the rolled-back actions never become visible
288 Remembering the bank database, suppose we debit $
100.00 from Alice's
289 account, and credit Bob's account, only to find later that we should
290 have credited Wally's account. We could do it using savepoints like
295 UPDATE accounts SET balance = balance -
100.00
296 WHERE name = 'Alice';
297 SAVEPOINT my_savepoint;
298 UPDATE accounts SET balance = balance +
100.00
300 -- oops ... forget that and use Wally's account
301 ROLLBACK TO my_savepoint;
302 UPDATE accounts SET balance = balance +
100.00
303 WHERE name = 'Wally';
309 This example is, of course, oversimplified, but there's a lot of control
310 possible in a transaction block through the use of savepoints.
311 Moreover,
<command>ROLLBACK TO
</command> is the only way to regain control of a
312 transaction block that was put in aborted state by the
313 system due to an error, short of rolling it back completely and starting
320 <sect1 id=
"tutorial-window">
321 <title>Window Functions
</title>
323 <indexterm zone=
"tutorial-window">
324 <primary>window function
</primary>
328 A
<firstterm>window function
</firstterm> performs a calculation across a set of
329 table rows that are somehow related to the current row. This is comparable
330 to the type of calculation that can be done with an aggregate function.
331 However, window functions do not cause rows to become grouped into a single
332 output row like non-window aggregate calls would. Instead, the
333 rows retain their separate identities. Behind the scenes, the window
334 function is able to access more than just the current row of the query
339 Here is an example that shows how to compare each employee's salary
340 with the average salary in his or her department:
343 SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
347 depname | empno | salary | avg
348 -----------+-------+--------+-----------------------
349 develop |
11 |
5200 |
5020.0000000000000000
350 develop |
7 |
4200 |
5020.0000000000000000
351 develop |
9 |
4500 |
5020.0000000000000000
352 develop |
8 |
6000 |
5020.0000000000000000
353 develop |
10 |
5200 |
5020.0000000000000000
354 personnel |
5 |
3500 |
3700.0000000000000000
355 personnel |
2 |
3900 |
3700.0000000000000000
356 sales |
3 |
4800 |
4866.6666666666666667
357 sales |
1 |
5000 |
4866.6666666666666667
358 sales |
4 |
4800 |
4866.6666666666666667
362 The first three output columns come directly from the table
363 <structname>empsalary
</structname>, and there is one output row for each row in the
364 table. The fourth column represents an average taken across all the table
365 rows that have the same
<structfield>depname
</structfield> value as the current row.
366 (This actually is the same function as the non-window
<function>avg
</function>
367 aggregate, but the
<literal>OVER
</literal> clause causes it to be
368 treated as a window function and computed across the window frame.)
372 A window function call always contains an
<literal>OVER
</literal> clause
373 directly following the window function's name and argument(s). This is what
374 syntactically distinguishes it from a normal function or non-window
375 aggregate. The
<literal>OVER
</literal> clause determines exactly how the
376 rows of the query are split up for processing by the window function.
377 The
<literal>PARTITION BY
</literal> clause within
<literal>OVER
</literal>
378 divides the rows into groups, or partitions, that share the same
379 values of the
<literal>PARTITION BY
</literal> expression(s). For each row,
380 the window function is computed across the rows that fall into the
381 same partition as the current row.
385 You can also control the order in which rows are processed by
386 window functions using
<literal>ORDER BY
</literal> within
<literal>OVER
</literal>.
387 (The window
<literal>ORDER BY
</literal> does not even have to match the
388 order in which the rows are output.) Here is an example:
391 SELECT depname, empno, salary,
392 rank() OVER (PARTITION BY depname ORDER BY salary DESC)
397 depname | empno | salary | rank
398 -----------+-------+--------+------
399 develop |
8 |
6000 |
1
400 develop |
10 |
5200 |
2
401 develop |
11 |
5200 |
2
402 develop |
9 |
4500 |
4
403 develop |
7 |
4200 |
5
404 personnel |
2 |
3900 |
1
405 personnel |
5 |
3500 |
2
412 As shown here, the
<function>rank
</function> function produces a numerical rank
413 for each distinct
<literal>ORDER BY
</literal> value in the current row's
414 partition, using the order defined by the
<literal>ORDER BY
</literal> clause.
415 <function>rank
</function> needs no explicit parameter, because its behavior
416 is entirely determined by the
<literal>OVER
</literal> clause.
420 The rows considered by a window function are those of the
<quote>virtual
421 table
</quote> produced by the query's
<literal>FROM
</literal> clause as filtered by its
422 <literal>WHERE
</literal>,
<literal>GROUP BY
</literal>, and
<literal>HAVING
</literal> clauses
423 if any. For example, a row removed because it does not meet the
424 <literal>WHERE
</literal> condition is not seen by any window function.
425 A query can contain multiple window functions that slice up the data
426 in different ways using different
<literal>OVER
</literal> clauses, but
427 they all act on the same collection of rows defined by this virtual table.
431 We already saw that
<literal>ORDER BY
</literal> can be omitted if the ordering
432 of rows is not important. It is also possible to omit
<literal>PARTITION
433 BY
</literal>, in which case there is a single partition containing all rows.
437 There is another important concept associated with window functions:
438 for each row, there is a set of rows within its partition called its
439 <firstterm>window frame
</firstterm>. Some window functions act only
440 on the rows of the window frame, rather than of the whole partition.
441 By default, if
<literal>ORDER BY
</literal> is supplied then the frame consists of
442 all rows from the start of the partition up through the current row, plus
443 any following rows that are equal to the current row according to the
444 <literal>ORDER BY
</literal> clause. When
<literal>ORDER BY
</literal> is omitted the
445 default frame consists of all rows in the partition.
448 There are options to define the window frame in other ways, but
449 this tutorial does not cover them. See
450 <xref linkend=
"syntax-window-functions"/> for details.
453 Here is an example using
<function>sum
</function>:
457 SELECT salary, sum(salary) OVER () FROM empsalary;
477 Above, since there is no
<literal>ORDER BY
</literal> in the
<literal>OVER
</literal>
478 clause, the window frame is the same as the partition, which for lack of
479 <literal>PARTITION BY
</literal> is the whole table; in other words each sum is
480 taken over the whole table and so we get the same result for each output
481 row. But if we add an
<literal>ORDER BY
</literal> clause, we get very different
486 SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
506 Here the sum is taken from the first (lowest) salary up through the
507 current one, including any duplicates of the current one (notice the
508 results for the duplicated salaries).
512 Window functions are permitted only in the
<literal>SELECT
</literal> list
513 and the
<literal>ORDER BY
</literal> clause of the query. They are forbidden
514 elsewhere, such as in
<literal>GROUP BY
</literal>,
<literal>HAVING
</literal>
515 and
<literal>WHERE
</literal> clauses. This is because they logically
516 execute after the processing of those clauses. Also, window functions
517 execute after non-window aggregate functions. This means it is valid to
518 include an aggregate function call in the arguments of a window function,
523 If there is a need to filter or group rows after the window calculations
524 are performed, you can use a sub-select. For example:
527 SELECT depname, empno, salary, enroll_date
529 (SELECT depname, empno, salary, enroll_date,
530 rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
536 The above query only shows the rows from the inner query having
537 <literal>rank
</literal> less than
3.
541 When a query involves multiple window functions, it is possible to write
542 out each one with a separate
<literal>OVER
</literal> clause, but this is
543 duplicative and error-prone if the same windowing behavior is wanted
544 for several functions. Instead, each windowing behavior can be named
545 in a
<literal>WINDOW
</literal> clause and then referenced in
<literal>OVER
</literal>.
549 SELECT sum(salary) OVER w, avg(salary) OVER w
551 WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
556 More details about window functions can be found in
557 <xref linkend=
"syntax-window-functions"/>,
558 <xref linkend=
"functions-window"/>,
559 <xref linkend=
"queries-window"/>, and the
560 <xref linkend=
"sql-select"/> reference page.
565 <sect1 id=
"tutorial-inheritance">
566 <title>Inheritance
</title>
568 <indexterm zone=
"tutorial-inheritance">
569 <primary>inheritance
</primary>
573 Inheritance is a concept from object-oriented databases. It opens
574 up interesting new possibilities of database design.
578 Let's create two tables: A table
<classname>cities
</classname>
579 and a table
<classname>capitals
</classname>. Naturally, capitals
580 are also cities, so you want some way to show the capitals
581 implicitly when you list all cities. If you're really clever you
582 might invent some scheme like this:
585 CREATE TABLE capitals (
588 elevation int, -- (in ft)
592 CREATE TABLE non_capitals (
595 elevation int -- (in ft)
598 CREATE VIEW cities AS
599 SELECT name, population, elevation FROM capitals
601 SELECT name, population, elevation FROM non_capitals;
604 This works OK as far as querying goes, but it gets ugly when you
605 need to update several rows, for one thing.
609 A better solution is this:
612 CREATE TABLE cities (
615 elevation int -- (in ft)
618 CREATE TABLE capitals (
619 state char(
2) UNIQUE NOT NULL
625 In this case, a row of
<classname>capitals
</classname>
626 <firstterm>inherits
</firstterm> all columns (
<structfield>name
</structfield>,
627 <structfield>population
</structfield>, and
<structfield>elevation
</structfield>) from its
628 <firstterm>parent
</firstterm>,
<classname>cities
</classname>. The
629 type of the column
<structfield>name
</structfield> is
630 <type>text
</type>, a native
<productname>PostgreSQL
</productname>
631 type for variable length character strings. The
632 <classname>capitals
</classname> table has
633 an additional column,
<structfield>state
</structfield>, which shows its
634 state abbreviation. In
635 <productname>PostgreSQL
</productname>, a table can inherit from
636 zero or more other tables.
640 For example, the following query finds the names of all cities,
641 including state capitals, that are located at an elevation
645 SELECT name, elevation
647 WHERE elevation
> 500;
654 -----------+-----------
663 On the other hand, the following query finds
664 all the cities that are not state capitals and
665 are situated at an elevation over
500 feet:
668 SELECT name, elevation
670 WHERE elevation
> 500;
675 -----------+-----------
683 Here the
<literal>ONLY
</literal> before
<literal>cities
</literal>
684 indicates that the query should be run over only the
685 <classname>cities
</classname> table, and not tables below
686 <classname>cities
</classname> in the inheritance hierarchy. Many
687 of the commands that we have already discussed
—
688 <command>SELECT
</command>,
<command>UPDATE
</command>, and
689 <command>DELETE
</command> — support this
<literal>ONLY
</literal>
695 Although inheritance is frequently useful, it has not been integrated
696 with unique constraints or foreign keys, which limits its usefulness.
697 See
<xref linkend=
"ddl-inherit"/> for more detail.
703 <sect1 id=
"tutorial-conclusion">
704 <title>Conclusion
</title>
707 <productname>PostgreSQL
</productname> has many features not
708 touched upon in this tutorial introduction, which has been
709 oriented toward newer users of
<acronym>SQL
</acronym>. These
710 features are discussed in more detail in the remainder of this
715 If you feel you need more introductory material, please visit the PostgreSQL
716 <ulink url=
"https://www.postgresql.org">web site
</ulink>
717 for links to more resources.