At update of non-LP_NORMAL TID, fail instead of corrupting page header.
[pgsql.git] / doc / src / sgml / advanced.sgml
blobe15a3323dfbfdd34803dc5715c043979cc50eb8f
1 <!-- doc/src/sgml/advanced.sgml -->
3 <chapter id="tutorial-advanced">
4 <title>Advanced Features</title>
6 <sect1 id="tutorial-advanced-intro">
7 <title>Introduction</title>
9 <para>
10 In the previous chapter we have covered the basics of using
11 <acronym>SQL</acronym> to store and access your data in
12 <productname>PostgreSQL</productname>. We will now discuss some
13 more advanced features of <acronym>SQL</acronym> that simplify
14 management and prevent loss or corruption of your data. Finally,
15 we will look at some <productname>PostgreSQL</productname>
16 extensions.
17 </para>
19 <para>
20 This chapter will on occasion refer to examples found in <xref
21 linkend="tutorial-sql"/> to change or improve them, so it will be
22 useful to have read that chapter. Some examples from
23 this chapter can also be found in
24 <filename>advanced.sql</filename> in the tutorial directory. This
25 file also contains some sample data to load, which is not
26 repeated here. (Refer to <xref linkend="tutorial-sql-intro"/> for
27 how to use the file.)
28 </para>
29 </sect1>
32 <sect1 id="tutorial-views">
33 <title>Views</title>
35 <indexterm zone="tutorial-views">
36 <primary>view</primary>
37 </indexterm>
39 <para>
40 Refer back to the queries in <xref linkend="tutorial-join"/>.
41 Suppose the combined listing of weather records and city location
42 is of particular interest to your application, but you do not want
43 to type the query each time you need it. You can create a
44 <firstterm>view</firstterm> over the query, which gives a name to
45 the query that you can refer to like an ordinary table:
47 <programlisting>
48 CREATE VIEW myview AS
49 SELECT name, temp_lo, temp_hi, prcp, date, location
50 FROM weather, cities
51 WHERE city = name;
53 SELECT * FROM myview;
54 </programlisting>
55 </para>
57 <para>
58 Making liberal use of views is a key aspect of good SQL database
59 design. Views allow you to encapsulate the details of the
60 structure of your tables, which might change as your application
61 evolves, behind consistent interfaces.
62 </para>
64 <para>
65 Views can be used in almost any place a real table can be used.
66 Building views upon other views is not uncommon.
67 </para>
68 </sect1>
71 <sect1 id="tutorial-fk">
72 <title>Foreign Keys</title>
74 <indexterm zone="tutorial-fk">
75 <primary>foreign key</primary>
76 </indexterm>
78 <indexterm zone="tutorial-fk">
79 <primary>referential integrity</primary>
80 </indexterm>
82 <para>
83 Recall the <classname>weather</classname> and
84 <classname>cities</classname> tables from <xref
85 linkend="tutorial-sql"/>. Consider the following problem: You
86 want to make sure that no one can insert rows in the
87 <classname>weather</classname> table that do not have a matching
88 entry in the <classname>cities</classname> table. This is called
89 maintaining the <firstterm>referential integrity</firstterm> of
90 your data. In simplistic database systems this would be
91 implemented (if at all) by first looking at the
92 <classname>cities</classname> table to check if a matching record
93 exists, and then inserting or rejecting the new
94 <classname>weather</classname> records. This approach has a
95 number of problems and is very inconvenient, so
96 <productname>PostgreSQL</productname> can do this for you.
97 </para>
99 <para>
100 The new declaration of the tables would look like this:
102 <programlisting>
103 CREATE TABLE cities (
104 name varchar(80) primary key,
105 location point
108 CREATE TABLE weather (
109 city varchar(80) references cities(name),
110 temp_lo int,
111 temp_hi int,
112 prcp real,
113 date date
115 </programlisting>
117 Now try inserting an invalid record:
119 <programlisting>
120 INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');
121 </programlisting>
123 <screen>
124 ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey"
125 DETAIL: Key (city)=(Berkeley) is not present in table "cities".
126 </screen>
127 </para>
129 <para>
130 The behavior of foreign keys can be finely tuned to your
131 application. We will not go beyond this simple example in this
132 tutorial, but just refer you to <xref linkend="ddl"/>
133 for more information. Making correct use of
134 foreign keys will definitely improve the quality of your database
135 applications, so you are strongly encouraged to learn about them.
136 </para>
137 </sect1>
140 <sect1 id="tutorial-transactions">
141 <title>Transactions</title>
143 <indexterm zone="tutorial-transactions">
144 <primary>transaction</primary>
145 </indexterm>
147 <para>
148 <firstterm>Transactions</firstterm> are a fundamental concept of all database
149 systems. The essential point of a transaction is that it bundles
150 multiple steps into a single, all-or-nothing operation. The intermediate
151 states between the steps are not visible to other concurrent transactions,
152 and if some failure occurs that prevents the transaction from completing,
153 then none of the steps affect the database at all.
154 </para>
156 <para>
157 For example, consider a bank database that contains balances for various
158 customer accounts, as well as total deposit balances for branches.
159 Suppose that we want to record a payment of $100.00 from Alice's account
160 to Bob's account. Simplifying outrageously, the SQL commands for this
161 might look like:
163 <programlisting>
164 UPDATE accounts SET balance = balance - 100.00
165 WHERE name = 'Alice';
166 UPDATE branches SET balance = balance - 100.00
167 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
168 UPDATE accounts SET balance = balance + 100.00
169 WHERE name = 'Bob';
170 UPDATE branches SET balance = balance + 100.00
171 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
172 </programlisting>
173 </para>
175 <para>
176 The details of these commands are not important here; the important
177 point is that there are several separate updates involved to accomplish
178 this rather simple operation. Our bank's officers will want to be
179 assured that either all these updates happen, or none of them happen.
180 It would certainly not do for a system failure to result in Bob
181 receiving $100.00 that was not debited from Alice. Nor would Alice long
182 remain a happy customer if she was debited without Bob being credited.
183 We need a guarantee that if something goes wrong partway through the
184 operation, none of the steps executed so far will take effect. Grouping
185 the updates into a <firstterm>transaction</firstterm> gives us this guarantee.
186 A transaction is said to be <firstterm>atomic</firstterm>: from the point of
187 view of other transactions, it either happens completely or not at all.
188 </para>
190 <para>
191 We also want a
192 guarantee that once a transaction is completed and acknowledged by
193 the database system, it has indeed been permanently recorded
194 and won't be lost even if a crash ensues shortly thereafter.
195 For example, if we are recording a cash withdrawal by Bob,
196 we do not want any chance that the debit to his account will
197 disappear in a crash just after he walks out the bank door.
198 A transactional database guarantees that all the updates made by
199 a transaction are logged in permanent storage (i.e., on disk) before
200 the transaction is reported complete.
201 </para>
203 <para>
204 Another important property of transactional databases is closely
205 related to the notion of atomic updates: when multiple transactions
206 are running concurrently, each one should not be able to see the
207 incomplete changes made by others. For example, if one transaction
208 is busy totalling all the branch balances, it would not do for it
209 to include the debit from Alice's branch but not the credit to
210 Bob's branch, nor vice versa. So transactions must be all-or-nothing
211 not only in terms of their permanent effect on the database, but
212 also in terms of their visibility as they happen. The updates made
213 so far by an open transaction are invisible to other transactions
214 until the transaction completes, whereupon all the updates become
215 visible simultaneously.
216 </para>
218 <para>
219 In <productname>PostgreSQL</productname>, a transaction is set up by surrounding
220 the SQL commands of the transaction with
221 <command>BEGIN</command> and <command>COMMIT</command> commands. So our banking
222 transaction would actually look like:
224 <programlisting>
225 BEGIN;
226 UPDATE accounts SET balance = balance - 100.00
227 WHERE name = 'Alice';
228 -- etc etc
229 COMMIT;
230 </programlisting>
231 </para>
233 <para>
234 If, partway through the transaction, we decide we do not want to
235 commit (perhaps we just noticed that Alice's balance went negative),
236 we can issue the command <command>ROLLBACK</command> instead of
237 <command>COMMIT</command>, and all our updates so far will be canceled.
238 </para>
240 <para>
241 <productname>PostgreSQL</productname> actually treats every SQL statement as being
242 executed within a transaction. If you do not issue a <command>BEGIN</command>
243 command,
244 then each individual statement has an implicit <command>BEGIN</command> and
245 (if successful) <command>COMMIT</command> wrapped around it. A group of
246 statements surrounded by <command>BEGIN</command> and <command>COMMIT</command>
247 is sometimes called a <firstterm>transaction block</firstterm>.
248 </para>
250 <note>
251 <para>
252 Some client libraries issue <command>BEGIN</command> and <command>COMMIT</command>
253 commands automatically, so that you might get the effect of transaction
254 blocks without asking. Check the documentation for the interface
255 you are using.
256 </para>
257 </note>
259 <para>
260 It's possible to control the statements in a transaction in a more
261 granular fashion through the use of <firstterm>savepoints</firstterm>. Savepoints
262 allow you to selectively discard parts of the transaction, while
263 committing the rest. After defining a savepoint with
264 <command>SAVEPOINT</command>, you can if needed roll back to the savepoint
265 with <command>ROLLBACK TO</command>. All the transaction's database changes
266 between defining the savepoint and rolling back to it are discarded, but
267 changes earlier than the savepoint are kept.
268 </para>
270 <para>
271 After rolling back to a savepoint, it continues to be defined, so you can
272 roll back to it several times. Conversely, if you are sure you won't need
273 to roll back to a particular savepoint again, it can be released, so the
274 system can free some resources. Keep in mind that either releasing or
275 rolling back to a savepoint
276 will automatically release all savepoints that were defined after it.
277 </para>
279 <para>
280 All this is happening within the transaction block, so none of it
281 is visible to other database sessions. When and if you commit the
282 transaction block, the committed actions become visible as a unit
283 to other sessions, while the rolled-back actions never become visible
284 at all.
285 </para>
287 <para>
288 Remembering the bank database, suppose we debit $100.00 from Alice's
289 account, and credit Bob's account, only to find later that we should
290 have credited Wally's account. We could do it using savepoints like
291 this:
293 <programlisting>
294 BEGIN;
295 UPDATE accounts SET balance = balance - 100.00
296 WHERE name = 'Alice';
297 SAVEPOINT my_savepoint;
298 UPDATE accounts SET balance = balance + 100.00
299 WHERE name = 'Bob';
300 -- oops ... forget that and use Wally's account
301 ROLLBACK TO my_savepoint;
302 UPDATE accounts SET balance = balance + 100.00
303 WHERE name = 'Wally';
304 COMMIT;
305 </programlisting>
306 </para>
308 <para>
309 This example is, of course, oversimplified, but there's a lot of control
310 possible in a transaction block through the use of savepoints.
311 Moreover, <command>ROLLBACK TO</command> is the only way to regain control of a
312 transaction block that was put in aborted state by the
313 system due to an error, short of rolling it back completely and starting
314 again.
315 </para>
317 </sect1>
320 <sect1 id="tutorial-window">
321 <title>Window Functions</title>
323 <indexterm zone="tutorial-window">
324 <primary>window function</primary>
325 </indexterm>
327 <para>
328 A <firstterm>window function</firstterm> performs a calculation across a set of
329 table rows that are somehow related to the current row. This is comparable
330 to the type of calculation that can be done with an aggregate function.
331 However, window functions do not cause rows to become grouped into a single
332 output row like non-window aggregate calls would. Instead, the
333 rows retain their separate identities. Behind the scenes, the window
334 function is able to access more than just the current row of the query
335 result.
336 </para>
338 <para>
339 Here is an example that shows how to compare each employee's salary
340 with the average salary in his or her department:
342 <programlisting>
343 SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
344 </programlisting>
346 <screen>
347 depname | empno | salary | avg
348 -----------+-------+--------+-----------------------
349 develop | 11 | 5200 | 5020.0000000000000000
350 develop | 7 | 4200 | 5020.0000000000000000
351 develop | 9 | 4500 | 5020.0000000000000000
352 develop | 8 | 6000 | 5020.0000000000000000
353 develop | 10 | 5200 | 5020.0000000000000000
354 personnel | 5 | 3500 | 3700.0000000000000000
355 personnel | 2 | 3900 | 3700.0000000000000000
356 sales | 3 | 4800 | 4866.6666666666666667
357 sales | 1 | 5000 | 4866.6666666666666667
358 sales | 4 | 4800 | 4866.6666666666666667
359 (10 rows)
360 </screen>
362 The first three output columns come directly from the table
363 <structname>empsalary</structname>, and there is one output row for each row in the
364 table. The fourth column represents an average taken across all the table
365 rows that have the same <structfield>depname</structfield> value as the current row.
366 (This actually is the same function as the non-window <function>avg</function>
367 aggregate, but the <literal>OVER</literal> clause causes it to be
368 treated as a window function and computed across the window frame.)
369 </para>
371 <para>
372 A window function call always contains an <literal>OVER</literal> clause
373 directly following the window function's name and argument(s). This is what
374 syntactically distinguishes it from a normal function or non-window
375 aggregate. The <literal>OVER</literal> clause determines exactly how the
376 rows of the query are split up for processing by the window function.
377 The <literal>PARTITION BY</literal> clause within <literal>OVER</literal>
378 divides the rows into groups, or partitions, that share the same
379 values of the <literal>PARTITION BY</literal> expression(s). For each row,
380 the window function is computed across the rows that fall into the
381 same partition as the current row.
382 </para>
384 <para>
385 You can also control the order in which rows are processed by
386 window functions using <literal>ORDER BY</literal> within <literal>OVER</literal>.
387 (The window <literal>ORDER BY</literal> does not even have to match the
388 order in which the rows are output.) Here is an example:
390 <programlisting>
391 SELECT depname, empno, salary,
392 row_number() OVER (PARTITION BY depname ORDER BY salary DESC)
393 FROM empsalary;
394 </programlisting>
396 <screen>
397 depname | empno | salary | row_number
398 -----------+-------+--------+------------
399 develop | 8 | 6000 | 1
400 develop | 10 | 5200 | 2
401 develop | 11 | 5200 | 3
402 develop | 9 | 4500 | 4
403 develop | 7 | 4200 | 5
404 personnel | 2 | 3900 | 1
405 personnel | 5 | 3500 | 2
406 sales | 1 | 5000 | 1
407 sales | 4 | 4800 | 2
408 sales | 3 | 4800 | 3
409 (10 rows)
410 </screen>
412 As shown here, the <function>row_number</function> window function
413 assigns sequential numbers to the rows within each partition,
414 in the order defined by the <literal>ORDER BY</literal> clause
415 (with tied rows numbered in an unspecified order).
416 <function>row_number</function> needs no explicit parameter,
417 because its behavior
418 is entirely determined by the <literal>OVER</literal> clause.
419 </para>
421 <para>
422 The rows considered by a window function are those of the <quote>virtual
423 table</quote> produced by the query's <literal>FROM</literal> clause as filtered by its
424 <literal>WHERE</literal>, <literal>GROUP BY</literal>, and <literal>HAVING</literal> clauses
425 if any. For example, a row removed because it does not meet the
426 <literal>WHERE</literal> condition is not seen by any window function.
427 A query can contain multiple window functions that slice up the data
428 in different ways using different <literal>OVER</literal> clauses, but
429 they all act on the same collection of rows defined by this virtual table.
430 </para>
432 <para>
433 We already saw that <literal>ORDER BY</literal> can be omitted if the ordering
434 of rows is not important. It is also possible to omit <literal>PARTITION
435 BY</literal>, in which case there is a single partition containing all rows.
436 </para>
438 <para>
439 There is another important concept associated with window functions:
440 for each row, there is a set of rows within its partition called its
441 <firstterm>window frame</firstterm>. Some window functions act only
442 on the rows of the window frame, rather than of the whole partition.
443 By default, if <literal>ORDER BY</literal> is supplied then the frame consists of
444 all rows from the start of the partition up through the current row, plus
445 any following rows that are equal to the current row according to the
446 <literal>ORDER BY</literal> clause. When <literal>ORDER BY</literal> is omitted the
447 default frame consists of all rows in the partition.
448 <footnote>
449 <para>
450 There are options to define the window frame in other ways, but
451 this tutorial does not cover them. See
452 <xref linkend="syntax-window-functions"/> for details.
453 </para>
454 </footnote>
455 Here is an example using <function>sum</function>:
456 </para>
458 <programlisting>
459 SELECT salary, sum(salary) OVER () FROM empsalary;
460 </programlisting>
462 <screen>
463 salary | sum
464 --------+-------
465 5200 | 47100
466 5000 | 47100
467 3500 | 47100
468 4800 | 47100
469 3900 | 47100
470 4200 | 47100
471 4500 | 47100
472 4800 | 47100
473 6000 | 47100
474 5200 | 47100
475 (10 rows)
476 </screen>
478 <para>
479 Above, since there is no <literal>ORDER BY</literal> in the <literal>OVER</literal>
480 clause, the window frame is the same as the partition, which for lack of
481 <literal>PARTITION BY</literal> is the whole table; in other words each sum is
482 taken over the whole table and so we get the same result for each output
483 row. But if we add an <literal>ORDER BY</literal> clause, we get very different
484 results:
485 </para>
487 <programlisting>
488 SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
489 </programlisting>
491 <screen>
492 salary | sum
493 --------+-------
494 3500 | 3500
495 3900 | 7400
496 4200 | 11600
497 4500 | 16100
498 4800 | 25700
499 4800 | 25700
500 5000 | 30700
501 5200 | 41100
502 5200 | 41100
503 6000 | 47100
504 (10 rows)
505 </screen>
507 <para>
508 Here the sum is taken from the first (lowest) salary up through the
509 current one, including any duplicates of the current one (notice the
510 results for the duplicated salaries).
511 </para>
513 <para>
514 Window functions are permitted only in the <literal>SELECT</literal> list
515 and the <literal>ORDER BY</literal> clause of the query. They are forbidden
516 elsewhere, such as in <literal>GROUP BY</literal>, <literal>HAVING</literal>
517 and <literal>WHERE</literal> clauses. This is because they logically
518 execute after the processing of those clauses. Also, window functions
519 execute after non-window aggregate functions. This means it is valid to
520 include an aggregate function call in the arguments of a window function,
521 but not vice versa.
522 </para>
524 <para>
525 If there is a need to filter or group rows after the window calculations
526 are performed, you can use a sub-select. For example:
528 <programlisting>
529 SELECT depname, empno, salary, enroll_date
530 FROM
531 (SELECT depname, empno, salary, enroll_date,
532 row_number() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
533 FROM empsalary
534 ) AS ss
535 WHERE pos &lt; 3;
536 </programlisting>
538 The above query only shows the rows from the inner query having
539 <literal>row_number</literal> less than 3 (that is, the first
540 two rows for each department).
541 </para>
543 <para>
544 When a query involves multiple window functions, it is possible to write
545 out each one with a separate <literal>OVER</literal> clause, but this is
546 duplicative and error-prone if the same windowing behavior is wanted
547 for several functions. Instead, each windowing behavior can be named
548 in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
549 For example:
551 <programlisting>
552 SELECT sum(salary) OVER w, avg(salary) OVER w
553 FROM empsalary
554 WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
555 </programlisting>
556 </para>
558 <para>
559 More details about window functions can be found in
560 <xref linkend="syntax-window-functions"/>,
561 <xref linkend="functions-window"/>,
562 <xref linkend="queries-window"/>, and the
563 <xref linkend="sql-select"/> reference page.
564 </para>
565 </sect1>
568 <sect1 id="tutorial-inheritance">
569 <title>Inheritance</title>
571 <indexterm zone="tutorial-inheritance">
572 <primary>inheritance</primary>
573 </indexterm>
575 <para>
576 Inheritance is a concept from object-oriented databases. It opens
577 up interesting new possibilities of database design.
578 </para>
580 <para>
581 Let's create two tables: A table <classname>cities</classname>
582 and a table <classname>capitals</classname>. Naturally, capitals
583 are also cities, so you want some way to show the capitals
584 implicitly when you list all cities. If you're really clever you
585 might invent some scheme like this:
587 <programlisting>
588 CREATE TABLE capitals (
589 name text,
590 population real,
591 elevation int, -- (in ft)
592 state char(2)
595 CREATE TABLE non_capitals (
596 name text,
597 population real,
598 elevation int -- (in ft)
601 CREATE VIEW cities AS
602 SELECT name, population, elevation FROM capitals
603 UNION
604 SELECT name, population, elevation FROM non_capitals;
605 </programlisting>
607 This works OK as far as querying goes, but it gets ugly when you
608 need to update several rows, for one thing.
609 </para>
611 <para>
612 A better solution is this:
614 <programlisting>
615 CREATE TABLE cities (
616 name text,
617 population real,
618 elevation int -- (in ft)
621 CREATE TABLE capitals (
622 state char(2) UNIQUE NOT NULL
623 ) INHERITS (cities);
624 </programlisting>
625 </para>
627 <para>
628 In this case, a row of <classname>capitals</classname>
629 <firstterm>inherits</firstterm> all columns (<structfield>name</structfield>,
630 <structfield>population</structfield>, and <structfield>elevation</structfield>) from its
631 <firstterm>parent</firstterm>, <classname>cities</classname>. The
632 type of the column <structfield>name</structfield> is
633 <type>text</type>, a native <productname>PostgreSQL</productname>
634 type for variable length character strings. The
635 <classname>capitals</classname> table has
636 an additional column, <structfield>state</structfield>, which shows its
637 state abbreviation. In
638 <productname>PostgreSQL</productname>, a table can inherit from
639 zero or more other tables.
640 </para>
642 <para>
643 For example, the following query finds the names of all cities,
644 including state capitals, that are located at an elevation
645 over 500 feet:
647 <programlisting>
648 SELECT name, elevation
649 FROM cities
650 WHERE elevation &gt; 500;
651 </programlisting>
653 which returns:
655 <screen>
656 name | elevation
657 -----------+-----------
658 Las Vegas | 2174
659 Mariposa | 1953
660 Madison | 845
661 (3 rows)
662 </screen>
663 </para>
665 <para>
666 On the other hand, the following query finds
667 all the cities that are not state capitals and
668 are situated at an elevation over 500 feet:
670 <programlisting>
671 SELECT name, elevation
672 FROM ONLY cities
673 WHERE elevation &gt; 500;
674 </programlisting>
676 <screen>
677 name | elevation
678 -----------+-----------
679 Las Vegas | 2174
680 Mariposa | 1953
681 (2 rows)
682 </screen>
683 </para>
685 <para>
686 Here the <literal>ONLY</literal> before <literal>cities</literal>
687 indicates that the query should be run over only the
688 <classname>cities</classname> table, and not tables below
689 <classname>cities</classname> in the inheritance hierarchy. Many
690 of the commands that we have already discussed &mdash;
691 <command>SELECT</command>, <command>UPDATE</command>, and
692 <command>DELETE</command> &mdash; support this <literal>ONLY</literal>
693 notation.
694 </para>
696 <note>
697 <para>
698 Although inheritance is frequently useful, it has not been integrated
699 with unique constraints or foreign keys, which limits its usefulness.
700 See <xref linkend="ddl-inherit"/> for more detail.
701 </para>
702 </note>
703 </sect1>
706 <sect1 id="tutorial-conclusion">
707 <title>Conclusion</title>
709 <para>
710 <productname>PostgreSQL</productname> has many features not
711 touched upon in this tutorial introduction, which has been
712 oriented toward newer users of <acronym>SQL</acronym>. These
713 features are discussed in more detail in the remainder of this
714 book.
715 </para>
717 <para>
718 If you feel you need more introductory material, please visit the PostgreSQL
719 <ulink url="https://www.postgresql.org">web site</ulink>
720 for links to more resources.
721 </para>
722 </sect1>
723 </chapter>