Fix xslt_process() to ensure that it inserts a NULL terminator after the
[PostgreSQL.git] / doc / src / sgml / advanced.sgml
blob3073d5fd24f79a3f9763540088e4cb6296b260cc
1 <!-- $PostgreSQL$ -->
3 <chapter id="tutorial-advanced">
4 <title>Advanced Features</title>
6 <sect1 id="tutorial-advanced-intro">
7 <title>Introduction</title>
9 <para>
10 In the previous chapter we have covered the basics of using
11 <acronym>SQL</acronym> to store and access your data in
12 <productname>PostgreSQL</productname>. We will now discuss some
13 more advanced features of <acronym>SQL</acronym> that simplify
14 management and prevent loss or corruption of your data. Finally,
15 we will look at some <productname>PostgreSQL</productname>
16 extensions.
17 </para>
19 <para>
20 This chapter will on occasion refer to examples found in <xref
21 linkend="tutorial-sql"> to change or improve them, so it will be
22 useful to have read that chapter. Some examples from
23 this chapter can also be found in
24 <filename>advanced.sql</filename> in the tutorial directory. This
25 file also contains some sample data to load, which is not
26 repeated here. (Refer to <xref linkend="tutorial-sql-intro"> for
27 how to use the file.)
28 </para>
29 </sect1>
32 <sect1 id="tutorial-views">
33 <title>Views</title>
35 <indexterm zone="tutorial-views">
36 <primary>view</primary>
37 </indexterm>
39 <para>
40 Refer back to the queries in <xref linkend="tutorial-join">.
41 Suppose the combined listing of weather records and city location
42 is of particular interest to your application, but you do not want
43 to type the query each time you need it. You can create a
44 <firstterm>view</firstterm> over the query, which gives a name to
45 the query that you can refer to like an ordinary table:
47 <programlisting>
48 CREATE VIEW myview AS
49 SELECT city, temp_lo, temp_hi, prcp, date, location
50 FROM weather, cities
51 WHERE city = name;
53 SELECT * FROM myview;
54 </programlisting>
55 </para>
57 <para>
58 Making liberal use of views is a key aspect of good SQL database
59 design. Views allow you to encapsulate the details of the
60 structure of your tables, which might change as your application
61 evolves, behind consistent interfaces.
62 </para>
64 <para>
65 Views can be used in almost any place a real table can be used.
66 Building views upon other views is not uncommon.
67 </para>
68 </sect1>
71 <sect1 id="tutorial-fk">
72 <title>Foreign Keys</title>
74 <indexterm zone="tutorial-fk">
75 <primary>foreign key</primary>
76 </indexterm>
78 <indexterm zone="tutorial-fk">
79 <primary>referential integrity</primary>
80 </indexterm>
82 <para>
83 Recall the <classname>weather</classname> and
84 <classname>cities</classname> tables from <xref
85 linkend="tutorial-sql">. Consider the following problem: You
86 want to make sure that no one can insert rows in the
87 <classname>weather</classname> table that do not have a matching
88 entry in the <classname>cities</classname> table. This is called
89 maintaining the <firstterm>referential integrity</firstterm> of
90 your data. In simplistic database systems this would be
91 implemented (if at all) by first looking at the
92 <classname>cities</classname> table to check if a matching record
93 exists, and then inserting or rejecting the new
94 <classname>weather</classname> records. This approach has a
95 number of problems and is very inconvenient, so
96 <productname>PostgreSQL</productname> can do this for you.
97 </para>
99 <para>
100 The new declaration of the tables would look like this:
102 <programlisting>
103 CREATE TABLE cities (
104 city varchar(80) primary key,
105 location point
108 CREATE TABLE weather (
109 city varchar(80) references cities(city),
110 temp_lo int,
111 temp_hi int,
112 prcp real,
113 date date
115 </programlisting>
117 Now try inserting an invalid record:
119 <programlisting>
120 INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');
121 </programlisting>
123 <screen>
124 ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey"
125 DETAIL: Key (city)=(Berkeley) is not present in table "cities".
126 </screen>
127 </para>
129 <para>
130 The behavior of foreign keys can be finely tuned to your
131 application. We will not go beyond this simple example in this
132 tutorial, but just refer you to <xref linkend="ddl">
133 for more information. Making correct use of
134 foreign keys will definitely improve the quality of your database
135 applications, so you are strongly encouraged to learn about them.
136 </para>
137 </sect1>
140 <sect1 id="tutorial-transactions">
141 <title>Transactions</title>
143 <indexterm zone="tutorial-transactions">
144 <primary>transaction</primary>
145 </indexterm>
147 <para>
148 <firstterm>Transactions</> are a fundamental concept of all database
149 systems. The essential point of a transaction is that it bundles
150 multiple steps into a single, all-or-nothing operation. The intermediate
151 states between the steps are not visible to other concurrent transactions,
152 and if some failure occurs that prevents the transaction from completing,
153 then none of the steps affect the database at all.
154 </para>
156 <para>
157 For example, consider a bank database that contains balances for various
158 customer accounts, as well as total deposit balances for branches.
159 Suppose that we want to record a payment of $100.00 from Alice's account
160 to Bob's account. Simplifying outrageously, the SQL commands for this
161 might look like:
163 <programlisting>
164 UPDATE accounts SET balance = balance - 100.00
165 WHERE name = 'Alice';
166 UPDATE branches SET balance = balance - 100.00
167 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice');
168 UPDATE accounts SET balance = balance + 100.00
169 WHERE name = 'Bob';
170 UPDATE branches SET balance = balance + 100.00
171 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob');
172 </programlisting>
173 </para>
175 <para>
176 The details of these commands are not important here; the important
177 point is that there are several separate updates involved to accomplish
178 this rather simple operation. Our bank's officers will want to be
179 assured that either all these updates happen, or none of them happen.
180 It would certainly not do for a system failure to result in Bob
181 receiving $100.00 that was not debited from Alice. Nor would Alice long
182 remain a happy customer if she was debited without Bob being credited.
183 We need a guarantee that if something goes wrong partway through the
184 operation, none of the steps executed so far will take effect. Grouping
185 the updates into a <firstterm>transaction</> gives us this guarantee.
186 A transaction is said to be <firstterm>atomic</>: from the point of
187 view of other transactions, it either happens completely or not at all.
188 </para>
190 <para>
191 We also want a
192 guarantee that once a transaction is completed and acknowledged by
193 the database system, it has indeed been permanently recorded
194 and won't be lost even if a crash ensues shortly thereafter.
195 For example, if we are recording a cash withdrawal by Bob,
196 we do not want any chance that the debit to his account will
197 disappear in a crash just after he walks out the bank door.
198 A transactional database guarantees that all the updates made by
199 a transaction are logged in permanent storage (i.e., on disk) before
200 the transaction is reported complete.
201 </para>
203 <para>
204 Another important property of transactional databases is closely
205 related to the notion of atomic updates: when multiple transactions
206 are running concurrently, each one should not be able to see the
207 incomplete changes made by others. For example, if one transaction
208 is busy totalling all the branch balances, it would not do for it
209 to include the debit from Alice's branch but not the credit to
210 Bob's branch, nor vice versa. So transactions must be all-or-nothing
211 not only in terms of their permanent effect on the database, but
212 also in terms of their visibility as they happen. The updates made
213 so far by an open transaction are invisible to other transactions
214 until the transaction completes, whereupon all the updates become
215 visible simultaneously.
216 </para>
218 <para>
219 In <productname>PostgreSQL</>, a transaction is set up by surrounding
220 the SQL commands of the transaction with
221 <command>BEGIN</> and <command>COMMIT</> commands. So our banking
222 transaction would actually look like:
224 <programlisting>
225 BEGIN;
226 UPDATE accounts SET balance = balance - 100.00
227 WHERE name = 'Alice';
228 -- etc etc
229 COMMIT;
230 </programlisting>
231 </para>
233 <para>
234 If, partway through the transaction, we decide we do not want to
235 commit (perhaps we just noticed that Alice's balance went negative),
236 we can issue the command <command>ROLLBACK</> instead of
237 <command>COMMIT</>, and all our updates so far will be canceled.
238 </para>
240 <para>
241 <productname>PostgreSQL</> actually treats every SQL statement as being
242 executed within a transaction. If you do not issue a <command>BEGIN</>
243 command,
244 then each individual statement has an implicit <command>BEGIN</> and
245 (if successful) <command>COMMIT</> wrapped around it. A group of
246 statements surrounded by <command>BEGIN</> and <command>COMMIT</>
247 is sometimes called a <firstterm>transaction block</>.
248 </para>
250 <note>
251 <para>
252 Some client libraries issue <command>BEGIN</> and <command>COMMIT</>
253 commands automatically, so that you might get the effect of transaction
254 blocks without asking. Check the documentation for the interface
255 you are using.
256 </para>
257 </note>
259 <para>
260 It's possible to control the statements in a transaction in a more
261 granular fashion through the use of <firstterm>savepoints</>. Savepoints
262 allow you to selectively discard parts of the transaction, while
263 committing the rest. After defining a savepoint with
264 <command>SAVEPOINT</>, you can if needed roll back to the savepoint
265 with <command>ROLLBACK TO</>. All the transaction's database changes
266 between defining the savepoint and rolling back to it are discarded, but
267 changes earlier than the savepoint are kept.
268 </para>
270 <para>
271 After rolling back to a savepoint, it continues to be defined, so you can
272 roll back to it several times. Conversely, if you are sure you won't need
273 to roll back to a particular savepoint again, it can be released, so the
274 system can free some resources. Keep in mind that either releasing or
275 rolling back to a savepoint
276 will automatically release all savepoints that were defined after it.
277 </para>
279 <para>
280 All this is happening within the transaction block, so none of it
281 is visible to other database sessions. When and if you commit the
282 transaction block, the committed actions become visible as a unit
283 to other sessions, while the rolled-back actions never become visible
284 at all.
285 </para>
287 <para>
288 Remembering the bank database, suppose we debit $100.00 from Alice's
289 account, and credit Bob's account, only to find later that we should
290 have credited Wally's account. We could do it using savepoints like
291 this:
293 <programlisting>
294 BEGIN;
295 UPDATE accounts SET balance = balance - 100.00
296 WHERE name = 'Alice';
297 SAVEPOINT my_savepoint;
298 UPDATE accounts SET balance = balance + 100.00
299 WHERE name = 'Bob';
300 -- oops ... forget that and use Wally's account
301 ROLLBACK TO my_savepoint;
302 UPDATE accounts SET balance = balance + 100.00
303 WHERE name = 'Wally';
304 COMMIT;
305 </programlisting>
306 </para>
308 <para>
309 This example is, of course, oversimplified, but there's a lot of control
310 possible in a transaction block through the use of savepoints.
311 Moreover, <command>ROLLBACK TO</> is the only way to regain control of a
312 transaction block that was put in aborted state by the
313 system due to an error, short of rolling it back completely and starting
314 again.
315 </para>
317 </sect1>
320 <sect1 id="tutorial-window">
321 <title id="tutorial-window-title">Window Functions</title>
323 <indexterm zone="tutorial-window">
324 <primary>window function</primary>
325 </indexterm>
327 <para>
328 A <firstterm>window function</> performs a calculation across a set of
329 table rows that are somehow related to the current row. This is comparable
330 to the type of calculation that can be done with an aggregate function.
331 But unlike regular aggregate functions, use of a window function does not
332 cause rows to become grouped into a single output row &mdash; the
333 rows retain their separate identities. Behind the scenes, the window
334 function is able to access more than just the current row of the query
335 result.
336 </para>
338 <para>
339 Here is an example that shows how to compare each employee's salary
340 with the average salary in his or her department:
342 <programlisting>
343 SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
344 </programlisting>
346 <screen>
347 depname | empno | salary | avg
348 -----------+-------+--------+-----------------------
349 develop | 11 | 5200 | 5020.0000000000000000
350 develop | 7 | 4200 | 5020.0000000000000000
351 develop | 9 | 4500 | 5020.0000000000000000
352 develop | 8 | 6000 | 5020.0000000000000000
353 develop | 10 | 5200 | 5020.0000000000000000
354 personnel | 5 | 3500 | 3700.0000000000000000
355 personnel | 2 | 3900 | 3700.0000000000000000
356 sales | 3 | 4800 | 4866.6666666666666667
357 sales | 1 | 5000 | 4866.6666666666666667
358 sales | 4 | 4800 | 4866.6666666666666667
359 (10 rows)
360 </screen>
362 The first three output columns come directly from the table
363 <structname>empsalary</>, and there is one output row for each row in the
364 table. The fourth column represents an average taken across all the table
365 rows that have the same <structfield>depname</> value as the current row.
366 (This actually is the same function as the regular <function>avg</>
367 aggregate function, but the <literal>OVER</> clause causes it to be
368 treated as a window function and computed across an appropriate set of
369 rows.)
370 </para>
372 <para>
373 A window function call always contains an <literal>OVER</> clause
374 following the window function's name and argument(s). This is what
375 syntactically distinguishes it from a regular function or aggregate
376 function. The <literal>OVER</> clause determines exactly how the
377 rows of the query are split up for processing by the window function.
378 The <literal>PARTITION BY</> list within <literal>OVER</> specifies
379 dividing the rows into groups, or partitions, that share the same
380 values of the <literal>PARTITION BY</> expression(s). For each row,
381 the window function is computed across the rows that fall into the
382 same partition as the current row.
383 </para>
385 <para>
386 Although <function>avg</> will produce the same result no matter
387 what order it processes the partition's rows in, this is not true of all
388 window functions. When needed, you can control that order using
389 <literal>ORDER BY</> within <literal>OVER</>. Here is an example:
391 <programlisting>
392 SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;
393 </programlisting>
395 <screen>
396 depname | empno | salary | rank
397 -----------+-------+--------+------
398 develop | 8 | 6000 | 1
399 develop | 10 | 5200 | 2
400 develop | 11 | 5200 | 2
401 develop | 9 | 4500 | 4
402 develop | 7 | 4200 | 5
403 personnel | 2 | 3900 | 1
404 personnel | 5 | 3500 | 2
405 sales | 1 | 5000 | 1
406 sales | 4 | 4800 | 2
407 sales | 3 | 4800 | 2
408 (10 rows)
409 </screen>
411 As shown here, the <function>rank</> function produces a numerical rank
412 within the current row's partition for each distinct <literal>ORDER BY</>
413 value, in the order defined by the <literal>ORDER BY</> clause.
414 <function>rank</> needs no explicit parameter, because its behavior
415 is entirely determined by the <literal>OVER</> clause.
416 </para>
418 <para>
419 The rows considered by a window function are those of the <quote>virtual
420 table</> produced by the query's <literal>FROM</> clause as filtered by its
421 <literal>WHERE</>, <literal>GROUP BY</>, and <literal>HAVING</> clauses
422 if any. For example, a row removed because it does not meet the
423 <literal>WHERE</> condition is not seen by any window function.
424 A query can contain multiple window functions that slice up the data
425 in different ways by means of different <literal>OVER</> clauses, but
426 they all act on the same collection of rows defined by this virtual table.
427 </para>
429 <para>
430 We already saw that <literal>ORDER BY</> can be omitted if the ordering
431 of rows is not important. It is also possible to omit <literal>PARTITION
432 BY</>, in which case there is just one partition containing all the rows.
433 </para>
435 <para>
436 There is another important concept associated with window functions:
437 for each row, there is a set of rows within its partition called its
438 <firstterm>window frame</>. Many (but not all) window functions act only
439 on the rows of the window frame, rather than of the whole partition.
440 By default, if <literal>ORDER BY</> is supplied then the frame consists of
441 all rows from the start of the partition up through the current row, plus
442 any following rows that are equal to the current row according to the
443 <literal>ORDER BY</> clause. When <literal>ORDER BY</> is omitted the
444 default frame consists of all rows in the partition.
445 <footnote>
446 <para>
447 There are options to define the window frame in other ways, but
448 this tutorial does not cover them. See
449 <xref linkend="syntax-window-functions"> for details.
450 </para>
451 </footnote>
452 Here is an example using <function>sum</>:
453 </para>
455 <programlisting>
456 SELECT salary, sum(salary) OVER () FROM empsalary;
457 </programlisting>
459 <screen>
460 salary | sum
461 --------+-------
462 5200 | 47100
463 5000 | 47100
464 3500 | 47100
465 4800 | 47100
466 3900 | 47100
467 4200 | 47100
468 4500 | 47100
469 4800 | 47100
470 6000 | 47100
471 5200 | 47100
472 (10 rows)
473 </screen>
475 <para>
476 Above, since there is no <literal>ORDER BY</> in the <literal>OVER</>
477 clause, the window frame is the same as the partition, which for lack of
478 <literal>PARTITION BY</> is the whole table; in other words each sum is
479 taken over the whole table and so we get the same result for each output
480 row. But if we add an <literal>ORDER BY</> clause, we get very different
481 results:
482 </para>
484 <programlisting>
485 SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
486 </programlisting>
488 <screen>
489 salary | sum
490 --------+-------
491 3500 | 3500
492 3900 | 7400
493 4200 | 11600
494 4500 | 16100
495 4800 | 25700
496 4800 | 25700
497 5000 | 30700
498 5200 | 41100
499 5200 | 41100
500 6000 | 47100
501 (10 rows)
502 </screen>
504 <para>
505 Here the sum is taken from the first (lowest) salary up through the
506 current one, including any duplicates of the current one (notice the
507 results for the duplicated salaries).
508 </para>
510 <para>
511 Window functions are permitted only in the <literal>SELECT</literal> list
512 and the <literal>ORDER BY</> clause of the query. They are forbidden
513 elsewhere, such as in <literal>GROUP BY</>, <literal>HAVING</>
514 and <literal>WHERE</literal> clauses. This is because they logically
515 execute after the processing of those clauses. Also, window functions
516 execute after regular aggregate functions. This means it is valid to
517 include an aggregate function call in the arguments of a window function,
518 but not vice versa.
519 </para>
521 <para>
522 If there is a need to filter or group rows after the window calculations
523 are performed, you can use a sub-select. For example:
525 <programlisting>
526 SELECT depname, empno, salary, enroll_date
527 FROM
528 (SELECT depname, empno, salary, enroll_date,
529 rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos
530 FROM empsalary
531 ) AS ss
532 WHERE pos &lt; 3;
533 </programlisting>
535 The above query only shows the rows from the inner query having
536 <literal>rank</> less than <literal>3</>.
537 </para>
539 <para>
540 When a query involves multiple window functions, it is possible to write
541 out each one with a separate <literal>OVER</> clause, but this is
542 duplicative and error-prone if the same windowing behavior is wanted
543 for several functions. Instead, each windowing behavior can be named
544 in a <literal>WINDOW</> clause and then referenced in <literal>OVER</>.
545 For example:
547 <programlisting>
548 SELECT sum(salary) OVER w, avg(salary) OVER w
549 FROM empsalary
550 WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
551 </programlisting>
552 </para>
554 <para>
555 More details about window functions can be found in
556 <xref linkend="syntax-window-functions">,
557 <xref linkend="queries-window">, and the
558 <xref linkend="sql-select" endterm="sql-select-title"> reference page.
559 </para>
560 </sect1>
563 <sect1 id="tutorial-inheritance">
564 <title>Inheritance</title>
566 <indexterm zone="tutorial-inheritance">
567 <primary>inheritance</primary>
568 </indexterm>
570 <para>
571 Inheritance is a concept from object-oriented databases. It opens
572 up interesting new possibilities of database design.
573 </para>
575 <para>
576 Let's create two tables: A table <classname>cities</classname>
577 and a table <classname>capitals</classname>. Naturally, capitals
578 are also cities, so you want some way to show the capitals
579 implicitly when you list all cities. If you're really clever you
580 might invent some scheme like this:
582 <programlisting>
583 CREATE TABLE capitals (
584 name text,
585 population real,
586 altitude int, -- (in ft)
587 state char(2)
590 CREATE TABLE non_capitals (
591 name text,
592 population real,
593 altitude int -- (in ft)
596 CREATE VIEW cities AS
597 SELECT name, population, altitude FROM capitals
598 UNION
599 SELECT name, population, altitude FROM non_capitals;
600 </programlisting>
602 This works OK as far as querying goes, but it gets ugly when you
603 need to update several rows, for one thing.
604 </para>
606 <para>
607 A better solution is this:
609 <programlisting>
610 CREATE TABLE cities (
611 name text,
612 population real,
613 altitude int -- (in ft)
616 CREATE TABLE capitals (
617 state char(2)
618 ) INHERITS (cities);
619 </programlisting>
620 </para>
622 <para>
623 In this case, a row of <classname>capitals</classname>
624 <firstterm>inherits</firstterm> all columns (<structfield>name</>,
625 <structfield>population</>, and <structfield>altitude</>) from its
626 <firstterm>parent</firstterm>, <classname>cities</classname>. The
627 type of the column <structfield>name</structfield> is
628 <type>text</type>, a native <productname>PostgreSQL</productname>
629 type for variable length character strings. State capitals have
630 an extra column, <structfield>state</>, that shows their state. In
631 <productname>PostgreSQL</productname>, a table can inherit from
632 zero or more other tables.
633 </para>
635 <para>
636 For example, the following query finds the names of all cities,
637 including state capitals, that are located at an altitude
638 over 500 feet:
640 <programlisting>
641 SELECT name, altitude
642 FROM cities
643 WHERE altitude &gt; 500;
644 </programlisting>
646 which returns:
648 <screen>
649 name | altitude
650 -----------+----------
651 Las Vegas | 2174
652 Mariposa | 1953
653 Madison | 845
654 (3 rows)
655 </screen>
656 </para>
658 <para>
659 On the other hand, the following query finds
660 all the cities that are not state capitals and
661 are situated at an altitude of 500 feet or higher:
663 <programlisting>
664 SELECT name, altitude
665 FROM ONLY cities
666 WHERE altitude &gt; 500;
667 </programlisting>
669 <screen>
670 name | altitude
671 -----------+----------
672 Las Vegas | 2174
673 Mariposa | 1953
674 (2 rows)
675 </screen>
676 </para>
678 <para>
679 Here the <literal>ONLY</literal> before <literal>cities</literal>
680 indicates that the query should be run over only the
681 <classname>cities</classname> table, and not tables below
682 <classname>cities</classname> in the inheritance hierarchy. Many
683 of the commands that we have already discussed &mdash;
684 <command>SELECT</command>, <command>UPDATE</command>, and
685 <command>DELETE</command> &mdash; support this <literal>ONLY</literal>
686 notation.
687 </para>
689 <note>
690 <para>
691 Although inheritance is frequently useful, it has not been integrated
692 with unique constraints or foreign keys, which limits its usefulness.
693 See <xref linkend="ddl-inherit"> for more detail.
694 </para>
695 </note>
696 </sect1>
699 <sect1 id="tutorial-conclusion">
700 <title>Conclusion</title>
702 <para>
703 <productname>PostgreSQL</productname> has many features not
704 touched upon in this tutorial introduction, which has been
705 oriented toward newer users of <acronym>SQL</acronym>. These
706 features are discussed in more detail in the remainder of this
707 book.
708 </para>
710 <para>
711 If you feel you need more introductory material, please visit the PostgreSQL
712 <ulink url="http://www.postgresql.org">web site</ulink>
713 for links to more resources.
714 </para>
715 </sect1>
716 </chapter>