Repair memory leaks in plpython.
[pgsql.git] / doc / src / sgml / query.sgml
blob59962d6e8563352e713888146d30168cd2f7d83f
1 <!-- doc/src/sgml/query.sgml -->
3 <chapter id="tutorial-sql">
4 <title>The <acronym>SQL</acronym> Language</title>
6 <sect1 id="tutorial-sql-intro">
7 <title>Introduction</title>
9 <para>
10 This chapter provides an overview of how to use
11 <acronym>SQL</acronym> to perform simple operations. This
12 tutorial is only intended to give you an introduction and is in no
13 way a complete tutorial on <acronym>SQL</acronym>. Numerous books
14 have been written on <acronym>SQL</acronym>, including <xref
15 linkend="melt93"/> and <xref linkend="date97"/>.
16 You should be aware that some <productname>PostgreSQL</productname>
17 language features are extensions to the standard.
18 </para>
20 <para>
21 In the examples that follow, we assume that you have created a
22 database named <literal>mydb</literal>, as described in the previous
23 chapter, and have been able to start <application>psql</application>.
24 </para>
26 <para>
27 Examples in this manual can also be found in the
28 <productname>PostgreSQL</productname> source distribution
29 in the directory <filename>src/tutorial/</filename>. (Binary
30 distributions of <productname>PostgreSQL</productname> might not
31 provide those files.) To use those
32 files, first change to that directory and run <application>make</application>:
34 <screen>
35 <prompt>$</prompt> <userinput>cd <replaceable>...</replaceable>/src/tutorial</userinput>
36 <prompt>$</prompt> <userinput>make</userinput>
37 </screen>
39 This creates the scripts and compiles the C files containing user-defined
40 functions and types. Then, to start the tutorial, do the following:
42 <screen>
43 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
44 <computeroutput>
45 ...
46 </computeroutput>
47 <prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
48 </screen>
50 The <literal>\i</literal> command reads in commands from the
51 specified file. <command>psql</command>'s <literal>-s</literal> option puts you in
52 single step mode which pauses before sending each statement to the
53 server. The commands used in this section are in the file
54 <filename>basics.sql</filename>.
55 </para>
56 </sect1>
59 <sect1 id="tutorial-concepts">
60 <title>Concepts</title>
62 <para>
63 <indexterm><primary>relational database</primary></indexterm>
64 <indexterm><primary>hierarchical database</primary></indexterm>
65 <indexterm><primary>object-oriented database</primary></indexterm>
66 <indexterm><primary>relation</primary></indexterm>
67 <indexterm><primary>table</primary></indexterm>
69 <productname>PostgreSQL</productname> is a <firstterm>relational
70 database management system</firstterm> (<acronym>RDBMS</acronym>).
71 That means it is a system for managing data stored in
72 <firstterm>relations</firstterm>. Relation is essentially a
73 mathematical term for <firstterm>table</firstterm>. The notion of
74 storing data in tables is so commonplace today that it might
75 seem inherently obvious, but there are a number of other ways of
76 organizing databases. Files and directories on Unix-like
77 operating systems form an example of a hierarchical database. A
78 more modern development is the object-oriented database.
79 </para>
81 <para>
82 <indexterm><primary>row</primary></indexterm>
83 <indexterm><primary>column</primary></indexterm>
85 Each table is a named collection of <firstterm>rows</firstterm>.
86 Each row of a given table has the same set of named
87 <firstterm>columns</firstterm>,
88 and each column is of a specific data type. Whereas columns have
89 a fixed order in each row, it is important to remember that SQL
90 does not guarantee the order of the rows within the table in any
91 way (although they can be explicitly sorted for display).
92 </para>
94 <para>
95 <indexterm><primary>database cluster</primary></indexterm>
96 <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
98 Tables are grouped into databases, and a collection of databases
99 managed by a single <productname>PostgreSQL</productname> server
100 instance constitutes a database <firstterm>cluster</firstterm>.
101 </para>
102 </sect1>
105 <sect1 id="tutorial-table">
106 <title>Creating a New Table</title>
108 <indexterm zone="tutorial-table">
109 <primary>CREATE TABLE</primary>
110 </indexterm>
112 <para>
113 You can create a new table by specifying the table
114 name, along with all column names and their types:
116 <programlisting>
117 CREATE TABLE weather (
118 city varchar(80),
119 temp_lo int, -- low temperature
120 temp_hi int, -- high temperature
121 prcp real, -- precipitation
122 date date
124 </programlisting>
126 You can enter this into <command>psql</command> with the line
127 breaks. <command>psql</command> will recognize that the command
128 is not terminated until the semicolon.
129 </para>
131 <para>
132 White space (i.e., spaces, tabs, and newlines) can be used freely
133 in SQL commands. That means you can type the command aligned
134 differently than above, or even all on one line. Two dashes
135 (<quote><literal>--</literal></quote>) introduce comments.
136 Whatever follows them is ignored up to the end of the line. SQL
137 is case-insensitive about key words and identifiers, except
138 when identifiers are double-quoted to preserve the case (not done
139 above).
140 </para>
142 <para>
143 <type>varchar(80)</type> specifies a data type that can store
144 arbitrary character strings up to 80 characters in length.
145 <type>int</type> is the normal integer type. <type>real</type> is
146 a type for storing single precision floating-point numbers.
147 <type>date</type> should be self-explanatory. (Yes, the column of
148 type <type>date</type> is also named <structfield>date</structfield>.
149 This might be convenient or confusing &mdash; you choose.)
150 </para>
152 <para>
153 <productname>PostgreSQL</productname> supports the standard
154 <acronym>SQL</acronym> types <type>int</type>,
155 <type>smallint</type>, <type>real</type>, <type>double
156 precision</type>, <type>char(<replaceable>N</replaceable>)</type>,
157 <type>varchar(<replaceable>N</replaceable>)</type>, <type>date</type>,
158 <type>time</type>, <type>timestamp</type>, and
159 <type>interval</type>, as well as other types of general utility
160 and a rich set of geometric types.
161 <productname>PostgreSQL</productname> can be customized with an
162 arbitrary number of user-defined data types. Consequently, type
163 names are not key words in the syntax, except where required to
164 support special cases in the <acronym>SQL</acronym> standard.
165 </para>
167 <para>
168 The second example will store cities and their associated
169 geographical location:
170 <programlisting>
171 CREATE TABLE cities (
172 name varchar(80),
173 location point
175 </programlisting>
176 The <type>point</type> type is an example of a
177 <productname>PostgreSQL</productname>-specific data type.
178 </para>
180 <para>
181 <indexterm>
182 <primary>DROP TABLE</primary>
183 </indexterm>
185 Finally, it should be mentioned that if you don't need a table any
186 longer or want to recreate it differently you can remove it using
187 the following command:
188 <synopsis>
189 DROP TABLE <replaceable>tablename</replaceable>;
190 </synopsis>
191 </para>
192 </sect1>
195 <sect1 id="tutorial-populate">
196 <title>Populating a Table With Rows</title>
198 <indexterm zone="tutorial-populate">
199 <primary>INSERT</primary>
200 </indexterm>
202 <para>
203 The <command>INSERT</command> statement is used to populate a table with
204 rows:
206 <programlisting>
207 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
208 </programlisting>
210 Note that all data types use rather obvious input formats.
211 Constants that are not simple numeric values usually must be
212 surrounded by single quotes (<literal>'</literal>), as in the example.
214 <type>date</type> type is actually quite flexible in what it
215 accepts, but for this tutorial we will stick to the unambiguous
216 format shown here.
217 </para>
219 <para>
220 The <type>point</type> type requires a coordinate pair as input,
221 as shown here:
222 <programlisting>
223 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
224 </programlisting>
225 </para>
227 <para>
228 The syntax used so far requires you to remember the order of the
229 columns. An alternative syntax allows you to list the columns
230 explicitly:
231 <programlisting>
232 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
233 VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
234 </programlisting>
235 You can list the columns in a different order if you wish or
236 even omit some columns, e.g., if the precipitation is unknown:
237 <programlisting>
238 INSERT INTO weather (date, city, temp_hi, temp_lo)
239 VALUES ('1994-11-29', 'Hayward', 54, 37);
240 </programlisting>
241 Many developers consider explicitly listing the columns better
242 style than relying on the order implicitly.
243 </para>
245 <para>
246 Please enter all the commands shown above so you have some data to
247 work with in the following sections.
248 </para>
250 <para>
251 <indexterm>
252 <primary>COPY</primary>
253 </indexterm>
255 You could also have used <command>COPY</command> to load large
256 amounts of data from flat-text files. This is usually faster
257 because the <command>COPY</command> command is optimized for this
258 application while allowing less flexibility than
259 <command>INSERT</command>. An example would be:
261 <programlisting>
262 COPY weather FROM '/home/user/weather.txt';
263 </programlisting>
265 where the file name for the source file must be available on the
266 machine running the backend process, not the client, since the backend process
267 reads the file directly. You can read more about the
268 <command>COPY</command> command in <xref linkend="sql-copy"/>.
269 </para>
270 </sect1>
273 <sect1 id="tutorial-select">
274 <title>Querying a Table</title>
276 <para>
277 <indexterm><primary>query</primary></indexterm>
278 <indexterm><primary>SELECT</primary></indexterm>
280 To retrieve data from a table, the table is
281 <firstterm>queried</firstterm>. An <acronym>SQL</acronym>
282 <command>SELECT</command> statement is used to do this. The
283 statement is divided into a select list (the part that lists the
284 columns to be returned), a table list (the part that lists the
285 tables from which to retrieve the data), and an optional
286 qualification (the part that specifies any restrictions). For
287 example, to retrieve all the rows of table
288 <structname>weather</structname>, type:
289 <programlisting>
290 SELECT * FROM weather;
291 </programlisting>
292 Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
293 <footnote>
294 <para>
295 While <literal>SELECT *</literal> is useful for off-the-cuff
296 queries, it is widely considered bad style in production code,
297 since adding a column to the table would change the results.
298 </para>
299 </footnote>
300 So the same result would be had with:
301 <programlisting>
302 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
303 </programlisting>
305 The output should be:
307 <screen>
308 city | temp_lo | temp_hi | prcp | date
309 ---------------+---------+---------+------+------------
310 San Francisco | 46 | 50 | 0.25 | 1994-11-27
311 San Francisco | 43 | 57 | 0 | 1994-11-29
312 Hayward | 37 | 54 | | 1994-11-29
313 (3 rows)
314 </screen>
315 </para>
317 <para>
318 You can write expressions, not just simple column references, in the
319 select list. For example, you can do:
320 <programlisting>
321 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
322 </programlisting>
323 This should give:
324 <screen>
325 city | temp_avg | date
326 ---------------+----------+------------
327 San Francisco | 48 | 1994-11-27
328 San Francisco | 50 | 1994-11-29
329 Hayward | 45 | 1994-11-29
330 (3 rows)
331 </screen>
332 Notice how the <literal>AS</literal> clause is used to relabel the
333 output column. (The <literal>AS</literal> clause is optional.)
334 </para>
336 <para>
337 A query can be <quote>qualified</quote> by adding a <literal>WHERE</literal>
338 clause that specifies which rows are wanted. The <literal>WHERE</literal>
339 clause contains a Boolean (truth value) expression, and only rows for
340 which the Boolean expression is true are returned. The usual
341 Boolean operators (<literal>AND</literal>,
342 <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
343 the qualification. For example, the following
344 retrieves the weather of San Francisco on rainy days:
346 <programlisting>
347 SELECT * FROM weather
348 WHERE city = 'San Francisco' AND prcp &gt; 0.0;
349 </programlisting>
350 Result:
351 <screen>
352 city | temp_lo | temp_hi | prcp | date
353 ---------------+---------+---------+------+------------
354 San Francisco | 46 | 50 | 0.25 | 1994-11-27
355 (1 row)
356 </screen>
357 </para>
359 <para>
360 <indexterm><primary>ORDER BY</primary></indexterm>
362 You can request that the results of a query
363 be returned in sorted order:
365 <programlisting>
366 SELECT * FROM weather
367 ORDER BY city;
368 </programlisting>
370 <screen>
371 city | temp_lo | temp_hi | prcp | date
372 ---------------+---------+---------+------+------------
373 Hayward | 37 | 54 | | 1994-11-29
374 San Francisco | 43 | 57 | 0 | 1994-11-29
375 San Francisco | 46 | 50 | 0.25 | 1994-11-27
376 </screen>
378 In this example, the sort order isn't fully specified, and so you
379 might get the San Francisco rows in either order. But you'd always
380 get the results shown above if you do:
382 <programlisting>
383 SELECT * FROM weather
384 ORDER BY city, temp_lo;
385 </programlisting>
386 </para>
388 <para>
389 <indexterm><primary>DISTINCT</primary></indexterm>
390 <indexterm><primary>duplicate</primary></indexterm>
392 You can request that duplicate rows be removed from the result of
393 a query:
395 <programlisting>
396 SELECT DISTINCT city
397 FROM weather;
398 </programlisting>
400 <screen>
401 city
402 ---------------
403 Hayward
404 San Francisco
405 (2 rows)
406 </screen>
408 Here again, the result row ordering might vary.
409 You can ensure consistent results by using <literal>DISTINCT</literal> and
410 <literal>ORDER BY</literal> together:
411 <footnote>
412 <para>
413 In some database systems, including older versions of
414 <productname>PostgreSQL</productname>, the implementation of
415 <literal>DISTINCT</literal> automatically orders the rows and
416 so <literal>ORDER BY</literal> is unnecessary. But this is not
417 required by the SQL standard, and current
418 <productname>PostgreSQL</productname> does not guarantee that
419 <literal>DISTINCT</literal> causes the rows to be ordered.
420 </para>
421 </footnote>
423 <programlisting>
424 SELECT DISTINCT city
425 FROM weather
426 ORDER BY city;
427 </programlisting>
428 </para>
429 </sect1>
432 <sect1 id="tutorial-join">
433 <title>Joins Between Tables</title>
435 <indexterm zone="tutorial-join">
436 <primary>join</primary>
437 </indexterm>
439 <para>
440 Thus far, our queries have only accessed one table at a time.
441 Queries can access multiple tables at once, or access the same
442 table in such a way that multiple rows of the table are being
443 processed at the same time. Queries that access multiple tables
444 (or multiple instances of the same table) at one time are called
445 <firstterm>join</firstterm> queries. They combine rows from one table
446 with rows from a second table, with an expression specifying which rows
447 are to be paired. For example, to return all the weather records together
448 with the location of the associated city, the database needs to compare
449 the <structfield>city</structfield>
450 column of each row of the <structname>weather</structname> table with the
451 <structfield>name</structfield> column of all rows in the <structname>cities</structname>
452 table, and select the pairs of rows where these values match.<footnote>
453 <para>
454 This is only a conceptual model. The join is usually performed
455 in a more efficient manner than actually comparing each possible
456 pair of rows, but this is invisible to the user.
457 </para>
458 </footnote>
459 This would be accomplished by the following query:
461 <programlisting>
462 SELECT * FROM weather JOIN cities ON city = name;
463 </programlisting>
465 <screen>
466 city | temp_lo | temp_hi | prcp | date | name | location
467 ---------------+---------+---------+------+------------+---------------+-----------
468 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
469 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
470 (2 rows)
471 </screen>
473 </para>
475 <para>
476 Observe two things about the result set:
477 <itemizedlist>
478 <listitem>
479 <para>
480 There is no result row for the city of Hayward. This is
481 because there is no matching entry in the
482 <structname>cities</structname> table for Hayward, so the join
483 ignores the unmatched rows in the <structname>weather</structname> table. We will see
484 shortly how this can be fixed.
485 </para>
486 </listitem>
488 <listitem>
489 <para>
490 There are two columns containing the city name. This is
491 correct because the lists of columns from the
492 <structname>weather</structname> and
493 <structname>cities</structname> tables are concatenated. In
494 practice this is undesirable, though, so you will probably want
495 to list the output columns explicitly rather than using
496 <literal>*</literal>:
497 <programlisting>
498 SELECT city, temp_lo, temp_hi, prcp, date, location
499 FROM weather JOIN cities ON city = name;
500 </programlisting>
501 </para>
502 </listitem>
503 </itemizedlist>
504 </para>
506 <para>
507 Since the columns all had different names, the parser
508 automatically found which table they belong to. If there
509 were duplicate column names in the two tables you'd need to
510 <firstterm>qualify</firstterm> the column names to show which one you
511 meant, as in:
513 <programlisting>
514 SELECT weather.city, weather.temp_lo, weather.temp_hi,
515 weather.prcp, weather.date, cities.location
516 FROM weather JOIN cities ON weather.city = cities.name;
517 </programlisting>
519 It is widely considered good style to qualify all column names
520 in a join query, so that the query won't fail if a duplicate
521 column name is later added to one of the tables.
522 </para>
524 <para>
525 Join queries of the kind seen thus far can also be written in this
526 form:
528 <programlisting>
529 SELECT *
530 FROM weather, cities
531 WHERE city = name;
532 </programlisting>
534 This syntax pre-dates the <literal>JOIN</literal>/<literal>ON</literal>
535 syntax, which was introduced in SQL-92. The tables are simply listed in
536 the <literal>FROM</literal> clause, and the comparison expression is added
537 to the <literal>WHERE</literal> clause. The results from this older
538 implicit syntax and the newer explicit
539 <literal>JOIN</literal>/<literal>ON</literal> syntax are identical. But
540 for a reader of the query, the explicit syntax makes its meaning easier to
541 understand: The join condition is introduced by its own key word whereas
542 previously the condition was mixed into the <literal>WHERE</literal>
543 clause together with other conditions.
544 </para>
546 <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
548 <para>
549 Now we will figure out how we can get the Hayward records back in.
550 What we want the query to do is to scan the
551 <structname>weather</structname> table and for each row to find the
552 matching <structname>cities</structname> row(s). If no matching row is
553 found we want some <quote>empty values</quote> to be substituted
554 for the <structname>cities</structname> table's columns. This kind
555 of query is called an <firstterm>outer join</firstterm>. (The
556 joins we have seen so far are <firstterm>inner joins</firstterm>.)
557 The command looks like this:
559 <programlisting>
560 SELECT *
561 FROM weather LEFT OUTER JOIN cities ON weather.city = cities.name;
562 </programlisting>
564 <screen>
565 city | temp_lo | temp_hi | prcp | date | name | location
566 ---------------+---------+---------+------+------------+---------------+-----------
567 Hayward | 37 | 54 | | 1994-11-29 | |
568 San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
569 San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
570 (3 rows)
571 </screen>
573 This query is called a <firstterm>left outer
574 join</firstterm> because the table mentioned on the left of the
575 join operator will have each of its rows in the output at least
576 once, whereas the table on the right will only have those rows
577 output that match some row of the left table. When outputting a
578 left-table row for which there is no right-table match, empty (null)
579 values are substituted for the right-table columns.
580 </para>
582 <formalpara>
583 <title>Exercise:</title>
585 <para>
586 There are also right outer joins and full outer joins. Try to
587 find out what those do.
588 </para>
589 </formalpara>
591 <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
592 <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
593 <para>
594 We can also join a table against itself. This is called a
595 <firstterm>self join</firstterm>. As an example, suppose we wish
596 to find all the weather records that are in the temperature range
597 of other weather records. So we need to compare the
598 <structfield>temp_lo</structfield> and <structfield>temp_hi</structfield> columns of
599 each <structname>weather</structname> row to the
600 <structfield>temp_lo</structfield> and
601 <structfield>temp_hi</structfield> columns of all other
602 <structname>weather</structname> rows. We can do this with the
603 following query:
605 <programlisting>
606 SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
607 w2.city, w2.temp_lo AS low, w2.temp_hi AS high
608 FROM weather w1 JOIN weather w2
609 ON w1.temp_lo &lt; w2.temp_lo AND w1.temp_hi &gt; w2.temp_hi;
610 </programlisting>
612 <screen>
613 city | low | high | city | low | high
614 ---------------+-----+------+---------------+-----+------
615 San Francisco | 43 | 57 | San Francisco | 46 | 50
616 Hayward | 37 | 54 | San Francisco | 46 | 50
617 (2 rows)
618 </screen>
620 Here we have relabeled the weather table as <literal>w1</literal> and
621 <literal>w2</literal> to be able to distinguish the left and right side
622 of the join. You can also use these kinds of aliases in other
623 queries to save some typing, e.g.:
624 <programlisting>
625 SELECT *
626 FROM weather w JOIN cities c ON w.city = c.name;
627 </programlisting>
628 You will encounter this style of abbreviating quite frequently.
629 </para>
630 </sect1>
633 <sect1 id="tutorial-agg">
634 <title>Aggregate Functions</title>
636 <indexterm zone="tutorial-agg">
637 <primary>aggregate function</primary>
638 </indexterm>
640 <para>
641 Like most other relational database products,
642 <productname>PostgreSQL</productname> supports
643 <firstterm>aggregate functions</firstterm>.
644 An aggregate function computes a single result from multiple input rows.
645 For example, there are aggregates to compute the
646 <function>count</function>, <function>sum</function>,
647 <function>avg</function> (average), <function>max</function> (maximum) and
648 <function>min</function> (minimum) over a set of rows.
649 </para>
651 <para>
652 As an example, we can find the highest low-temperature reading anywhere
653 with:
655 <programlisting>
656 SELECT max(temp_lo) FROM weather;
657 </programlisting>
659 <screen>
661 -----
663 (1 row)
664 </screen>
665 </para>
667 <para>
668 <indexterm><primary>subquery</primary></indexterm>
670 If we wanted to know what city (or cities) that reading occurred in,
671 we might try:
673 <programlisting>
674 SELECT city FROM weather WHERE temp_lo = max(temp_lo); <lineannotation>WRONG</lineannotation>
675 </programlisting>
677 but this will not work since the aggregate
678 <function>max</function> cannot be used in the
679 <literal>WHERE</literal> clause. (This restriction exists because
680 the <literal>WHERE</literal> clause determines which rows will be
681 included in the aggregate calculation; so obviously it has to be evaluated
682 before aggregate functions are computed.)
683 However, as is often the case
684 the query can be restated to accomplish the desired result, here
685 by using a <firstterm>subquery</firstterm>:
687 <programlisting>
688 SELECT city FROM weather
689 WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
690 </programlisting>
692 <screen>
693 city
694 ---------------
695 San Francisco
696 (1 row)
697 </screen>
699 This is OK because the subquery is an independent computation
700 that computes its own aggregate separately from what is happening
701 in the outer query.
702 </para>
704 <para>
705 <indexterm><primary>GROUP BY</primary></indexterm>
706 <indexterm><primary>HAVING</primary></indexterm>
708 Aggregates are also very useful in combination with <literal>GROUP
709 BY</literal> clauses. For example, we can get the number of readings
710 and the maximum low temperature observed in each city with:
712 <programlisting>
713 SELECT city, count(*), max(temp_lo)
714 FROM weather
715 GROUP BY city;
716 </programlisting>
718 <screen>
719 city | count | max
720 ---------------+-------+-----
721 Hayward | 1 | 37
722 San Francisco | 2 | 46
723 (2 rows)
724 </screen>
726 which gives us one output row per city. Each aggregate result is
727 computed over the table rows matching that city.
728 We can filter these grouped
729 rows using <literal>HAVING</literal>:
731 <programlisting>
732 SELECT city, count(*), max(temp_lo)
733 FROM weather
734 GROUP BY city
735 HAVING max(temp_lo) &lt; 40;
736 </programlisting>
738 <screen>
739 city | count | max
740 ---------+-------+-----
741 Hayward | 1 | 37
742 (1 row)
743 </screen>
745 which gives us the same results for only the cities that have all
746 <structfield>temp_lo</structfield> values below 40. Finally, if we only care about
747 cities whose
748 names begin with <quote><literal>S</literal></quote>, we might do:
750 <programlisting>
751 SELECT city, count(*), max(temp_lo)
752 FROM weather
753 WHERE city LIKE 'S%' -- <co id="co.tutorial-agg-like"/>
754 GROUP BY city;
755 </programlisting>
757 <screen>
758 city | count | max
759 ---------------+-------+-----
760 San Francisco | 2 | 46
761 (1 row)
762 </screen>
763 <calloutlist>
764 <callout arearefs="co.tutorial-agg-like">
765 <para>
766 The <literal>LIKE</literal> operator does pattern matching and
767 is explained in <xref linkend="functions-matching"/>.
768 </para>
769 </callout>
770 </calloutlist>
771 </para>
773 <para>
774 It is important to understand the interaction between aggregates and
775 <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
776 The fundamental difference between <literal>WHERE</literal> and
777 <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
778 input rows before groups and aggregates are computed (thus, it controls
779 which rows go into the aggregate computation), whereas
780 <literal>HAVING</literal> selects group rows after groups and
781 aggregates are computed. Thus, the
782 <literal>WHERE</literal> clause must not contain aggregate functions;
783 it makes no sense to try to use an aggregate to determine which rows
784 will be inputs to the aggregates. On the other hand, the
785 <literal>HAVING</literal> clause always contains aggregate functions.
786 (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
787 clause that doesn't use aggregates, but it's seldom useful. The same
788 condition could be used more efficiently at the <literal>WHERE</literal>
789 stage.)
790 </para>
792 <para>
793 In the previous example, we can apply the city name restriction in
794 <literal>WHERE</literal>, since it needs no aggregate. This is
795 more efficient than adding the restriction to <literal>HAVING</literal>,
796 because we avoid doing the grouping and aggregate calculations
797 for all rows that fail the <literal>WHERE</literal> check.
798 </para>
800 <para>
801 Another way to select the rows that go into an aggregate
802 computation is to use <literal>FILTER</literal>, which is a
803 per-aggregate option:
805 <programlisting>
806 SELECT city, count(*) FILTER (WHERE temp_lo &lt; 45), max(temp_lo)
807 FROM weather
808 GROUP BY city;
809 </programlisting>
811 <screen>
812 city | count | max
813 ---------------+-------+-----
814 Hayward | 1 | 37
815 San Francisco | 1 | 46
816 (2 rows)
817 </screen>
819 <literal>FILTER</literal> is much like <literal>WHERE</literal>,
820 except that it removes rows only from the input of the particular
821 aggregate function that it is attached to.
822 Here, the <literal>count</literal> aggregate counts only
823 rows with <literal>temp_lo</literal> below 45; but the
824 <literal>max</literal> aggregate is still applied to all rows,
825 so it still finds the reading of 46.
826 </para>
827 </sect1>
830 <sect1 id="tutorial-update">
831 <title>Updates</title>
833 <indexterm zone="tutorial-update">
834 <primary>UPDATE</primary>
835 </indexterm>
837 <para>
838 You can update existing rows using the
839 <command>UPDATE</command> command.
840 Suppose you discover the temperature readings are
841 all off by 2 degrees after November 28. You can correct the
842 data as follows:
844 <programlisting>
845 UPDATE weather
846 SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2
847 WHERE date &gt; '1994-11-28';
848 </programlisting>
849 </para>
851 <para>
852 Look at the new state of the data:
853 <programlisting>
854 SELECT * FROM weather;
856 city | temp_lo | temp_hi | prcp | date
857 ---------------+---------+---------+------+------------
858 San Francisco | 46 | 50 | 0.25 | 1994-11-27
859 San Francisco | 41 | 55 | 0 | 1994-11-29
860 Hayward | 35 | 52 | | 1994-11-29
861 (3 rows)
862 </programlisting>
863 </para>
864 </sect1>
866 <sect1 id="tutorial-delete">
867 <title>Deletions</title>
869 <indexterm zone="tutorial-delete">
870 <primary>DELETE</primary>
871 </indexterm>
873 <para>
874 Rows can be removed from a table using the <command>DELETE</command>
875 command.
876 Suppose you are no longer interested in the weather of Hayward.
877 Then you can do the following to delete those rows from the table:
878 <programlisting>
879 DELETE FROM weather WHERE city = 'Hayward';
880 </programlisting>
882 All weather records belonging to Hayward are removed.
884 <programlisting>
885 SELECT * FROM weather;
886 </programlisting>
888 <screen>
889 city | temp_lo | temp_hi | prcp | date
890 ---------------+---------+---------+------+------------
891 San Francisco | 46 | 50 | 0.25 | 1994-11-27
892 San Francisco | 41 | 55 | 0 | 1994-11-29
893 (2 rows)
894 </screen>
895 </para>
897 <para>
898 One should be wary of statements of the form
899 <synopsis>
900 DELETE FROM <replaceable>tablename</replaceable>;
901 </synopsis>
903 Without a qualification, <command>DELETE</command> will
904 remove <emphasis>all</emphasis> rows from the given table, leaving it
905 empty. The system will not request confirmation before
906 doing this!
907 </para>
908 </sect1>
910 </chapter>