doc/src/sgml/query.sgml

   1 <!-- $PostgreSQL$ -->
   2
   3  <chapter id="tutorial-sql">
   4   <title>The <acronym>SQL</acronym> Language</title>
   5
   6   <sect1 id="tutorial-sql-intro">
   7    <title>Introduction</title>
   8
   9    <para>
  10     This chapter provides an overview of how to use
  11     <acronym>SQL</acronym> to perform simple operations.  This
  12     tutorial is only intended to give you an introduction and is in no
  13     way a complete tutorial on <acronym>SQL</acronym>.  Numerous books
  14     have been written on <acronym>SQL</acronym>, including <xref
  15     linkend="MELT93"> and <xref linkend="DATE97">.
  16     You should be aware that some <productname>PostgreSQL</productname>
  17     language features are extensions to the standard.
  18    </para>
  19
  20    <para>
  21     In the examples that follow, we assume that you have created a
  22     database named <literal>mydb</literal>, as described in the previous
  23     chapter, and have been able to start <application>psql</application>.
  24    </para>
  25
  26    <para>
  27     Examples in this manual can also be found in the
  28     <productname>PostgreSQL</productname> source distribution
  29     in the directory <filename>src/tutorial/</filename>.  To use those
  30     files, first change to that directory and run <application>make</>:
  31
  32 <screen>
  33 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/src/tutorial</userinput>
  34 <prompt>$</prompt> <userinput>make</userinput>
  35 </screen>
  36
  37     This creates the scripts and compiles the C files containing user-defined
  38     functions and types.  (If you installed a pre-packaged version of
  39     <productname>PostgreSQL</productname> rather than building from source,
  40     look for a directory named <filename>tutorial</> within the
  41     <productname>PostgreSQL</productname> distribution.  The <quote>make</>
  42     part should already have been done for you.)
  43     Then, to start the tutorial, do the following:
  44
  45 <screen>
  46 <prompt>$</prompt> <userinput>cd <replaceable>....</replaceable>/tutorial</userinput>
  47 <prompt>$</prompt> <userinput>psql -s mydb</userinput>
  48 <computeroutput>
  49 ...
  50 </computeroutput>
  51
  52 <prompt>mydb=&gt;</prompt> <userinput>\i basics.sql</userinput>
  53 </screen>
  54
  55     The <literal>\i</literal> command reads in commands from the
  56     specified file. <command>psql</command>'s <literal>-s</> option puts you in
  57     single step mode which pauses before sending each statement to the
  58     server.  The commands used in this section are in the file
  59     <filename>basics.sql</filename>.
  60    </para>
  61   </sect1>
  62
  63
  64   <sect1 id="tutorial-concepts">
  65    <title>Concepts</title>
  66
  67    <para>
  68     <indexterm><primary>relational database</primary></indexterm>
  69     <indexterm><primary>hierarchical database</primary></indexterm>
  70     <indexterm><primary>object-oriented database</primary></indexterm>
  71     <indexterm><primary>relation</primary></indexterm>
  72     <indexterm><primary>table</primary></indexterm>
  73
  74     <productname>PostgreSQL</productname> is a <firstterm>relational
  75     database management system</firstterm> (<acronym>RDBMS</acronym>).
  76     That means it is a system for managing data stored in
  77     <firstterm>relations</firstterm>.  Relation is essentially a
  78     mathematical term for <firstterm>table</firstterm>.  The notion of
  79     storing data in tables is so commonplace today that it might
  80     seem inherently obvious, but there are a number of other ways of
  81     organizing databases.  Files and directories on Unix-like
  82     operating systems form an example of a hierarchical database.  A
  83     more modern development is the object-oriented database.
  84    </para>
  85
  86    <para>
  87     <indexterm><primary>row</primary></indexterm>
  88     <indexterm><primary>column</primary></indexterm>
  89
  90     Each table is a named collection of <firstterm>rows</firstterm>.
  91     Each row of a given table has the same set of named
  92     <firstterm>columns</firstterm>,
  93     and each column is of a specific data type.  Whereas columns have
  94     a fixed order in each row, it is important to remember that SQL
  95     does not guarantee the order of the rows within the table in any
  96     way (although they can be explicitly sorted for display).
  97    </para>
  98
  99    <para>
 100     <indexterm><primary>database cluster</primary></indexterm>
 101     <indexterm><primary>cluster</primary><secondary>of databases</secondary><see>database cluster</see></indexterm>
 102
 103     Tables are grouped into databases, and a collection of databases
 104     managed by a single <productname>PostgreSQL</productname> server
 105     instance constitutes a database <firstterm>cluster</firstterm>.
 106    </para>
 107   </sect1>
 108
 109
 110   <sect1 id="tutorial-table">
 111    <title>Creating a New Table</title>
 112
 113    <indexterm zone="tutorial-table">
 114     <primary>CREATE TABLE</primary>
 115    </indexterm>
 116
 117    <para>
 118     You  can  create  a  new  table by specifying the table
 119     name, along with all column names and their types:
 120
 121 <programlisting>
 122 CREATE TABLE weather (
 123     city            varchar(80),
 124     temp_lo         int,           -- low temperature
 125     temp_hi         int,           -- high temperature
 126     prcp            real,          -- precipitation
 127     date            date
 128 );
 129 </programlisting>
 130
 131     You can enter this into <command>psql</command> with the line
 132     breaks.  <command>psql</command> will recognize that the command
 133     is not terminated until the semicolon.
 134    </para>
 135
 136    <para>
 137     White space (i.e., spaces, tabs, and newlines) can be used freely
 138     in SQL commands.  That means you can type the command aligned
 139     differently than above, or even all on one line.  Two dashes
 140     (<quote><literal>--</literal></quote>) introduce comments.
 141     Whatever follows them is ignored up to the end of the line.  SQL
 142     is case insensitive about key words and identifiers, except
 143     when identifiers are double-quoted to preserve the case (not done
 144     above).
 145    </para>
 146
 147    <para>
 148     <type>varchar(80)</type> specifies a data type that can store
 149     arbitrary character strings up to 80 characters in length.
 150     <type>int</type> is the normal integer type.  <type>real</type> is
 151     a type for storing single precision floating-point numbers.
 152     <type>date</type> should be self-explanatory.  (Yes, the column of
 153     type <type>date</type> is also named <structfield>date</structfield>.
 154     This might be convenient or confusing &mdash; you choose.)
 155    </para>
 156
 157    <para>
 158     <productname>PostgreSQL</productname> supports the standard
 159     <acronym>SQL</acronym> types <type>int</type>,
 160     <type>smallint</type>, <type>real</type>, <type>double
 161     precision</type>, <type>char(<replaceable>N</>)</type>,
 162     <type>varchar(<replaceable>N</>)</type>, <type>date</type>,
 163     <type>time</type>, <type>timestamp</type>, and
 164     <type>interval</type>, as well as other types of general utility
 165     and a rich set of geometric types.
 166     <productname>PostgreSQL</productname> can be customized with an
 167     arbitrary number of user-defined data types.  Consequently, type
 168     names are not key words in the syntax, except where required to
 169     support special cases in the <acronym>SQL</acronym> standard.
 170    </para>
 171
 172    <para>
 173     The second example will store cities and their associated
 174     geographical location:
 175 <programlisting>
 176 CREATE TABLE cities (
 177     name            varchar(80),
 178     location        point
 179 );
 180 </programlisting>
 181     The <type>point</type> type is an example of a
 182     <productname>PostgreSQL</productname>-specific data type.
 183    </para>
 184
 185    <para>
 186     <indexterm>
 187      <primary>DROP TABLE</primary>
 188     </indexterm>
 189
 190     Finally, it should be mentioned that if you don't need a table any
 191     longer or want to recreate it differently you can remove it using
 192     the following command:
 193 <synopsis>
 194 DROP TABLE <replaceable>tablename</replaceable>;
 195 </synopsis>
 196    </para>
 197   </sect1>
 198
 199
 200   <sect1 id="tutorial-populate">
 201    <title>Populating a Table With Rows</title>
 202
 203    <indexterm zone="tutorial-populate">
 204     <primary>INSERT</primary>
 205    </indexterm>
 206
 207    <para>
 208     The <command>INSERT</command> statement is used to populate a table  with
 209     rows:
 210
 211 <programlisting>
 212 INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27');
 213 </programlisting>
 214
 215     Note that all data types use rather obvious input formats.
 216     Constants that are not simple numeric values usually must be
 217     surrounded by single quotes (<literal>'</>), as in the example.
 218     The
 219     <type>date</type> type is actually quite flexible in what it
 220     accepts, but for this tutorial we will stick to the unambiguous
 221     format shown here.
 222    </para>
 223
 224    <para>
 225     The <type>point</type> type requires a coordinate pair as input,
 226     as shown here:
 227 <programlisting>
 228 INSERT INTO cities VALUES ('San Francisco', '(-194.0, 53.0)');
 229 </programlisting>
 230    </para>
 231
 232    <para>
 233     The syntax used so far requires you to remember the order of the
 234     columns.  An alternative syntax allows you to list the columns
 235     explicitly:
 236 <programlisting>
 237 INSERT INTO weather (city, temp_lo, temp_hi, prcp, date)
 238     VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29');
 239 </programlisting>
 240     You can list the columns in a different order if you wish or
 241     even omit some columns, e.g., if the precipitation is unknown:
 242 <programlisting>
 243 INSERT INTO weather (date, city, temp_hi, temp_lo)
 244     VALUES ('1994-11-29', 'Hayward', 54, 37);
 245 </programlisting>
 246     Many developers consider explicitly listing the columns better
 247     style than relying on the order implicitly.
 248    </para>
 249
 250    <para>
 251     Please enter all the commands shown above so you have some data to
 252     work with in the following sections.
 253    </para>
 254
 255    <para>
 256     <indexterm>
 257      <primary>COPY</primary>
 258     </indexterm>
 259
 260     You could also have used <command>COPY</command> to load large
 261     amounts of data from flat-text files.  This is usually faster
 262     because the <command>COPY</command> command is optimized for this
 263     application while allowing less flexibility than
 264     <command>INSERT</command>.  An example would be:
 265
 266 <programlisting>
 267 COPY weather FROM '/home/user/weather.txt';
 268 </programlisting>
 269
 270     where the file name for the source file must be available to the
 271     backend server machine, not the client, since the backend server
 272     reads the file directly.  You can read more about the
 273     <command>COPY</command> command in <xref linkend="sql-copy"
 274     endterm="sql-copy-title">.
 275    </para>
 276   </sect1>
 277
 278
 279   <sect1 id="tutorial-select">
 280    <title>Querying a Table</title>
 281
 282    <para>
 283     <indexterm><primary>query</primary></indexterm>
 284     <indexterm><primary>SELECT</primary></indexterm>
 285
 286     To retrieve data from a table, the table is
 287     <firstterm>queried</firstterm>.  An <acronym>SQL</acronym>
 288     <command>SELECT</command> statement is used to do this.  The
 289     statement is divided into a select list (the part that lists the
 290     columns to be returned), a table list (the part that lists the
 291     tables from which to retrieve the data), and an optional
 292     qualification (the part that specifies any restrictions).  For
 293     example, to retrieve all the rows of table
 294     <structname>weather</structname>, type:
 295 <programlisting>
 296 SELECT * FROM weather;
 297 </programlisting>
 298     Here <literal>*</literal> is a shorthand for <quote>all columns</quote>.
 299      <footnote>
 300       <para>
 301        While <literal>SELECT *</literal> is useful for off-the-cuff
 302        queries, it is widely considered bad style in production code,
 303        since adding a column to the table would change the results.
 304       </para>
 305      </footnote>
 306     So the same result would be had with:
 307 <programlisting>
 308 SELECT city, temp_lo, temp_hi, prcp, date FROM weather;
 309 </programlisting>
 310
 311     The output should be:
 312
 313 <screen>
 314      city      | temp_lo | temp_hi | prcp |    date
 315 ---------------+---------+---------+------+------------
 316  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 317  San Francisco |      43 |      57 |    0 | 1994-11-29
 318  Hayward       |      37 |      54 |      | 1994-11-29
 319 (3 rows)
 320 </screen>
 321    </para>
 322
 323    <para>
 324     You can write expressions, not just simple column references, in the
 325     select list.  For example, you can do:
 326 <programlisting>
 327 SELECT city, (temp_hi+temp_lo)/2 AS temp_avg, date FROM weather;
 328 </programlisting>
 329     This should give:
 330 <screen>
 331      city      | temp_avg |    date
 332 ---------------+----------+------------
 333  San Francisco |       48 | 1994-11-27
 334  San Francisco |       50 | 1994-11-29
 335  Hayward       |       45 | 1994-11-29
 336 (3 rows)
 337 </screen>
 338     Notice how the <literal>AS</literal> clause is used to relabel the
 339     output column.  (The <literal>AS</literal> clause is optional.)
 340    </para>
 341
 342    <para>
 343     A query can be <quote>qualified</> by adding a <literal>WHERE</>
 344     clause that specifies which rows are wanted.  The <literal>WHERE</>
 345     clause contains a Boolean (truth value) expression, and only rows for
 346     which the Boolean expression is true are returned.  The usual
 347     Boolean operators (<literal>AND</literal>,
 348     <literal>OR</literal>, and <literal>NOT</literal>) are allowed in
 349     the qualification.  For example, the following
 350     retrieves the weather of San Francisco on rainy days:
 351
 352 <programlisting>
 353 SELECT * FROM weather
 354     WHERE city = 'San Francisco' AND prcp &gt; 0.0;
 355 </programlisting>
 356     Result:
 357 <screen>
 358      city      | temp_lo | temp_hi | prcp |    date
 359 ---------------+---------+---------+------+------------
 360  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 361 (1 row)
 362 </screen>
 363    </para>
 364
 365    <para>
 366     <indexterm><primary>ORDER BY</primary></indexterm>
 367
 368     You can request that the results of a query
 369     be returned in sorted order:
 370
 371 <programlisting>
 372 SELECT * FROM weather
 373     ORDER BY city;
 374 </programlisting>
 375
 376 <screen>
 377      city      | temp_lo | temp_hi | prcp |    date
 378 ---------------+---------+---------+------+------------
 379  Hayward       |      37 |      54 |      | 1994-11-29
 380  San Francisco |      43 |      57 |    0 | 1994-11-29
 381  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 382 </screen>
 383
 384     In this example, the sort order isn't fully specified, and so you
 385     might get the San Francisco rows in either order.  But you'd always
 386     get the results shown above if you do:
 387
 388 <programlisting>
 389 SELECT * FROM weather
 390     ORDER BY city, temp_lo;
 391 </programlisting>
 392    </para>
 393
 394    <para>
 395     <indexterm><primary>DISTINCT</primary></indexterm>
 396     <indexterm><primary>duplicate</primary></indexterm>
 397
 398     You can request that duplicate rows be removed from the result of
 399     a query:
 400
 401 <programlisting>
 402 SELECT DISTINCT city
 403     FROM weather;
 404 </programlisting>
 405
 406 <screen>
 407      city
 408 ---------------
 409  Hayward
 410  San Francisco
 411 (2 rows)
 412 </screen>
 413
 414     Here again, the result row ordering might vary.
 415     You can ensure consistent results by using <literal>DISTINCT</literal> and
 416     <literal>ORDER BY</literal> together:
 417      <footnote>
 418       <para>
 419        In some database systems, including older versions of
 420        <productname>PostgreSQL</productname>, the implementation of
 421        <literal>DISTINCT</literal> automatically orders the rows and
 422        so <literal>ORDER BY</literal> is unnecessary.  But this is not
 423        required by the SQL standard, and current
 424        <productname>PostgreSQL</productname> does not guarantee that
 425        <literal>DISTINCT</literal> causes the rows to be ordered.
 426       </para>
 427      </footnote>
 428
 429 <programlisting>
 430 SELECT DISTINCT city
 431     FROM weather
 432     ORDER BY city;
 433 </programlisting>
 434    </para>
 435   </sect1>
 436
 437
 438   <sect1 id="tutorial-join">
 439    <title>Joins Between Tables</title>
 440
 441    <indexterm zone="tutorial-join">
 442     <primary>join</primary>
 443    </indexterm>
 444
 445    <para>
 446     Thus far, our queries have only accessed one table at a time.
 447     Queries can access multiple tables at once, or access the same
 448     table in such a way that multiple rows of the table are being
 449     processed at the same time.  A query that accesses multiple rows
 450     of the same or different tables at one time is called a
 451     <firstterm>join</firstterm> query.  As an example, say you wish to
 452     list all the weather records together with the location of the
 453     associated city.  To do that, we need to compare the <structfield>city</>
 454     column of each row of the <structname>weather</> table with the
 455     <structfield>name</> column of all rows in the <structname>cities</>
 456     table, and select the pairs of rows where these values match.
 457     <note>
 458      <para>
 459       This  is only a conceptual model.  The join is usually performed
 460       in a more efficient manner than actually comparing each possible
 461       pair of rows, but this is invisible to the user.
 462      </para>
 463     </note>
 464     This would be accomplished by the following query:
 465
 466 <programlisting>
 467 SELECT *
 468     FROM weather, cities
 469     WHERE city = name;
 470 </programlisting>
 471
 472 <screen>
 473      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 474 ---------------+---------+---------+------+------------+---------------+-----------
 475  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 476  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 477 (2 rows)
 478 </screen>
 479
 480    </para>
 481
 482    <para>
 483     Observe two things about the result set:
 484     <itemizedlist>
 485      <listitem>
 486       <para>
 487        There is no result row for the city of Hayward.  This is
 488        because there is no matching entry in the
 489        <structname>cities</structname> table for Hayward, so the join
 490        ignores the unmatched rows in the <structname>weather</> table.  We will see
 491        shortly how this can be fixed.
 492       </para>
 493      </listitem>
 494
 495      <listitem>
 496       <para>
 497        There are two columns containing the city name.  This is
 498        correct because the lists of columns from the
 499        <structname>weather</structname> and
 500        <structname>cities</structname> tables are concatenated.  In
 501        practice this is undesirable, though, so you will probably want
 502        to list the output columns explicitly rather than using
 503        <literal>*</literal>:
 504 <programlisting>
 505 SELECT city, temp_lo, temp_hi, prcp, date, location
 506     FROM weather, cities
 507     WHERE city = name;
 508 </programlisting>
 509       </para>
 510      </listitem>
 511     </itemizedlist>
 512    </para>
 513
 514    <formalpara>
 515     <title>Exercise:</title>
 516
 517     <para>
 518      Attempt to determine the semantics of this query when the
 519      <literal>WHERE</literal> clause is omitted.
 520     </para>
 521    </formalpara>
 522
 523    <para>
 524     Since the columns all had different names, the parser
 525     automatically found which table they belong to.  If there
 526     were duplicate column names in the two tables you'd need to
 527     <firstterm>qualify</> the column names to show which one you
 528     meant, as in:
 529
 530 <programlisting>
 531 SELECT weather.city, weather.temp_lo, weather.temp_hi,
 532        weather.prcp, weather.date, cities.location
 533     FROM weather, cities
 534     WHERE cities.name = weather.city;
 535 </programlisting>
 536
 537     It is widely considered good style to qualify all column names
 538     in a join query, so that the query won't fail if a duplicate
 539     column name is later added to one of the tables.
 540    </para>
 541
 542    <para>
 543     Join queries of the kind seen thus far can also be written in this
 544     alternative form:
 545
 546 <programlisting>
 547 SELECT *
 548     FROM weather INNER JOIN cities ON (weather.city = cities.name);
 549 </programlisting>
 550
 551     This syntax is not as commonly used as the one above, but we show
 552     it here to help you understand the following topics.
 553    </para>
 554
 555    <para>
 556     <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
 557
 558     Now we will figure out how we can get the Hayward records back in.
 559     What we want the query to do is to scan the
 560     <structname>weather</structname> table and for each row to find the
 561     matching <structname>cities</structname> row(s).  If no matching row is
 562     found we want some <quote>empty values</quote> to be substituted
 563     for the <structname>cities</structname> table's columns.  This kind
 564     of query is called an <firstterm>outer join</firstterm>.  (The
 565     joins we have seen so far are inner joins.)  The command looks
 566     like this:
 567
 568 <programlisting>
 569 SELECT *
 570     FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
 571
 572      city      | temp_lo | temp_hi | prcp |    date    |     name      | location
 573 ---------------+---------+---------+------+------------+---------------+-----------
 574  Hayward       |      37 |      54 |      | 1994-11-29 |               |
 575  San Francisco |      46 |      50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
 576  San Francisco |      43 |      57 |    0 | 1994-11-29 | San Francisco | (-194,53)
 577 (3 rows)
 578 </programlisting>
 579
 580     This query is called a <firstterm>left outer
 581     join</firstterm> because the table mentioned on the left of the
 582     join operator will have each of its rows in the output at least
 583     once, whereas the table on the right will only have those rows
 584     output that match some row of the left table.  When outputting a
 585     left-table row for which there is no right-table match, empty (null)
 586     values are substituted for the right-table columns.
 587    </para>
 588
 589    <formalpara>
 590     <title>Exercise:</title>
 591
 592     <para>
 593      There are also right outer joins and full outer joins.  Try to
 594      find out what those do.
 595     </para>
 596    </formalpara>
 597
 598    <para>
 599     <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
 600     <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
 601
 602     We can also join a table against itself.  This is called a
 603     <firstterm>self join</firstterm>.  As an example, suppose we wish
 604     to find all the weather records that are in the temperature range
 605     of other weather records.  So we need to compare the
 606     <structfield>temp_lo</> and <structfield>temp_hi</> columns of
 607     each <structname>weather</structname> row to the
 608     <structfield>temp_lo</structfield> and
 609     <structfield>temp_hi</structfield> columns of all other
 610     <structname>weather</structname> rows.  We can do this with the
 611     following query:
 612
 613 <programlisting>
 614 SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
 615     W2.city, W2.temp_lo AS low, W2.temp_hi AS high
 616     FROM weather W1, weather W2
 617     WHERE W1.temp_lo &lt; W2.temp_lo
 618     AND W1.temp_hi &gt; W2.temp_hi;
 619
 620      city      | low | high |     city      | low | high
 621 ---------------+-----+------+---------------+-----+------
 622  San Francisco |  43 |   57 | San Francisco |  46 |   50
 623  Hayward       |  37 |   54 | San Francisco |  46 |   50
 624 (2 rows)
 625 </programlisting>
 626
 627     Here we have relabeled the weather table as <literal>W1</> and
 628     <literal>W2</> to be able to distinguish the left and right side
 629     of the join.  You can also use these kinds of aliases in other
 630     queries to save some typing, e.g.:
 631 <programlisting>
 632 SELECT *
 633     FROM weather w, cities c
 634     WHERE w.city = c.name;
 635 </programlisting>
 636     You will encounter this style of abbreviating quite frequently.
 637    </para>
 638   </sect1>
 639
 640
 641   <sect1 id="tutorial-agg">
 642    <title>Aggregate Functions</title>
 643
 644    <indexterm zone="tutorial-agg">
 645     <primary>aggregate function</primary>
 646    </indexterm>
 647
 648    <para>
 649     <indexterm><primary>average</primary></indexterm>
 650     <indexterm><primary>count</primary></indexterm>
 651     <indexterm><primary>max</primary></indexterm>
 652     <indexterm><primary>min</primary></indexterm>
 653     <indexterm><primary>sum</primary></indexterm>
 654
 655     Like  most  other relational database products,
 656     <productname>PostgreSQL</productname> supports
 657     <firstterm>aggregate functions</>.
 658     An aggregate function computes a single result from multiple input rows.
 659     For example, there are aggregates to compute the
 660     <function>count</function>, <function>sum</function>,
 661     <function>avg</function> (average), <function>max</function> (maximum) and
 662     <function>min</function> (minimum) over a set of rows.
 663    </para>
 664
 665    <para>
 666     As an example, we can find the highest low-temperature reading anywhere
 667     with:
 668
 669 <programlisting>
 670 SELECT max(temp_lo) FROM weather;
 671 </programlisting>
 672
 673 <screen>
 674  max
 675 -----
 676   46
 677 (1 row)
 678 </screen>
 679    </para>
 680
 681    <para>
 682     <indexterm><primary>subquery</primary></indexterm>
 683
 684     If we wanted to know what city (or cities) that reading occurred in,
 685     we might try:
 686
 687 <programlisting>
 688 SELECT city FROM weather WHERE temp_lo = max(temp_lo);     <lineannotation>WRONG</lineannotation>
 689 </programlisting>
 690
 691     but this will not work since the aggregate
 692     <function>max</function> cannot be used in the
 693     <literal>WHERE</literal> clause.  (This restriction exists because
 694     the <literal>WHERE</literal> clause determines which rows will be
 695     included in the aggregate calculation; so obviously it has to be evaluated
 696     before aggregate functions are computed.)
 697     However, as is often the case
 698     the query can be restated to accomplish the desired result, here
 699     by using a <firstterm>subquery</firstterm>:
 700
 701 <programlisting>
 702 SELECT city FROM weather
 703     WHERE temp_lo = (SELECT max(temp_lo) FROM weather);
 704 </programlisting>
 705
 706 <screen>
 707      city
 708 ---------------
 709  San Francisco
 710 (1 row)
 711 </screen>
 712
 713     This is OK because the subquery is an independent computation
 714     that computes its own aggregate separately from what is happening
 715     in the outer query.
 716    </para>
 717
 718    <para>
 719     <indexterm><primary>GROUP BY</primary></indexterm>
 720     <indexterm><primary>HAVING</primary></indexterm>
 721
 722     Aggregates are also very useful in combination with <literal>GROUP
 723     BY</literal> clauses.  For example, we can get the maximum low
 724     temperature observed in each city with:
 725
 726 <programlisting>
 727 SELECT city, max(temp_lo)
 728     FROM weather
 729     GROUP BY city;
 730 </programlisting>
 731
 732 <screen>
 733      city      | max
 734 ---------------+-----
 735  Hayward       |  37
 736  San Francisco |  46
 737 (2 rows)
 738 </screen>
 739
 740     which gives us one output row per city.  Each aggregate result is
 741     computed over the table rows matching that city.
 742     We can filter these grouped
 743     rows using <literal>HAVING</literal>:
 744
 745 <programlisting>
 746 SELECT city, max(temp_lo)
 747     FROM weather
 748     GROUP BY city
 749     HAVING max(temp_lo) &lt; 40;
 750 </programlisting>
 751
 752 <screen>
 753   city   | max
 754 ---------+-----
 755  Hayward |  37
 756 (1 row)
 757 </screen>
 758
 759     which gives us the same results for only the cities that have all
 760     <structfield>temp_lo</> values below 40.  Finally, if we only care about
 761     cities whose
 762     names begin with <quote><literal>S</literal></quote>, we might do:
 763
 764 <programlisting>
 765 SELECT city, max(temp_lo)
 766     FROM weather
 767     WHERE city LIKE 'S%'<co id="co.tutorial-agg-like">
 768     GROUP BY city
 769     HAVING max(temp_lo) &lt; 40;
 770 </programlisting>
 771    <calloutlist>
 772     <callout arearefs="co.tutorial-agg-like">
 773      <para>
 774       The <literal>LIKE</literal> operator does pattern matching and
 775       is explained in <xref linkend="functions-matching">.
 776      </para>
 777     </callout>
 778    </calloutlist>
 779    </para>
 780
 781    <para>
 782     It is important to understand the interaction between aggregates and
 783     <acronym>SQL</acronym>'s <literal>WHERE</literal> and <literal>HAVING</literal> clauses.
 784     The fundamental difference between <literal>WHERE</literal> and
 785     <literal>HAVING</literal> is this: <literal>WHERE</literal> selects
 786     input rows before groups and aggregates are computed (thus, it controls
 787     which rows go into the aggregate computation), whereas
 788     <literal>HAVING</literal> selects group rows after groups and
 789     aggregates are computed.  Thus, the
 790     <literal>WHERE</literal> clause must not contain aggregate functions;
 791     it makes no sense to try to use an aggregate to determine which rows
 792     will be inputs to the aggregates.  On the other hand, the
 793     <literal>HAVING</literal> clause always contains aggregate functions.
 794     (Strictly speaking, you are allowed to write a <literal>HAVING</literal>
 795     clause that doesn't use aggregates, but it's seldom useful. The same
 796     condition could be used more efficiently at the <literal>WHERE</literal>
 797     stage.)
 798    </para>
 799
 800    <para>
 801     In the previous example, we can apply the city name restriction in
 802     <literal>WHERE</literal>, since it needs no aggregate.  This is
 803     more efficient than adding the restriction to <literal>HAVING</literal>,
 804     because we avoid doing the grouping and aggregate calculations
 805     for all rows that fail the <literal>WHERE</literal> check.
 806    </para>
 807   </sect1>
 808
 809
 810   <sect1 id="tutorial-update">
 811    <title>Updates</title>
 812
 813    <indexterm zone="tutorial-update">
 814     <primary>UPDATE</primary>
 815    </indexterm>
 816
 817    <para>
 818     You can update existing rows using the
 819     <command>UPDATE</command> command.
 820     Suppose you discover the temperature readings are
 821     all off by 2 degrees after November 28.  You can correct the
 822     data as follows:
 823
 824 <programlisting>
 825 UPDATE weather
 826     SET temp_hi = temp_hi - 2,  temp_lo = temp_lo - 2
 827     WHERE date &gt; '1994-11-28';
 828 </programlisting>
 829    </para>
 830
 831    <para>
 832     Look at the new state of the data:
 833 <programlisting>
 834 SELECT * FROM weather;
 835
 836      city      | temp_lo | temp_hi | prcp |    date
 837 ---------------+---------+---------+------+------------
 838  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 839  San Francisco |      41 |      55 |    0 | 1994-11-29
 840  Hayward       |      35 |      52 |      | 1994-11-29
 841 (3 rows)
 842 </programlisting>
 843    </para>
 844   </sect1>
 845
 846   <sect1 id="tutorial-delete">
 847    <title>Deletions</title>
 848
 849    <indexterm zone="tutorial-delete">
 850     <primary>DELETE</primary>
 851    </indexterm>
 852
 853    <para>
 854     Rows can be removed from a table using the <command>DELETE</command>
 855     command.
 856     Suppose you are no longer interested in the weather of Hayward.
 857     Then you can do the following to delete those rows from the table:
 858 <programlisting>
 859 DELETE FROM weather WHERE city = 'Hayward';
 860 </programlisting>
 861
 862     All weather records belonging to Hayward are removed.
 863
 864 <programlisting>
 865 SELECT * FROM weather;
 866 </programlisting>
 867
 868 <screen>
 869      city      | temp_lo | temp_hi | prcp |    date
 870 ---------------+---------+---------+------+------------
 871  San Francisco |      46 |      50 | 0.25 | 1994-11-27
 872  San Francisco |      41 |      55 |    0 | 1994-11-29
 873 (2 rows)
 874 </screen>
 875    </para>
 876
 877    <para>
 878     One should be wary of statements of the form
 879 <synopsis>
 880 DELETE FROM <replaceable>tablename</replaceable>;
 881 </synopsis>
 882
 883     Without a qualification, <command>DELETE</command> will
 884     remove  <emphasis>all</>  rows from the given table, leaving it
 885     empty.  The system will not request confirmation before
 886     doing this!
 887    </para>
 888   </sect1>
 889
 890  </chapter>