doc/src/sgml/mvcc.sgml

   1 <!-- $PostgreSQL$ -->
   2
   3  <chapter id="mvcc">
   4   <title>Concurrency Control</title>
   5
   6   <indexterm>
   7    <primary>concurrency</primary>
   8   </indexterm>
   9
  10   <para>
  11    This chapter describes the behavior of the
  12    <productname>PostgreSQL</productname> database system when two or
  13    more sessions try to access the same data at the same time.  The
  14    goals in that situation are to allow efficient access for all
  15    sessions while maintaining strict data integrity.  Every developer
  16    of database applications should be familiar with the topics covered
  17    in this chapter.
  18   </para>
  19
  20   <sect1 id="mvcc-intro">
  21    <title>Introduction</title>
  22
  23    <indexterm>
  24     <primary>MVCC</primary>
  25    </indexterm>
  26
  27    <para>
  28     <productname>PostgreSQL</productname> provides a rich set of tools
  29     for developers to manage concurrent access to data.  Internally,
  30     data consistency is maintained by using a multiversion
  31     model (Multiversion Concurrency Control, <acronym>MVCC</acronym>).
  32     This means that while querying a database each transaction sees
  33     a snapshot of data (a <firstterm>database version</firstterm>)
  34     as it was some
  35     time ago, regardless of the current state of the underlying data.
  36     This protects the transaction from viewing inconsistent data that
  37     could be caused by (other) concurrent transaction updates on the same
  38     data rows, providing <firstterm>transaction isolation</firstterm>
  39     for each database session.  <acronym>MVCC</acronym>, by eschewing
  40     explicit locking methodologies of traditional database systems,
  41     minimizes lock contention in order to allow for reasonable
  42     performance in multiuser environments.
  43    </para>
  44
  45    <para>
  46     The main advantage of using the <acronym>MVCC</acronym> model of
  47     concurrency control rather than locking is that in
  48     <acronym>MVCC</acronym> locks acquired for querying (reading) data
  49     do not conflict with locks acquired for writing data, and so
  50     reading never blocks writing and writing never blocks reading.
  51    </para>
  52
  53    <para>
  54     Table- and row-level locking facilities are also available in
  55     <productname>PostgreSQL</productname> for applications that cannot
  56     adapt easily to <acronym>MVCC</acronym> behavior.  However, proper
  57     use of <acronym>MVCC</acronym> will generally provide better
  58     performance than locks.  In addition, application-defined advisory
  59     locks provide a mechanism for acquiring locks that are not tied
  60     to a single transaction.
  61    </para>
  62   </sect1>
  63
  64   <sect1 id="transaction-iso">
  65    <title>Transaction Isolation</title>
  66
  67    <indexterm>
  68     <primary>transaction isolation</primary>
  69    </indexterm>
  70
  71    <para>
  72     The <acronym>SQL</acronym> standard defines four levels of
  73     transaction isolation in terms of three phenomena that must be
  74     prevented between concurrent transactions.  These undesirable
  75     phenomena are:
  76
  77     <variablelist>
  78      <varlistentry>
  79       <term>
  80        dirty read
  81        <indexterm><primary>dirty read</primary></indexterm>
  82       </term>
  83      <listitem>
  84       <para>
  85         A transaction reads data written by a concurrent uncommitted transaction.
  86        </para>
  87       </listitem>
  88      </varlistentry>
  89
  90      <varlistentry>
  91       <term>
  92        nonrepeatable read
  93        <indexterm><primary>nonrepeatable read</primary></indexterm>
  94       </term>
  95      <listitem>
  96       <para>
  97         A transaction re-reads data it has previously read and finds that data
  98         has been modified by another transaction (that committed since the
  99         initial read).
 100        </para>
 101       </listitem>
 102      </varlistentry>
 103
 104      <varlistentry>
 105       <term>
 106        phantom read
 107        <indexterm><primary>phantom read</primary></indexterm>
 108       </term>
 109      <listitem>
 110       <para>
 111         A transaction re-executes a query returning a set of rows that satisfy a
 112         search condition and finds that the set of rows satisfying the condition
 113         has changed due to another recently-committed transaction.
 114        </para>
 115       </listitem>
 116      </varlistentry>
 117     </variablelist>
 118    </para>
 119
 120    <para>
 121     <indexterm>
 122      <primary>transaction isolation level</primary>
 123     </indexterm>
 124     The four transaction isolation levels and the corresponding
 125     behaviors are described in <xref linkend="mvcc-isolevel-table">.
 126    </para>
 127
 128     <table tocentry="1" id="mvcc-isolevel-table">
 129      <title><acronym>SQL</acronym> Transaction Isolation Levels</title>
 130      <tgroup cols="4">
 131       <thead>
 132        <row>
 133         <entry>
 134          Isolation Level
 135         </entry>
 136         <entry>
 137          Dirty Read
 138         </entry>
 139         <entry>
 140          Nonrepeatable Read
 141         </entry>
 142         <entry>
 143          Phantom Read
 144         </entry>
 145        </row>
 146       </thead>
 147       <tbody>
 148        <row>
 149         <entry>
 150          Read uncommitted
 151         </entry>
 152         <entry>
 153          Possible
 154         </entry>
 155         <entry>
 156          Possible
 157         </entry>
 158         <entry>
 159          Possible
 160         </entry>
 161        </row>
 162
 163        <row>
 164         <entry>
 165          Read committed
 166         </entry>
 167         <entry>
 168          Not possible
 169         </entry>
 170         <entry>
 171          Possible
 172         </entry>
 173         <entry>
 174          Possible
 175         </entry>
 176        </row>
 177
 178        <row>
 179         <entry>
 180          Repeatable read
 181         </entry>
 182         <entry>
 183          Not possible
 184         </entry>
 185         <entry>
 186          Not possible
 187         </entry>
 188         <entry>
 189          Possible
 190         </entry>
 191        </row>
 192
 193        <row>
 194         <entry>
 195          Serializable
 196         </entry>
 197         <entry>
 198          Not possible
 199         </entry>
 200         <entry>
 201          Not possible
 202         </entry>
 203         <entry>
 204          Not possible
 205         </entry>
 206        </row>
 207       </tbody>
 208      </tgroup>
 209     </table>
 210
 211    <para>
 212     In <productname>PostgreSQL</productname>, you can request any of the
 213     four standard transaction isolation levels.  But internally, there are
 214     only two distinct isolation levels, which correspond to the levels Read
 215     Committed and Serializable.  When you select the level Read
 216     Uncommitted you really get Read Committed, and when you select
 217     Repeatable Read you really get Serializable, so the actual
 218     isolation level might be stricter than what you select.  This is
 219     permitted by the SQL standard: the four isolation levels only
 220     define which phenomena must not happen, they do not define which
 221     phenomena must happen.  The reason that <productname>PostgreSQL</>
 222     only provides two isolation levels is that this is the only
 223     sensible way to map the standard isolation levels to the multiversion
 224     concurrency control architecture.  The behavior of the available
 225     isolation levels is detailed in the following subsections.
 226    </para>
 227
 228    <para>
 229     To set the transaction isolation level of a transaction, use the
 230     command <xref linkend="sql-set-transaction" endterm="sql-set-transaction-title">.
 231    </para>
 232
 233   <sect2 id="xact-read-committed">
 234    <title>Read Committed Isolation Level</title>
 235
 236    <indexterm>
 237     <primary>transaction isolation level</primary>
 238     <secondary>read committed</secondary>
 239    </indexterm>
 240
 241    <para>
 242     <firstterm>Read Committed</firstterm> is the default isolation
 243     level in <productname>PostgreSQL</productname>.  When a transaction
 244     uses this isolation level, a <command>SELECT</command> query
 245     (without a <literal>FOR UPDATE/SHARE</> clause) sees only data
 246     committed before the query began; it never sees either uncommitted
 247     data or changes committed during query execution by concurrent
 248     transactions.  In effect, a <command>SELECT</command> query sees
 249     a snapshot of the database as of the instant the query begins to
 250     run.   However, <command>SELECT</command> does see the effects
 251     of previous updates executed within its own transaction, even
 252     though they are not yet committed.  Also note that two successive
 253     <command>SELECT</command> commands can see different data, even
 254     though they are within a single transaction, if other transactions
 255     commit changes during execution of the first <command>SELECT</command>.
 256    </para>
 257
 258    <para>
 259     <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT
 260     FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands
 261     behave the same as <command>SELECT</command>
 262     in terms of searching for target rows: they will only find target rows
 263     that were committed as of the command start time.  However, such a target
 264     row might have already been updated (or deleted or locked) by
 265     another concurrent transaction by the time it is found.  In this case, the
 266     would-be updater will wait for the first updating transaction to commit or
 267     roll back (if it is still in progress).  If the first updater rolls back,
 268     then its effects are negated and the second updater can proceed with
 269     updating the originally found row.  If the first updater commits, the
 270     second updater will ignore the row if the first updater deleted it,
 271     otherwise it will attempt to apply its operation to the updated version of
 272     the row.  The search condition of the command (the <literal>WHERE</> clause) is
 273     re-evaluated to see if the updated version of the row still matches the
 274     search condition.  If so, the second updater proceeds with its operation
 275     using the updated version of the row.  In the case of
 276     <command>SELECT FOR UPDATE</command> and <command>SELECT FOR
 277     SHARE</command>, this means it is the updated version of the row that is
 278     locked and returned to the client.
 279    </para>
 280
 281    <para>
 282     Because of the above rule, it is possible for an updating command to see an
 283     inconsistent snapshot: it can see the effects of concurrent updating
 284     commands on the same rows it is trying to update, but it
 285     does not see effects of those commands on other rows in the database.
 286     This behavior makes Read Committed mode unsuitable for commands that
 287     involve complex search conditions; however, it is just right for simpler
 288     cases.  For example, consider updating bank balances with transactions
 289     like:
 290
 291 <screen>
 292 BEGIN;
 293 UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
 294 UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534;
 295 COMMIT;
 296 </screen>
 297
 298     If two such transactions concurrently try to change the balance of account
 299     12345, we clearly want the second transaction to start with the updated
 300     version of the account's row.  Because each command is affecting only a
 301     predetermined row, letting it see the updated version of the row does
 302     not create any troublesome inconsistency.
 303    </para>
 304
 305    <para>
 306     More complex usage can produce undesirable results in Read Committed
 307     mode.  For example, consider a <command>DELETE</command> command
 308     operating on data that is being both added and removed from its
 309     restriction criteria by another command, e.g., assume
 310     <literal>website</literal> is a two-row table with
 311     <literal>website.hits</literal> equaling <literal>9</literal> and
 312     <literal>10</literal>:
 313
 314 <screen>
 315 BEGIN;
 316 UPDATE website SET hits = hits + 1;
 317 -- run from another session:  DELETE FROM website WHERE hits = 10;
 318 COMMIT;
 319 </screen>
 320
 321     The <command>DELETE</command> will have no effect even though
 322     there is a <literal>website.hits = 10</literal> row before and
 323     after the <command>UPDATE</command>. This occurs because the
 324     pre-update row value <literal>9</> is skipped, and when the
 325     <command>UPDATE</command> completes and <command>DELETE</command>
 326     obtains a lock, the new row value is no longer <literal>10</> but
 327     <literal>11</>, which no longer matches the criteria.
 328    </para>
 329
 330    <para>
 331     Because Read Committed mode starts each command with a new snapshot
 332     that includes all transactions committed up to that instant,
 333     subsequent commands in the same transaction will see the effects
 334     of the committed concurrent transaction in any case.  The point
 335     at issue above is whether or not a <emphasis>single</> command
 336     sees an absolutely consistent view of the database.
 337    </para>
 338
 339    <para>
 340     The partial transaction isolation provided by Read Committed mode
 341     is adequate for many applications, and this mode is fast and simple
 342     to use;  however, it is not sufficient for all cases.  Applications
 343     that do complex queries and updates might require a more rigorously
 344     consistent view of the database than Read Committed mode provides.
 345    </para>
 346   </sect2>
 347
 348   <sect2 id="xact-serializable">
 349    <title>Serializable Isolation Level</title>
 350
 351    <indexterm>
 352     <primary>transaction isolation level</primary>
 353     <secondary>serializable</secondary>
 354    </indexterm>
 355
 356    <para>
 357     The <firstterm>Serializable</firstterm> isolation level provides the strictest transaction
 358     isolation.  This level emulates serial transaction execution,
 359     as if transactions had been executed one after another, serially,
 360     rather than concurrently.  However, applications using this level must
 361     be prepared to retry transactions due to serialization failures.
 362    </para>
 363
 364    <para>
 365     When a transaction is using the serializable level,
 366     a <command>SELECT</command> query only sees data committed before the
 367     transaction began; it never sees either uncommitted data or changes
 368     committed
 369     during transaction execution by concurrent transactions.  (However,
 370     the query does see the effects of previous updates
 371     executed within its own transaction, even though they are not yet
 372     committed.)  This is different from Read Committed in that
 373     a query in a serializable transaction
 374     sees a snapshot as of the start of the <emphasis>transaction</>,
 375     not as of the start
 376     of the current query within the transaction.  Thus, successive
 377     <command>SELECT</command> commands within a <emphasis>single</>
 378     transaction see the same data, i.e., they do not see changes made by
 379     other transactions that committed after their own transaction started.
 380     (This behavior can be ideal for reporting applications.)
 381    </para>
 382
 383    <para>
 384     <command>UPDATE</command>, <command>DELETE</command>, <command>SELECT
 385     FOR UPDATE</command>, and <command>SELECT FOR SHARE</command> commands
 386     behave the same as <command>SELECT</command>
 387     in terms of searching for target rows: they will only find target rows
 388     that were committed as of the transaction start time.  However, such a
 389     target
 390     row might have already been updated (or deleted or locked) by
 391     another concurrent transaction by the time it is found.  In this case, the
 392     serializable transaction will wait for the first updating transaction to commit or
 393     roll back (if it is still in progress).  If the first updater rolls back,
 394     then its effects are negated and the serializable transaction can proceed
 395     with updating the originally found row.  But if the first updater commits
 396     (and actually updated or deleted the row, not just locked it)
 397     then the serializable transaction will be rolled back with the message
 398
 399 <screen>
 400 ERROR:  could not serialize access due to concurrent update
 401 </screen>
 402
 403     because a serializable transaction cannot modify or lock rows changed by
 404     other transactions after the serializable transaction began.
 405    </para>
 406
 407    <para>
 408     When an application receives this error message, it should abort
 409     the current transaction and retry the whole transaction from
 410     the beginning.  The second time through, the transaction will see the
 411     previously-committed change as part of its initial view of the database,
 412     so there is no logical conflict in using the new version of the row
 413     as the starting point for the new transaction's update.
 414    </para>
 415
 416    <para>
 417     Note that only updating transactions might need to be retried; read-only
 418     transactions will never have serialization conflicts.
 419    </para>
 420
 421    <para>
 422     The Serializable mode provides a rigorous guarantee that each
 423     transaction sees a wholly consistent view of the database.  However,
 424     the application has to be prepared to retry transactions when concurrent
 425     updates make it impossible to sustain the illusion of serial execution.
 426     Since the cost of redoing complex transactions can be significant,
 427     serializable mode is recommended only when updating transactions contain logic
 428     sufficiently complex that they might give wrong answers in Read
 429     Committed mode.  Most commonly, Serializable mode is necessary when
 430     a transaction executes several successive commands that must see
 431     identical views of the database.
 432    </para>
 433
 434    <sect3 id="mvcc-serializability">
 435     <title>Serializable Isolation versus True Serializability</title>
 436
 437    <indexterm>
 438     <primary>serializability</primary>
 439    </indexterm>
 440
 441    <indexterm>
 442     <primary>predicate locking</primary>
 443    </indexterm>
 444
 445    <para>
 446     The intuitive meaning (and mathematical definition) of
 447     <quote>serializable</> execution is that any two successfully committed
 448     concurrent transactions will appear to have executed strictly serially,
 449     one after the other &mdash; although which one appeared to occur first might
 450     not be predictable in advance.  It is important to realize that forbidding
 451     the undesirable behaviors listed in <xref linkend="mvcc-isolevel-table">
 452     is not sufficient to guarantee true serializability, and in fact
 453     <productname>PostgreSQL</productname>'s Serializable mode <emphasis>does
 454     not guarantee serializable execution in this sense</>.  As an example,
 455     consider a table <structname>mytab</>, initially containing:
 456 <screen>
 457  class | value
 458 -------+-------
 459      1 |    10
 460      1 |    20
 461      2 |   100
 462      2 |   200
 463 </screen>
 464     Suppose that serializable transaction A computes:
 465 <screen>
 466 SELECT SUM(value) FROM mytab WHERE class = 1;
 467 </screen>
 468     and then inserts the result (30) as the <structfield>value</> in a
 469     new row with <structfield>class</><literal> = 2</>.  Concurrently, serializable
 470     transaction B computes:
 471 <screen>
 472 SELECT SUM(value) FROM mytab WHERE class = 2;
 473 </screen>
 474     and obtains the result 300, which it inserts in a new row with
 475     <structfield>class</><literal> = 1</>.  Then both transactions commit.  None of
 476     the listed undesirable behaviors have occurred, yet we have a result
 477     that could not have occurred in either order serially.  If A had
 478     executed before B, B would have computed the sum 330, not 300, and
 479     similarly the other order would have resulted in a different sum
 480     computed by A.
 481    </para>
 482
 483    <para>
 484     To guarantee true mathematical serializability, it is necessary for
 485     a database system to enforce <firstterm>predicate locking</>, which
 486     means that a transaction cannot insert or modify a row that would
 487     have matched the <literal>WHERE</> condition of a query in another concurrent
 488     transaction.  For example, once transaction A has executed the query
 489     <literal>SELECT ... WHERE class = 1</>, a predicate-locking system
 490     would forbid transaction B from inserting any new row with class 1
 491     until A has committed.
 492      <footnote>
 493       <para>
 494        Essentially, a predicate-locking system prevents phantom reads
 495        by restricting what is written, whereas MVCC prevents them by
 496        restricting what is read.
 497       </para>
 498      </footnote>
 499     Such a locking system is complex to
 500     implement and extremely expensive in execution, since every session must
 501     be aware of the details of every query executed by every concurrent
 502     transaction.  And this large expense is mostly wasted, since in
 503     practice most applications do not do the sorts of things that could
 504     result in problems.  (Certainly the example above is rather contrived
 505     and unlikely to represent real software.)  For these reasons,
 506     <productname>PostgreSQL</productname> does not implement predicate
 507     locking.
 508    </para>
 509
 510    <para>
 511     In cases where the possibility of non-serializable execution
 512     is a real hazard, problems can be prevented by appropriate use of
 513     explicit locking.  Further discussion appears in the following
 514     sections.
 515    </para>
 516   </sect3>
 517   </sect2>
 518  </sect1>
 519
 520   <sect1 id="explicit-locking">
 521    <title>Explicit Locking</title>
 522
 523    <indexterm>
 524     <primary>lock</primary>
 525    </indexterm>
 526
 527    <para>
 528     <productname>PostgreSQL</productname> provides various lock modes
 529     to control concurrent access to data in tables.  These modes can
 530     be used for application-controlled locking in situations where
 531     <acronym>MVCC</acronym> does not give the desired behavior.  Also,
 532     most <productname>PostgreSQL</productname> commands automatically
 533     acquire locks of appropriate modes to ensure that referenced
 534     tables are not dropped or modified in incompatible ways while the
 535     command executes.  (For example, <command>ALTER TABLE</> cannot safely be
 536     executed concurrently with other operations on the same table, so it
 537     obtains an exclusive lock on the table to enforce that.)
 538    </para>
 539
 540    <para>
 541     To examine a list of the currently outstanding locks in a database
 542     server, use the
 543     <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
 544     system view. For more information on monitoring the status of the lock
 545     manager subsystem, refer to <xref linkend="monitoring">.
 546    </para>
 547
 548   <sect2 id="locking-tables">
 549    <title>Table-Level Locks</title>
 550
 551    <indexterm zone="locking-tables">
 552     <primary>LOCK</primary>
 553    </indexterm>
 554
 555    <para>
 556     The list below shows the available lock modes and the contexts in
 557     which they are used automatically by
 558     <productname>PostgreSQL</productname>.  You can also acquire any
 559     of these locks explicitly with the command <xref
 560     linkend="sql-lock" endterm="sql-lock-title">.
 561     Remember that all of these lock modes are table-level locks,
 562     even if the name contains the word
 563     <quote>row</quote>; the names of the lock modes are historical.
 564     To some extent the names reflect the typical usage of each lock
 565     mode &mdash; but the semantics are all the same.  The only real difference
 566     between one lock mode and another is the set of lock modes with
 567     which each conflicts (see <xref linkend="table-lock-compatibility">).
 568  .  Two transactions cannot hold locks of conflicting
 569     modes on the same table at the same time.  (However, a transaction
 570     never conflicts with itself.  For example, it might acquire
 571     <literal>ACCESS EXCLUSIVE</literal> lock and later acquire
 572     <literal>ACCESS SHARE</literal> lock on the same table.)  Non-conflicting
 573     lock modes can be held concurrently by many transactions.  Notice in
 574     particular that some lock modes are self-conflicting (for example,
 575     an <literal>ACCESS EXCLUSIVE</literal> lock cannot be held by more than one
 576     transaction at a time) while others are not self-conflicting (for example,
 577     an <literal>ACCESS SHARE</literal> lock can be held by multiple transactions).
 578    </para>
 579
 580      <variablelist>
 581       <title>Table-level lock modes</title>
 582       <varlistentry>
 583        <term>
 584         <literal>ACCESS SHARE</literal>
 585        </term>
 586        <listitem>
 587         <para>
 588          Conflicts with the <literal>ACCESS EXCLUSIVE</literal> lock
 589          mode only.
 590         </para>
 591
 592         <para>
 593          The <command>SELECT</command> command acquires a lock of this mode on
 594          referenced tables.  In general, any query that only <emphasis>reads</> a table
 595          and does not modify it will acquire this lock mode.
 596         </para>
 597        </listitem>
 598       </varlistentry>
 599
 600       <varlistentry>
 601        <term>
 602         <literal>ROW SHARE</literal>
 603        </term>
 604        <listitem>
 605         <para>
 606          Conflicts with the <literal>EXCLUSIVE</literal> and
 607          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 608         </para>
 609
 610         <para>
 611          The <command>SELECT FOR UPDATE</command> and
 612          <command>SELECT FOR SHARE</command> commands acquire a
 613          lock of this mode on the target table(s) (in addition to
 614          <literal>ACCESS SHARE</literal> locks on any other tables
 615          that are referenced but not selected
 616          <option>FOR UPDATE/FOR SHARE</option>).
 617         </para>
 618        </listitem>
 619       </varlistentry>
 620
 621       <varlistentry>
 622        <term>
 623         <literal>ROW EXCLUSIVE</literal>
 624        </term>
 625        <listitem>
 626         <para>
 627          Conflicts with the <literal>SHARE</literal>, <literal>SHARE ROW
 628          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 629          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 630         </para>
 631
 632         <para>
 633          The commands <command>UPDATE</command>,
 634          <command>DELETE</command>, and <command>INSERT</command>
 635          acquire this lock mode on the target table (in addition to
 636          <literal>ACCESS SHARE</literal> locks on any other referenced
 637          tables).  In general, this lock mode will be acquired by any
 638          command that <emphasis>modifies data</> in a table.
 639         </para>
 640        </listitem>
 641       </varlistentry>
 642
 643       <varlistentry>
 644        <term>
 645         <literal>SHARE UPDATE EXCLUSIVE</literal>
 646        </term>
 647        <listitem>
 648         <para>
 649          Conflicts with the <literal>SHARE UPDATE EXCLUSIVE</literal>,
 650          <literal>SHARE</literal>, <literal>SHARE ROW
 651          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 652          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 653          This mode protects a table against
 654          concurrent schema changes and <command>VACUUM</> runs.
 655         </para>
 656
 657         <para>
 658          Acquired by <command>VACUUM</command> (without <option>FULL</option>),
 659          <command>ANALYZE</>, and <command>CREATE INDEX CONCURRENTLY</>.
 660         </para>
 661        </listitem>
 662       </varlistentry>
 663
 664       <varlistentry>
 665        <term>
 666         <literal>SHARE</literal>
 667        </term>
 668        <listitem>
 669         <para>
 670          Conflicts with the <literal>ROW EXCLUSIVE</literal>,
 671          <literal>SHARE UPDATE EXCLUSIVE</literal>, <literal>SHARE ROW
 672          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 673          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 674          This mode protects a table against concurrent data changes.
 675         </para>
 676
 677         <para>
 678          Acquired by <command>CREATE INDEX</command>
 679          (without <option>CONCURRENTLY</option>).
 680         </para>
 681        </listitem>
 682       </varlistentry>
 683
 684       <varlistentry>
 685        <term>
 686         <literal>SHARE ROW EXCLUSIVE</literal>
 687        </term>
 688        <listitem>
 689         <para>
 690          Conflicts with the <literal>ROW EXCLUSIVE</literal>,
 691          <literal>SHARE UPDATE EXCLUSIVE</literal>,
 692          <literal>SHARE</literal>, <literal>SHARE ROW
 693          EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 694          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 695         </para>
 696
 697         <para>
 698          This lock mode is not automatically acquired by any
 699          <productname>PostgreSQL</productname> command.
 700         </para>
 701        </listitem>
 702       </varlistentry>
 703
 704       <varlistentry>
 705        <term>
 706         <literal>EXCLUSIVE</literal>
 707        </term>
 708        <listitem>
 709         <para>
 710          Conflicts with the <literal>ROW SHARE</literal>, <literal>ROW
 711          EXCLUSIVE</literal>, <literal>SHARE UPDATE
 712          EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
 713          ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 714          <literal>ACCESS EXCLUSIVE</literal> lock modes.
 715          This mode allows only concurrent <literal>ACCESS SHARE</literal> locks,
 716          i.e., only reads from the table can proceed in parallel with a
 717          transaction holding this lock mode.
 718         </para>
 719
 720         <para>
 721          This lock mode is not automatically acquired on user tables by any
 722          <productname>PostgreSQL</productname> command.  However it is
 723          acquired on certain system catalogs in some operations.
 724         </para>
 725        </listitem>
 726       </varlistentry>
 727
 728       <varlistentry>
 729        <term>
 730         <literal>ACCESS EXCLUSIVE</literal>
 731        </term>
 732        <listitem>
 733         <para>
 734          Conflicts with locks of all modes (<literal>ACCESS
 735          SHARE</literal>, <literal>ROW SHARE</literal>, <literal>ROW
 736          EXCLUSIVE</literal>, <literal>SHARE UPDATE
 737          EXCLUSIVE</literal>, <literal>SHARE</literal>, <literal>SHARE
 738          ROW EXCLUSIVE</literal>, <literal>EXCLUSIVE</literal>, and
 739          <literal>ACCESS EXCLUSIVE</literal>).
 740          This mode guarantees that the
 741          holder is the only transaction accessing the table in any way.
 742         </para>
 743
 744         <para>
 745          Acquired by the <command>ALTER TABLE</command>, <command>DROP
 746          TABLE</command>, <command>TRUNCATE</command>, <command>REINDEX</command>,
 747          <command>CLUSTER</command>, and <command>VACUUM FULL</command>
 748          commands.  This is also the default lock mode for <command>LOCK
 749          TABLE</command> statements that do not specify a mode explicitly.
 750         </para>
 751        </listitem>
 752       </varlistentry>
 753      </variablelist>
 754
 755      <tip>
 756       <para>
 757        Only an <literal>ACCESS EXCLUSIVE</literal> lock blocks a
 758        <command>SELECT</command> (without <option>FOR UPDATE/SHARE</option>)
 759        statement.
 760       </para>
 761      </tip>
 762
 763    <para>
 764     Once acquired, a lock is normally held till end of transaction.  But if a
 765     lock is acquired after establishing a savepoint, the lock is released
 766     immediately if the savepoint is rolled back to.  This is consistent with
 767     the principle that <command>ROLLBACK</> cancels all effects of the
 768     commands since the savepoint.  The same holds for locks acquired within a
 769     <application>PL/pgSQL</> exception block: an error escape from the block
 770     releases locks acquired within it.
 771    </para>
 772
 773
 774
 775     <table tocentry="1" id="table-lock-compatibility">
 776      <title> Conflicting lock modes</title>
 777      <tgroup cols="9">
 778       <colspec colnum="2" colname="lockst">
 779       <colspec colnum="9" colname="lockend">
 780       <spanspec namest="lockst" nameend="lockend" spanname="lockreq">
 781       <thead>
 782        <row>
 783         <entry morerows="1">Requested Lock Mode</entry>
 784         <entry spanname="lockreq">Current Lock Mode</entry>
 785        </row>
 786        <row>
 787         <entry>ACCESS SHARE</entry>
 788         <entry>ROW SHARE</entry>
 789         <entry>ROW EXCLUSIVE</entry>
 790         <entry>SHARE UPDATE EXCLUSIVE</entry>
 791         <entry>SHARE</entry>
 792         <entry>SHARE ROW EXCLUSIVE</entry>
 793         <entry>EXCLUSIVE</entry>
 794         <entry>ACCESS EXCLUSIVE</entry>
 795        </row>
 796       </thead>
 797       <tbody>
 798        <row>
 799         <entry>ACCESS SHARE</entry>
 800         <entry align="center"></entry>
 801         <entry align="center"></entry>
 802         <entry align="center"></entry>
 803         <entry align="center"></entry>
 804         <entry align="center"></entry>
 805         <entry align="center"></entry>
 806         <entry align="center"></entry>
 807         <entry align="center">X</entry>
 808        </row>
 809        <row>
 810         <entry>ROW SHARE</entry>
 811         <entry align="center"></entry>
 812         <entry align="center"></entry>
 813         <entry align="center"></entry>
 814         <entry align="center"></entry>
 815         <entry align="center"></entry>
 816         <entry align="center"></entry>
 817         <entry align="center">X</entry>
 818         <entry align="center">X</entry>
 819        </row>
 820        <row>
 821         <entry>ROW EXCLUSIVE</entry>
 822         <entry align="center"></entry>
 823         <entry align="center"></entry>
 824         <entry align="center"></entry>
 825         <entry align="center"></entry>
 826         <entry align="center">X</entry>
 827         <entry align="center">X</entry>
 828         <entry align="center">X</entry>
 829         <entry align="center">X</entry>
 830        </row>
 831        <row>
 832         <entry>SHARE UPDATE EXCLUSIVE</entry>
 833         <entry align="center"></entry>
 834         <entry align="center"></entry>
 835         <entry align="center"></entry>
 836         <entry align="center">X</entry>
 837         <entry align="center">X</entry>
 838         <entry align="center">X</entry>
 839         <entry align="center">X</entry>
 840         <entry align="center">X</entry>
 841        </row>
 842        <row>
 843         <entry>SHARE</entry>
 844         <entry align="center"></entry>
 845         <entry align="center"></entry>
 846         <entry align="center">X</entry>
 847         <entry align="center">X</entry>
 848         <entry align="center"></entry>
 849         <entry align="center">X</entry>
 850         <entry align="center">X</entry>
 851         <entry align="center">X</entry>
 852        </row>
 853        <row>
 854         <entry>SHARE ROW EXCLUSIVE</entry>
 855         <entry align="center"></entry>
 856         <entry align="center"></entry>
 857         <entry align="center">X</entry>
 858         <entry align="center">X</entry>
 859         <entry align="center">X</entry>
 860         <entry align="center">X</entry>
 861         <entry align="center">X</entry>
 862         <entry align="center">X</entry>
 863        </row>
 864        <row>
 865         <entry>EXCLUSIVE</entry>
 866         <entry align="center"></entry>
 867         <entry align="center">X</entry>
 868         <entry align="center">X</entry>
 869         <entry align="center">X</entry>
 870         <entry align="center">X</entry>
 871         <entry align="center">X</entry>
 872         <entry align="center">X</entry>
 873         <entry align="center">X</entry>
 874        </row>
 875        <row>
 876         <entry>ACCESS EXCLUSIVE</entry>
 877         <entry align="center">X</entry>
 878         <entry align="center">X</entry>
 879         <entry align="center">X</entry>
 880         <entry align="center">X</entry>
 881         <entry align="center">X</entry>
 882         <entry align="center">X</entry>
 883         <entry align="center">X</entry>
 884         <entry align="center">X</entry>
 885        </row>
 886       </tbody>
 887      </tgroup>
 888     </table>
 889    </sect2>
 890
 891    <sect2 id="locking-rows">
 892     <title>Row-Level Locks</title>
 893
 894     <para>
 895      In addition to table-level locks, there are row-level locks, which
 896      can be exclusive or shared locks.  An exclusive row-level lock on a
 897      specific row is automatically acquired when the row is updated or
 898      deleted.  The lock is held until the transaction commits or rolls
 899      back, just like table-level locks.  Row-level locks do
 900      not affect data querying; they block only <emphasis>writers to the same
 901      row</emphasis>.
 902     </para>
 903
 904     <para>
 905      To acquire an exclusive row-level lock on a row without actually
 906      modifying the row, select the row with <command>SELECT FOR
 907      UPDATE</command>.  Note that once the row-level lock is acquired,
 908      the transaction can update the row multiple times without
 909      fear of conflicts.
 910     </para>
 911
 912     <para>
 913      To acquire a shared row-level lock on a row, select the row with
 914      <command>SELECT FOR SHARE</command>.  A shared lock does not prevent
 915      other transactions from acquiring the same shared lock.  However,
 916      no transaction is allowed to update, delete, or exclusively lock a
 917      row on which any other transaction holds a shared lock.  Any attempt
 918      to do so will block until the shared lock(s) have been released.
 919     </para>
 920
 921     <para>
 922      <productname>PostgreSQL</productname> doesn't remember any
 923      information about modified rows in memory, so there is no limit on
 924      the number of rows locked at one time.  However, locking a row
 925      might cause a disk write, e.g., <command>SELECT FOR
 926      UPDATE</command> modifies selected rows to mark them locked, and so
 927      will result in disk writes.
 928     </para>
 929
 930     <para>
 931      In addition to table and row locks, page-level share/exclusive locks are
 932      used to control read/write access to table pages in the shared buffer
 933      pool.  These locks are released immediately after a row is fetched or
 934      updated.  Application developers normally need not be concerned with
 935      page-level locks, but they are mentioned here for completeness.
 936     </para>
 937
 938    </sect2>
 939
 940    <sect2 id="locking-deadlocks">
 941     <title>Deadlocks</title>
 942
 943     <indexterm zone="locking-deadlocks">
 944      <primary>deadlock</primary>
 945     </indexterm>
 946
 947     <para>
 948      The use of explicit locking can increase the likelihood of
 949      <firstterm>deadlocks</>, wherein two (or more) transactions each
 950      hold locks that the other wants.  For example, if transaction 1
 951      acquires an exclusive lock on table A and then tries to acquire
 952      an exclusive lock on table B, while transaction 2 has already
 953      exclusive-locked table B and now wants an exclusive lock on table
 954      A, then neither one can proceed.
 955      <productname>PostgreSQL</productname> automatically detects
 956      deadlock situations and resolves them by aborting one of the
 957      transactions involved, allowing the other(s) to complete.
 958      (Exactly which transaction will be aborted is difficult to
 959      predict and should not be relied upon.)
 960     </para>
 961
 962     <para>
 963      Note that deadlocks can also occur as the result of row-level
 964      locks (and thus, they can occur even if explicit locking is not
 965      used). Consider the case in which two concurrent
 966      transactions modify a table. The first transaction executes:
 967
 968 <screen>
 969 UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 11111;
 970 </screen>
 971
 972      This acquires a row-level lock on the row with the specified
 973      account number. Then, the second transaction executes:
 974
 975 <screen>
 976 UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 22222;
 977 UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 11111;
 978 </screen>
 979
 980      The first <command>UPDATE</command> statement successfully
 981      acquires a row-level lock on the specified row, so it succeeds in
 982      updating that row. However, the second <command>UPDATE</command>
 983      statement finds that the row it is attempting to update has
 984      already been locked, so it waits for the transaction that
 985      acquired the lock to complete. Transaction two is now waiting on
 986      transaction one to complete before it continues execution. Now,
 987      transaction one executes:
 988
 989 <screen>
 990 UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 22222;
 991 </screen>
 992
 993      Transaction one attempts to acquire a row-level lock on the
 994      specified row, but it cannot: transaction two already holds such
 995      a lock. So it waits for transaction two to complete. Thus,
 996      transaction one is blocked on transaction two, and transaction
 997      two is blocked on transaction one: a deadlock
 998      condition. <productname>PostgreSQL</productname> will detect this
 999      situation and abort one of the transactions.
1000     </para>
1001
1002     <para>
1003      The best defense against deadlocks is generally to avoid them by
1004      being certain that all applications using a database acquire
1005      locks on multiple objects in a consistent order. In the example
1006      above, if both transactions
1007      had updated the rows in the same order, no deadlock would have
1008      occurred. One should also ensure that the first lock acquired on
1009      an object in a transaction is the most restrictive mode that will be
1010      needed for that object.  If it is not feasible to verify this in
1011      advance, then deadlocks can be handled on-the-fly by retrying
1012      transactions that abort due to deadlocks.
1013     </para>
1014
1015     <para>
1016      So long as no deadlock situation is detected, a transaction seeking
1017      either a table-level or row-level lock will wait indefinitely for
1018      conflicting locks to be released.  This means it is a bad idea for
1019      applications to hold transactions open for long periods of time
1020      (e.g., while waiting for user input).
1021     </para>
1022    </sect2>
1023
1024    <sect2 id="advisory-locks">
1025     <title>Advisory Locks</title>
1026
1027     <indexterm zone="advisory-locks">
1028      <primary>lock</primary>
1029      <secondary>advisory</secondary>
1030     </indexterm>
1031
1032     <para>
1033      <productname>PostgreSQL</productname> provides a means for
1034      creating locks that have application-defined meanings.  These are
1035      called <firstterm>advisory locks</>, because the system does not
1036      enforce their use &mdash; it is up to the application to use them
1037      correctly.  Advisory locks can be useful for locking strategies
1038      that are an awkward fit for the MVCC model.  Once acquired, an
1039      advisory lock is held until explicitly released or the session ends.
1040      Unlike standard locks, advisory locks do not
1041      honor transaction semantics: a lock acquired during a
1042      transaction that is later rolled back will still be held following the
1043      rollback, and likewise an unlock is effective even if the calling
1044      transaction fails later.  The same lock can be acquired multiple times by
1045      its owning process: for each lock request there must be a corresponding
1046      unlock request before the lock is actually released.  (If a session
1047      already holds a given lock, additional requests will always succeed, even
1048      if other sessions are awaiting the lock.)  Like all locks in
1049      <productname>PostgreSQL</productname>, a complete list of advisory
1050      locks currently held by any session can be found in the
1051      <link linkend="view-pg-locks"><structname>pg_locks</structname></link>
1052      system view.
1053     </para>
1054
1055     <para>
1056      Advisory locks are allocated out of a shared memory pool whose size
1057      is defined by the configuration variables
1058      <xref linkend="guc-max-locks-per-transaction"> and
1059      <xref linkend="guc-max-connections">.
1060      Care must be taken not to exhaust this
1061      memory or the server will be unable to grant any locks at all.
1062      This imposes an upper limit on the number of advisory locks
1063      grantable by the server, typically in the tens to hundreds of thousands
1064      depending on how the server is configured.
1065     </para>
1066
1067     <para>
1068      A common use of advisory locks is to emulate pessimistic locking
1069      strategies typical of so called <quote>flat file</> data management
1070      systems.
1071      While a flag stored in a table could be used for the same purpose,
1072      advisory locks are faster, avoid MVCC bloat, and are automatically
1073      cleaned up by the server at the end of the session.
1074      In certain cases using this advisory locking method, especially in queries
1075      involving explicit ordering and <literal>LIMIT</> clauses, care must be
1076      taken to control the locks acquired because of the order in which SQL
1077      expressions are evaluated.  For example:
1078 <screen>
1079 SELECT pg_advisory_lock(id) FROM foo WHERE id = 12345; -- ok
1080 SELECT pg_advisory_lock(id) FROM foo WHERE id &gt; 12345 LIMIT 100; -- danger!
1081 SELECT pg_advisory_lock(q.id) FROM
1082 (
1083   SELECT id FROM foo WHERE id &gt; 12345 LIMIT 100;
1084 ) q; -- ok
1085 </screen>
1086      In the above queries, the second form is dangerous because the
1087      <literal>LIMIT</> is not guaranteed to be applied before the locking
1088      function is executed.  This might cause some locks to be acquired
1089      that the application was not expecting, and hence would fail to release
1090      (until it ends the session).
1091      From the point of view of the application, such locks
1092      would be dangling, although still viewable in
1093      <structname>pg_locks</structname>.
1094     </para>
1095
1096     <para>
1097      The functions provided to manipulate advisory locks are described in
1098      <xref linkend="functions-advisory-locks">.
1099     </para>
1100    </sect2>
1101
1102   </sect1>
1103
1104   <sect1 id="applevel-consistency">
1105    <title>Data Consistency Checks at the Application Level</title>
1106
1107    <para>
1108     Because readers in <productname>PostgreSQL</productname>
1109     do not lock data, regardless of
1110     transaction isolation level, data read by one transaction can be
1111     overwritten by another concurrent transaction. In other words,
1112     if a row is returned by <command>SELECT</command> it doesn't mean that
1113     the row is still current at the instant it is returned (i.e., sometime
1114     after the current query began).  The row might have been modified or
1115     deleted by an already-committed transaction that committed after
1116     the <command>SELECT</command> started.
1117     Even if the row is still valid <quote>now</>, it could be changed or
1118     deleted
1119     before the current transaction does a commit or rollback.
1120    </para>
1121
1122    <para>
1123     Another way to think about it is that each
1124     transaction sees a snapshot of the database contents, and concurrently
1125     executing transactions might very well see different snapshots.  So the
1126     whole concept of <quote>now</quote> is somewhat ill-defined anyway.
1127     This is not normally
1128     a big problem if the client applications are isolated from each other,
1129     but if the clients can communicate via channels outside the database
1130     then serious confusion might ensue.
1131    </para>
1132
1133    <para>
1134     To ensure the current validity of a row and protect it against
1135     concurrent updates one must use <command>SELECT FOR UPDATE</command>,
1136     <command>SELECT FOR SHARE</command>, or an appropriate <command>LOCK
1137     TABLE</command> statement.  (<command>SELECT FOR UPDATE</command>
1138     and <command>SELECT FOR SHARE</command> lock just the
1139     returned rows against concurrent updates, while <command>LOCK
1140     TABLE</command> locks the whole table.)  This should be taken into
1141     account when porting applications to
1142     <productname>PostgreSQL</productname> from other environments.
1143    </para>
1144
1145    <para>
1146     Global validity checks require extra thought under <acronym>MVCC</acronym>.
1147     For example, a banking application might wish to check that the sum of
1148     all credits in one table equals the sum of debits in another table,
1149     when both tables are being actively updated.  Comparing the results of two
1150     successive <literal>SELECT sum(...)</literal> commands will not work reliably in
1151     Read Committed mode, since the second query will likely include the results
1152     of transactions not counted by the first.  Doing the two sums in a
1153     single serializable transaction will give an accurate picture of only the
1154     effects of transactions that committed before the serializable transaction
1155     started &mdash; but one might legitimately wonder whether the answer is still
1156     relevant by the time it is delivered.  If the serializable transaction
1157     itself applied some changes before trying to make the consistency check,
1158     the usefulness of the check becomes even more debatable, since now it
1159     includes some but not all post-transaction-start changes.  In such cases
1160     a careful person might wish to lock all tables needed for the check,
1161     in order to get an indisputable picture of current reality.  A
1162     <literal>SHARE</> mode (or higher) lock guarantees that there are no
1163     uncommitted changes in the locked table, other than those of the current
1164     transaction.
1165    </para>
1166
1167    <para>
1168     Note also that if one is relying on explicit locking to prevent concurrent
1169     changes, one should either use Read Committed mode, or in Serializable
1170     mode be careful to obtain
1171     locks before performing queries.  A lock obtained by a
1172     serializable transaction guarantees that no other transactions modifying
1173     the table are still running, but if the snapshot seen by the
1174     transaction predates obtaining the lock, it might predate some now-committed
1175     changes in the table.  A serializable transaction's snapshot is actually
1176     frozen at the start of its first query or data-modification command
1177     (<literal>SELECT</>, <literal>INSERT</>,
1178     <literal>UPDATE</>, or <literal>DELETE</>), so
1179     it is possible to obtain locks explicitly before the snapshot is
1180     frozen.
1181    </para>
1182   </sect1>
1183
1184   <sect1 id="locking-indexes">
1185    <title>Locking and Indexes</title>
1186
1187    <indexterm zone="locking-indexes">
1188     <primary>index</primary>
1189     <secondary>locks</secondary>
1190    </indexterm>
1191
1192    <para>
1193     Though <productname>PostgreSQL</productname>
1194     provides nonblocking read/write access to table
1195     data, nonblocking read/write access is not currently offered for every
1196     index access method implemented
1197     in <productname>PostgreSQL</productname>.
1198     The various index types are handled as follows:
1199
1200     <variablelist>
1201      <varlistentry>
1202       <term>
1203        B-tree and <acronym>GiST</acronym> indexes
1204       </term>
1205       <listitem>
1206        <para>
1207         Short-term share/exclusive page-level locks are used for
1208         read/write access. Locks are released immediately after each
1209         index row is fetched or inserted.  These index types provide
1210         the highest concurrency without deadlock conditions.
1211        </para>
1212       </listitem>
1213      </varlistentry>
1214
1215      <varlistentry>
1216       <term>
1217        Hash indexes
1218       </term>
1219       <listitem>
1220        <para>
1221         Share/exclusive hash-bucket-level locks are used for read/write
1222         access.  Locks are released after the whole bucket is processed.
1223         Bucket-level locks provide better concurrency than index-level
1224         ones, but deadlock is possible since the locks are held longer
1225         than one index operation.
1226        </para>
1227       </listitem>
1228      </varlistentry>
1229
1230      <varlistentry>
1231       <term>
1232        <acronym>GIN</acronym> indexes
1233       </term>
1234       <listitem>
1235        <para>
1236         Short-term share/exclusive page-level locks are used for
1237         read/write access. Locks are released immediately after each
1238         index row is fetched or inserted. But note that insertion of a
1239         GIN-indexed value usually produces several index key insertions
1240         per row, so GIN might do substantial work for a single value's
1241         insertion.
1242        </para>
1243       </listitem>
1244      </varlistentry>
1245     </variablelist>
1246    </para>
1247
1248    <para>
1249     Currently, B-tree indexes offer the best performance for concurrent
1250     applications; since they also have more features than hash
1251     indexes, they are the recommended index type for concurrent
1252     applications that need to index scalar data. When dealing with
1253     non-scalar data, B-trees are not useful, and GiST or GIN indexes should
1254     be used instead.
1255    </para>
1256   </sect1>
1257  </chapter>