1 <!-- doc/src/sgml/high-availability.sgml -->
3 <chapter id=
"high-availability">
4 <title>High Availability, Load Balancing, and Replication
</title>
6 <indexterm><primary>high availability
</primary></indexterm>
7 <indexterm><primary>failover
</primary></indexterm>
8 <indexterm><primary>replication
</primary></indexterm>
9 <indexterm><primary>load balancing
</primary></indexterm>
10 <indexterm><primary>clustering
</primary></indexterm>
11 <indexterm><primary>data partitioning
</primary></indexterm>
14 Database servers can work together to allow a second server to
15 take over quickly if the primary server fails (high
16 availability), or to allow several computers to serve the same
17 data (load balancing). Ideally, database servers could work
18 together seamlessly. Web servers serving static web pages can
19 be combined quite easily by merely load-balancing web requests
20 to multiple machines. In fact, read-only database servers can
21 be combined relatively easily too. Unfortunately, most database
22 servers have a read/write mix of requests, and read/write servers
23 are much harder to combine. This is because though read-only
24 data needs to be placed on each server only once, a write to any
25 server has to be propagated to all servers so that future read
26 requests to those servers return consistent results.
30 This synchronization problem is the fundamental difficulty for
31 servers working together. Because there is no single solution
32 that eliminates the impact of the sync problem for all use cases,
33 there are multiple solutions. Each solution addresses this
34 problem in a different way, and minimizes its impact for a specific
39 Some solutions deal with synchronization by allowing only one
40 server to modify the data. Servers that can modify data are
41 called read/write,
<firstterm>master
</firstterm> or
<firstterm>primary
</firstterm> servers.
42 Servers that track changes in the primary are called
<firstterm>standby
</firstterm>
43 or
<firstterm>secondary
</firstterm> servers. A standby server that cannot be connected
44 to until it is promoted to a primary server is called a
<firstterm>warm
45 standby
</firstterm> server, and one that can accept connections and serves read-only
46 queries is called a
<firstterm>hot standby
</firstterm> server.
50 Some solutions are synchronous,
51 meaning that a data-modifying transaction is not considered
52 committed until all servers have committed the transaction. This
53 guarantees that a failover will not lose any data and that all
54 load-balanced servers will return consistent results no matter
55 which server is queried. In contrast, asynchronous solutions allow some
56 delay between the time of a commit and its propagation to the other servers,
57 opening the possibility that some transactions might be lost in
58 the switch to a backup server, and that load balanced servers
59 might return slightly stale results. Asynchronous communication
60 is used when synchronous would be too slow.
64 Solutions can also be categorized by their granularity. Some solutions
65 can deal only with an entire database server, while others allow control
66 at the per-table or per-database level.
70 Performance must be considered in any choice. There is usually a
71 trade-off between functionality and
72 performance. For example, a fully synchronous solution over a slow
73 network might cut performance by more than half, while an asynchronous
74 one might have a minimal performance impact.
78 The remainder of this section outlines various failover, replication,
79 and load balancing solutions.
82 <sect1 id=
"different-replication-solutions">
83 <title>Comparison of Different Solutions
</title>
88 <term>Shared Disk Failover
</term>
92 Shared disk failover avoids synchronization overhead by having only one
93 copy of the database. It uses a single disk array that is shared by
94 multiple servers. If the main database server fails, the standby server
95 is able to mount and start the database as though it were recovering from
96 a database crash. This allows rapid failover with no data loss.
100 Shared hardware functionality is common in network storage devices.
101 Using a network file system is also possible, though care must be
102 taken that the file system has full
<acronym>POSIX
</acronym> behavior (see
<xref
103 linkend=
"creating-cluster-nfs"/>). One significant limitation of this
104 method is that if the shared disk array fails or becomes corrupt, the
105 primary and standby servers are both nonfunctional. Another issue is
106 that the standby server should never access the shared storage while
107 the primary server is running.
114 <term>File System (Block Device) Replication
</term>
118 A modified version of shared hardware functionality is file system
119 replication, where all changes to a file system are mirrored to a file
120 system residing on another computer. The only restriction is that
121 the mirroring must be done in a way that ensures the standby server
122 has a consistent copy of the file system
— specifically, writes
123 to the standby must be done in the same order as those on the primary.
124 <productname>DRBD
</productname> is a popular file system replication solution
129 https://forge.continuent.org/pipermail/sequoia/2006-November/004070.html
131 Oracle RAC is a shared disk approach and just send cache invalidations
132 to other nodes but not actual data. As the disk is shared, data is
133 only committed once to disk and there is a distributed locking
134 protocol to make nodes agree on a serializable transactional order.
141 <term>Write-Ahead Log Shipping
</term>
145 Warm and hot standby servers can be kept current by reading a
146 stream of write-ahead log (
<acronym>WAL
</acronym>)
147 records. If the main server fails, the standby contains
148 almost all of the data of the main server, and can be quickly
149 made the new primary database server. This can be synchronous or
150 asynchronous and can only be done for the entire database server.
153 A standby server can be implemented using file-based log shipping
154 (
<xref linkend=
"warm-standby"/>) or streaming replication (see
155 <xref linkend=
"streaming-replication"/>), or a combination of both. For
156 information on hot standby, see
<xref linkend=
"hot-standby"/>.
162 <term>Logical Replication
</term>
165 Logical replication allows a database server to send a stream of data
166 modifications to another server.
<productname>PostgreSQL
</productname>
167 logical replication constructs a stream of logical data modifications
168 from the WAL. Logical replication allows replication of data changes on
169 a per-table basis. In addition, a server that is publishing its own
170 changes can also subscribe to changes from another server, allowing data
171 to flow in multiple directions. For more information on logical
172 replication, see
<xref linkend=
"logical-replication"/>. Through the
173 logical decoding interface (
<xref linkend=
"logicaldecoding"/>),
174 third-party extensions can also provide similar functionality.
180 <term>Trigger-Based Primary-Standby Replication
</term>
184 A trigger-based replication setup typically funnels data modification
185 queries to a designated primary server. Operating on a per-table basis,
186 the primary server sends data changes (typically) asynchronously to the
187 standby servers. Standby servers can answer queries while the primary is
188 running, and may allow some local data changes or write activity. This
189 form of replication is often used for offloading large analytical or data
194 <productname>Slony-I
</productname> is an example of this type of
195 replication, with per-table granularity, and support for multiple standby
196 servers. Because it updates the standby server asynchronously (in
197 batches), there is possible data loss during fail over.
203 <term>SQL-Based Replication Middleware
</term>
207 With SQL-based replication middleware, a program intercepts
208 every SQL query and sends it to one or all servers. Each server
209 operates independently. Read-write queries must be sent to all servers,
210 so that every server receives any changes. But read-only queries can be
211 sent to just one server, allowing the read workload to be distributed
216 If queries are simply broadcast unmodified, functions like
217 <function>random()
</function>,
<function>CURRENT_TIMESTAMP
</function>, and
218 sequences can have different values on different servers.
219 This is because each server operates independently, and because
220 SQL queries are broadcast rather than actual data changes. If
221 this is unacceptable, either the middleware or the application
222 must determine such values from a single source and then use those
223 values in write queries. Care must also be taken that all
224 transactions either commit or abort on all servers, perhaps
225 using two-phase commit (
<xref linkend=
"sql-prepare-transaction"/>
226 and
<xref linkend=
"sql-commit-prepared"/>).
227 <productname>Pgpool-II
</productname> and
<productname>Continuent Tungsten
</productname>
228 are examples of this type of replication.
234 <term>Asynchronous Multimaster Replication
</term>
238 For servers that are not regularly connected or have slow
239 communication links, like laptops or
240 remote servers, keeping data consistent among servers is a
241 challenge. Using asynchronous multimaster replication, each
242 server works independently, and periodically communicates with
243 the other servers to identify conflicting transactions. The
244 conflicts can be resolved by users or conflict resolution rules.
245 Bucardo is an example of this type of replication.
251 <term>Synchronous Multimaster Replication
</term>
255 In synchronous multimaster replication, each server can accept
256 write requests, and modified data is transmitted from the
257 original server to every other server before each transaction
258 commits. Heavy write activity can cause excessive locking and
259 commit delays, leading to poor performance. Read requests can
260 be sent to any server. Some implementations use shared disk
261 to reduce the communication overhead. Synchronous multimaster
262 replication is best for mostly read workloads, though its big
263 advantage is that any server can accept write requests
—
264 there is no need to partition workloads between primary and
265 standby servers, and because the data changes are sent from one
266 server to another, there is no problem with non-deterministic
267 functions like
<function>random()
</function>.
271 <productname>PostgreSQL
</productname> does not offer this type of replication,
272 though
<productname>PostgreSQL
</productname> two-phase commit (
<xref
273 linkend=
"sql-prepare-transaction"/> and
<xref
274 linkend=
"sql-commit-prepared"/>)
275 can be used to implement this in application code or middleware.
283 <xref linkend=
"high-availability-matrix"/> summarizes
284 the capabilities of the various solutions listed above.
287 <table id=
"high-availability-matrix">
288 <title>High Availability, Load Balancing, and Replication Feature Matrix
</title>
290 <colspec colname=
"col1" colwidth=
"1.1*"/>
291 <colspec colname=
"col2" colwidth=
"1*"/>
292 <colspec colname=
"col3" colwidth=
"1*"/>
293 <colspec colname=
"col4" colwidth=
"1*"/>
294 <colspec colname=
"col5" colwidth=
"1*"/>
295 <colspec colname=
"col6" colwidth=
"1*"/>
296 <colspec colname=
"col7" colwidth=
"1*"/>
297 <colspec colname=
"col8" colwidth=
"1*"/>
298 <colspec colname=
"col9" colwidth=
"1*"/>
301 <entry>Feature
</entry>
302 <entry>Shared Disk
</entry>
303 <entry>File System Repl.
</entry>
304 <entry>Write-Ahead Log Shipping
</entry>
305 <entry>Logical Repl.
</entry>
306 <entry>Trigger-
&zwsp;Based Repl.
</entry>
307 <entry>SQL Repl. Middle-ware
</entry>
308 <entry>Async. MM Repl.
</entry>
309 <entry>Sync. MM Repl.
</entry>
316 <entry>Popular examples
</entry>
317 <entry align=
"center">NAS
</entry>
318 <entry align=
"center">DRBD
</entry>
319 <entry align=
"center">built-in streaming repl.
</entry>
320 <entry align=
"center">built-in logical repl., pglogical
</entry>
321 <entry align=
"center">Londiste, Slony
</entry>
322 <entry align=
"center">pgpool-II
</entry>
323 <entry align=
"center">Bucardo
</entry>
324 <entry align=
"center"></entry>
328 <entry>Comm. method
</entry>
329 <entry align=
"center">shared disk
</entry>
330 <entry align=
"center">disk blocks
</entry>
331 <entry align=
"center">WAL
</entry>
332 <entry align=
"center">logical decoding
</entry>
333 <entry align=
"center">table rows
</entry>
334 <entry align=
"center">SQL
</entry>
335 <entry align=
"center">table rows
</entry>
336 <entry align=
"center">table rows and row locks
</entry>
340 <entry>No special hardware required
</entry>
341 <entry align=
"center"></entry>
342 <entry align=
"center">•</entry>
343 <entry align=
"center">•</entry>
344 <entry align=
"center">•</entry>
345 <entry align=
"center">•</entry>
346 <entry align=
"center">•</entry>
347 <entry align=
"center">•</entry>
348 <entry align=
"center">•</entry>
352 <entry>Allows multiple primary servers
</entry>
353 <entry align=
"center"></entry>
354 <entry align=
"center"></entry>
355 <entry align=
"center"></entry>
356 <entry align=
"center">•</entry>
357 <entry align=
"center"></entry>
358 <entry align=
"center">•</entry>
359 <entry align=
"center">•</entry>
360 <entry align=
"center">•</entry>
364 <entry>No overhead on primary
</entry>
365 <entry align=
"center">•</entry>
366 <entry align=
"center"></entry>
367 <entry align=
"center">•</entry>
368 <entry align=
"center">•</entry>
369 <entry align=
"center"></entry>
370 <entry align=
"center">•</entry>
371 <entry align=
"center"></entry>
372 <entry align=
"center"></entry>
376 <entry>No waiting for multiple servers
</entry>
377 <entry align=
"center">•</entry>
378 <entry align=
"center"></entry>
379 <entry align=
"center">with sync off
</entry>
380 <entry align=
"center">with sync off
</entry>
381 <entry align=
"center">•</entry>
382 <entry align=
"center"></entry>
383 <entry align=
"center">•</entry>
384 <entry align=
"center"></entry>
388 <entry>Primary failure will never lose data
</entry>
389 <entry align=
"center">•</entry>
390 <entry align=
"center">•</entry>
391 <entry align=
"center">with sync on
</entry>
392 <entry align=
"center">with sync on
</entry>
393 <entry align=
"center"></entry>
394 <entry align=
"center">•</entry>
395 <entry align=
"center"></entry>
396 <entry align=
"center">•</entry>
400 <entry>Replicas accept read-only queries
</entry>
401 <entry align=
"center"></entry>
402 <entry align=
"center"></entry>
403 <entry align=
"center">with hot standby
</entry>
404 <entry align=
"center">•</entry>
405 <entry align=
"center">•</entry>
406 <entry align=
"center">•</entry>
407 <entry align=
"center">•</entry>
408 <entry align=
"center">•</entry>
412 <entry>Per-table granularity
</entry>
413 <entry align=
"center"></entry>
414 <entry align=
"center"></entry>
415 <entry align=
"center"></entry>
416 <entry align=
"center">•</entry>
417 <entry align=
"center">•</entry>
418 <entry align=
"center"></entry>
419 <entry align=
"center">•</entry>
420 <entry align=
"center">•</entry>
424 <entry>No conflict resolution necessary
</entry>
425 <entry align=
"center">•</entry>
426 <entry align=
"center">•</entry>
427 <entry align=
"center">•</entry>
428 <entry align=
"center"></entry>
429 <entry align=
"center">•</entry>
430 <entry align=
"center">•</entry>
431 <entry align=
"center"></entry>
432 <entry align=
"center">•</entry>
440 There are a few solutions that do not fit into the above categories:
446 <term>Data Partitioning
</term>
450 Data partitioning splits tables into data sets. Each set can
451 be modified by only one server. For example, data can be
452 partitioned by offices, e.g., London and Paris, with a server
453 in each office. If queries combining London and Paris data
454 are necessary, an application can query both servers, or
455 primary/standby replication can be used to keep a read-only copy
456 of the other office's data on each server.
462 <term>Multiple-Server Parallel Query Execution
</term>
466 Many of the above solutions allow multiple servers to handle multiple
467 queries, but none allow a single query to use multiple servers to
468 complete faster. This solution allows multiple servers to work
469 concurrently on a single query. It is usually accomplished by
470 splitting the data among servers and having each server execute its
471 part of the query and return results to a central server where they
472 are combined and returned to the user. This can be implemented using the
473 <productname>PL/Proxy
</productname> tool set.
482 It should also be noted that because
<productname>PostgreSQL
</productname>
483 is open source and easily extended, a number of companies have
484 taken
<productname>PostgreSQL
</productname> and created commercial
485 closed-source solutions with unique failover, replication, and load
486 balancing capabilities. These are not discussed here.
492 <sect1 id=
"warm-standby">
493 <title>Log-Shipping Standby Servers
</title>
497 Continuous archiving can be used to create a
<firstterm>high
498 availability
</firstterm> (HA) cluster configuration with one or more
499 <firstterm>standby servers
</firstterm> ready to take over operations if the
500 primary server fails. This capability is widely referred to as
501 <firstterm>warm standby
</firstterm> or
<firstterm>log shipping
</firstterm>.
505 The primary and standby server work together to provide this capability,
506 though the servers are only loosely coupled. The primary server operates
507 in continuous archiving mode, while each standby server operates in
508 continuous recovery mode, reading the WAL files from the primary. No
509 changes to the database tables are required to enable this capability,
510 so it offers low administration overhead compared to some other
511 replication solutions. This configuration also has relatively low
512 performance impact on the primary server.
516 Directly moving WAL records from one database server to another
517 is typically described as log shipping.
<productname>PostgreSQL
</productname>
518 implements file-based log shipping by transferring WAL records
519 one file (WAL segment) at a time. WAL files (
16MB) can be
520 shipped easily and cheaply over any distance, whether it be to an
521 adjacent system, another system at the same site, or another system on
522 the far side of the globe. The bandwidth required for this technique
523 varies according to the transaction rate of the primary server.
524 Record-based log shipping is more granular and streams WAL changes
525 incrementally over a network connection (see
<xref
526 linkend=
"streaming-replication"/>).
530 It should be noted that log shipping is asynchronous, i.e., the WAL
531 records are shipped after transaction commit. As a result, there is a
532 window for data loss should the primary server suffer a catastrophic
533 failure; transactions not yet shipped will be lost. The size of the
534 data loss window in file-based log shipping can be limited by use of the
535 <varname>archive_timeout
</varname> parameter, which can be set as low
536 as a few seconds. However such a low setting will
537 substantially increase the bandwidth required for file shipping.
538 Streaming replication (see
<xref linkend=
"streaming-replication"/>)
539 allows a much smaller window of data loss.
543 Recovery performance is sufficiently good that the standby will
544 typically be only moments away from full
545 availability once it has been activated. As a result, this is called
546 a warm standby configuration which offers high
547 availability. Restoring a server from an archived base backup and
548 rollforward will take considerably longer, so that technique only
549 offers a solution for disaster recovery, not high availability.
550 A standby server can also be used for read-only queries, in which case
551 it is called a
<firstterm>hot standby
</firstterm> server. See
552 <xref linkend=
"hot-standby"/> for more information.
555 <indexterm zone=
"high-availability">
556 <primary>warm standby
</primary>
559 <indexterm zone=
"high-availability">
560 <primary>PITR standby
</primary>
563 <indexterm zone=
"high-availability">
564 <primary>standby server
</primary>
567 <indexterm zone=
"high-availability">
568 <primary>log shipping
</primary>
571 <indexterm zone=
"high-availability">
572 <primary>witness server
</primary>
575 <indexterm zone=
"high-availability">
576 <primary>STONITH
</primary>
579 <sect2 id=
"standby-planning">
580 <title>Planning
</title>
583 It is usually wise to create the primary and standby servers
584 so that they are as similar as possible, at least from the
585 perspective of the database server. In particular, the path names
586 associated with tablespaces will be passed across unmodified, so both
587 primary and standby servers must have the same mount paths for
588 tablespaces if that feature is used. Keep in mind that if
589 <xref linkend=
"sql-createtablespace"/>
590 is executed on the primary, any new mount point needed for it must
591 be created on the primary and all standby servers before the command
592 is executed. Hardware need not be exactly the same, but experience shows
593 that maintaining two identical systems is easier than maintaining two
594 dissimilar ones over the lifetime of the application and system.
595 In any case the hardware architecture must be the same
— shipping
596 from, say, a
32-bit to a
64-bit system will not work.
600 In general, log shipping between servers running different major
601 <productname>PostgreSQL
</productname> release
602 levels is not possible. It is the policy of the PostgreSQL Global
603 Development Group not to make changes to disk formats during minor release
604 upgrades, so it is likely that running different minor release levels
605 on primary and standby servers will work successfully. However, no
606 formal support for that is offered and you are advised to keep primary
607 and standby servers at the same release level as much as possible.
608 When updating to a new minor release, the safest policy is to update
609 the standby servers first
— a new minor release is more likely
610 to be able to read WAL files from a previous minor release than vice
616 <sect2 id=
"standby-server-operation" xreflabel=
"Standby Server Operation">
617 <title>Standby Server Operation
</title>
620 A server enters standby mode if a
621 <anchor id=
"file-standby-signal" xreflabel=
"standby.signal"/>
622 <filename>standby.signal
</filename>
623 <indexterm><primary><filename>standby.signal
</filename></primary></indexterm>
624 file exists in the data directory when the server is started.
628 In standby mode, the server continuously applies WAL received from the
629 primary server. The standby server can read WAL from a WAL archive
630 (see
<xref linkend=
"guc-restore-command"/>) or directly from the primary
631 over a TCP connection (streaming replication). The standby server will
632 also attempt to restore any WAL found in the standby cluster's
633 <filename>pg_wal
</filename> directory. That typically happens after a server
634 restart, when the standby replays again WAL that was streamed from the
635 primary before the restart, but you can also manually copy files to
636 <filename>pg_wal
</filename> at any time to have them replayed.
640 At startup, the standby begins by restoring all WAL available in the
641 archive location, calling
<varname>restore_command
</varname>. Once it
642 reaches the end of WAL available there and
<varname>restore_command
</varname>
643 fails, it tries to restore any WAL available in the
<filename>pg_wal
</filename> directory.
644 If that fails, and streaming replication has been configured, the
645 standby tries to connect to the primary server and start streaming WAL
646 from the last valid record found in archive or
<filename>pg_wal
</filename>. If that fails
647 or streaming replication is not configured, or if the connection is
648 later disconnected, the standby goes back to step
1 and tries to
649 restore the file from the archive again. This loop of retries from the
650 archive,
<filename>pg_wal
</filename>, and via streaming replication goes on until the server
651 is stopped or is promoted.
655 Standby mode is exited and the server switches to normal operation
656 when
<command>pg_ctl promote
</command> is run, or
657 <function>pg_promote()
</function> is called. Before failover,
658 any WAL immediately available in the archive or in
<filename>pg_wal
</filename>
659 will be restored, but no attempt is made to connect to the primary.
663 <sect2 id=
"preparing-primary-for-standby">
664 <title>Preparing the Primary for Standby Servers
</title>
667 Set up continuous archiving on the primary to an archive directory
668 accessible from the standby, as described
669 in
<xref linkend=
"continuous-archiving"/>. The archive location should be
670 accessible from the standby even when the primary is down, i.e., it should
671 reside on the standby server itself or another trusted server, not on
676 If you want to use streaming replication, set up authentication on the
677 primary server to allow replication connections from the standby
678 server(s); that is, create a role and provide a suitable entry or
679 entries in
<filename>pg_hba.conf
</filename> with the database field set to
680 <literal>replication
</literal>. Also ensure
<varname>max_wal_senders
</varname> is set
681 to a sufficiently large value in the configuration file of the primary
682 server. If replication slots will be used,
683 ensure that
<varname>max_replication_slots
</varname> is set sufficiently
688 Take a base backup as described in
<xref linkend=
"backup-base-backup"/>
689 to bootstrap the standby server.
693 <sect2 id=
"standby-server-setup">
694 <title>Setting Up a Standby Server
</title>
697 To set up the standby server, restore the base backup taken from primary
698 server (see
<xref linkend=
"backup-pitr-recovery"/>). Create a file
699 <link linkend=
"file-standby-signal"><filename>standby.signal
</filename></link><indexterm><primary>standby.signal
</primary></indexterm>
700 in the standby's cluster data
701 directory. Set
<xref linkend=
"guc-restore-command"/> to a simple command to copy files from
702 the WAL archive. If you plan to have multiple standby servers for high
703 availability purposes, make sure that
<varname>recovery_target_timeline
</varname> is set to
704 <literal>latest
</literal> (the default), to make the standby server follow the timeline change
705 that occurs at failover to another standby.
710 <xref linkend=
"guc-restore-command"/> should return immediately
711 if the file does not exist; the server will retry the command again if
717 If you want to use streaming replication, fill in
718 <xref linkend=
"guc-primary-conninfo"/> with a libpq connection string, including
719 the host name (or IP address) and any additional details needed to
720 connect to the primary server. If the primary needs a password for
721 authentication, the password needs to be specified in
722 <xref linkend=
"guc-primary-conninfo"/> as well.
726 If you're setting up the standby server for high availability purposes,
727 set up WAL archiving, connections and authentication like the primary
728 server, because the standby server will work as a primary server after
733 If you're using a WAL archive, its size can be minimized using the
<xref
734 linkend=
"guc-archive-cleanup-command"/> parameter to remove files that are no
735 longer required by the standby server.
736 The
<application>pg_archivecleanup
</application> utility is designed specifically to
737 be used with
<varname>archive_cleanup_command
</varname> in typical single-standby
738 configurations, see
<xref linkend=
"pgarchivecleanup"/>.
739 Note however, that if you're using the archive for backup purposes, you
740 need to retain files needed to recover from at least the latest base
741 backup, even if they're no longer needed by the standby.
745 A simple example of configuration is:
747 primary_conninfo = 'host=
192.168.1.50 port=
5432 user=foo password=foopass options=''-c wal_sender_timeout=
5000'''
748 restore_command = 'cp /path/to/archive/%f %p'
749 archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
754 You can have any number of standby servers, but if you use streaming
755 replication, make sure you set
<varname>max_wal_senders
</varname> high enough in
756 the primary to allow them to be connected simultaneously.
761 <sect2 id=
"streaming-replication">
762 <title>Streaming Replication
</title>
764 <indexterm zone=
"high-availability">
765 <primary>Streaming Replication
</primary>
769 Streaming replication allows a standby server to stay more up-to-date
770 than is possible with file-based log shipping. The standby connects
771 to the primary, which streams WAL records to the standby as they're
772 generated, without waiting for the WAL file to be filled.
776 Streaming replication is asynchronous by default
777 (see
<xref linkend=
"synchronous-replication"/>), in which case there is
778 a small delay between committing a transaction in the primary and the
779 changes becoming visible in the standby. This delay is however much
780 smaller than with file-based log shipping, typically under one second
781 assuming the standby is powerful enough to keep up with the load. With
782 streaming replication,
<varname>archive_timeout
</varname> is not required to
783 reduce the data loss window.
787 If you use streaming replication without file-based continuous
788 archiving, the server might recycle old WAL segments before the standby
789 has received them. If this occurs, the standby will need to be
790 reinitialized from a new base backup. You can avoid this by setting
791 <varname>wal_keep_size
</varname> to a value large enough to ensure that
792 WAL segments are not recycled too early, or by configuring a replication
793 slot for the standby. If you set up a WAL archive that's accessible from
794 the standby, these solutions are not required, since the standby can
795 always use the archive to catch up provided it retains enough segments.
799 To use streaming replication, set up a file-based log-shipping standby
800 server as described in
<xref linkend=
"warm-standby"/>. The step that
801 turns a file-based log-shipping standby into streaming replication
802 standby is setting the
<varname>primary_conninfo
</varname> setting
803 to point to the primary server. Set
804 <xref linkend=
"guc-listen-addresses"/> and authentication options
805 (see
<filename>pg_hba.conf
</filename>) on the primary so that the standby server
806 can connect to the
<literal>replication
</literal> pseudo-database on the primary
807 server (see
<xref linkend=
"streaming-replication-authentication"/>).
811 On systems that support the keepalive socket option, setting
812 <xref linkend=
"guc-tcp-keepalives-idle"/>,
813 <xref linkend=
"guc-tcp-keepalives-interval"/> and
814 <xref linkend=
"guc-tcp-keepalives-count"/> helps the primary promptly
815 notice a broken connection.
819 Set the maximum number of concurrent connections from the standby servers
820 (see
<xref linkend=
"guc-max-wal-senders"/> for details).
824 When the standby is started and
<varname>primary_conninfo
</varname> is set
825 correctly, the standby will connect to the primary after replaying all
826 WAL files available in the archive. If the connection is established
827 successfully, you will see a
<literal>walreceiver
</literal> in the standby, and
828 a corresponding
<literal>walsender
</literal> process in the primary.
831 <sect3 id=
"streaming-replication-authentication">
832 <title>Authentication
</title>
834 It is very important that the access privileges for replication be set up
835 so that only trusted users can read the WAL stream, because it is
836 easy to extract privileged information from it. Standby servers must
837 authenticate to the primary as an account that has the
838 <literal>REPLICATION
</literal> privilege or a superuser. It is
839 recommended to create a dedicated user account with
840 <literal>REPLICATION
</literal> and
<literal>LOGIN
</literal>
841 privileges for replication. While
<literal>REPLICATION
</literal>
842 privilege gives very high permissions, it does not allow the user to
843 modify any data on the primary system, which the
844 <literal>SUPERUSER
</literal> privilege does.
848 Client authentication for replication is controlled by a
849 <filename>pg_hba.conf
</filename> record specifying
<literal>replication
</literal> in the
850 <replaceable>database
</replaceable> field. For example, if the standby is running on
851 host IP
<literal>192.168.1.100</literal> and the account name for replication
852 is
<literal>foo
</literal>, the administrator can add the following line to the
853 <filename>pg_hba.conf
</filename> file on the primary:
856 # Allow the user
"foo" from host
192.168.1.100 to connect to the primary
857 # as a replication standby if the user's password is correctly supplied.
859 # TYPE DATABASE USER ADDRESS METHOD
860 host replication foo
192.168.1.100/
32 md5
864 The host name and port number of the primary, connection user name,
865 and password are specified in the
<xref linkend=
"guc-primary-conninfo"/>.
866 The password can also be set in the
<filename>~/.pgpass
</filename> file on the
867 standby (specify
<literal>replication
</literal> in the
<replaceable>database
</replaceable>
869 For example, if the primary is running on host IP
<literal>192.168.1.50</literal>,
870 port
<literal>5432</literal>, the account name for replication is
871 <literal>foo
</literal>, and the password is
<literal>foopass
</literal>, the administrator
872 can add the following line to the
<filename>postgresql.conf
</filename> file on the
876 # The standby connects to the primary that is running on host
192.168.1.50
877 # and port
5432 as the user
"foo" whose password is
"foopass".
878 primary_conninfo = 'host=
192.168.1.50 port=
5432 user=foo password=foopass'
883 <sect3 id=
"streaming-replication-monitoring">
884 <title>Monitoring
</title>
886 An important health indicator of streaming replication is the amount
887 of WAL records generated in the primary, but not yet applied in the
888 standby. You can calculate this lag by comparing the current WAL write
889 location on the primary with the last WAL location received by the
890 standby. These locations can be retrieved using
891 <function>pg_current_wal_lsn
</function> on the primary and
892 <function>pg_last_wal_receive_lsn
</function> on the standby,
893 respectively (see
<xref linkend=
"functions-admin-backup-table"/> and
894 <xref linkend=
"functions-recovery-info-table"/> for details).
895 The last WAL receive location in the standby is also displayed in the
896 process status of the WAL receiver process, displayed using the
897 <command>ps
</command> command (see
<xref linkend=
"monitoring-ps"/> for details).
900 You can retrieve a list of WAL sender processes via the
901 <link linkend=
"monitoring-pg-stat-replication-view"><structname>
902 pg_stat_replication
</structname></link> view. Large differences between
903 <function>pg_current_wal_lsn
</function> and the view's
<literal>sent_lsn
</literal> field
904 might indicate that the primary server is under heavy load, while
905 differences between
<literal>sent_lsn
</literal> and
906 <function>pg_last_wal_receive_lsn
</function> on the standby might indicate
907 network delay, or that the standby is under heavy load.
910 On a hot standby, the status of the WAL receiver process can be retrieved
911 via the
<link linkend=
"monitoring-pg-stat-wal-receiver-view">
912 <structname>pg_stat_wal_receiver
</structname></link> view. A large
913 difference between
<function>pg_last_wal_replay_lsn
</function> and the
914 view's
<literal>flushed_lsn
</literal> indicates that WAL is being
915 received faster than it can be replayed.
920 <sect2 id=
"streaming-replication-slots">
921 <title>Replication Slots
</title>
923 <primary>replication slot
</primary>
924 <secondary>streaming replication
</secondary>
927 Replication slots provide an automated way to ensure that the
929 not remove WAL segments until they have been received by all standbys,
930 and that the primary does not remove rows which could cause a
931 <link linkend=
"hot-standby-conflict">recovery conflict
</link> even when the
932 standby is disconnected.
935 In lieu of using replication slots, it is possible to prevent the removal
936 of old WAL segments using
<xref linkend=
"guc-wal-keep-size"/>, or by
937 storing the segments in an archive using
938 <xref linkend=
"guc-archive-command"/> or
<xref linkend=
"guc-archive-library"/>.
939 A disadvantage of these methods is that they
940 often result in retaining more WAL segments than
941 required, whereas replication slots retain only the number of segments
945 Similarly,
<xref linkend=
"guc-hot-standby-feedback"/> on its own, without
946 also using a replication slot, provides protection against relevant rows
947 being removed by vacuum, but provides no protection during any time period
948 when the standby is not connected.
953 Beware that replication slots can cause the server to retain so
954 many WAL segments that they fill up the space allocated for
955 <literal>pg_wal
</literal>.
956 <xref linkend=
"guc-max-slot-wal-keep-size"/> can be used to limit the size
957 of WAL files retained by replication slots.
961 <sect3 id=
"streaming-replication-slots-manipulation">
962 <title>Querying and Manipulating Replication Slots
</title>
964 Each replication slot has a name, which can contain lower-case letters,
965 numbers, and the underscore character.
968 Existing replication slots and their state can be seen in the
969 <link linkend=
"view-pg-replication-slots"><structname>pg_replication_slots
</structname></link>
973 Slots can be created and dropped either via the streaming replication
974 protocol (see
<xref linkend=
"protocol-replication"/>) or via SQL
975 functions (see
<xref linkend=
"functions-replication"/>).
978 <sect3 id=
"streaming-replication-slots-config">
979 <title>Configuration Example
</title>
981 You can create a replication slot like this:
983 postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
988 postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
989 slot_name | slot_type | active
990 -------------+-----------+--------
991 node_a_slot | physical | f
994 To configure the standby to use this slot,
<varname>primary_slot_name
</varname>
995 should be configured on the standby. Here is a simple example:
997 primary_conninfo = 'host=
192.168.1.50 port=
5432 user=foo password=foopass'
998 primary_slot_name = 'node_a_slot'
1004 <sect2 id=
"cascading-replication">
1005 <title>Cascading Replication
</title>
1007 <indexterm zone=
"high-availability">
1008 <primary>Cascading Replication
</primary>
1012 The cascading replication feature allows a standby server to accept replication
1013 connections and stream WAL records to other standbys, acting as a relay.
1014 This can be used to reduce the number of direct connections to the primary
1015 and also to minimize inter-site bandwidth overheads.
1019 A standby acting as both a receiver and a sender is known as a cascading
1020 standby. Standbys that are more directly connected to the primary are known
1021 as upstream servers, while those standby servers further away are downstream
1022 servers. Cascading replication does not place limits on the number or
1023 arrangement of downstream servers, though each standby connects to only
1024 one upstream server which eventually links to a single primary server.
1028 A cascading standby sends not only WAL records received from the
1029 primary but also those restored from the archive. So even if the replication
1030 connection in some upstream connection is terminated, streaming replication
1031 continues downstream for as long as new WAL records are available.
1035 Cascading replication is currently asynchronous. Synchronous replication
1036 (see
<xref linkend=
"synchronous-replication"/>) settings have no effect on
1037 cascading replication at present.
1041 Hot standby feedback propagates upstream, whatever the cascaded arrangement.
1045 If an upstream standby server is promoted to become the new primary, downstream
1046 servers will continue to stream from the new primary if
1047 <varname>recovery_target_timeline
</varname> is set to
<literal>'latest'
</literal> (the default).
1051 To use cascading replication, set up the cascading standby so that it can
1052 accept replication connections (that is, set
1053 <xref linkend=
"guc-max-wal-senders"/> and
<xref linkend=
"guc-hot-standby"/>,
1055 <link linkend=
"auth-pg-hba-conf">host-based authentication
</link>).
1056 You will also need to set
<varname>primary_conninfo
</varname> in the downstream
1057 standby to point to the cascading standby.
1061 <sect2 id=
"synchronous-replication">
1062 <title>Synchronous Replication
</title>
1064 <indexterm zone=
"high-availability">
1065 <primary>Synchronous Replication
</primary>
1069 <productname>PostgreSQL
</productname> streaming replication is asynchronous by
1070 default. If the primary server
1071 crashes then some transactions that were committed may not have been
1072 replicated to the standby server, causing data loss. The amount
1073 of data loss is proportional to the replication delay at the time of
1078 Synchronous replication offers the ability to confirm that all changes
1079 made by a transaction have been transferred to one or more synchronous
1080 standby servers. This extends that standard level of durability
1081 offered by a transaction commit. This level of protection is referred
1082 to as
2-safe replication in computer science theory, and group-
1-safe
1083 (group-safe and
1-safe) when
<varname>synchronous_commit
</varname> is set to
1084 <literal>remote_write
</literal>.
1088 When requesting synchronous replication, each commit of a
1089 write transaction will wait until confirmation is
1090 received that the commit has been written to the write-ahead log on disk
1091 of both the primary and standby server. The only possibility that data
1092 can be lost is if both the primary and the standby suffer crashes at the
1093 same time. This can provide a much higher level of durability, though only
1094 if the sysadmin is cautious about the placement and management of the two
1095 servers. Waiting for confirmation increases the user's confidence that the
1096 changes will not be lost in the event of server crashes but it also
1097 necessarily increases the response time for the requesting transaction.
1098 The minimum wait time is the round-trip time between primary and standby.
1102 Read-only transactions and transaction rollbacks need not wait for
1103 replies from standby servers. Subtransaction commits do not wait for
1104 responses from standby servers, only top-level commits. Long
1105 running actions such as data loading or index building do not wait
1106 until the very final commit message. All two-phase commit actions
1107 require commit waits, including both prepare and commit.
1111 A synchronous standby can be a physical replication standby or a logical
1112 replication subscriber. It can also be any other physical or logical WAL
1113 replication stream consumer that knows how to send the appropriate
1114 feedback messages. Besides the built-in physical and logical replication
1115 systems, this includes special programs such
1116 as
<command>pg_receivewal
</command> and
<command>pg_recvlogical
</command>
1117 as well as some third-party replication systems and custom programs.
1118 Check the respective documentation for details on synchronous replication
1122 <sect3 id=
"synchronous-replication-config">
1123 <title>Basic Configuration
</title>
1126 Once streaming replication has been configured, configuring synchronous
1127 replication requires only one additional configuration step:
1128 <xref linkend=
"guc-synchronous-standby-names"/> must be set to
1129 a non-empty value.
<varname>synchronous_commit
</varname> must also be set to
1130 <literal>on
</literal>, but since this is the default value, typically no change is
1131 required. (See
<xref linkend=
"runtime-config-wal-settings"/> and
1132 <xref linkend=
"runtime-config-replication-primary"/>.)
1133 This configuration will cause each commit to wait for
1134 confirmation that the standby has written the commit record to durable
1136 <varname>synchronous_commit
</varname> can be set by individual
1137 users, so it can be configured in the configuration file, for particular
1138 users or databases, or dynamically by applications, in order to control
1139 the durability guarantee on a per-transaction basis.
1143 After a commit record has been written to disk on the primary, the
1144 WAL record is then sent to the standby. The standby sends reply
1145 messages each time a new batch of WAL data is written to disk, unless
1146 <varname>wal_receiver_status_interval
</varname> is set to zero on the standby.
1147 In the case that
<varname>synchronous_commit
</varname> is set to
1148 <literal>remote_apply
</literal>, the standby sends reply messages when the commit
1149 record is replayed, making the transaction visible.
1150 If the standby is chosen as a synchronous standby, according to the setting
1151 of
<varname>synchronous_standby_names
</varname> on the primary, the reply
1152 messages from that standby will be considered along with those from other
1153 synchronous standbys to decide when to release transactions waiting for
1154 confirmation that the commit record has been received. These parameters
1155 allow the administrator to specify which standby servers should be
1156 synchronous standbys. Note that the configuration of synchronous
1157 replication is mainly on the primary. Named standbys must be directly
1158 connected to the primary; the primary knows nothing about downstream
1159 standby servers using cascaded replication.
1163 Setting
<varname>synchronous_commit
</varname> to
<literal>remote_write
</literal> will
1164 cause each commit to wait for confirmation that the standby has received
1165 the commit record and written it out to its own operating system, but not
1166 for the data to be flushed to disk on the standby. This
1167 setting provides a weaker guarantee of durability than
<literal>on
</literal>
1168 does: the standby could lose the data in the event of an operating system
1169 crash, though not a
<productname>PostgreSQL
</productname> crash.
1170 However, it's a useful setting in practice
1171 because it can decrease the response time for the transaction.
1172 Data loss could only occur if both the primary and the standby crash and
1173 the database of the primary gets corrupted at the same time.
1177 Setting
<varname>synchronous_commit
</varname> to
<literal>remote_apply
</literal> will
1178 cause each commit to wait until the current synchronous standbys report
1179 that they have replayed the transaction, making it visible to user
1180 queries. In simple cases, this allows for load balancing with causal
1185 Users will stop waiting if a fast shutdown is requested. However, as
1186 when using asynchronous replication, the server will not fully
1187 shutdown until all outstanding WAL records are transferred to the currently
1188 connected standby servers.
1193 <sect3 id=
"synchronous-replication-multiple-standbys">
1194 <title>Multiple Synchronous Standbys
</title>
1197 Synchronous replication supports one or more synchronous standby servers;
1198 transactions will wait until all the standby servers which are considered
1199 as synchronous confirm receipt of their data. The number of synchronous
1200 standbys that transactions must wait for replies from is specified in
1201 <varname>synchronous_standby_names
</varname>. This parameter also specifies
1202 a list of standby names and the method (
<literal>FIRST
</literal> and
1203 <literal>ANY
</literal>) to choose synchronous standbys from the listed ones.
1206 The method
<literal>FIRST
</literal> specifies a priority-based synchronous
1207 replication and makes transaction commits wait until their WAL records are
1208 replicated to the requested number of synchronous standbys chosen based on
1209 their priorities. The standbys whose names appear earlier in the list are
1210 given higher priority and will be considered as synchronous. Other standby
1211 servers appearing later in this list represent potential synchronous
1212 standbys. If any of the current synchronous standbys disconnects for
1213 whatever reason, it will be replaced immediately with the
1214 next-highest-priority standby.
1217 An example of
<varname>synchronous_standby_names
</varname> for
1218 a priority-based multiple synchronous standbys is:
1220 synchronous_standby_names = 'FIRST
2 (s1, s2, s3)'
1222 In this example, if four standby servers
<literal>s1
</literal>,
<literal>s2
</literal>,
1223 <literal>s3
</literal> and
<literal>s4
</literal> are running, the two standbys
1224 <literal>s1
</literal> and
<literal>s2
</literal> will be chosen as synchronous standbys
1225 because their names appear early in the list of standby names.
1226 <literal>s3
</literal> is a potential synchronous standby and will take over
1227 the role of synchronous standby when either of
<literal>s1
</literal> or
1228 <literal>s2
</literal> fails.
<literal>s4
</literal> is an asynchronous standby since
1229 its name is not in the list.
1232 The method
<literal>ANY
</literal> specifies a quorum-based synchronous
1233 replication and makes transaction commits wait until their WAL records
1234 are replicated to
<emphasis>at least
</emphasis> the requested number of
1235 synchronous standbys in the list.
1238 An example of
<varname>synchronous_standby_names
</varname> for
1239 a quorum-based multiple synchronous standbys is:
1241 synchronous_standby_names = 'ANY
2 (s1, s2, s3)'
1243 In this example, if four standby servers
<literal>s1
</literal>,
<literal>s2
</literal>,
1244 <literal>s3
</literal> and
<literal>s4
</literal> are running, transaction commits will
1245 wait for replies from at least any two standbys of
<literal>s1
</literal>,
1246 <literal>s2
</literal> and
<literal>s3
</literal>.
<literal>s4
</literal> is an asynchronous
1247 standby since its name is not in the list.
1250 The synchronous states of standby servers can be viewed using
1251 the
<structname>pg_stat_replication
</structname> view.
1255 <sect3 id=
"synchronous-replication-performance">
1256 <title>Planning for Performance
</title>
1259 Synchronous replication usually requires carefully planned and placed
1260 standby servers to ensure applications perform acceptably. Waiting
1261 doesn't utilize system resources, but transaction locks continue to be
1262 held until the transfer is confirmed. As a result, incautious use of
1263 synchronous replication will reduce performance for database
1264 applications because of increased response times and higher contention.
1268 <productname>PostgreSQL
</productname> allows the application developer
1269 to specify the durability level required via replication. This can be
1270 specified for the system overall, though it can also be specified for
1271 specific users or connections, or even individual transactions.
1275 For example, an application workload might consist of:
1276 10% of changes are important customer details, while
1277 90% of changes are less important data that the business can more
1278 easily survive if it is lost, such as chat messages between users.
1282 With synchronous replication options specified at the application level
1283 (on the primary) we can offer synchronous replication for the most
1284 important changes, without slowing down the bulk of the total workload.
1285 Application level options are an important and practical tool for allowing
1286 the benefits of synchronous replication for high performance applications.
1290 You should consider that the network bandwidth must be higher than
1291 the rate of generation of WAL data.
1296 <sect3 id=
"synchronous-replication-ha">
1297 <title>Planning for High Availability
</title>
1300 <varname>synchronous_standby_names
</varname> specifies the number and
1301 names of synchronous standbys that transaction commits made when
1302 <varname>synchronous_commit
</varname> is set to
<literal>on
</literal>,
1303 <literal>remote_apply
</literal> or
<literal>remote_write
</literal> will wait for
1304 responses from. Such transaction commits may never be completed
1305 if any one of the synchronous standbys should crash.
1309 The best solution for high availability is to ensure you keep as many
1310 synchronous standbys as requested. This can be achieved by naming multiple
1311 potential synchronous standbys using
<varname>synchronous_standby_names
</varname>.
1315 In a priority-based synchronous replication, the standbys whose names
1316 appear earlier in the list will be used as synchronous standbys.
1317 Standbys listed after these will take over the role of synchronous standby
1318 if one of current ones should fail.
1322 In a quorum-based synchronous replication, all the standbys appearing
1323 in the list will be used as candidates for synchronous standbys.
1324 Even if one of them should fail, the other standbys will keep performing
1325 the role of candidates of synchronous standby.
1329 When a standby first attaches to the primary, it will not yet be properly
1330 synchronized. This is described as
<literal>catchup
</literal> mode. Once
1331 the lag between standby and primary reaches zero for the first time
1332 we move to real-time
<literal>streaming
</literal> state.
1333 The catch-up duration may be long immediately after the standby has
1334 been created. If the standby is shut down, then the catch-up period
1335 will increase according to the length of time the standby has been down.
1336 The standby is only able to become a synchronous standby
1337 once it has reached
<literal>streaming
</literal> state.
1338 This state can be viewed using
1339 the
<structname>pg_stat_replication
</structname> view.
1343 If primary restarts while commits are waiting for acknowledgment, those
1344 waiting transactions will be marked fully committed once the primary
1346 There is no way to be certain that all standbys have received all
1347 outstanding WAL data at time of the crash of the primary. Some
1348 transactions may not show as committed on the standby, even though
1349 they show as committed on the primary. The guarantee we offer is that
1350 the application will not receive explicit acknowledgment of the
1351 successful commit of a transaction until the WAL data is known to be
1352 safely received by all the synchronous standbys.
1356 If you really cannot keep as many synchronous standbys as requested
1357 then you should decrease the number of synchronous standbys that
1358 transaction commits must wait for responses from
1359 in
<varname>synchronous_standby_names
</varname> (or disable it) and
1360 reload the configuration file on the primary server.
1364 If the primary is isolated from remaining standby servers you should
1365 fail over to the best candidate of those other remaining standby servers.
1369 If you need to re-create a standby server while transactions are
1370 waiting, make sure that the functions
<function>pg_backup_start()
</function>
1371 and
<function>pg_backup_stop()
</function> are run in a session with
1372 <varname>synchronous_commit
</varname> =
<literal>off
</literal>, otherwise those
1373 requests will wait forever for the standby to appear.
1379 <sect2 id=
"continuous-archiving-in-standby">
1380 <title>Continuous Archiving in Standby
</title>
1383 <primary>continuous archiving
</primary>
1384 <secondary>in standby
</secondary>
1388 When continuous WAL archiving is used in a standby, there are two
1389 different scenarios: the WAL archive can be shared between the primary
1390 and the standby, or the standby can have its own WAL archive. When
1391 the standby has its own WAL archive, set
<varname>archive_mode
</varname>
1392 to
<literal>always
</literal>, and the standby will call the archive
1393 command for every WAL segment it receives, whether it's by restoring
1394 from the archive or by streaming replication. The shared archive can
1395 be handled similarly, but the
<varname>archive_command
</varname> or
<varname>archive_library
</varname> must
1396 test if the file being archived exists already, and if the existing file
1397 has identical contents. This requires more care in the
1398 <varname>archive_command
</varname> or
<varname>archive_library
</varname>, as it must
1399 be careful to not overwrite an existing file with different contents,
1400 but return success if the exactly same file is archived twice. And
1401 all that must be done free of race conditions, if two servers attempt
1402 to archive the same file at the same time.
1406 If
<varname>archive_mode
</varname> is set to
<literal>on
</literal>, the
1407 archiver is not enabled during recovery or standby mode. If the standby
1408 server is promoted, it will start archiving after the promotion, but
1409 will not archive any WAL or timeline history files that
1410 it did not generate itself. To get a complete
1411 series of WAL files in the archive, you must ensure that all WAL is
1412 archived, before it reaches the standby. This is inherently true with
1413 file-based log shipping, as the standby can only restore files that
1414 are found in the archive, but not if streaming replication is enabled.
1415 When a server is not in recovery mode, there is no difference between
1416 <literal>on
</literal> and
<literal>always
</literal> modes.
1421 <sect1 id=
"warm-standby-failover">
1422 <title>Failover
</title>
1425 If the primary server fails then the standby server should begin
1426 failover procedures.
1430 If the standby server fails then no failover need take place. If the
1431 standby server can be restarted, even some time later, then the recovery
1432 process can also be restarted immediately, taking advantage of
1433 restartable recovery. If the standby server cannot be restarted, then a
1434 full new standby server instance should be created.
1438 If the primary server fails and the standby server becomes the
1439 new primary, and then the old primary restarts, you must have
1440 a mechanism for informing the old primary that it is no longer the primary. This is
1441 sometimes known as
<acronym>STONITH
</acronym> (Shoot The Other Node In The Head), which is
1442 necessary to avoid situations where both systems think they are the
1443 primary, which will lead to confusion and ultimately data loss.
1447 Many failover systems use just two systems, the primary and the standby,
1448 connected by some kind of heartbeat mechanism to continually verify the
1449 connectivity between the two and the viability of the primary. It is
1450 also possible to use a third system (called a witness server) to prevent
1451 some cases of inappropriate failover, but the additional complexity
1452 might not be worthwhile unless it is set up with sufficient care and
1457 <productname>PostgreSQL
</productname> does not provide the system
1458 software required to identify a failure on the primary and notify
1459 the standby database server. Many such tools exist and are well
1460 integrated with the operating system facilities required for
1461 successful failover, such as IP address migration.
1465 Once failover to the standby occurs, there is only a
1466 single server in operation. This is known as a degenerate state.
1467 The former standby is now the primary, but the former primary is down
1468 and might stay down. To return to normal operation, a standby server
1470 either on the former primary system when it comes up, or on a third,
1471 possibly new, system. The
<xref linkend=
"app-pgrewind"/> utility can be
1472 used to speed up this process on large clusters.
1473 Once complete, the primary and standby can be
1474 considered to have switched roles. Some people choose to use a third
1475 server to provide backup for the new primary until the new standby
1476 server is recreated,
1477 though clearly this complicates the system configuration and
1478 operational processes.
1482 So, switching from primary to standby server can be fast but requires
1483 some time to re-prepare the failover cluster. Regular switching from
1484 primary to standby is useful, since it allows regular downtime on
1485 each system for maintenance. This also serves as a test of the
1486 failover mechanism to ensure that it will really work when you need it.
1487 Written administration procedures are advised.
1491 If you have opted for logical replication slot synchronization (see
1492 <xref linkend=
"logicaldecoding-replication-slots-synchronization"/>),
1493 then before switching to the standby server, it is recommended to check
1494 if the logical slots synchronized on the standby server are ready
1495 for failover. This can be done by following the steps described in
1496 <xref linkend=
"logical-replication-failover"/>.
1500 To trigger failover of a log-shipping standby server, run
1501 <command>pg_ctl promote
</command> or call
<function>pg_promote()
</function>.
1502 If you're setting up reporting servers that are only used to offload
1503 read-only queries from the primary, not for high availability purposes,
1504 you don't need to promote.
1508 <sect1 id=
"hot-standby">
1509 <title>Hot Standby
</title>
1511 <indexterm zone=
"high-availability">
1512 <primary>hot standby
</primary>
1516 Hot standby is the term used to describe the ability to connect to
1517 the server and run read-only queries while the server is in archive
1518 recovery or standby mode. This
1519 is useful both for replication purposes and for restoring a backup
1520 to a desired state with great precision.
1521 The term hot standby also refers to the ability of the server to move
1522 from recovery through to normal operation while users continue running
1523 queries and/or keep their connections open.
1527 Running queries in hot standby mode is similar to normal query operation,
1528 though there are several usage and administrative differences
1532 <sect2 id=
"hot-standby-users">
1533 <title>User's Overview
</title>
1536 When the
<xref linkend=
"guc-hot-standby"/> parameter is set to true on a
1537 standby server, it will begin accepting connections once the recovery has
1538 brought the system to a consistent state. All such connections are
1539 strictly read-only; not even temporary tables may be written.
1543 The data on the standby takes some time to arrive from the primary server
1544 so there will be a measurable delay between primary and standby. Running the
1545 same query nearly simultaneously on both primary and standby might therefore
1546 return differing results. We say that data on the standby is
1547 <firstterm>eventually consistent
</firstterm> with the primary. Once the
1548 commit record for a transaction is replayed on the standby, the changes
1549 made by that transaction will be visible to any new snapshots taken on
1550 the standby. Snapshots may be taken at the start of each query or at the
1551 start of each transaction, depending on the current transaction isolation
1552 level. For more details, see
<xref linkend=
"transaction-iso"/>.
1556 Transactions started during hot standby may issue the following commands:
1561 Query access:
<command>SELECT
</command>,
<command>COPY TO
</command>
1566 Cursor commands:
<command>DECLARE
</command>,
<command>FETCH
</command>,
<command>CLOSE
</command>
1571 Settings:
<command>SHOW
</command>,
<command>SET
</command>,
<command>RESET
</command>
1576 Transaction management commands:
1580 <command>BEGIN
</command>,
<command>END
</command>,
<command>ABORT
</command>,
<command>START TRANSACTION
</command>
1585 <command>SAVEPOINT
</command>,
<command>RELEASE
</command>,
<command>ROLLBACK TO SAVEPOINT
</command>
1590 <command>EXCEPTION
</command> blocks and other internal subtransactions
1598 <command>LOCK TABLE
</command>, though only when explicitly in one of these modes:
1599 <literal>ACCESS SHARE
</literal>,
<literal>ROW SHARE
</literal> or
<literal>ROW EXCLUSIVE
</literal>.
1604 Plans and resources:
<command>PREPARE
</command>,
<command>EXECUTE
</command>,
1605 <command>DEALLOCATE
</command>,
<command>DISCARD
</command>
1610 Plugins and extensions:
<command>LOAD
</command>
1615 <command>UNLISTEN
</command>
1622 Transactions started during hot standby will never be assigned a
1623 transaction ID and cannot write to the system write-ahead log.
1624 Therefore, the following actions will produce error messages:
1629 Data Manipulation Language (DML):
<command>INSERT
</command>,
1630 <command>UPDATE
</command>,
<command>DELETE
</command>,
1631 <command>MERGE
</command>,
<command>COPY FROM
</command>,
1632 <command>TRUNCATE
</command>.
1633 Note that there are no allowed actions that result in a trigger
1634 being executed during recovery. This restriction applies even to
1635 temporary tables, because table rows cannot be read or written without
1636 assigning a transaction ID, which is currently not possible in a
1637 hot standby environment.
1642 Data Definition Language (DDL):
<command>CREATE
</command>,
1643 <command>DROP
</command>,
<command>ALTER
</command>,
<command>COMMENT
</command>.
1644 This restriction applies even to temporary tables, because carrying
1645 out these operations would require updating the system catalog tables.
1650 <command>SELECT ... FOR SHARE | UPDATE
</command>, because row locks cannot be
1651 taken without updating the underlying data files.
1656 Rules on
<command>SELECT
</command> statements that generate DML commands.
1661 <command>LOCK
</command> that explicitly requests a mode higher than
<literal>ROW EXCLUSIVE MODE
</literal>.
1666 <command>LOCK
</command> in short default form, since it requests
<literal>ACCESS EXCLUSIVE MODE
</literal>.
1671 Transaction management commands that explicitly set non-read-only state:
1675 <command>BEGIN READ WRITE
</command>,
1676 <command>START TRANSACTION READ WRITE
</command>
1681 <command>SET TRANSACTION READ WRITE
</command>,
1682 <command>SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
</command>
1687 <command>SET transaction_read_only = off
</command>
1695 Two-phase commit commands:
<command>PREPARE TRANSACTION
</command>,
1696 <command>COMMIT PREPARED
</command>,
<command>ROLLBACK PREPARED
</command>
1697 because even read-only transactions need to write WAL in the
1698 prepare phase (the first phase of two phase commit).
1703 Sequence updates:
<function>nextval()
</function>,
<function>setval()
</function>
1708 <command>LISTEN
</command>,
<command>NOTIFY
</command>
1715 In normal operation,
<quote>read-only
</quote> transactions are allowed to
1716 use
<command>LISTEN
</command> and
<command>NOTIFY
</command>,
1717 so hot standby sessions operate under slightly tighter
1718 restrictions than ordinary read-only sessions. It is possible that some
1719 of these restrictions might be loosened in a future release.
1723 During hot standby, the parameter
<varname>transaction_read_only
</varname> is always
1724 true and may not be changed. But as long as no attempt is made to modify
1725 the database, connections during hot standby will act much like any other
1726 database connection. If failover or switchover occurs, the database will
1727 switch to normal processing mode. Sessions will remain connected while the
1728 server changes mode. Once hot standby finishes, it will be possible to
1729 initiate read-write transactions (even from a session begun during
1734 Users can determine whether hot standby is currently active for their
1735 session by issuing
<command>SHOW in_hot_standby
</command>.
1736 (In server versions before
14, the
<varname>in_hot_standby
</varname>
1737 parameter did not exist; a workable substitute method for older servers
1738 is
<command>SHOW transaction_read_only
</command>.) In addition, a set of
1739 functions (
<xref linkend=
"functions-recovery-info-table"/>) allow users to
1740 access information about the standby server. These allow you to write
1741 programs that are aware of the current state of the database. These
1742 can be used to monitor the progress of recovery, or to allow you to
1743 write complex programs that restore the database to particular states.
1747 <sect2 id=
"hot-standby-conflict">
1748 <title>Handling Query Conflicts
</title>
1751 The primary and standby servers are in many ways loosely connected. Actions
1752 on the primary will have an effect on the standby. As a result, there is
1753 potential for negative interactions or conflicts between them. The easiest
1754 conflict to understand is performance: if a huge data load is taking place
1755 on the primary then this will generate a similar stream of WAL records on the
1756 standby, so standby queries may contend for system resources, such as I/O.
1760 There are also additional types of conflict that can occur with hot standby.
1761 These conflicts are
<emphasis>hard conflicts
</emphasis> in the sense that queries
1762 might need to be canceled and, in some cases, sessions disconnected to resolve them.
1763 The user is provided with several ways to handle these
1764 conflicts. Conflict cases include:
1769 Access Exclusive locks taken on the primary server, including both
1770 explicit
<command>LOCK
</command> commands and various
<acronym>DDL
</acronym>
1771 actions, conflict with table accesses in standby queries.
1776 Dropping a tablespace on the primary conflicts with standby queries
1777 using that tablespace for temporary work files.
1782 Dropping a database on the primary conflicts with sessions connected
1783 to that database on the standby.
1788 Application of a vacuum cleanup record from WAL conflicts with
1789 standby transactions whose snapshots can still
<quote>see
</quote> any of
1790 the rows to be removed.
1795 Application of a vacuum cleanup record from WAL conflicts with
1796 queries accessing the target page on the standby, whether or not
1797 the data to be removed is visible.
1804 On the primary server, these cases simply result in waiting; and the
1805 user might choose to cancel either of the conflicting actions. However,
1806 on the standby there is no choice: the WAL-logged action already occurred
1807 on the primary so the standby must not fail to apply it. Furthermore,
1808 allowing WAL application to wait indefinitely may be very undesirable,
1809 because the standby's state will become increasingly far behind the
1810 primary's. Therefore, a mechanism is provided to forcibly cancel standby
1811 queries that conflict with to-be-applied WAL records.
1815 An example of the problem situation is an administrator on the primary
1816 server running
<command>DROP TABLE
</command> on a table that is currently being
1817 queried on the standby server. Clearly the standby query cannot continue
1818 if the
<command>DROP TABLE
</command> is applied on the standby. If this situation
1819 occurred on the primary, the
<command>DROP TABLE
</command> would wait until the
1820 other query had finished. But when
<command>DROP TABLE
</command> is run on the
1821 primary, the primary doesn't have information about what queries are
1822 running on the standby, so it will not wait for any such standby
1823 queries. The WAL change records come through to the standby while the
1824 standby query is still running, causing a conflict. The standby server
1825 must either delay application of the WAL records (and everything after
1826 them, too) or else cancel the conflicting query so that the
<command>DROP
1827 TABLE
</command> can be applied.
1831 When a conflicting query is short, it's typically desirable to allow it to
1832 complete by delaying WAL application for a little bit; but a long delay in
1833 WAL application is usually not desirable. So the cancel mechanism has
1834 parameters,
<xref linkend=
"guc-max-standby-archive-delay"/> and
<xref
1835 linkend=
"guc-max-standby-streaming-delay"/>, that define the maximum
1836 allowed delay in WAL application. Conflicting queries will be canceled
1837 once it has taken longer than the relevant delay setting to apply any
1838 newly-received WAL data. There are two parameters so that different delay
1839 values can be specified for the case of reading WAL data from an archive
1840 (i.e., initial recovery from a base backup or
<quote>catching up
</quote> a
1841 standby server that has fallen far behind) versus reading WAL data via
1842 streaming replication.
1846 In a standby server that exists primarily for high availability, it's
1847 best to set the delay parameters relatively short, so that the server
1848 cannot fall far behind the primary due to delays caused by standby
1849 queries. However, if the standby server is meant for executing
1850 long-running queries, then a high or even infinite delay value may be
1851 preferable. Keep in mind however that a long-running query could
1852 cause other sessions on the standby server to not see recent changes
1853 on the primary, if it delays application of WAL records.
1857 Once the delay specified by
<varname>max_standby_archive_delay
</varname> or
1858 <varname>max_standby_streaming_delay
</varname> has been exceeded, conflicting
1859 queries will be canceled. This usually results just in a cancellation
1860 error, although in the case of replaying a
<command>DROP DATABASE
</command>
1861 the entire conflicting session will be terminated. Also, if the conflict
1862 is over a lock held by an idle transaction, the conflicting session is
1863 terminated (this behavior might change in the future).
1867 Canceled queries may be retried immediately (after beginning a new
1868 transaction, of course). Since query cancellation depends on
1869 the nature of the WAL records being replayed, a query that was
1870 canceled may well succeed if it is executed again.
1874 Keep in mind that the delay parameters are compared to the elapsed time
1875 since the WAL data was received by the standby server. Thus, the grace
1876 period allowed to any one query on the standby is never more than the
1877 delay parameter, and could be considerably less if the standby has already
1878 fallen behind as a result of waiting for previous queries to complete, or
1879 as a result of being unable to keep up with a heavy update load.
1883 The most common reason for conflict between standby queries and WAL replay
1884 is
<quote>early cleanup
</quote>. Normally,
<productname>PostgreSQL
</productname> allows
1885 cleanup of old row versions when there are no transactions that need to
1886 see them to ensure correct visibility of data according to MVCC rules.
1887 However, this rule can only be applied for transactions executing on the
1888 primary. So it is possible that cleanup on the primary will remove row
1889 versions that are still visible to a transaction on the standby.
1893 Row version cleanup isn't the only potential cause of conflicts with
1894 standby queries. All index-only scans (including those that run on
1895 standbys) must use an
<acronym>MVCC
</acronym> snapshot that
1896 <quote>agrees
</quote> with the visibility map. Conflicts are therefore
1897 required whenever
<command>VACUUM
</command> <link
1898 linkend=
"vacuum-for-visibility-map">sets a page as all-visible in the
1899 visibility map
</link> containing one or more rows
1900 <emphasis>not
</emphasis> visible to all standby queries. So even running
1901 <command>VACUUM
</command> against a table with no updated or deleted rows
1902 requiring cleanup might lead to conflicts.
1906 Users should be clear that tables that are regularly and heavily updated
1907 on the primary server will quickly cause cancellation of longer running
1908 queries on the standby. In such cases the setting of a finite value for
1909 <varname>max_standby_archive_delay
</varname> or
1910 <varname>max_standby_streaming_delay
</varname> can be considered similar to
1911 setting
<varname>statement_timeout
</varname>.
1915 Remedial possibilities exist if the number of standby-query cancellations
1916 is found to be unacceptable. The first option is to set the parameter
1917 <varname>hot_standby_feedback
</varname>, which prevents
<command>VACUUM
</command> from
1918 removing recently-dead rows and so cleanup conflicts do not occur.
1920 should note that this will delay cleanup of dead rows on the primary,
1921 which may result in undesirable table bloat. However, the cleanup
1922 situation will be no worse than if the standby queries were running
1923 directly on the primary server, and you are still getting the benefit of
1924 off-loading execution onto the standby.
1925 If standby servers connect and disconnect frequently, you
1926 might want to make adjustments to handle the period when
1927 <varname>hot_standby_feedback
</varname> feedback is not being provided.
1928 For example, consider increasing
<varname>max_standby_archive_delay
</varname>
1929 so that queries are not rapidly canceled by conflicts in WAL archive
1930 files during disconnected periods. You should also consider increasing
1931 <varname>max_standby_streaming_delay
</varname> to avoid rapid cancellations
1932 by newly-arrived streaming WAL entries after reconnection.
1936 The number of query cancels and the reason for them can be viewed using
1937 the
<structname>pg_stat_database_conflicts
</structname> system view on the standby
1938 server. The
<structname>pg_stat_database
</structname> system view also contains
1939 summary information.
1943 Users can control whether a log message is produced when WAL replay is waiting
1944 longer than
<varname>deadlock_timeout
</varname> for conflicts. This
1945 is controlled by the
<xref linkend=
"guc-log-recovery-conflict-waits"/> parameter.
1949 <sect2 id=
"hot-standby-admin">
1950 <title>Administrator's Overview
</title>
1953 If
<varname>hot_standby
</varname> is
<literal>on
</literal> in
<filename>postgresql.conf
</filename>
1954 (the default value) and there is a
1955 <link linkend=
"file-standby-signal"><filename>standby.signal
</filename></link><indexterm><primary>standby.signal
</primary><secondary>for hot standby
</secondary></indexterm>
1956 file present, the server will run in hot standby mode.
1957 However, it may take some time for hot standby connections to be allowed,
1958 because the server will not accept connections until it has completed
1959 sufficient recovery to provide a consistent state against which queries
1960 can run. During this period,
1961 clients that attempt to connect will be refused with an error message.
1962 To confirm the server has come up, either loop trying to connect from
1963 the application, or look for these messages in the server logs:
1966 LOG: entering standby mode
1968 ... then some time later ...
1970 LOG: consistent recovery state reached
1971 LOG: database system is ready to accept read-only connections
1974 Consistency information is recorded once per checkpoint on the primary.
1975 It is not possible to enable hot standby when reading WAL
1976 written during a period when
<varname>wal_level
</varname> was not set to
1977 <literal>replica
</literal> or
<literal>logical
</literal> on the primary. Reaching
1978 a consistent state can also be delayed in the presence of both of these
1984 A write transaction has more than
64 subtransactions
1989 Very long-lived write transactions
1994 If you are running file-based log shipping (
"warm standby"), you might need
1995 to wait until the next WAL file arrives, which could be as long as the
1996 <varname>archive_timeout
</varname> setting on the primary.
2000 The settings of some parameters determine the size of shared memory for
2001 tracking transaction IDs, locks, and prepared transactions. These shared
2002 memory structures must be no smaller on a standby than on the primary in
2003 order to ensure that the standby does not run out of shared memory during
2004 recovery. For example, if the primary had used a prepared transaction but
2005 the standby had not allocated any shared memory for tracking prepared
2006 transactions, then recovery could not continue until the standby's
2007 configuration is changed. The parameters affected are:
2012 <varname>max_connections
</varname>
2017 <varname>max_prepared_transactions
</varname>
2022 <varname>max_locks_per_transaction
</varname>
2027 <varname>max_wal_senders
</varname>
2032 <varname>max_worker_processes
</varname>
2037 The easiest way to ensure this does not become a problem is to have these
2038 parameters set on the standbys to values equal to or greater than on the
2039 primary. Therefore, if you want to increase these values, you should do
2040 so on all standby servers first, before applying the changes to the
2041 primary server. Conversely, if you want to decrease these values, you
2042 should do so on the primary server first, before applying the changes to
2043 all standby servers. Keep in mind that when a standby is promoted, it
2044 becomes the new reference for the required parameter settings for the
2045 standbys that follow it. Therefore, to avoid this becoming a problem
2046 during a switchover or failover, it is recommended to keep these settings
2047 the same on all standby servers.
2051 The WAL tracks changes to these parameters on the
2052 primary. If a hot standby processes WAL that indicates that the current
2053 value on the primary is higher than its own value, it will log a warning
2054 and pause recovery, for example:
2056 WARNING: hot standby is not possible because of insufficient parameter settings
2057 DETAIL: max_connections =
80 is a lower setting than on the primary server, where its value was
100.
2058 LOG: recovery has paused
2059 DETAIL: If recovery is unpaused, the server will shut down.
2060 HINT: You can then restart the server after making the necessary configuration changes.
2062 At that point, the settings on the standby need to be updated and the
2063 instance restarted before recovery can continue. If the standby is not a
2064 hot standby, then when it encounters the incompatible parameter change, it
2065 will shut down immediately without pausing, since there is then no value
2070 It is important that the administrator select appropriate settings for
2071 <xref linkend=
"guc-max-standby-archive-delay"/> and
<xref
2072 linkend=
"guc-max-standby-streaming-delay"/>. The best choices vary
2073 depending on business priorities. For example if the server is primarily
2074 tasked as a High Availability server, then you will want low delay
2075 settings, perhaps even zero, though that is a very aggressive setting. If
2076 the standby server is tasked as an additional server for decision support
2077 queries then it might be acceptable to set the maximum delay values to
2078 many hours, or even -
1 which means wait forever for queries to complete.
2082 Transaction status
"hint bits" written on the primary are not WAL-logged,
2083 so data on the standby will likely re-write the hints again on the standby.
2084 Thus, the standby server will still perform disk writes even though
2085 all users are read-only; no changes occur to the data values
2086 themselves. Users will still write large sort temporary files and
2087 re-generate relcache info files, so no part of the database
2088 is truly read-only during hot standby mode.
2089 Note also that writes to remote databases using
2090 <application>dblink
</application> module, and other operations outside the
2091 database using PL functions will still be possible, even though the
2092 transaction is read-only locally.
2096 The following types of administration commands are not accepted
2097 during recovery mode:
2102 Data Definition Language (DDL): e.g.,
<command>CREATE INDEX
</command>
2107 Privilege and Ownership:
<command>GRANT
</command>,
<command>REVOKE
</command>,
2108 <command>REASSIGN
</command>
2113 Maintenance commands:
<command>ANALYZE
</command>,
<command>VACUUM
</command>,
2114 <command>CLUSTER
</command>,
<command>REINDEX
</command>
2121 Again, note that some of these commands are actually allowed during
2122 "read only" mode transactions on the primary.
2126 As a result, you cannot create additional indexes that exist solely
2127 on the standby, nor statistics that exist solely on the standby.
2128 If these administration commands are needed, they should be executed
2129 on the primary, and eventually those changes will propagate to the
2134 <function>pg_cancel_backend()
</function>
2135 and
<function>pg_terminate_backend()
</function> will work on user backends,
2136 but not the startup process, which performs
2137 recovery.
<structname>pg_stat_activity
</structname> does not show
2138 recovering transactions as active. As a result,
2139 <structname>pg_prepared_xacts
</structname> is always empty during
2140 recovery. If you wish to resolve in-doubt prepared transactions, view
2141 <literal>pg_prepared_xacts
</literal> on the primary and issue commands to
2142 resolve transactions there or resolve them after the end of recovery.
2146 <structname>pg_locks
</structname> will show locks held by backends,
2147 as normal.
<structname>pg_locks
</structname> also shows
2148 a virtual transaction managed by the startup process that owns all
2149 <literal>AccessExclusiveLocks
</literal> held by transactions being replayed by recovery.
2150 Note that the startup process does not acquire locks to
2151 make database changes, and thus locks other than
<literal>AccessExclusiveLocks
</literal>
2152 do not show in
<structname>pg_locks
</structname> for the Startup
2153 process; they are just presumed to exist.
2157 The
<productname>Nagios
</productname> plugin
<productname>check_pgsql
</productname> will
2158 work, because the simple information it checks for exists.
2159 The
<productname>check_postgres
</productname> monitoring script will also work,
2160 though some reported values could give different or confusing results.
2161 For example, last vacuum time will not be maintained, since no
2162 vacuum occurs on the standby. Vacuums running on the primary
2163 do still send their changes to the standby.
2167 WAL file control commands will not work during recovery,
2168 e.g.,
<function>pg_backup_start
</function>,
<function>pg_switch_wal
</function> etc.
2172 Dynamically loadable modules work, including
<structname>pg_stat_statements
</structname>.
2176 Advisory locks work normally in recovery, including deadlock detection.
2177 Note that advisory locks are never WAL logged, so it is impossible for
2178 an advisory lock on either the primary or the standby to conflict with WAL
2179 replay. Nor is it possible to acquire an advisory lock on the primary
2180 and have it initiate a similar advisory lock on the standby. Advisory
2181 locks relate only to the server on which they are acquired.
2185 Trigger-based replication systems such as
<productname>Slony
</productname>,
2186 <productname>Londiste
</productname> and
<productname>Bucardo
</productname> won't run on the
2187 standby at all, though they will run happily on the primary server as
2188 long as the changes are not sent to standby servers to be applied.
2189 WAL replay is not trigger-based so you cannot relay from the
2190 standby to any system that requires additional database writes or
2191 relies on the use of triggers.
2195 New OIDs cannot be assigned, though some
<acronym>UUID
</acronym> generators may still
2196 work as long as they do not rely on writing new status to the database.
2200 Currently, temporary table creation is not allowed during read-only
2201 transactions, so in some cases existing scripts will not run correctly.
2202 This restriction might be relaxed in a later release. This is
2203 both an SQL standard compliance issue and a technical issue.
2207 <command>DROP TABLESPACE
</command> can only succeed if the tablespace is empty.
2208 Some standby users may be actively using the tablespace via their
2209 <varname>temp_tablespaces
</varname> parameter. If there are temporary files in the
2210 tablespace, all active queries are canceled to ensure that temporary
2211 files are removed, so the tablespace can be removed and WAL replay
2216 Running
<command>DROP DATABASE
</command> or
<command>ALTER DATABASE ... SET
2217 TABLESPACE
</command> on the primary
2218 will generate a WAL entry that will cause all users connected to that
2219 database on the standby to be forcibly disconnected. This action occurs
2220 immediately, whatever the setting of
2221 <varname>max_standby_streaming_delay
</varname>. Note that
2222 <command>ALTER DATABASE ... RENAME
</command> does not disconnect users, which
2223 in most cases will go unnoticed, though might in some cases cause a
2224 program confusion if it depends in some way upon database name.
2228 In normal (non-recovery) mode, if you issue
<command>DROP USER
</command> or
<command>DROP ROLE
</command>
2229 for a role with login capability while that user is still connected then
2230 nothing happens to the connected user
— they remain connected. The user cannot
2231 reconnect however. This behavior applies in recovery also, so a
2232 <command>DROP USER
</command> on the primary does not disconnect that user on the standby.
2236 The cumulative statistics system is active during recovery. All scans,
2237 reads, blocks, index usage, etc., will be recorded normally on the
2238 standby. However, WAL replay will not increment relation and database
2239 specific counters. I.e. replay will not increment
2240 <structname>pg_stat_all_tables
</structname> columns (like
<structfield>n_tup_ins
</structfield>),
2241 nor will reads or writes performed by the startup process be tracked in the
2242 <structname>pg_statio_
</structname> views, nor will associated
2243 <structname>pg_stat_database
</structname> columns be incremented.
2247 Autovacuum is not active during recovery. It will start normally at the
2252 The checkpointer process and the background writer process are active during
2253 recovery. The checkpointer process will perform restartpoints (similar to
2254 checkpoints on the primary) and the background writer process will perform
2255 normal block cleaning activities. This can include updates of the hint bit
2256 information stored on the standby server.
2257 The
<command>CHECKPOINT
</command> command is accepted during recovery,
2258 though it performs a restartpoint rather than a new checkpoint.
2262 <sect2 id=
"hot-standby-parameters">
2263 <title>Hot Standby Parameter Reference
</title>
2266 Various parameters have been mentioned above in
2267 <xref linkend=
"hot-standby-conflict"/> and
2268 <xref linkend=
"hot-standby-admin"/>.
2272 On the primary, the
<xref linkend=
"guc-wal-level"/> parameter can be used.
2273 <xref linkend=
"guc-max-standby-archive-delay"/> and
2274 <xref linkend=
"guc-max-standby-streaming-delay"/> have no effect if set on
2279 On the standby, parameters
<xref linkend=
"guc-hot-standby"/>,
2280 <xref linkend=
"guc-max-standby-archive-delay"/> and
2281 <xref linkend=
"guc-max-standby-streaming-delay"/> can be used.
2285 <sect2 id=
"hot-standby-caveats">
2286 <title>Caveats
</title>
2289 There are several limitations of hot standby.
2290 These can and probably will be fixed in future releases:
2295 Full knowledge of running transactions is required before snapshots
2296 can be taken. Transactions that use large numbers of subtransactions
2297 (currently greater than
64) will delay the start of read-only
2298 connections until the completion of the longest running write transaction.
2299 If this situation occurs, explanatory messages will be sent to the server log.
2304 Valid starting points for standby queries are generated at each
2305 checkpoint on the primary. If the standby is shut down while the primary
2306 is in a shutdown state, it might not be possible to re-enter hot standby
2307 until the primary is started up, so that it generates further starting
2308 points in the WAL logs. This situation isn't a problem in the most
2309 common situations where it might happen. Generally, if the primary is
2310 shut down and not available anymore, that's likely due to a serious
2311 failure that requires the standby being converted to operate as
2312 the new primary anyway. And in situations where the primary is
2313 being intentionally taken down, coordinating to make sure the standby
2314 becomes the new primary smoothly is also standard procedure.
2319 At the end of recovery,
<literal>AccessExclusiveLocks
</literal> held by prepared transactions
2320 will require twice the normal number of lock table entries. If you plan
2321 on running either a large number of concurrent prepared transactions
2322 that normally take
<literal>AccessExclusiveLocks
</literal>, or you plan on having one
2323 large transaction that takes many
<literal>AccessExclusiveLocks
</literal>, you are
2324 advised to select a larger value of
<varname>max_locks_per_transaction
</varname>,
2325 perhaps as much as twice the value of the parameter on
2326 the primary server. You need not consider this at all if
2327 your setting of
<varname>max_prepared_transactions
</varname> is
0.
2332 The Serializable transaction isolation level is not yet available in hot
2333 standby. (See
<xref linkend=
"xact-serializable"/> and
2334 <xref linkend=
"serializable-consistency"/> for details.)
2335 An attempt to set a transaction to the serializable isolation level in
2336 hot standby mode will generate an error.