doc/src/sgml/replication-origins.sgml

   1 <!-- doc/src/sgml/replication-origins.sgml -->
   2 <chapter id="replication-origins">
   3  <title>Replication Progress Tracking</title>
   4
   5  <indexterm zone="replication-origins">
   6   <primary>Replication Progress Tracking</primary>
   7  </indexterm>
   8  <indexterm zone="replication-origins">
   9   <primary>Replication Origins</primary>
  10  </indexterm>
  11
  12  <para>
  13   Replication origins are intended to make it easier to implement
  14   logical replication solutions on top
  15   of <link linkend="logicaldecoding">logical decoding</link>.
  16   They provide a solution to two common problems:
  17   <itemizedlist>
  18    <listitem>
  19     <para>How to safely keep track of replication progress</para>
  20    </listitem>
  21    <listitem>
  22     <para>How to change replication behavior based on the
  23      origin of a row; for example, to prevent loops in bi-directional
  24      replication setups</para>
  25    </listitem>
  26   </itemizedlist>
  27  </para>
  28
  29  <para>
  30   Replication origins have just two properties, a name and an ID. The name,
  31   which is what should be used to refer to the origin across systems, is
  32   free-form <type>text</type>. It should be used in a way that makes conflicts
  33   between replication origins created by different replication solutions
  34   unlikely; e.g., by prefixing the replication solution's name to it.
  35   The ID is used only to avoid having to store the long version
  36   in situations where space efficiency is important. It should never be shared
  37   across systems.
  38  </para>
  39
  40  <para>
  41   Replication origins can be created using the function
  42   <link linkend="pg-replication-origin-create"><function>pg_replication_origin_create()</function></link>;
  43   dropped using
  44   <link linkend="pg-replication-origin-drop"><function>pg_replication_origin_drop()</function></link>;
  45   and seen in the
  46   <link linkend="catalog-pg-replication-origin"><structname>pg_replication_origin</structname></link>
  47   system catalog.
  48  </para>
  49
  50  <para>
  51   One nontrivial part of building a replication solution is to keep track of
  52   replay progress in a safe manner. When the applying process, or the whole
  53   cluster, dies, it needs to be possible to find out up to where data has
  54   successfully been replicated. Naive solutions to this, such as updating a
  55   row in a table for every replayed transaction, have problems like run-time
  56   overhead and database bloat.
  57  </para>
  58
  59  <para>
  60   Using the replication origin infrastructure a session can be
  61   marked as replaying from a remote node (using the
  62   <link linkend="pg-replication-origin-session-setup"><function>pg_replication_origin_session_setup()</function></link>
  63   function). Additionally the <acronym>LSN</acronym> and commit
  64   time stamp of every source transaction can be configured on a per
  65   transaction basis using
  66   <link linkend="pg-replication-origin-xact-setup"><function>pg_replication_origin_xact_setup()</function></link>.
  67   If that's done replication progress will persist in a crash safe
  68   manner. Replay progress for all replication origins can be seen in the
  69   <link linkend="view-pg-replication-origin-status">
  70    <structname>pg_replication_origin_status</structname>
  71   </link> view. An individual origin's progress, e.g., when resuming
  72   replication, can be acquired using
  73   <link linkend="pg-replication-origin-progress"><function>pg_replication_origin_progress()</function></link>
  74   for any origin or
  75   <link linkend="pg-replication-origin-session-progress"><function>pg_replication_origin_session_progress()</function></link>
  76   for the origin configured in the current session.
  77  </para>
  78
  79  <para>
  80   In replication topologies more complex than replication from exactly one
  81   system to one other system, another problem can be that it is hard to avoid
  82   replicating replayed rows again. That can lead both to cycles in the
  83   replication and inefficiencies. Replication origins provide an optional
  84   mechanism to recognize and prevent that. When configured using the functions
  85   referenced in the previous paragraph, every change and transaction passed to
  86   output plugin callbacks (see <xref linkend="logicaldecoding-output-plugin"/>)
  87   generated by the session is tagged with the replication origin of the
  88   generating session.  This allows treating them differently in the output
  89   plugin, e.g., ignoring all but locally-originating rows.  Additionally
  90   the <link linkend="logicaldecoding-output-plugin-filter-origin">
  91   <function>filter_by_origin_cb</function></link> callback can be used
  92   to filter the logical decoding change stream based on the
  93   source. While less flexible, filtering via that callback is
  94   considerably more efficient than doing it in the output plugin.
  95  </para>
  96 </chapter>