Skip to content

Commit 447aae1

Browse files
committed
Implement WAIT FOR command
WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdsjtZLVzxjGT8rJHCYbM0D5dwkO+BBjcirozJ6nYbOW8Q@mail.gmail.com Discussion: https://www.postgresql.org/message-id/flat/CABPTF7UNft368x-RgOXkfj475OwEbp%2BVVO-wEXz7StgjD_%3D6sw%40mail.gmail.com Author: Kartyshov Ivan <[email protected]> Author: Alexander Korotkov <[email protected]> Author: Xuneng Zhou <[email protected]> Reviewed-by: Michael Paquier <[email protected]> Reviewed-by: Peter Eisentraut <[email protected]> Reviewed-by: Dilip Kumar <[email protected]> Reviewed-by: Amit Kapila <[email protected]> Reviewed-by: Alexander Lakhin <[email protected]> Reviewed-by: Bharath Rupireddy <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Reviewed-by: Kyotaro Horiguchi <[email protected]> Reviewed-by: jian he <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Reviewed-by: Xuneng Zhou <[email protected]>
1 parent 3b4e53a commit 447aae1

File tree

21 files changed

+931
-11
lines changed

21 files changed

+931
-11
lines changed

doc/src/sgml/high-availability.sgml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1376,6 +1376,60 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
13761376
</sect3>
13771377
</sect2>
13781378

1379+
<sect2 id="read-your-writes-consistency">
1380+
<title>Read-Your-Writes Consistency</title>
1381+
1382+
<para>
1383+
In asynchronous replication, there is always a short window where changes
1384+
on the primary may not yet be visible on the standby due to replication
1385+
lag. This can lead to inconsistencies when an application writes data on
1386+
the primary and then immediately issues a read query on the standby.
1387+
However, it is possible to address this without switching to synchronous
1388+
replication.
1389+
</para>
1390+
1391+
<para>
1392+
To address this, PostgreSQL offers a mechanism for read-your-writes
1393+
consistency. The key idea is to ensure that a client sees its own writes
1394+
by synchronizing the WAL replay on the standby with the known point of
1395+
change on the primary.
1396+
</para>
1397+
1398+
<para>
1399+
This is achieved by the following steps. After performing write
1400+
operations, the application retrieves the current WAL location using a
1401+
function call like this.
1402+
1403+
<programlisting>
1404+
postgres=# SELECT pg_current_wal_insert_lsn();
1405+
pg_current_wal_insert_lsn
1406+
--------------------
1407+
0/306EE20
1408+
(1 row)
1409+
</programlisting>
1410+
</para>
1411+
1412+
<para>
1413+
The <acronym>LSN</acronym> obtained from the primary is then communicated
1414+
to the standby server. This can be managed at the application level or
1415+
via the connection pooler. On the standby, the application issues the
1416+
<xref linkend="sql-wait-for"/> command to block further processing until
1417+
the standby's WAL replay process reaches (or exceeds) the specified
1418+
<acronym>LSN</acronym>.
1419+
1420+
<programlisting>
1421+
postgres=# WAIT FOR LSN '0/306EE20';
1422+
status
1423+
--------
1424+
success
1425+
(1 row)
1426+
</programlisting>
1427+
Once the command returns a status of success, it guarantees that all
1428+
changes up to the provided <acronym>LSN</acronym> have been applied,
1429+
ensuring that subsequent read queries will reflect the latest updates.
1430+
</para>
1431+
</sect2>
1432+
13791433
<sect2 id="continuous-archiving-in-standby">
13801434
<title>Continuous Archiving in Standby</title>
13811435

doc/src/sgml/ref/allfiles.sgml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,7 @@ Complete list of usable sgml source files in this directory.
188188
<!ENTITY update SYSTEM "update.sgml">
189189
<!ENTITY vacuum SYSTEM "vacuum.sgml">
190190
<!ENTITY values SYSTEM "values.sgml">
191+
<!ENTITY waitFor SYSTEM "wait_for.sgml">
191192

192193
<!-- applications and utilities -->
193194
<!ENTITY clusterdb SYSTEM "clusterdb.sgml">

doc/src/sgml/ref/wait_for.sgml

Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
<!--
2+
doc/src/sgml/ref/wait_for.sgml
3+
PostgreSQL documentation
4+
-->
5+
6+
<refentry id="sql-wait-for">
7+
<indexterm zone="sql-wait-for">
8+
<primary>WAIT FOR</primary>
9+
</indexterm>
10+
11+
<refmeta>
12+
<refentrytitle>WAIT FOR</refentrytitle>
13+
<manvolnum>7</manvolnum>
14+
<refmiscinfo>SQL - Language Statements</refmiscinfo>
15+
</refmeta>
16+
17+
<refnamediv>
18+
<refname>WAIT FOR</refname>
19+
<refpurpose>wait for target <acronym>LSN</acronym> to be replayed, optionally with a timeout</refpurpose>
20+
</refnamediv>
21+
22+
<refsynopsisdiv>
23+
<synopsis>
24+
WAIT FOR LSN '<replaceable class="parameter">lsn</replaceable>' [ WITH ( <replaceable class="parameter">option</replaceable> [, ...] ) ]
25+
26+
<phrase>where <replaceable class="parameter">option</replaceable> can be:</phrase>
27+
28+
TIMEOUT '<replaceable class="parameter">timeout</replaceable>'
29+
NO_THROW
30+
</synopsis>
31+
</refsynopsisdiv>
32+
33+
<refsect1>
34+
<title>Description</title>
35+
36+
<para>
37+
Waits until recovery replays <parameter>lsn</parameter>.
38+
If no <parameter>timeout</parameter> is specified or it is set to
39+
zero, this command waits indefinitely for the
40+
<parameter>lsn</parameter>.
41+
On timeout, or if the server is promoted before
42+
<parameter>lsn</parameter> is reached, an error is emitted,
43+
unless <literal>NO_THROW</literal> is specified in the WITH clause.
44+
If <parameter>NO_THROW</parameter> is specified, then the command
45+
doesn't throw errors.
46+
</para>
47+
48+
<para>
49+
The possible return values are <literal>success</literal>,
50+
<literal>timeout</literal>, and <literal>not in recovery</literal>.
51+
</para>
52+
</refsect1>
53+
54+
<refsect1>
55+
<title>Parameters</title>
56+
57+
<variablelist>
58+
<varlistentry>
59+
<term><replaceable class="parameter">lsn</replaceable></term>
60+
<listitem>
61+
<para>
62+
Specifies the target <acronym>LSN</acronym> to wait for.
63+
</para>
64+
</listitem>
65+
</varlistentry>
66+
67+
<varlistentry>
68+
<term><literal>WITH ( <replaceable class="parameter">option</replaceable> [, ...] )</literal></term>
69+
<listitem>
70+
<para>
71+
This clause specifies optional parameters for the wait operation.
72+
The following parameters are supported:
73+
74+
<variablelist>
75+
<varlistentry>
76+
<term><literal>TIMEOUT</literal> '<replaceable class="parameter">timeout</replaceable>'</term>
77+
<listitem>
78+
<para>
79+
When specified and <parameter>timeout</parameter> is greater than zero,
80+
the command waits until <parameter>lsn</parameter> is reached or
81+
the specified <parameter>timeout</parameter> has elapsed.
82+
</para>
83+
<para>
84+
The <parameter>timeout</parameter> might be given as integer number of
85+
milliseconds. Also it might be given as string literal with
86+
integer number of milliseconds or a number with unit
87+
(see <xref linkend="config-setting-names-values"/>).
88+
</para>
89+
</listitem>
90+
</varlistentry>
91+
92+
<varlistentry>
93+
<term><literal>NO_THROW</literal></term>
94+
<listitem>
95+
<para>
96+
Specify to not throw an error in the case of timeout or
97+
running on the primary. In this case the result status can be get from
98+
the return value.
99+
</para>
100+
</listitem>
101+
</varlistentry>
102+
</variablelist>
103+
</para>
104+
</listitem>
105+
</varlistentry>
106+
</variablelist>
107+
</refsect1>
108+
109+
<refsect1>
110+
<title>Outputs</title>
111+
112+
<variablelist>
113+
<varlistentry>
114+
<term><literal>success</literal></term>
115+
<listitem>
116+
<para>
117+
This return value denotes that we have successfully reached
118+
the target <parameter>lsn</parameter>.
119+
</para>
120+
</listitem>
121+
</varlistentry>
122+
123+
<varlistentry>
124+
<term><literal>timeout</literal></term>
125+
<listitem>
126+
<para>
127+
This return value denotes that the timeout happened before reaching
128+
the target <parameter>lsn</parameter>.
129+
</para>
130+
</listitem>
131+
</varlistentry>
132+
133+
<varlistentry>
134+
<term><literal>not in recovery</literal></term>
135+
<listitem>
136+
<para>
137+
This return value denotes that the database server is not in a recovery
138+
state. This might mean either the database server was not in recovery
139+
at the moment of receiving the command, or it was promoted before
140+
reaching the target <parameter>lsn</parameter>.
141+
</para>
142+
</listitem>
143+
</varlistentry>
144+
</variablelist>
145+
</refsect1>
146+
147+
<refsect1>
148+
<title>Notes</title>
149+
150+
<para>
151+
<command>WAIT FOR</command> command waits till
152+
<parameter>lsn</parameter> to be replayed on standby.
153+
That is, after this command execution, the value returned by
154+
<function>pg_last_wal_replay_lsn</function> should be greater or equal
155+
to the <parameter>lsn</parameter> value. This is useful to achieve
156+
read-your-writes-consistency, while using async replica for reads and
157+
primary for writes. In that case, the <acronym>lsn</acronym> of the last
158+
modification should be stored on the client application side or the
159+
connection pooler side.
160+
</para>
161+
162+
<para>
163+
<command>WAIT FOR</command> command should be called on standby.
164+
If a user runs <command>WAIT FOR</command> on primary, it
165+
will error out unless <parameter>NO_THROW</parameter> is specified in the WITH clause.
166+
However, if <command>WAIT FOR</command> is
167+
called on primary promoted from standby and <literal>lsn</literal>
168+
was already replayed, then the <command>WAIT FOR</command> command just
169+
exits immediately.
170+
</para>
171+
172+
</refsect1>
173+
174+
<refsect1>
175+
<title>Examples</title>
176+
177+
<para>
178+
You can use <command>WAIT FOR</command> command to wait for
179+
the <type>pg_lsn</type> value. For example, an application could update
180+
the <literal>movie</literal> table and get the <acronym>lsn</acronym> after
181+
changes just made. This example uses <function>pg_current_wal_insert_lsn</function>
182+
on primary server to get the <acronym>lsn</acronym> given that
183+
<varname>synchronous_commit</varname> could be set to
184+
<literal>off</literal>.
185+
186+
<programlisting>
187+
postgres=# UPDATE movie SET genre = 'Dramatic' WHERE genre = 'Drama';
188+
UPDATE 100
189+
postgres=# SELECT pg_current_wal_insert_lsn();
190+
pg_current_wal_insert_lsn
191+
--------------------
192+
0/306EE20
193+
(1 row)
194+
</programlisting>
195+
196+
Then an application could run <command>WAIT FOR</command>
197+
with the <parameter>lsn</parameter> obtained from primary. After that the
198+
changes made on primary should be guaranteed to be visible on replica.
199+
200+
<programlisting>
201+
postgres=# WAIT FOR LSN '0/306EE20';
202+
status
203+
--------
204+
success
205+
(1 row)
206+
postgres=# SELECT * FROM movie WHERE genre = 'Drama';
207+
genre
208+
-------
209+
(0 rows)
210+
</programlisting>
211+
</para>
212+
213+
<para>
214+
If the target LSN is not reached before the timeout, the error is thrown.
215+
216+
<programlisting>
217+
postgres=# WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '0.1s');
218+
ERROR: timed out while waiting for target LSN 0/306EE20 to be replayed; current replay LSN 0/306EA60
219+
</programlisting>
220+
</para>
221+
222+
<para>
223+
The same example uses <command>WAIT FOR</command> with
224+
<parameter>NO_THROW</parameter> option.
225+
<programlisting>
226+
postgres=# WAIT FOR LSN '0/306EE20' WITH (TIMEOUT '100ms', NO_THROW);
227+
status
228+
--------
229+
timeout
230+
(1 row)
231+
</programlisting>
232+
</para>
233+
</refsect1>
234+
</refentry>

doc/src/sgml/reference.sgml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@
216216
&update;
217217
&vacuum;
218218
&values;
219+
&waitFor;
219220

220221
</reference>
221222

src/backend/access/transam/xact.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include "access/xloginsert.h"
3232
#include "access/xlogrecovery.h"
3333
#include "access/xlogutils.h"
34+
#include "access/xlogwait.h"
3435
#include "catalog/index.h"
3536
#include "catalog/namespace.h"
3637
#include "catalog/pg_enum.h"
@@ -2843,6 +2844,11 @@ AbortTransaction(void)
28432844
*/
28442845
LWLockReleaseAll();
28452846

2847+
/*
2848+
* Cleanup waiting for LSN if any.
2849+
*/
2850+
WaitLSNCleanup();
2851+
28462852
/* Clear wait information and command progress indicator */
28472853
pgstat_report_wait_end();
28482854
pgstat_progress_end_command();

src/backend/access/transam/xlog.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262
#include "access/xlogreader.h"
6363
#include "access/xlogrecovery.h"
6464
#include "access/xlogutils.h"
65+
#include "access/xlogwait.h"
6566
#include "backup/basebackup.h"
6667
#include "catalog/catversion.h"
6768
#include "catalog/pg_control.h"
@@ -6227,6 +6228,12 @@ StartupXLOG(void)
62276228
UpdateControlFile();
62286229
LWLockRelease(ControlFileLock);
62296230

6231+
/*
6232+
* Wake up all waiters for replay LSN. They need to report an error that
6233+
* recovery was ended before reaching the target LSN.
6234+
*/
6235+
WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr);
6236+
62306237
/*
62316238
* Shutdown the recovery environment. This must occur after
62326239
* RecoverPreparedTransactions() (see notes in lock_twophase_recover())

src/backend/access/transam/xlogrecovery.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
#include "access/xlogreader.h"
4141
#include "access/xlogrecovery.h"
4242
#include "access/xlogutils.h"
43+
#include "access/xlogwait.h"
4344
#include "backup/basebackup.h"
4445
#include "catalog/pg_control.h"
4546
#include "commands/tablespace.h"
@@ -1838,6 +1839,16 @@ PerformWalRecovery(void)
18381839
break;
18391840
}
18401841

1842+
/*
1843+
* If we replayed an LSN that someone was waiting for then walk
1844+
* over the shared memory array and set latches to notify the
1845+
* waiters.
1846+
*/
1847+
if (waitLSNState &&
1848+
(XLogRecoveryCtl->lastReplayedEndRecPtr >=
1849+
pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_REPLAY])))
1850+
WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, XLogRecoveryCtl->lastReplayedEndRecPtr);
1851+
18411852
/* Else, try to fetch the next WAL record */
18421853
record = ReadRecord(xlogprefetcher, LOG, false, replayTLI);
18431854
} while (record != NULL);

src/backend/commands/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ OBJS = \
6464
vacuum.o \
6565
vacuumparallel.o \
6666
variable.o \
67-
view.o
67+
view.o \
68+
wait.o
6869

6970
include $(top_srcdir)/src/backend/common.mk

src/backend/commands/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,5 @@ backend_sources += files(
5353
'vacuumparallel.c',
5454
'variable.c',
5555
'view.c',
56+
'wait.c',
5657
)

0 commit comments

Comments
 (0)