Skip to content

Commit 1e417c3

Browse files
author
Commitfest Bot
committed
[CF 5323] v20251105 - Support enabling checksums online
This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/5323 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/[email protected] Author(s): Magnus Hagander, Daniel Gustafsson
2 parents a4fd971 + aeb9712 commit 1e417c3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+4841
-51
lines changed

doc/src/sgml/func/func-admin.sgml

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2979,4 +2979,75 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
29792979

29802980
</sect2>
29812981

2982+
<sect2 id="functions-admin-checksum">
2983+
<title>Data Checksum Functions</title>
2984+
2985+
<para>
2986+
The functions shown in <xref linkend="functions-checksums-table" /> can
2987+
be used to enable or disable data checksums in a running cluster.
2988+
See <xref linkend="checksums" /> for details.
2989+
</para>
2990+
2991+
<table id="functions-checksums-table">
2992+
<title>Data Checksum Functions</title>
2993+
<tgroup cols="1">
2994+
<thead>
2995+
<row>
2996+
<entry role="func_table_entry"><para role="func_signature">
2997+
Function
2998+
</para>
2999+
<para>
3000+
Description
3001+
</para></entry>
3002+
</row>
3003+
</thead>
3004+
3005+
<tbody>
3006+
<row>
3007+
<entry role="func_table_entry"><para role="func_signature">
3008+
<indexterm>
3009+
<primary>pg_enable_data_checksums</primary>
3010+
</indexterm>
3011+
<function>pg_enable_data_checksums</function> ( <optional><parameter>cost_delay</parameter> <type>int</type>, <parameter>cost_limit</parameter> <type>int</type></optional> )
3012+
<returnvalue>void</returnvalue>
3013+
</para>
3014+
<para>
3015+
Initiates data checksums for the cluster. This will switch the data
3016+
checksums mode to <literal>inprogress-on</literal> as well as start a
3017+
background worker that will process all pages in the database and
3018+
enable checksums on them. When all data pages have had checksums
3019+
enabled, the cluster will automatically switch data checksums mode to
3020+
<literal>on</literal>.
3021+
</para>
3022+
<para>
3023+
If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
3024+
specified, the speed of the process is throttled using the same principles as
3025+
<link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
3026+
</para></entry>
3027+
</row>
3028+
3029+
<row>
3030+
<entry role="func_table_entry"><para role="func_signature">
3031+
<indexterm>
3032+
<primary>pg_disable_data_checksums</primary>
3033+
</indexterm>
3034+
<function>pg_disable_data_checksums</function> ()
3035+
<returnvalue>void</returnvalue>
3036+
</para>
3037+
<para>
3038+
Disables data checksum validation and calculation for the cluster. This
3039+
will switch the data checksum mode to <literal>inprogress-off</literal>
3040+
while data checksums are being disabled. When all active backends have
3041+
stopped validating data checksums, the data checksum mode will be
3042+
changed to <literal>off</literal>. At this point the data pages will
3043+
still have checksums recorded but they are not updated when pages are
3044+
modified.
3045+
</para></entry>
3046+
</row>
3047+
</tbody>
3048+
</tgroup>
3049+
</table>
3050+
3051+
</sect2>
3052+
29823053
</sect1>

doc/src/sgml/glossary.sgml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,8 @@
184184
(but not the autovacuum workers),
185185
the <glossterm linkend="glossary-background-writer">background writer</glossterm>,
186186
the <glossterm linkend="glossary-checkpointer">checkpointer</glossterm>,
187+
the <glossterm linkend="glossary-data-checksums-worker">data checksums worker</glossterm>,
188+
the <glossterm linkend="glossary-data-checksums-worker-launcher">data checksums worker launcher</glossterm>,
187189
the <glossterm linkend="glossary-logger">logger</glossterm>,
188190
the <glossterm linkend="glossary-startup-process">startup process</glossterm>,
189191
the <glossterm linkend="glossary-wal-archiver">WAL archiver</glossterm>,
@@ -573,6 +575,27 @@
573575
</glossdef>
574576
</glossentry>
575577

578+
<glossentry id="glossary-data-checksums-worker">
579+
<glossterm>Data Checksums Worker</glossterm>
580+
<glossdef>
581+
<para>
582+
An <glossterm linkend="glossary-auxiliary-proc">auxiliary process</glossterm>
583+
which enables or disables data checksums in a specific database.
584+
</para>
585+
</glossdef>
586+
</glossentry>
587+
588+
<glossentry id="glossary-data-checksums-worker-launcher">
589+
<glossterm>Data Checksums Worker Launcher</glossterm>
590+
<glossdef>
591+
<para>
592+
An <glossterm linkend="glossary-auxiliary-proc">auxiliary process</glossterm>
593+
which starts <glossterm linkend="glossary-data-checksums-worker"> processes</glossterm>
594+
for each database.
595+
</para>
596+
</glossdef>
597+
</glossentry>
598+
576599
<glossentry id="glossary-db-cluster">
577600
<glossterm>Database cluster</glossterm>
578601
<glossdef>

doc/src/sgml/monitoring.sgml

Lines changed: 204 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3551,8 +3551,9 @@ description | Waiting for a newly initialized WAL file to reach durable storage
35513551
</para>
35523552
<para>
35533553
Number of data page checksum failures detected in this
3554-
database (or on a shared object), or NULL if data checksums are
3555-
disabled.
3554+
database (or on a shared object).
3555+
Detected failures are reported regardless of the
3556+
<xref linkend="guc-data-checksums"/> setting.
35563557
</para></entry>
35573558
</row>
35583559

@@ -3562,8 +3563,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage
35623563
</para>
35633564
<para>
35643565
Time at which the last data page checksum failure was detected in
3565-
this database (or on a shared object), or NULL if data checksums are
3566-
disabled.
3566+
this database (or on a shared object). Last failure is reported
3567+
regardless of the <xref linkend="guc-data-checksums"/> setting.
35673568
</para></entry>
35683569
</row>
35693570

@@ -6946,6 +6947,205 @@ FROM pg_stat_get_backend_idset() AS backendid;
69466947

69476948
</sect2>
69486949

6950+
<sect2 id="data-checksum-progress-reporting">
6951+
<title>Data Checksum Progress Reporting</title>
6952+
6953+
<indexterm>
6954+
<primary>pg_stat_progress_data_checksums</primary>
6955+
</indexterm>
6956+
6957+
<para>
6958+
When data checksums are being enabled on a running cluster, the
6959+
<structname>pg_stat_progress_data_checksums</structname> view will contain
6960+
a row for the launcher process, and one row for each worker process which
6961+
is currently calculating checksums for the data pages in one database.
6962+
</para>
6963+
6964+
<table id="pg-stat-progress-data-checksums-view" xreflabel="pg_stat_progress_data_checksums">
6965+
<title><structname>pg_stat_progress_data_checksums</structname> View</title>
6966+
<tgroup cols="1">
6967+
<thead>
6968+
<row>
6969+
<entry role="catalog_table_entry">
6970+
<para role="column_definition">
6971+
Column Type
6972+
</para>
6973+
<para>
6974+
Description>
6975+
</para>
6976+
</entry>
6977+
</row>
6978+
</thead>
6979+
6980+
<tbody>
6981+
<row>
6982+
<entry role="catalog_table_entry">
6983+
<para role="column_definition">
6984+
<structfield>pid</structfield> <type>integer</type>
6985+
</para>
6986+
<para>
6987+
Process ID of a datachecksumworker process.
6988+
</para>
6989+
</entry>
6990+
</row>
6991+
6992+
<row>
6993+
<entry role="catalog_table_entry"><para role="column_definition">
6994+
<structfield>datid</structfield> <type>oid</type>
6995+
</para>
6996+
<para>
6997+
OID of this database, or 0 for the launcher process
6998+
relation
6999+
</para></entry>
7000+
</row>
7001+
7002+
<row>
7003+
<entry role="catalog_table_entry"><para role="column_definition">
7004+
<structfield>datname</structfield> <type>name</type>
7005+
</para>
7006+
<para>
7007+
Name of this database, or <literal>NULL</literal> for the
7008+
launcher process.
7009+
</para></entry>
7010+
</row>
7011+
7012+
<row>
7013+
<entry role="catalog_table_entry">
7014+
<para role="column_definition">
7015+
<structfield>phase</structfield> <type>text</type>
7016+
</para>
7017+
<para>
7018+
Current processing phase, see <xref linkend="datachecksum-phases"/>
7019+
for description of the phases.
7020+
</para>
7021+
</entry>
7022+
</row>
7023+
7024+
<row>
7025+
<entry role="catalog_table_entry">
7026+
<para role="column_definition">
7027+
<structfield>databases_total</structfield> <type>integer</type>
7028+
</para>
7029+
<para>
7030+
The total number of databases which will be processed. Only the
7031+
launcher worker has this value set, the other worker processes
7032+
have this set to <literal>NULL</literal>.
7033+
</para>
7034+
</entry>
7035+
</row>
7036+
7037+
<row>
7038+
<entry role="catalog_table_entry">
7039+
<para role="column_definition">
7040+
<structfield>databases_done</structfield> <type>integer</type>
7041+
</para>
7042+
<para>
7043+
The number of databases which have been processed. Only the
7044+
launcher worker has this value set, the other worker processes
7045+
have this set to <literal>NULL</literal>.
7046+
</para>
7047+
</entry>
7048+
</row>
7049+
7050+
<row>
7051+
<entry role="catalog_table_entry">
7052+
<para role="column_definition">
7053+
<structfield>relations_total</structfield> <type>integer</type>
7054+
</para>
7055+
<para>
7056+
The total number of relations which will be processed, or
7057+
<literal>NULL</literal> if the data checksums worker process hasn't
7058+
calculated the number of relations yet. The launcher process has
7059+
this <literal>NULL</literal>.
7060+
</para>
7061+
</entry>
7062+
</row>
7063+
7064+
<row>
7065+
<entry role="catalog_table_entry">
7066+
<para role="column_definition">
7067+
<structfield>relations_done</structfield> <type>integer</type>
7068+
</para>
7069+
<para>
7070+
The number of relations which have been processed. The launcher
7071+
process has this <literal>NULL</literal>.
7072+
</para>
7073+
</entry>
7074+
</row>
7075+
7076+
<row>
7077+
<entry role="catalog_table_entry">
7078+
<para role="column_definition">
7079+
<structfield>blocks_total</structfield> <type>integer</type>
7080+
</para>
7081+
<para>
7082+
The number of blocks in the current relation which will be processed,
7083+
or <literal>NULL</literal> if the data checksums worker process hasn't
7084+
calculated the number of blocks yet. The launcher process has
7085+
this <literal>NULL</literal>.
7086+
</para>
7087+
</entry>
7088+
</row>
7089+
7090+
<row>
7091+
<entry role="catalog_table_entry">
7092+
<para role="column_definition">
7093+
<structfield>blocks_done</structfield> <type>integer</type>
7094+
</para>
7095+
<para>
7096+
The number of blocks in the current relation which have been processed.
7097+
The launcher process has this <literal>NULL</literal>.
7098+
</para>
7099+
</entry>
7100+
</row>
7101+
7102+
</tbody>
7103+
</tgroup>
7104+
</table>
7105+
7106+
<table id="datachecksum-phases">
7107+
<title>Data Checksum Phases</title>
7108+
<tgroup cols="2">
7109+
<colspec colname="col1" colwidth="1*"/>
7110+
<colspec colname="col2" colwidth="2*"/>
7111+
<thead>
7112+
<row>
7113+
<entry>Phase</entry>
7114+
<entry>Description</entry>
7115+
</row>
7116+
</thead>
7117+
<tbody>
7118+
<row>
7119+
<entry><literal>enabling</literal></entry>
7120+
<entry>
7121+
The command is currently enabling data checksums on the cluster.
7122+
</entry>
7123+
</row>
7124+
<row>
7125+
<entry><literal>disabling</literal></entry>
7126+
<entry>
7127+
The command is currently disabling data checksums on the cluster.
7128+
</entry>
7129+
</row>
7130+
<row>
7131+
<entry><literal>waiting on temporary tables</literal></entry>
7132+
<entry>
7133+
The command is currently waiting for all temporary tables which existed
7134+
at the time the command was started to be removed.
7135+
</entry>
7136+
</row>
7137+
<row>
7138+
<entry><literal>waiting on checkpoint</literal></entry>
7139+
<entry>
7140+
The command is currently waiting for a checkpoint to update the checksum
7141+
state before finishing.
7142+
</entry>
7143+
</row>
7144+
</tbody>
7145+
</tgroup>
7146+
</table>
7147+
</sect2>
7148+
69497149
</sect1>
69507150

69517151
<sect1 id="dynamic-trace">

doc/src/sgml/ref/pg_checksums.sgml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@ PostgreSQL documentation
4545
exit status is nonzero if the operation failed.
4646
</para>
4747

48+
<para>
49+
When enabling checksums, if checksums were in the process of being enabled
50+
when the cluster was shut down, <application>pg_checksums</application>
51+
will still process all relations regardless of the online processing.
52+
</para>
53+
4854
<para>
4955
When verifying checksums, every file in the cluster is scanned. When
5056
enabling checksums, each relation file block with a changed checksum is

doc/src/sgml/regress.sgml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,18 @@ make check-world PG_TEST_EXTRA='kerberos ldap ssl load_balance libpq_encryption'
263263
</programlisting>
264264
The following values are currently supported:
265265
<variablelist>
266+
<varlistentry>
267+
<term><literal>checksum_extended</literal></term>
268+
<listitem>
269+
<para>
270+
Runs additional tests for enabling data checksums which inject delays
271+
and re-tries in the processing, as well as tests that run pgbench
272+
concurrently and randomly restarts the cluster. Some of these test
273+
suites requires injection points enabled in the installation.
274+
</para>
275+
</listitem>
276+
</varlistentry>
277+
266278
<varlistentry>
267279
<term><literal>kerberos</literal></term>
268280
<listitem>

0 commit comments

Comments
 (0)