Skip to content

Commit 7caf3d4

Browse files
Álvaro HerreraCommitfest Bot
authored andcommitted
Add REPACK command
REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single command. Because this functionality is completely different from regular VACUUM, having it separate from VACUUM makes it easier for users to understand; as for CLUSTER, the term is heavily overloaded in the IT world and even in Postgres itself, so it's good that we can avoid it. This also adds pg_repackdb, a new utility that can invoke the new commands. This is heavily based on vacuumdb. Author: Antonin Houska <[email protected]> Co-authored-by: Álvaro Herrera <[email protected]> Reviewed-by: Mihail Nikalayeu <[email protected]> Reviewed-by: Robert Treat <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Matheus Alcantara <[email protected]> Reviewed-by: Junwang Zhao <[email protected]> Reviewed-by: jian he <[email protected]> Discussion: https://postgr.es/m/82651.1720540558@antos Discussion: https://postgr.es/m/[email protected]
1 parent 35e53b6 commit 7caf3d4

File tree

33 files changed

+2481
-543
lines changed

33 files changed

+2481
-543
lines changed

doc/src/sgml/monitoring.sgml

Lines changed: 222 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
405405
</entry>
406406
</row>
407407

408+
<row>
409+
<entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
410+
<entry>One row for each backend running
411+
<command>REPACK</command>, showing current progress. See
412+
<xref linkend="repack-progress-reporting"/>.
413+
</entry>
414+
</row>
415+
408416
<row>
409417
<entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
410418
<entry>One row for each WAL sender process streaming a base backup,
@@ -5575,7 +5583,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
55755583
certain commands during command execution. Currently, the only commands
55765584
which support progress reporting are <command>ANALYZE</command>,
55775585
<command>CLUSTER</command>,
5578-
<command>CREATE INDEX</command>, <command>VACUUM</command>,
5586+
<command>CREATE INDEX</command>, <command>REPACK</command>,
5587+
<command>VACUUM</command>,
55795588
<command>COPY</command>,
55805589
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
55815590
command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -6034,6 +6043,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
60346043
</table>
60356044
</sect2>
60366045

6046+
<sect2 id="repack-progress-reporting">
6047+
<title>REPACK Progress Reporting</title>
6048+
6049+
<indexterm>
6050+
<primary>pg_stat_progress_repack</primary>
6051+
</indexterm>
6052+
6053+
<para>
6054+
Whenever <command>REPACK</command> is running,
6055+
the <structname>pg_stat_progress_repack</structname> view will contain a
6056+
row for each backend that is currently running the command. The tables
6057+
below describe the information that will be reported and provide
6058+
information about how to interpret it.
6059+
</para>
6060+
6061+
<table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
6062+
<title><structname>pg_stat_progress_repack</structname> View</title>
6063+
<tgroup cols="1">
6064+
<thead>
6065+
<row>
6066+
<entry role="catalog_table_entry"><para role="column_definition">
6067+
Column Type
6068+
</para>
6069+
<para>
6070+
Description
6071+
</para></entry>
6072+
</row>
6073+
</thead>
6074+
6075+
<tbody>
6076+
<row>
6077+
<entry role="catalog_table_entry"><para role="column_definition">
6078+
<structfield>pid</structfield> <type>integer</type>
6079+
</para>
6080+
<para>
6081+
Process ID of backend.
6082+
</para></entry>
6083+
</row>
6084+
6085+
<row>
6086+
<entry role="catalog_table_entry"><para role="column_definition">
6087+
<structfield>datid</structfield> <type>oid</type>
6088+
</para>
6089+
<para>
6090+
OID of the database to which this backend is connected.
6091+
</para></entry>
6092+
</row>
6093+
6094+
<row>
6095+
<entry role="catalog_table_entry"><para role="column_definition">
6096+
<structfield>datname</structfield> <type>name</type>
6097+
</para>
6098+
<para>
6099+
Name of the database to which this backend is connected.
6100+
</para></entry>
6101+
</row>
6102+
6103+
<row>
6104+
<entry role="catalog_table_entry"><para role="column_definition">
6105+
<structfield>relid</structfield> <type>oid</type>
6106+
</para>
6107+
<para>
6108+
OID of the table being repacked.
6109+
</para></entry>
6110+
</row>
6111+
6112+
<row>
6113+
<entry role="catalog_table_entry"><para role="column_definition">
6114+
<structfield>phase</structfield> <type>text</type>
6115+
</para>
6116+
<para>
6117+
Current processing phase. See <xref linkend="repack-phases"/>.
6118+
</para></entry>
6119+
</row>
6120+
6121+
<row>
6122+
<entry role="catalog_table_entry"><para role="column_definition">
6123+
<structfield>repack_index_relid</structfield> <type>oid</type>
6124+
</para>
6125+
<para>
6126+
If the table is being scanned using an index, this is the OID of the
6127+
index being used; otherwise, it is zero.
6128+
</para></entry>
6129+
</row>
6130+
6131+
<row>
6132+
<entry role="catalog_table_entry"><para role="column_definition">
6133+
<structfield>heap_tuples_scanned</structfield> <type>bigint</type>
6134+
</para>
6135+
<para>
6136+
Number of heap tuples scanned.
6137+
This counter only advances when the phase is
6138+
<literal>seq scanning heap</literal>,
6139+
<literal>index scanning heap</literal>
6140+
or <literal>writing new heap</literal>.
6141+
</para></entry>
6142+
</row>
6143+
6144+
<row>
6145+
<entry role="catalog_table_entry"><para role="column_definition">
6146+
<structfield>heap_tuples_written</structfield> <type>bigint</type>
6147+
</para>
6148+
<para>
6149+
Number of heap tuples written.
6150+
This counter only advances when the phase is
6151+
<literal>seq scanning heap</literal>,
6152+
<literal>index scanning heap</literal>
6153+
or <literal>writing new heap</literal>.
6154+
</para></entry>
6155+
</row>
6156+
6157+
<row>
6158+
<entry role="catalog_table_entry"><para role="column_definition">
6159+
<structfield>heap_blks_total</structfield> <type>bigint</type>
6160+
</para>
6161+
<para>
6162+
Total number of heap blocks in the table. This number is reported
6163+
as of the beginning of <literal>seq scanning heap</literal>.
6164+
</para></entry>
6165+
</row>
6166+
6167+
<row>
6168+
<entry role="catalog_table_entry"><para role="column_definition">
6169+
<structfield>heap_blks_scanned</structfield> <type>bigint</type>
6170+
</para>
6171+
<para>
6172+
Number of heap blocks scanned. This counter only advances when the
6173+
phase is <literal>seq scanning heap</literal>.
6174+
</para></entry>
6175+
</row>
6176+
6177+
<row>
6178+
<entry role="catalog_table_entry"><para role="column_definition">
6179+
<structfield>index_rebuild_count</structfield> <type>bigint</type>
6180+
</para>
6181+
<para>
6182+
Number of indexes rebuilt. This counter only advances when the phase
6183+
is <literal>rebuilding index</literal>.
6184+
</para></entry>
6185+
</row>
6186+
</tbody>
6187+
</tgroup>
6188+
</table>
6189+
6190+
<table id="repack-phases">
6191+
<title>REPACK Phases</title>
6192+
<tgroup cols="2">
6193+
<colspec colname="col1" colwidth="1*"/>
6194+
<colspec colname="col2" colwidth="2*"/>
6195+
<thead>
6196+
<row>
6197+
<entry>Phase</entry>
6198+
<entry>Description</entry>
6199+
</row>
6200+
</thead>
6201+
6202+
<tbody>
6203+
<row>
6204+
<entry><literal>initializing</literal></entry>
6205+
<entry>
6206+
The command is preparing to begin scanning the heap. This phase is
6207+
expected to be very brief.
6208+
</entry>
6209+
</row>
6210+
<row>
6211+
<entry><literal>seq scanning heap</literal></entry>
6212+
<entry>
6213+
The command is currently scanning the table using a sequential scan.
6214+
</entry>
6215+
</row>
6216+
<row>
6217+
<entry><literal>index scanning heap</literal></entry>
6218+
<entry>
6219+
<command>REPACK</command> is currently scanning the table using an index scan.
6220+
</entry>
6221+
</row>
6222+
<row>
6223+
<entry><literal>sorting tuples</literal></entry>
6224+
<entry>
6225+
<command>REPACK</command> is currently sorting tuples.
6226+
</entry>
6227+
</row>
6228+
<row>
6229+
<entry><literal>writing new heap</literal></entry>
6230+
<entry>
6231+
<command>REPACK</command> is currently writing the new heap.
6232+
</entry>
6233+
</row>
6234+
<row>
6235+
<entry><literal>swapping relation files</literal></entry>
6236+
<entry>
6237+
The command is currently swapping newly-built files into place.
6238+
</entry>
6239+
</row>
6240+
<row>
6241+
<entry><literal>rebuilding index</literal></entry>
6242+
<entry>
6243+
The command is currently rebuilding an index.
6244+
</entry>
6245+
</row>
6246+
<row>
6247+
<entry><literal>performing final cleanup</literal></entry>
6248+
<entry>
6249+
The command is performing final cleanup. When this phase is
6250+
completed, <command>REPACK</command> will end.
6251+
</entry>
6252+
</row>
6253+
</tbody>
6254+
</tgroup>
6255+
</table>
6256+
</sect2>
6257+
60376258
<sect2 id="copy-progress-reporting">
60386259
<title>COPY Progress Reporting</title>
60396260

doc/src/sgml/ref/allfiles.sgml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
167167
<!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
168168
<!ENTITY reindex SYSTEM "reindex.sgml">
169169
<!ENTITY releaseSavepoint SYSTEM "release_savepoint.sgml">
170+
<!ENTITY repack SYSTEM "repack.sgml">
170171
<!ENTITY reset SYSTEM "reset.sgml">
171172
<!ENTITY revoke SYSTEM "revoke.sgml">
172173
<!ENTITY rollback SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
212213
<!ENTITY pgIsready SYSTEM "pg_isready.sgml">
213214
<!ENTITY pgReceivewal SYSTEM "pg_receivewal.sgml">
214215
<!ENTITY pgRecvlogical SYSTEM "pg_recvlogical.sgml">
216+
<!ENTITY pgRepackdb SYSTEM "pg_repackdb.sgml">
215217
<!ENTITY pgResetwal SYSTEM "pg_resetwal.sgml">
216218
<!ENTITY pgRestore SYSTEM "pg_restore.sgml">
217219
<!ENTITY pgRewind SYSTEM "pg_rewind.sgml">

doc/src/sgml/ref/cluster.sgml

Lines changed: 4 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
3333
<title>Description</title>
3434

3535
<para>
36-
<command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
37-
to cluster the table specified
38-
by <replaceable class="parameter">table_name</replaceable>
39-
based on the index specified by
40-
<replaceable class="parameter">index_name</replaceable>. The index must
41-
already have been defined on
42-
<replaceable class="parameter">table_name</replaceable>.
36+
The <command>CLUSTER</command> command is equivalent to
37+
<xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
38+
clause. See there for more details.
4339
</para>
4440

45-
<para>
46-
When a table is clustered, it is physically reordered
47-
based on the index information. Clustering is a one-time operation:
48-
when the table is subsequently updated, the changes are
49-
not clustered. That is, no attempt is made to store new or
50-
updated rows according to their index order. (If one wishes, one can
51-
periodically recluster by issuing the command again. Also, setting
52-
the table's <literal>fillfactor</literal> storage parameter to less than
53-
100% can aid in preserving cluster ordering during updates, since updated
54-
rows are kept on the same page if enough space is available there.)
55-
</para>
56-
57-
<para>
58-
When a table is clustered, <productname>PostgreSQL</productname>
59-
remembers which index it was clustered by. The form
60-
<command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
61-
reclusters the table using the same index as before. You can also
62-
use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
63-
forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
64-
future cluster operations, or to clear any previous setting.
65-
</para>
66-
67-
<para>
68-
<command>CLUSTER</command> without a
69-
<replaceable class="parameter">table_name</replaceable> reclusters all the
70-
previously-clustered tables in the current database that the calling user
71-
has privileges for. This form of <command>CLUSTER</command> cannot be
72-
executed inside a transaction block.
73-
</para>
41+
<!-- Do we need to describe exactly which options map to what? They seem obvious to me. -->
7442

75-
<para>
76-
When a table is being clustered, an <literal>ACCESS
77-
EXCLUSIVE</literal> lock is acquired on it. This prevents any other
78-
database operations (both reads and writes) from operating on the
79-
table until the <command>CLUSTER</command> is finished.
80-
</para>
8143
</refsect1>
8244

8345
<refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
13698
on the table.
13799
</para>
138100

139-
<para>
140-
In cases where you are accessing single rows randomly
141-
within a table, the actual order of the data in the
142-
table is unimportant. However, if you tend to access some
143-
data more than others, and there is an index that groups
144-
them together, you will benefit from using <command>CLUSTER</command>.
145-
If you are requesting a range of indexed values from a table, or a
146-
single indexed value that has multiple rows that match,
147-
<command>CLUSTER</command> will help because once the index identifies the
148-
table page for the first row that matches, all other rows
149-
that match are probably already on the same table page,
150-
and so you save disk accesses and speed up the query.
151-
</para>
152-
153-
<para>
154-
<command>CLUSTER</command> can re-sort the table using either an index scan
155-
on the specified index, or (if the index is a b-tree) a sequential
156-
scan followed by sorting. It will attempt to choose the method that
157-
will be faster, based on planner cost parameters and available statistical
158-
information.
159-
</para>
160-
161101
<para>
162102
While <command>CLUSTER</command> is running, the <xref
163103
linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
164104
pg_temp</literal>.
165105
</para>
166106

167-
<para>
168-
When an index scan is used, a temporary copy of the table is created that
169-
contains the table data in the index order. Temporary copies of each
170-
index on the table are created as well. Therefore, you need free space on
171-
disk at least equal to the sum of the table size and the index sizes.
172-
</para>
173-
174-
<para>
175-
When a sequential scan and sort is used, a temporary sort file is
176-
also created, so that the peak temporary space requirement is as much
177-
as double the table size, plus the index sizes. This method is often
178-
faster than the index scan method, but if the disk space requirement is
179-
intolerable, you can disable this choice by temporarily setting <xref
180-
linkend="guc-enable-sort"/> to <literal>off</literal>.
181-
</para>
182-
183-
<para>
184-
It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
185-
a reasonably large value (but not more than the amount of RAM you can
186-
dedicate to the <command>CLUSTER</command> operation) before clustering.
187-
</para>
188-
189-
<para>
190-
Because the planner records statistics about the ordering of
191-
tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
192-
on the newly clustered table.
193-
Otherwise, the planner might make poor choices of query plans.
194-
</para>
195-
196107
<para>
197108
Because <command>CLUSTER</command> remembers which indexes are clustered,
198109
one can cluster the tables one wants clustered manually the first time,

doc/src/sgml/ref/clusterdb.sgml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ PostgreSQL documentation
6464
this utility and via other methods for accessing the server.
6565
</para>
6666

67+
<para>
68+
<application>clusterdb</application> has been superceded by
69+
<application>pg_repackdb</application>.
70+
</para>
71+
6772
</refsect1>
6873

6974

0 commit comments

Comments
 (0)