Waiting for PostgreSQL 18 – Non text modes for pg_dumpall, correspondingly change pg_restore

On 4th of April 2025, Andrew Dunstan committed patch:

Non text modes for pg_dumpall, correspondingly change pg_restore
 
pg_dumpall acquires a new -F/--format option, with the same meanings as
pg_dump. The default is p, meaning plain text. For any other value, a
directory is created containing two files, globals.data and map.dat. The
first contains SQL for restoring the global data, and the second
contains a map from oids to database names. It will also contain a
subdirectory called databases, inside which it will create archives in
the specified format, named using the database oids.
 
In these casess the -f argument is required.
 
If pg_restore encounters a directory containing globals.dat, and no
toc.dat, it restores the global settings and then restores each
database.
 
pg_restore acquires two new options: -g/--globals-only which suppresses
restoration of any databases, and --exclude-database which inhibits
restoration of particualr database(s) in the same way the same option
works in pg_dumpall.
 
Author: Mahendra Singh Thalor <[email protected]>
Co-authored-by:  Andrew Dunstan <[email protected]>
Reviewed-by: jian he <[email protected]>
Reviewed-by: Srinath Reddy <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/[email protected]

OMG! While it doesn't look like it given the delay between commit and me writing about it, I really can't contain my enthusiasm.

Up to this moment I really disliked pg_dumpall. The reasons are (were?) it's slowness and lack of ability to pick what I want to restore. This is (was?) due to the fact that dumpall simply ran, normal pg_dump (well, kinda) with plain text format, sequentially.

But now, in one HUGE step, it becomes actually amazing.

Let's get some comparison data. On my test system I have couple of DBs:

=$ select datname, pg_size_pretty( pg_database_size(oid) ) from pg_database order by pg_database_size(oid) desc;
    datname     │ pg_size_pretty
────────────────┼────────────────
 depesz_explain │ 7325 MB
 depesz         │ 238 MB
 pgdoc          │ 22 MB
 depesz_paste   │ 18 MB
 inst           │ 16 MB
 template1      │ 7774 kB
 pgdba          │ 7694 kB
 template0      │ 7694 kB
 postgres       │ 7694 kB
(9 rows)

One of them is significantly larger than others. Normal, old-style pg_dumpall:

=$ time pg_dumpall -f the.dump.base.sql
 
real    0m58.893s
user    0m2.892s
sys     0m28.562s
 
=$ du -h the.dump.base.sql
32G     the.dump.base.sql
 
=$ time gzip -vk the.dump.base.sql
the.dump.base.sql:       88.4% -- created the.dump.base.sql.gz
 
real    6m31.129s
user    6m8.112s
sys     0m21.083s
 
=$ du -h the.dump.base.sql.gz
3.7G    the.dump.base.sql.gz

So, dump all all DBs, to single file, took around a minute, but the file is HUGE – 32GB. Compressing took 6.5 minutes, and reduced the size to 3.7GB.

For completion sake, I also got old style dump, but with on-the-fly compression:

=$ time pg_dumpall | gzip - > the.dump.compress.pipe.sql.gz
 
real    6m27.469s
user    6m18.340s
sys     0m31.272s
 
=$ du -h the.dump.compress.pipe.sql.gz
3.7G    the.dump.compress.pipe.sql.gz

OK. All makes sense.

Now. Let's see what else can we do:

=$ time pg_dumpall -Fc -f the.dump.Fc.dump
 
real    5m55.864s
user    5m38.110s
sys     0m11.658s
 
=$ du -sh the.dump.Fc.dump
3.7G    the.dump.Fc.dump

Dumping with -Fc allowed me to reduce time a bit.

What's more, while I can't (easily) limit which objects to load from which DB, I can use pg_restore to load it, and use it's new options:

=$ pg_restore --globals-only -f globals.sql the.dump.Fc.dump/

to restore just globals (roles, group membership, tablespaces).

Or, I can even restore some of the DBs. There is option: –exclude-database=…. But how do I know what DBs are there in the dump. Let's see what is in such dump:

=$ find the.dump.Fc.dump/ -ls
 94784725      4 drwx------   3 depesz   depesz       4096 Apr 15 11:39 the.dump.Fc.dump/
 94784726      4 -rw-rw-r--   1 depesz   depesz       1751 Apr 15 11:45 the.dump.Fc.dump/global.dat
 94784728      4 -rw-rw-r--   1 depesz   depesz        116 Apr 15 11:45 the.dump.Fc.dump/map.dat
 94784727      4 drwx------   2 depesz   depesz       4096 Apr 15 11:45 the.dump.Fc.dump/databases
 94784747   2348 -rw-rw-r--   1 depesz   depesz    2403266 Apr 15 11:45 the.dump.Fc.dump/databases/574053.dmp
 94784744  24644 -rw-rw-r--   1 depesz   depesz   25233590 Apr 15 11:39 the.dump.Fc.dump/databases/571913.dmp
 94784758    704 -rw-rw-r--   1 depesz   depesz     718709 Apr 15 11:45 the.dump.Fc.dump/databases/571987.dmp
 94784745 3807448 -rw-rw-r--   1 depesz   depesz   3898822633 Apr 15 11:45 the.dump.Fc.dump/databases/16395.dmp
 94784759       4 -rw-rw-r--   1 depesz   depesz         1056 Apr 15 11:45 the.dump.Fc.dump/databases/5.dmp
 94784746    2224 -rw-rw-r--   1 depesz   depesz      2276688 Apr 15 11:45 the.dump.Fc.dump/databases/572133.dmp
 94784753       4 -rw-rw-r--   1 depesz   depesz          832 Apr 15 11:45 the.dump.Fc.dump/databases/574187.dmp
 94784742       4 -rw-rw-r--   1 depesz   depesz         1561 Apr 15 11:39 the.dump.Fc.dump/databases/1.dmp

So, names of files are not really helpful. But, there is a file that kinda looks promising:

=$ cat the.dump.Fc.dump/map.dat
1 template1
571913 depesz
16395 depesz_explain
572133 depesz_paste
574053 inst
574187 pgdba
571987 pgdoc
5 postgres

Given the names I can decide that I just want db pgdba by filtering out the rest:

=$ pg_restore -C -f just.one.db.sql --exclude-database="depesz*|template*|inst|pgdoc" the.dump.Fc.dump/

While this works, I can't, for example, reverse selections – just restore one DB, or don't restore globals. But I'd assume this can be fixed.

If I'd like to restore single DB, I can just as well, just restore ond of the .dmp files from databases/ directory, which seem to be normal pg_dump -Fc files:

=$ file the.dump.Fc.dump/databases/574187.dmp
the.dump.Fc.dump/databases/574187.dmp: PostgreSQL custom database dump - v1.16-0
 
=$ pg_restore -l the.dump.Fc.dump/databases/574187.dmp
;
; Archive created at 2025-04-15 11:45:33 CEST
;     dbname: pgdba
;     TOC Entries: 4
;     Compression: gzip
;     Dump Version: 1.16-0
;     Format: CUSTOM
…

pg_dumpall now supports also directory format, which I love in pg_dump, but in pg_dumpall, even with -Fd, we can't use parallelization, as it doesn't have –jobs option. I'd say that until it will allow parallelization of dumps, -Fd is mostly useless (in pg_dumpall!), and if you want dumps to be made fast, pg_dump calls per database, with -Fd and appropritate -j N are the best way.

But, even without -j, this change is great. Thanks a lot to everyone involved.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.