On 4th of April 2025, Andrew Dunstan committed patch:
Non text modes for pg_dumpall, correspondingly change pg_restore pg_dumpall acquires a new -F/--format option, with the same meanings as pg_dump. The default is p, meaning plain text. For any other value, a directory is created containing two files, globals.data and map.dat. The first contains SQL for restoring the global data, and the second contains a map from oids to database names. It will also contain a subdirectory called databases, inside which it will create archives in the specified format, named using the database oids. In these casess the -f argument is required. If pg_restore encounters a directory containing globals.dat, and no toc.dat, it restores the global settings and then restores each database. pg_restore acquires two new options: -g/--globals-only which suppresses restoration of any databases, and --exclude-database which inhibits restoration of particualr database(s) in the same way the same option works in pg_dumpall. Author: Mahendra Singh Thalor <[email protected]> Co-authored-by: Andrew Dunstan <[email protected]> Reviewed-by: jian he <[email protected]> Reviewed-by: Srinath Reddy <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/[email protected]
OMG! While it doesn't look like it given the delay between commit and me writing about it, I really can't contain my enthusiasm.
Up to this moment I really disliked pg_dumpall. The reasons are (were?) it's slowness and lack of ability to pick what I want to restore. This is (was?) due to the fact that dumpall simply ran, normal pg_dump (well, kinda) with plain text format, sequentially.
But now, in one HUGE step, it becomes actually amazing.
Let's get some comparison data. On my test system I have couple of DBs:
=$ select datname, pg_size_pretty( pg_database_size(oid) ) from pg_database order by pg_database_size(oid) desc; datname │ pg_size_pretty ────────────────┼──────────────── depesz_explain │ 7325 MB depesz │ 238 MB pgdoc │ 22 MB depesz_paste │ 18 MB inst │ 16 MB template1 │ 7774 kB pgdba │ 7694 kB template0 │ 7694 kB postgres │ 7694 kB (9 rows)
One of them is significantly larger than others. Normal, old-style pg_dumpall:
=$ time pg_dumpall -f the.dump.base.sql real 0m58.893s user 0m2.892s sys 0m28.562s =$ du -h the.dump.base.sql 32G the.dump.base.sql =$ time gzip -vk the.dump.base.sql the.dump.base.sql: 88.4% -- created the.dump.base.sql.gz real 6m31.129s user 6m8.112s sys 0m21.083s =$ du -h the.dump.base.sql.gz 3.7G the.dump.base.sql.gz
So, dump all all DBs, to single file, took around a minute, but the file is HUGE – 32GB. Compressing took 6.5 minutes, and reduced the size to 3.7GB.
For completion sake, I also got old style dump, but with on-the-fly compression:
=$ time pg_dumpall | gzip - > the.dump.compress.pipe.sql.gz real 6m27.469s user 6m18.340s sys 0m31.272s =$ du -h the.dump.compress.pipe.sql.gz 3.7G the.dump.compress.pipe.sql.gz
OK. All makes sense.
Now. Let's see what else can we do:
=$ time pg_dumpall -Fc -f the.dump.Fc.dump real 5m55.864s user 5m38.110s sys 0m11.658s =$ du -sh the.dump.Fc.dump 3.7G the.dump.Fc.dump
Dumping with -Fc allowed me to reduce time a bit.
What's more, while I can't (easily) limit which objects to load from which DB, I can use pg_restore to load it, and use it's new options:
=$ pg_restore --globals-only -f globals.sql the.dump.Fc.dump/
to restore just globals (roles, group membership, tablespaces).
Or, I can even restore some of the DBs. There is option: –exclude-database=…. But how do I know what DBs are there in the dump. Let's see what is in such dump:
=$ find the.dump.Fc.dump/ -ls 94784725 4 drwx------ 3 depesz depesz 4096 Apr 15 11:39 the.dump.Fc.dump/ 94784726 4 -rw-rw-r-- 1 depesz depesz 1751 Apr 15 11:45 the.dump.Fc.dump/global.dat 94784728 4 -rw-rw-r-- 1 depesz depesz 116 Apr 15 11:45 the.dump.Fc.dump/map.dat 94784727 4 drwx------ 2 depesz depesz 4096 Apr 15 11:45 the.dump.Fc.dump/databases 94784747 2348 -rw-rw-r-- 1 depesz depesz 2403266 Apr 15 11:45 the.dump.Fc.dump/databases/574053.dmp 94784744 24644 -rw-rw-r-- 1 depesz depesz 25233590 Apr 15 11:39 the.dump.Fc.dump/databases/571913.dmp 94784758 704 -rw-rw-r-- 1 depesz depesz 718709 Apr 15 11:45 the.dump.Fc.dump/databases/571987.dmp 94784745 3807448 -rw-rw-r-- 1 depesz depesz 3898822633 Apr 15 11:45 the.dump.Fc.dump/databases/16395.dmp 94784759 4 -rw-rw-r-- 1 depesz depesz 1056 Apr 15 11:45 the.dump.Fc.dump/databases/5.dmp 94784746 2224 -rw-rw-r-- 1 depesz depesz 2276688 Apr 15 11:45 the.dump.Fc.dump/databases/572133.dmp 94784753 4 -rw-rw-r-- 1 depesz depesz 832 Apr 15 11:45 the.dump.Fc.dump/databases/574187.dmp 94784742 4 -rw-rw-r-- 1 depesz depesz 1561 Apr 15 11:39 the.dump.Fc.dump/databases/1.dmp
So, names of files are not really helpful. But, there is a file that kinda looks promising:
=$ cat the.dump.Fc.dump/map.dat 1 template1 571913 depesz 16395 depesz_explain 572133 depesz_paste 574053 inst 574187 pgdba 571987 pgdoc 5 postgres
Given the names I can decide that I just want db pgdba by filtering out the rest:
=$ pg_restore -C -f just.one.db.sql --exclude-database="depesz*|template*|inst|pgdoc" the.dump.Fc.dump/
While this works, I can't, for example, reverse selections – just restore one DB, or don't restore globals. But I'd assume this can be fixed.
If I'd like to restore single DB, I can just as well, just restore ond of the .dmp files from databases/ directory, which seem to be normal pg_dump -Fc files:
=$ file the.dump.Fc.dump/databases/574187.dmp the.dump.Fc.dump/databases/574187.dmp: PostgreSQL custom database dump - v1.16-0 =$ pg_restore -l the.dump.Fc.dump/databases/574187.dmp ; ; Archive created at 2025-04-15 11:45:33 CEST ; dbname: pgdba ; TOC Entries: 4 ; Compression: gzip ; Dump Version: 1.16-0 ; Format: CUSTOM …
pg_dumpall now supports also directory format, which I love in pg_dump, but in pg_dumpall, even with -Fd, we can't use parallelization, as it doesn't have –jobs option. I'd say that until it will allow parallelization of dumps, -Fd is mostly useless (in pg_dumpall!), and if you want dumps to be made fast, pg_dump calls per database, with -Fd and appropritate -j N are the best way.
But, even without -j, this change is great. Thanks a lot to everyone involved.