From 4fcfe3bc75114404b11ddd8e5c2cf373365e47d3 Mon Sep 17 00:00:00 2001 From: Dimitrios Apostolou Date: Sat, 29 Mar 2025 01:16:07 +0100 Subject: [PATCH] parallel pg_restore: avoid disk seeks when moving short distance forward Improve the performance of parallel pg_restore (-j) from a custom format pg_dump archive that does not include data offsets - typically happening when pg_dump has generated it by writing to stdout instead of a file. Also speeds up restoration of specific tables (-t tablename). In these cases, before the actual data restoration starts, pg_restore workers manifest constant looping of reading small sizes (4KB) and seeking forward small lenths (around 10KB for a compressed archive or even only a few bytes for uncompressed ones): read(4, "..."..., 4096) = 4096 lseek(4, 55544369152, SEEK_SET) = 55544369152 read(4, "..."..., 4096) = 4096 lseek(4, 55544381440, SEEK_SET) = 55544381440 read(4, "..."..., 4096) = 4096 lseek(4, 55544397824, SEEK_SET) = 55544397824 read(4, "..."..., 4096) = 4096 lseek(4, 55544414208, SEEK_SET) = 55544414208 read(4, "..."..., 4096) = 4096 lseek(4, 55544426496, SEEK_SET) = 55544426496 This happens as each worker has to scan the whole file until it finds the entry it wants, skipping forward each block. In combination to the small block size of the custom format dump, this causes many seeks and low performance. Fix by avoiding forward seeks for jumping forward 32KB or less. Do instead sequential reads. Performance gain can be significant, depending on the size of the dump and the I/O subsystem. On my local NVMe drive, read speeds for that phase of pg_restore increased from 150MB/s to 3GB/s. --- src/bin/pg_dump/pg_backup_custom.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_dump/pg_backup_custom.c b/src/bin/pg_dump/pg_backup_custom.c index f7c3af56304c..087b1c81e7fb 100644 --- a/src/bin/pg_dump/pg_backup_custom.c +++ b/src/bin/pg_dump/pg_backup_custom.c @@ -629,7 +629,11 @@ _skipData(ArchiveHandle *AH) blkLen = ReadInt(AH); while (blkLen != 0) { - if (ctx->hasSeek) + /* + * Sequential access is usually faster, so avoid seeking if the jump + * forward is 32KB or less. + */ + if (ctx->hasSeek && blkLen > 32 * 1024) { if (fseeko(AH->FH, blkLen, SEEK_CUR) != 0) pg_fatal("error during file seek: %m");