diff --git a/0096_how_to_use_pg_repack_to_mitigate_table_bloat.md b/0096_how_to_use_pg_repack_to_mitigate_table_bloat.md new file mode 100644 index 0000000000000000000000000000000000000000..040ad5013ba6c4e661f232368b6033ea649af089 --- /dev/null +++ b/0096_how_to_use_pg_repack_to_mitigate_table_bloat.md @@ -0,0 +1,101 @@ +## How to Avoid Failures +Here we describe the actions worth taking to reduce the risk of failures when using `pg_repack`. +1. `pg_repack` doesn't have comprehensive signal processing (such as `SIGINT`, which is issued by `Ctrl+C`) and doesn't perform cleanup when interrupted. In such cases, `pg_repack` may leave behind three types of objects that need manual cleanup: + a. Triggers on the processed table (if a table was being processed) + b. Indexes in `indisready=false` or `indisvalid=false` states (this may or may not occur), see https://www.postgresql.org/docs/current/catalog-pg-index.html + c. Before processing a table, autovacuum is disabled at the table level – this setting needs to be restored +2. The `-T` (`--wait-timeout=SECS`, default: 60) and `-D` (`--no-kill-backend`) options work only for DDL queries, not for DML. `-T` won't help cancel blocking DML queries; these must be cancelled manually using `pg_cancel_backend()`. + - See [source code](https://github.com/reorg/pg_repack/blob/693bce67ba19f8e1fe4d18c266e6c34a3092c777/bin/pg_repack.c#L137) + - Also see this issue: https://github.com/reorg/pg_repack/issues/456 +3. Don't run multiple `pg_repack` processes simultaneously. When two `pg_repack` commands execute at the same time, they might cause a deadlock. +4. `pg_repack` cannot reorganize temporary tables. +5. `pg_repack` cannot reorganize tables with GiST indexes. +6. While `pg_repack` is running, DDL commands cannot be executed on the target tables, except for `VACUUM` and `ANALYZE`. To enforce this restriction, `pg_repack` places an `ACCESS SHARE` lock on the target table during repacking. +7. For the user running `pg_repack` (usually `postgres`), increase the open files limit (`ulimit -n`) to 65535. This prevents `pg_repack` from crashing when reaching this limit. +8. Long-running transactions with an isolation level higher than `READ COMMITTED` will block `pg_repack`. Therefore, you need a way to stop queries that would prevent `pg_repack` from working. +9. Before running `pg_repack` on large tables, verify you have sufficient free disk space, as each processed table will be cloned during the process. +10. We recommend setting `idle_in_transaction_session_timeout = 0` for the user running `pg_repack`. If this parameter is set to, for example, 10 minutes, then when processing a table larger than 100 GB, the connection may be reset. `pg_repack` uses multiple connections, and some connections may be in the `idle in transaction` state (this is expected behavior). + +# Diagnostics and Troubleshooting +1. How to find corrupted indexes (left behind by `pg_repack` after an unsuccessful attempt): + ```sql + select + c_r.relname as tbl, + c_i.relname as idx, + i.indisvalid, -- If true, the index can be used in queries. "false" means the index might be incomplete: it will still be updated by INSERT/UPDATE commands, but it's not safe to use in queries. + i.indisready -- If true, the index is ready to accept data. "false" means the index is ignored by INSERT/UPDATE operations. + from pg_index i + join pg_class c_r on i.indrelid = c_r.oid and c_r.relkind = 'r' + join pg_class c_i on i.indexrelid = c_i.oid and c_i.relkind = 'i' + where + not i.indisvalid + or not i.indisready; + ``` +2. Blocking diagnostics: + ```sql + select + blocking_locks.pid as blocker_pid, + blocking_activity.usename as blocker_user, + substring(blocking_activity.query from 0 for 150) as blocker_statement, + blocked_locks.pid as blocked_pid, + blocked_activity.usename as blocked_user, + substring(blocked_activity.query from 0 for 150) as blocked_statement + from pg_catalog.pg_locks blocked_locks + join pg_catalog.pg_stat_activity blocked_activity on blocked_activity.pid = blocked_locks.pid + join pg_catalog.pg_locks blocking_locks on + blocking_locks.locktype = blocked_locks.locktype + and blocking_locks.database is not distinct from blocked_locks.database + and blocking_locks.relation is not distinct from blocked_locks.relation + and blocking_locks.page is not distinct from blocked_locks.page + and blocking_locks.tuple is not distinct from blocked_locks.tuple + and blocking_locks.virtualxid is not distinct from blocked_locks.virtualxid + and blocking_locks.transactionid is not distinct from blocked_locks.transactionid + and blocking_locks.classid is not distinct from blocked_locks.classid + and blocking_locks.objid is not distinct from blocked_locks.objid + and blocking_locks.objsubid is not distinct from blocked_locks.objsubid + and blocking_locks.pid <> blocked_locks.pid + join pg_catalog.pg_stat_activity blocking_activity on blocking_activity.pid = blocking_locks.pid + where not blocked_locks.granted; + ``` +3. Cleanup after `pg_repack` failure: + ```sql + drop extension pg_repack cascade; + create extension pg_repack; + -- check if there are any indexes in INVALID state and drop them: + do $$ + declare + index_name name; + begin + for index_name in + select c.relname + from pg_index as i + join pg_class c on + c.oid = i.indexrelid + and not indisvalid + and relname ~ '^index_\d+$' + loop + execute format( + 'drop index %I', + index_name + ); + end loop; + end; + $$ language plpgsql; + ``` +4. Search for tables with autovacuum disabled: + ```sql + select * + from pg_class + where + '{autovacuum_enabled=false}'::text[] @> reloptions + and relkind = 'r' + ``` +# Useful Links +- https://reorg.github.io/pg_repack/ +- https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/6181#note_141114178 +- https://github.com/reorg/pg_repack/issues/456 + +Additional materials on database internals in the context of concurrent transaction processing: +- https://momjian.us/main/presentations/internals.html +- http://www.interdb.jp/pg/pgsql05.html +- http://www.interdb.jp/pg/pgsql06.html \ No newline at end of file