|
| 1 | +Originally from: [tweet](https://twitter.com/samokhvalov/status/1730979356363882889), [LinkedIn post](). |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +# Autovacuum "queue" and progress |
| 6 | + |
| 7 | +> I post a new PostgreSQL "howto" article every day. Join me in this |
| 8 | +> journey – [subscribe](https://twitter.com/samokhvalov/), provide feedback, share! |
| 9 | +
|
| 10 | +We know that in some cases, `autovacuum` settings (especially if they are default) need to be adjusted to keep up |
| 11 | +with the updates. One of the ways to understand that the existing settings are "not enough" is to compare |
| 12 | +[autovacuum_max_workers](https://postgresqlco.nf/doc/en/param/autovacuum_max_workers/) and the number of workers |
| 13 | +actually used: |
| 14 | + |
| 15 | +```sql |
| 16 | +show autovacuum_max_workers; |
| 17 | + |
| 18 | +select |
| 19 | + state, |
| 20 | + count(*), |
| 21 | + array_agg(left(query, 25) order by xact_start) |
| 22 | +from pg_stat_activity |
| 23 | +where backend_type = 'autovacuum worker' |
| 24 | +group by state; |
| 25 | +``` |
| 26 | + |
| 27 | +👉 If most of the time, we see that the number of workers currently acting reaches `autovacuum_max_workers`, this |
| 28 | +is a strong signal that it's time to consider increasing the number of workers (requires a restart) and/or make them |
| 29 | +move faster – via [adjusting quotas](https://www.postgresql.org/docs/current/runtime-config-autovacuum.html) |
| 30 | +(<code><b>[auto]vacuum_vacuum_cost_limit</b></code>/<code>[auto]vacuum_vacuum_cost_delay</code>). |
| 31 | + |
| 32 | +However, we might have a question: how many tables are currently in the "queue" to be processed by `autovacuum`? The |
| 33 | +analysis of this "queue" can give an idea how much work the workers need to do and if the current settings are "enough". |
| 34 | +The size of the queue compared to the number workers, potentially, can give a metric similar to "load average" for CPU |
| 35 | +load. |
| 36 | + |
| 37 | +Below is the report ([source](https://gitlab.com/-/snippets/1889668)) that answers it, by looking at: |
| 38 | + |
| 39 | +- current global `autovacuum` settings, |
| 40 | +- per-table individual settings for vacuuming, |
| 41 | +- numbers of dead tuples for each table. |
| 42 | + |
| 43 | +It compares all this and builds the list of tables that need vacuuming. |
| 44 | + |
| 45 | +Additionally, it inspects `pg_stat_progress_vacuum` to analyze what's being processed right now. |
| 46 | + |
| 47 | +The report has been derived from [here](https://github.com/avito-tech/dba-utils/blob/master/munin/vacuum_queue). |
| 48 | + |
| 49 | +The further development of this query could include: |
| 50 | +_analysis of tables "in need of being auto-analyzed"_. |
| 51 | + |
| 52 | +```sql |
| 53 | +with table_opts as ( |
| 54 | + select |
| 55 | + pg_class.oid, |
| 56 | + relname, |
| 57 | + nspname, |
| 58 | + array_to_string(reloptions, '') as relopts |
| 59 | + from pg_class |
| 60 | + join pg_namespace ns on relnamespace = ns.oid |
| 61 | +), vacuum_settings as ( |
| 62 | + select |
| 63 | + oid, |
| 64 | + relname, |
| 65 | + nspname, |
| 66 | + case |
| 67 | + when relopts like '%autovacuum_vacuum_threshold%' then |
| 68 | + regexp_replace(relopts, '.*autovacuum_vacuum_threshold=([0-9.]+).*', e'\\1')::int8 |
| 69 | + else current_setting('autovacuum_vacuum_threshold')::int8 |
| 70 | + end as autovacuum_vacuum_threshold, |
| 71 | + case |
| 72 | + when relopts like '%autovacuum_vacuum_scale_factor%' |
| 73 | + then regexp_replace(relopts, '.*autovacuum_vacuum_scale_factor=([0-9.]+).*', e'\\1')::numeric |
| 74 | + else current_setting('autovacuum_vacuum_scale_factor')::numeric |
| 75 | + end as autovacuum_vacuum_scale_factor, |
| 76 | + case |
| 77 | + when relopts ~ 'autovacuum_enabled=(false|off)' then false |
| 78 | + else true |
| 79 | + end as autovacuum_enabled |
| 80 | + from table_opts |
| 81 | +), p as ( |
| 82 | + select * |
| 83 | + from pg_stat_progress_vacuum |
| 84 | +) |
| 85 | +select |
| 86 | + coalesce( |
| 87 | + coalesce(nullif(vacuum_settings.nspname, 'public') || '.', '') || vacuum_settings.relname, -- current DB |
| 88 | + format('[something in "%I"]', p.datname) -- another DB |
| 89 | + ) as relation, |
| 90 | + round((100 * psat.n_dead_tup::numeric / nullif(pg_class.reltuples, 0))::numeric, 2) as dead_tup_pct, |
| 91 | + pg_class.reltuples::numeric, |
| 92 | + psat.n_dead_tup, |
| 93 | + format ( |
| 94 | + 'vt: %s, vsf: %s, %s', -- 'vt' – vacuum_threshold, 'vsf' - vacuum_scale_factor |
| 95 | + vacuum_settings.autovacuum_vacuum_threshold, |
| 96 | + vacuum_settings.autovacuum_vacuum_scale_factor, |
| 97 | + (case when autovacuum_enabled then 'DISABLED' else 'enabled' end) |
| 98 | + ) as effective_settings, |
| 99 | + case |
| 100 | + when last_autovacuum > coalesce(last_vacuum, '0001-01-01') then left(last_autovacuum::text, 19) || ' (auto)' |
| 101 | + when last_vacuum is not null then left(last_vacuum::text, 19) || ' (manual)' |
| 102 | + else null |
| 103 | + end as last_vacuumed, |
| 104 | + coalesce(p.phase, '~~~ in queue ~~~') as status, |
| 105 | + p.pid as pid, |
| 106 | + case |
| 107 | + when a.query ~ '^autovacuum.*to prevent wraparound' then 'wraparound' |
| 108 | + when a.query ~ '^vacuum' then 'user' |
| 109 | + when a.pid is null then null |
| 110 | + else 'regular' |
| 111 | + end as mode, |
| 112 | + case |
| 113 | + when a.pid is null then null |
| 114 | + else coalesce(wait_event_type || '.' || wait_event, 'f') |
| 115 | + end as waiting, |
| 116 | + round(100.0 * p.heap_blks_scanned / nullif(p.heap_blks_total, 0), 1) as scanned_pct, |
| 117 | + round(100.0 * p.heap_blks_vacuumed / nullif(p.heap_blks_total, 0), 1) as vacuumed_pct, |
| 118 | + p.index_vacuum_count, |
| 119 | + case |
| 120 | + when psat.relid is not null and p.relid is not null then |
| 121 | + (select count(*) from pg_index where indrelid = psat.relid) |
| 122 | + else null |
| 123 | + end as index_count |
| 124 | +from pg_stat_all_tables psat |
| 125 | + join pg_class on psat.relid = pg_class.oid |
| 126 | + left join vacuum_settings on pg_class.oid = vacuum_settings.oid |
| 127 | + full outer join p on p.relid = psat.relid and p.datname = current_database() |
| 128 | + left join pg_stat_activity a using (pid) |
| 129 | +where |
| 130 | + psat.relid is null |
| 131 | + or p.phase is not null |
| 132 | + or ( |
| 133 | + autovacuum_vacuum_threshold |
| 134 | + + (autovacuum_vacuum_scale_factor::numeric * pg_class.reltuples) |
| 135 | + < psat.n_dead_tup |
| 136 | + ) |
| 137 | +order by status, relation; |
| 138 | +``` |
| 139 | + |
| 140 | +Example of the output (running it in psql with `\gx` instead of `;` in the end): |
| 141 | + |
| 142 | + |
0 commit comments