From eb5f188ea61e7cb4e284327093a2f2f537a7b52b Mon Sep 17 00:00:00 2001 From: Sadeq Dousti <3616518+msdousti@users.noreply.github.com> Date: Sun, 31 Aug 2025 21:10:29 +0200 Subject: [PATCH] Add how-to for outbox pattern, part 1 --- 0095_how_to_implement_outbox_part_1.md | 225 +++++++++++++++++++++++++ README.md | 3 +- 2 files changed, 227 insertions(+), 1 deletion(-) create mode 100644 0095_how_to_implement_outbox_part_1.md diff --git a/0095_how_to_implement_outbox_part_1.md b/0095_how_to_implement_outbox_part_1.md new file mode 100644 index 0000000..c66e6b1 --- /dev/null +++ b/0095_how_to_implement_outbox_part_1.md @@ -0,0 +1,225 @@ +# How to Implement the Outbox Pattern in PostgreSQL, Part 1 + +> The outbox pattern is a design pattern for reliable message delivery in distributed systems. Messages are first +> written into a database table (`outbox`) before being sent to a message broker. This ensures that database changes and +> message publishing remain consistent, even in the presence of failures. +> +> In PostgreSQL, a naive implementation of the outbox table can cause performance degradation, ordering errors, and +> operational issues. This guide explains how to avoid common pitfalls and how to implement a **revamped outbox pattern** +> using partitioned tables for efficiency and reliability. + +--- + +## 1) Avoid sorting by `created_at` + +Do not use timestamps for ordering, especially if the source is not Postgres. Clocks on different sources may not align, +and even small differences can break global ordering. + +❌ Bad: + +```sql +select * +from outbox +where published_at is null +order by created_at +limit 10; +``` + +✅ Good: + +Use a sequence (`id`) generated by the database (`generated always as identity`), and then sort by `id`: + +```sql +select * +from outbox +where published_at is null +order by id +limit 10; +``` + +--- + +## 2) Use the right data types + +- Use `int8` (`bigint`) for the primary key `id`, as opposed to 32-bit `int`. +- Use `timestamptz` for timestamps, as opposed to `timestamp` (without timezone). +- Use `json` or `jsonb` for payloads (avoid `jsonb` if messages may contain `\x00`). + +✅ Good table definition: + +```sql +create table outbox +( + id int8 generated always as identity primary key, + payload json not null, + created_at timestamptz not null default now(), + published_at timestamptz +); +``` + +The table may include other columns, like the Kafka topic to which the message should be published. These extra columns +are not mentioned for brevity. + +--- + +## 3) Index correctly + +Always index on `id` with a partial index for unpublished messages. + +✅ Good: + +```sql +create index idx1 on outbox (id) + where published_at is null; +``` + +❌ Bad alternatives: + +- Using the primary key index directly (inefficient filter step). +- Indexing on `published_at` (index filled with nulls, not useful); + +```sql +create index idx1 on outbox (published_at) + where published_at is null; +``` + +--- + +## 4) Do not pick too many messages at once + +Large fetch sizes cause long transactions and block autovacuum. They also increase rollback costs if publishing fails. + +✅ Recommendation: + +- Fetch around **100 messages** per iteration. +- Do not fetch more than **1000 messages**. +- Send messages in batches (e.g., grouped by user ID). + +--- + +## 5) Handle poison pill messages + +A poison pill is a message that cannot be processed. It will cause repeated failures. + +✅ Mitigation: + +- Process messages in independent groups. +- Catch exceptions per group. +- Set up monitoring and alerting for poison pill detection. + +--- + +## 6) Fix inefficient index and visibility checks + +Frequent inserts and updates cause index bloat and slow visibility checks. + +### Solution: Use partitioned tables + +Partition the outbox by `published_at`: + +- `outbox_unpublished`: messages where `published_at is null`. +- `outbox_published`: messages already published. + +✅ Setup: + +```sql +create table outbox +( + id int8 generated always as identity, + payload json not null, + created_at timestamptz not null default now(), + published_at timestamptz +) partition by list (published_at); + +create table outbox_unpublished + partition of outbox for values in (null); + +create table outbox_published + partition of outbox default; + +create index idx1 on outbox_unpublished (id); +``` + +--- + +### Illustrating partitioning + +``` + +-------------------+ + | outbox | + |-------------------| + | id | + | payload | + | created_at | + | published_at | + +---------+---------+ + | + --------------------------------------- + | | ++-------------------+ +-------------------+ +| outbox_unpublished| | outbox_published | +|-------------------| |-------------------| +| published_at = | | published_at != | +| NULL | | NULL (any value) | ++-------------------+ +-------------------+ + (indexed) (no index) +``` + +- **Unpublished partition** is indexed and queried often. +- **Published partition** can be truncated periodically. + +--- + +### Updating correctly + +❌ Bad (scans both partitions): + +```sql +update outbox +set published_at = now() +where id = 1; +``` + +✅ Good (restricts to unpublished partition): + +```sql +update outbox +set published_at = now() +where id = 1 + and published_at is null; +``` + +--- + +## Quick Reference Summary + +| Topic | Do ✅ | Don't ❌ | +|---------------------------|--------------------------------------------------|-----------------------------| +| Ordering | Use `id` | Use `created_at` | +| Primary key type | Use `int8` | Use `int4` | +| Timestamps | Use `timestamptz` | Use `timestamp` without TZ | +| Payload | Use `json/jsonb` (watch out for `\x00`) | Use inappropriate types | +| Indexing | Partial index on `id where published_at is null` | Index on `published_at` | +| Batch size | ~100, max 1000 | Huge fetches per iteration | +| Poison pills | Group + monitor | Let them block the pipeline | +| Index/visibility overhead | Use partitioned table | Rely on single large table | + +--- + +## Conclusion + +The outbox pattern is essential for reliable message delivery but can introduce operational issues if not implemented +carefully. +By following the practices above—especially **using sequence IDs, proper data types, partial indexes, bounded batch +sizes, poison pill handling, and partitioned tables**—you can achieve a robust and efficient outbox implementation in +PostgreSQL. + +--- + +## What's Next (Part 2) + +In the next part, we will cover: + +- How to **convert an existing outbox table** into a partitioned one. +- How to **remove index bloat** efficiently. +- How to fine-tune PostgreSQL's **`autovacuum` settings** to sustain performance under high throughput. diff --git a/README.md b/README.md index 48158be..8ffa2c8 100644 --- a/README.md +++ b/README.md @@ -124,7 +124,8 @@ As an example, first 2 rows: - 0091 [How to format text output in psql scripts](./0091_how_to_format_text_output_in_psql_scripts.md) - 0092 [How to tune work_mem](./0092_how_to_tune_work_mem.md) - 0093 [How to troubleshoot streaming replication lag](./0093_how_to_troubleshoot_streaming_replication_lag.md) -- 0094 [How to run ANALYZE (to collect statistics)](./0094_how_to_run_analyze.md.md) +- 0094 [How to run ANALYZE (to collect statistics)](./0094_how_to_run_analyze.md) +- 0095 [How to Implement the Outbox Pattern in PostgreSQL, Part 1](./0095_how_to_implement_outbox_part_1.md) - ... ## Contributors (thanks!!) -- GitLab