From eb5f188ea61e7cb4e284327093a2f2f537a7b52b Mon Sep 17 00:00:00 2001
From: Sadeq Dousti <3616518+msdousti@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:10:29 +0200
Subject: [PATCH] Add how-to for outbox pattern, part 1

---
 0095_how_to_implement_outbox_part_1.md | 225 +++++++++++++++++++++++++
 README.md                              |   3 +-
 2 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 0095_how_to_implement_outbox_part_1.md

diff --git a/0095_how_to_implement_outbox_part_1.md b/0095_how_to_implement_outbox_part_1.md
new file mode 100644
index 0000000..c66e6b1
--- /dev/null
+++ b/0095_how_to_implement_outbox_part_1.md
@@ -0,0 +1,225 @@
+# How to Implement the Outbox Pattern in PostgreSQL, Part 1
+
+> The outbox pattern is a design pattern for reliable message delivery in distributed systems. Messages are first
+> written into a database table (`outbox`) before being sent to a message broker. This ensures that database changes and
+> message publishing remain consistent, even in the presence of failures.
+>
+> In PostgreSQL, a naive implementation of the outbox table can cause performance degradation, ordering errors, and
+> operational issues. This guide explains how to avoid common pitfalls and how to implement a **revamped outbox pattern**
+> using partitioned tables for efficiency and reliability.
+
+---
+
+## 1) Avoid sorting by `created_at`
+
+Do not use timestamps for ordering, especially if the source is not Postgres. Clocks on different sources may not align,
+and even small differences can break global ordering.
+
+❌ Bad:
+
+```sql
+select *
+from outbox
+where published_at is null
+order by created_at
+limit 10;
+```
+
+✅ Good:
+
+Use a sequence (`id`) generated by the database (`generated always as identity`), and then sort by `id`:
+
+```sql
+select *
+from outbox
+where published_at is null
+order by id
+limit 10;
+```
+
+---
+
+## 2) Use the right data types
+
+- Use `int8` (`bigint`) for the primary key `id`, as opposed to 32-bit `int`.
+- Use `timestamptz` for timestamps, as opposed to `timestamp` (without timezone).
+- Use `json` or `jsonb` for payloads (avoid `jsonb` if messages may contain `\x00`).
+
+✅ Good table definition:
+
+```sql
+create table outbox
+(
+    id           int8 generated always as identity primary key,
+    payload      json        not null,
+    created_at   timestamptz not null default now(),
+    published_at timestamptz
+);
+```
+
+The table may include other columns, like the Kafka topic to which the message should be published. These extra columns
+are not mentioned for brevity.
+
+---
+
+## 3) Index correctly
+
+Always index on `id` with a partial index for unpublished messages.
+
+✅ Good:
+
+```sql
+create index idx1 on outbox (id)
+    where published_at is null;
+```
+
+❌ Bad alternatives:
+
+- Using the primary key index directly (inefficient filter step).
+- Indexing on `published_at` (index filled with nulls, not useful);
+
+```sql
+create index idx1 on outbox (published_at)
+    where published_at is null;
+``` 
+
+---
+
+## 4) Do not pick too many messages at once
+
+Large fetch sizes cause long transactions and block autovacuum. They also increase rollback costs if publishing fails.
+
+✅ Recommendation:
+
+- Fetch around **100 messages** per iteration.
+- Do not fetch more than **1000 messages**.
+- Send messages in batches (e.g., grouped by user ID).
+
+---
+
+## 5) Handle poison pill messages
+
+A poison pill is a message that cannot be processed. It will cause repeated failures.
+
+✅ Mitigation:
+
+- Process messages in independent groups.
+- Catch exceptions per group.
+- Set up monitoring and alerting for poison pill detection.
+
+---
+
+## 6) Fix inefficient index and visibility checks
+
+Frequent inserts and updates cause index bloat and slow visibility checks.
+
+### Solution: Use partitioned tables
+
+Partition the outbox by `published_at`:
+
+- `outbox_unpublished`: messages where `published_at is null`.
+- `outbox_published`: messages already published.
+
+✅ Setup:
+
+```sql
+create table outbox
+(
+    id           int8 generated always as identity,
+    payload      json        not null,
+    created_at   timestamptz not null default now(),
+    published_at timestamptz
+) partition by list (published_at);
+
+create table outbox_unpublished
+    partition of outbox for values in (null);
+
+create table outbox_published
+    partition of outbox default;
+
+create index idx1 on outbox_unpublished (id);
+```
+
+---
+
+### Illustrating partitioning
+
+```
+                +-------------------+
+                |      outbox       |
+                |-------------------|
+                | id                |
+                | payload           |
+                | created_at        |
+                | published_at      |
+                +---------+---------+
+                          |
+        ---------------------------------------
+        |                                     |
++-------------------+             +-------------------+
+| outbox_unpublished|             | outbox_published  |
+|-------------------|             |-------------------|
+| published_at =    |             | published_at !=   |
+| NULL              |             | NULL (any value)  |
++-------------------+             +-------------------+
+       (indexed)                          (no index)
+```
+
+- **Unpublished partition** is indexed and queried often.
+- **Published partition** can be truncated periodically.
+
+---
+
+### Updating correctly
+
+❌ Bad (scans both partitions):
+
+```sql
+update outbox
+set published_at = now()
+where id = 1;
+```
+
+✅ Good (restricts to unpublished partition):
+
+```sql
+update outbox
+set published_at = now()
+where id = 1
+  and published_at is null;
+```
+
+---
+
+## Quick Reference Summary
+
+| Topic                     | Do ✅                                             | Don't ❌                     |
+|---------------------------|--------------------------------------------------|-----------------------------|
+| Ordering                  | Use `id`                                         | Use `created_at`            |
+| Primary key type          | Use `int8`                                       | Use `int4`                  |
+| Timestamps                | Use `timestamptz`                                | Use `timestamp` without TZ  |
+| Payload                   | Use `json/jsonb` (watch out for `\x00`)          | Use inappropriate types     |
+| Indexing                  | Partial index on `id where published_at is null` | Index on `published_at`     |
+| Batch size                | ~100, max 1000                                   | Huge fetches per iteration  |
+| Poison pills              | Group + monitor                                  | Let them block the pipeline |
+| Index/visibility overhead | Use partitioned table                            | Rely on single large table  |
+
+---
+
+## Conclusion
+
+The outbox pattern is essential for reliable message delivery but can introduce operational issues if not implemented
+carefully.  
+By following the practices above—especially **using sequence IDs, proper data types, partial indexes, bounded batch
+sizes, poison pill handling, and partitioned tables**—you can achieve a robust and efficient outbox implementation in
+PostgreSQL.
+
+---
+
+## What's Next (Part 2)
+
+In the next part, we will cover:
+
+- How to **convert an existing outbox table** into a partitioned one.
+- How to **remove index bloat** efficiently.
+- How to fine-tune PostgreSQL's **`autovacuum` settings** to sustain performance under high throughput.  
diff --git a/README.md b/README.md
index 48158be..8ffa2c8 100644
--- a/README.md
+++ b/README.md
@@ -124,7 +124,8 @@ As an example, first 2 rows:
 - 0091 [How to format text output in psql scripts](./0091_how_to_format_text_output_in_psql_scripts.md)
 - 0092 [How to tune work_mem](./0092_how_to_tune_work_mem.md)
 - 0093 [How to troubleshoot streaming replication lag](./0093_how_to_troubleshoot_streaming_replication_lag.md)
-- 0094 [How to run ANALYZE (to collect statistics)](./0094_how_to_run_analyze.md.md)
+- 0094 [How to run ANALYZE (to collect statistics)](./0094_how_to_run_analyze.md)
+- 0095 [How to Implement the Outbox Pattern in PostgreSQL, Part 1](./0095_how_to_implement_outbox_part_1.md)
 - ...
 
 ## Contributors (thanks!!)
-- 
GitLab