The following bug has been logged on the website:
Bug reference:      17590
Logged by:          Guillaume FOUET
Email address:      [email protected]
PostgreSQL version: 14.5
Operating system:   Windows 10 - 64x
Description:
Tested with Postgres 13.5, 13.8 and 14.5 (I updated to see if that was a
bug).
I have four tables: ADDRESS, COMPANY, CONTACT, TRIP
ADDRESS has ~130000 rows
COMPANY has a FK toward ADDRESS and ~16000 rows
CONTACT has a FK toward ADDRESS and ~12000 rows
TRIP has a FK toward ADDRESS and ~137500 rows
We wanted to purge the address table from old, unused addresses:
DELETE FROM address WHERE address_id NOT IN (
    SELECT DISTINCT address_id FROM company
    UNION
    SELECT DISTINCT address_id FROM contact
    UNION
    SELECT DISTINCT address_id FROM trip
);
This query above never ends (I waited 15 minutes and it was still going, HDD
doing nothing, one core CPU used).
The EXPLAIN says it materializes the address_ids aggregates then scans
ADDRESS for deletion.
After many tries, I made this query instead:
CREATE TEMPORARY TABLE used_address_id AS (
  SELECT DISTINCT address_id FROM company
  UNION
  SELECT DISTINCT address_id FROM contact
  UNION
  SELECT DISTINCT address_id FROM trip
);
CREATE UNIQUE INDEX ON used_address_id (address_id);
DELETE FROM used_address_id WHERE address_id IS NULL;
DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM
used_address_id);
This was resolved in 500ms (basically instantly).
I have the feeling there's an O(n²) somewhere in the first query.
Thanks.