0

I have two tables in a postgres database, posts and users. posts has a user_id foreign key that references the users.id primary key column. Both tables are very large.

I have just deleted a random set of users (about 80% of the total users) and I want to delete all posts that reference deleted users, effectively an anti-join and delete. What is the most efficient way of doing this?

Currently I have this:

DELETE FROM posts l
WHERE NOT EXISTS
  (
     SELECT NULL
     FROM users r
     WHERE r.id = l.user_id
  )

Is there a more efficient way of doing this?

James Stonehill
  • 1,125
  • 10
  • 22
  • If you are deleting a large number of users, then you might want to create a new table with the posts that are still active. – Gordon Linoff Nov 07 '18 at 17:06
  • 1
    Surely the best option is to delete the posts at the same time as deleting the users. Your DBMS may even automate this for use using some sort of referential integrity system... – JeffUK Nov 07 '18 at 17:12
  • @GordonLinoff yeah that's a good point. Say for arguments sake I'm deleting 80% of the users (I've updated the question to reflect this premises). – James Stonehill Nov 07 '18 at 18:35
  • @JeffUK cascading deletes are not an option. – James Stonehill Nov 07 '18 at 18:35

2 Answers2

2

If you want to delete 80% of users, then the fastest way is probably:

create table temp_posts as 
    select p.*
    from posts p
    where exists (select 1 from users u where u.id = p.user_id);

truncate table posts;

insert into posts
    select *
    from temp_posts;

Batch inserts are much less work than updating most of the rows in the table. Of course, you should test this carefully. Truncating the table is a fast way to remove all the rows from it.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Someone at this link did some testing of not-exists vs not-in vs left-join-is-null. Postgre can tell that non-exists and left-join is null are anti-joins, and so proceeds accordingly. So your method should be the most efficient. You could possibly restructure to a left-join-is-null approach, but it probably won't buy you anything.

Prevention may be better. A foreign key constraint is the better option, with cascade deleting. You mention in the comments to your question that this is not an option. Is it not an option in your particular circumstance, because generally, it is:

REFERENCES someTable(someCol) ON DELETE CASCADE ON UPDATE CASCADE
pwilcox
  • 5,542
  • 1
  • 19
  • 31