I have Post and User models. Every post belongs to a user. However, during database import, some erroneous user_ids have been entered on some posts. What would be the query to get posts whose user_ids don't refer to any user? Thanks.
7 Answers
As @user2553863 already mentioned, Rails 5 added support for left_outer_joins
, meaning you can now do this in an efficient manner and without writing any SQL like this:
Post.left_outer_joins(:user).where(users: {id: nil}).delete_all
This will find any orphaned posts (those without users) and delete them. Here, user
is the association name and users
is the name of the joined table. You do not have to fire an additional SELECT
to query all the user ids, which might break when you have many users.

- 4,857
- 5
- 37
- 42
Sanjay,
all the solutions proposed works fine for small tables, but depending on the size of the tables involved, and the amount of memory and processing power available, for performance reasons, you might want to use a LEFT OUTER JOIN, like this:
Post.joins("LEFT OUTER JOIN users ON posts.user_id = user.id")
.where("user.id IS NULL")
In Rails 5, there is support for LEFT OUTER JOIN in ActiveRecord.
Regards

- 682
- 1
- 8
- 17
-
Thank you. This would take care of the memory problem likely to be caused by array. – Sanjay Singh Mar 31 '17 at 14:42
-
Thanks! Great tip! – Blue Smith Jun 22 '17 at 04:08
RAILS 6.1+
You can make use of the 'missing' method to fetch orphaned records. eg
Class User
end
Class Post
belongs_to :user
end
Here is when missing method is used
Post.where.missing(:user)
This will fetch the all Post records which have a user_id but the corresponding user being deleted.

- 295
- 1
- 3
- 11
I would do the following, which will result in one SELECT
and one DELETE
statement (so 2 queries in total)
Post.where('`posts`.`user_id` NOT IN (?)', User.pluck(:id)).delete_all

- 10,170
- 1
- 29
- 52
-
Wouldn't it be TWO selects? One for pluck, and one for the "where" statement? – user2553863 Mar 31 '17 at 12:26
-
Found an interesting caveat to this one- this works only if there exists some non-orphan records. If they are ALL orphans, ActiveRecord ends up doing a query with the clause `posts.user_id NOT IN (NULL)`, which actually returns nothing even if records exist. The join recommendation @user2553863 states below has been the most reliable. – abegosum Sep 10 '19 at 21:34
Thanks to both of you. My solution was similar to Manuel's
all_user_ids = User.all.pluck(:id)
unwanted_posts = Post.where.not(:user_id => all_user_ids)
Then I could destroy all unwanted_posts. Of course, the other solution would work as well.

- 367
- 1
- 11
Note: answers below are valid for Rails 5.0
Many of these answers work fine for a few records or on small tables, but don't scale well at all to having a large amount of orphaned records or when dealing with large tables.
For instance, dealing with two larger tables where ModelOne
has 707,891 orphaned records:
irb(main):032:0> ModelOne.count
=> 2,265,216
irb(main):033:0> ModelTwo.count
=> 5,109,186
Trying to execute a query with NOT IN
will fail as it's too large:
irb(main):029:0> ModelOne.where.not(model_two_id: ModelTwo.pluck(:id))
ActiveRecord::StatementInvalid (Mysql2::Error: MySQL server has gone away: SELECT `model_ones`.* FROM `model_ones` WHERE (`model_ones`.`model_two_id` NOT IN (12068663, 12076647, 12076648, 12082392, 12082393, 12082394, <repeat for the other 5 million ModelTwo records>))
Additionally, trying to call .delete_all
on a query that uses left_outer_joins
doesn't quite work as expected.
This is the SQL rails generates for ModelOne.left_outer_joins(:model_two).where(model_twos: {id: nil})
:
SELECT `model_ones`.* FROM `model_ones`
LEFT OUTER JOIN `model_twos` ON `model_twos`.`id` = `model_ones`.`model_two_id`
WHERE `model_twos`.`id` IS NULL
But chaining .delete_all
to the end (ModelOne.left_outer_joins(:model_two).where(model_twos: {id: nil}).delete_all
) generates:
DELETE FROM `model_ones` WHERE `model_twos`.`id` IS NULL
Which will throw an error.
The most performant way I've found to delete orphaned records comes from this answer and uses SQL EXISTS
and a nested query to efficiently find and delete orphaned records.
ModelOne.where.not(
ModelTwo.where('model_twos.id = model_ones.model_two_id').exists
)
Which generates:
SELECT `model_ones`.* FROM `model_ones`
WHERE (
NOT (
EXISTS (
SELECT `model_twos`.* FROM `model_twos` WHERE (model_twos.id = model_ones.model_two_id)
)
)
)
Using this query to load the 707,891 orphaned records takes just under a minute:
irb(main):040:0> Benchmark.measure { ModelOne.where.not(ModelTwo.where('model_twos.id = model_ones.model_two_id').exists).load }
=> #<Benchmark::Tms:0x0000563cfa227580 @label="", @real=59.61208474007435, @cstime=0.0, @cutime=0.0, @stime=0.23068100000000014, @utime=49.025859000000025, @total=49.25654000000002>
Chaining .delete_all
to this query will work as expected and delete all the orphaned records
ModelOne.where.not(ModelTwo.where('model_twos.id = model_ones.model_two_id').exists).delete_all
generates the SQL:
DELETE FROM `model_ones` WHERE (NOT (EXISTS (SELECT `model_twos`.* FROM `model_twos` WHERE (model_twos.id = model_ones.model_two_id))))

- 31
- 2
I don't think you can do this with straight AR, but it is fairly easy to fix with a little Ruby:
Post.find_each { |p| p.delete if p.user.nil? }
EDIT: forgot .all
doesn't return an ActiveRecord::Relation

- 1,589
- 10
- 20