1

I am trying to find a complement using ActiveRecord and/or SQL.

I have a collection of 'annotations' that each have two relevant fields:

  • session_datum_id which corresponds to the user who performed the annotation. Null means it has not yet been done.
  • post_id which represents the post the annotation is 'about'. Cannot be null.

There are potentially multiple annotations per post_id.

I would like to efficiently find an annotation that satisfies two constraints:

  1. session_datum_id is null. This means this particular annotation hasn't already been performed.
  2. the session_datum passed in as an arg has not already performed another annotation with the same post_id.

Here is a very naive version which does a join outside the DB. It finds all annotations this user has already performed and removes those post_ids from the exhaustive list of annotations that still need to be performed. It then picks at random from the resulting list:

def self.random_empty_unseen(session_datum)
  mine = where('session_datum_id = ?', session_datum)
  elligible = where('session_datum_id IS NULL')
  mine.each do |i|
    elligible.each do |j|
      if (i.post_id == j.post_id)
        elligible.delete(j)
      end
    end
  end
  elligible[rand(elligible.count)]
end

As the list of annotations gets large, this will bog down terribly. I can imagine a probabilistic algorithm where we select an elligible annotation at random and then checks if the user has already performed it (retrying if so) but there are degenerate cases where that won't work. (Large number of annotations and the user has performed all but one of them.)

Is there a closed form query for this, perhaps using NOT EXISTS?

  • Would it make your life easier to add a uniqueness constraint? If you only allow one annotation to exist pre user/post pairing, then you wouldn't have to worry about your second constraint. You'd have to find a different way to indicate completeness, though... – Xavier Holt Oct 03 '11 at 17:33
  • Unfortunately the problem domain requires this. Think of annotations as mechanical turk style tasks. We want each task done by multiple people but no person to do the same task more than once. Going to one person per task breaks the constraints of the problem. – Jordan Graf Oct 07 '11 at 20:29

1 Answers1

1
SELECT a1.*
FROM annotations AS a1
JOIN annotations AS a2
ON a1.post_id=a2.post_id
WHERE a2.session_datum_id=session_datum AND a1.session_datum_id IS NULL
nicholaides
  • 19,211
  • 12
  • 66
  • 82