1

I have a set of records of which some, but not all, have a 'path' field, and all have a 'value' field. I wish to select only those which either do not have a path, or have the largest value of all the records with a particular path.

That is, given these records:

Name:  Path:   Value:
A      foo     5
B      foo     6
C      NULL    2
D      bar     2
E      NULL    4

I want to return B, C, D, and E, but not A (because A has a path and it's path is the same as B, but A has a lower value).

How can I accomplish this, using ActiveRecord, ARel and Postgres? Ideally, I would like a solution which functions as a scope.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
MrTheWalrus
  • 9,670
  • 2
  • 42
  • 66
  • 3
    If not for ActiveRecord, you would have 3 working answers in SQL in 5 minutes. – mvp Feb 28 '13 at 05:33
  • Agreed, this is a really good time to fall back to real SQL. – Craig Ringer Feb 28 '13 at 14:51
  • How important is it that you have an AR scope solution? ActiveRecord's understanding of databases and SQL is rather primitive and limited so it has problems with anything more complicated than `select * from t where ...` and you need something a bit more advanced than that to get reliable results. Are there any constraints on your data? We might be able to come up with something scope-ish if the data is constrained. – mu is too short Feb 28 '13 at 21:46
  • @muistooshort At the moment I'm simply loading all records and rejecting the lower value ones in Ruby. This offends my sense of efficiency. I'm not opposed to a SQL solution, I would simply like to be able to treat this as a chainable scope (`User.things.highest_path_values.where()` etc.), use AR eager loading in conjunction with it, and so on. I'm not sure what sort of data constraints would be useful - Values are floats, paths are strings (or null), and each record has assorted other data that isn't relevant to the filtering process. – MrTheWalrus Feb 28 '13 at 22:13
  • The "do it all in Ruby because we don't understand SQL" is pretty typical of Rails, that's one of the reasons that database people scoff at ActiveRecord so much; you're right (IMO) to be offended by the expense of that approach. Rails is fine when you can do things its way but it falls apart quickly when you can't. The straight forward solution involves a window function, a derived table, and a UNION and ActiveRecord doesn't like any of those things very much. By constraints I'm thinking uniqueness constraints. Let me think about this a bit more. – mu is too short Feb 28 '13 at 23:17
  • Ah. Well, a path-value pair is going to be unique, though that's only enforced via Rails validation rather than database constraint at the moment. I think that's the only relevant one. – MrTheWalrus Feb 28 '13 at 23:47

3 Answers3

2

You could use something like this by using 2 subqueries (will do only one SQL query which has subqueries). Did not test, but should get you in the right direction. This is for Postgres.

scope :null_ids, -> { where(path: nil).select('id') }
scope :non_null_ids, -> { where('path IS NOT NULL').select('DISTINCT ON (path) id').order('path, value desc, id') }
scope :stuff, -> {
  subquery = [null_ids, non_null_ids].map{|q| "(#{q.to_sql})"}.join(' UNION ')
  where("#{table_name}.id IN (#{subquery})")
}

If you are using a different DB you might need to use group/order instead of distinct on for the non_nulls scope. If the query is running slow put an index on path and value.

You get only 1 query and it's a chainable scope.

mrbrdo
  • 7,968
  • 4
  • 32
  • 36
  • I had to wrap both subqueries in parenthesis, and have edited to reflect that, but otherwise this works like a charm. Many thanks. – MrTheWalrus Mar 04 '13 at 01:24
  • Great! I'm glad it worked for you. I actually never thought of doing this, but now that I did, I might use this myself if I get into a situation like this. – mrbrdo Mar 04 '13 at 17:23
1

A straightforward transliteration of your description to SQL would look like this:

select name, path, value
from (
    select name, path, value,
           row_number() over (partition by path order by value desc) as r
    from your_table
    where path is not null
) as dt
where r = 1
union all
select name, path, value
from your_table
where path is null

You could wrap that in a find_by_sql and get your objects out the other side.

That query works like this:

  1. The row_number window function allows us to group the rows by path, order each group by value, and then number the rows in each group. Play around with the SQL a bit inside psql and you'll see how this works, there are other window functions available that will allow you to do all sorts of wonderful things.
  2. You're treating NULL path values separately from non-NULL paths, hence the path is not null in the inner query.
  3. We can peel off the first row in each of the path groups by selecting those rows from the derived table that have a row number of one (i.e. where r = 1).
  4. The treatment of path is null rows is easily handled by the section query.
  5. The UNION is used to join the result sets of the queries together.

I can't think of any way to construct such a query using ActiveRecord nor can I think of any way to integrate such a query with ActiveRecord's scope mechanism. If you could easily access just the WHERE component of an ActiveRecord::Relation then you could augment the where path is not null and where path is null components of that query with the WHERE components of a scope chain. I don't know how to do that though.

In truth, I tend to abandon ActiveRecord at the drop of a hat. I find ActiveRecord to be rather cumbersome for most of the complicated things I do and not nearly as expressive as SQL. This applies to every ORM I've ever used so the problem isn't specific to ActiveRecord.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • +1: While this does not, strictly speaking, solve my problem, it *is* interesting, and I've placed window functions on my list of 'things to understand _real soon now_'. – MrTheWalrus Mar 04 '13 at 01:26
0

I have no experience with ActiveRecord, but here's a sample with SQLAlchemy to silent the just-use-SQL crowd ;)

q1 = Session.query(Record).filter(Record.path != None)
q1 = q1.distinct(Record.path).order_by(Record.path, Record.value.desc())

q2 = Session.query(Record).filter(Record.path == None)

query = q1.from_self().union(q2)
# Further chaining, e.g. query = query.filter(Record.value > 3) to return B, E

for record in query:
    print record.name
sayap
  • 6,169
  • 2
  • 36
  • 40