1

I've got a model, Prospect. I want to sort prospects by some value (e.g. updated_at ASC), however I also want to bisect the sorted list so that all prospects where lead_id IS NOT NULL are shown first, sorted by updated_at ASC, then all prospects where lead_id IS NULL are shown last sorted by updated_at ASC. I don't care what lead_id is, only that any records where lead_id IS NOT NULL are bumped to the beginning of the list. I need to be able to paginate the dataset and the query needs to be relatively efficient (turnaround time should be well under 500ms, ideally under 100ms).

First Approach

I first tried to accomplish this using the following query, however this doesn't work (for my use case) because prospects are sorted by lead_id (as you would expect from this query), which is unique, therefore making a secondary sort effectively useless.

Prospect.order("lead_id ASC nulls last, updated_at ASC")

Second Approach

I tried a (slightly modified) approach suggested by Andrey Deineko. This returns the entire dataset in the correct order, but there is no way to merge the two separate relations into a single relation that can be paginated. As a result, in order to paginate the dataset, I need to instantiate every row from the table into memory. That would be acceptable for a few dozen records at most, but certainly not 20k+.

# prospect.rb
scope :with_leads,    -> { where.not(lead_id: nil) }
scope :without_leads, -> { where(lead_id: nil) }
scope :ordered,       -> { order(:total_clicks_count) } 

[Prospect.with_leads.ordered, Prospect.without_leads.ordered].flatten

Third Approach

I realized that I can get a sorted list of prospect ids (both with and without a lead_id) and use that to get the full dataset ordered by id. This accomplishes what I need and works fine for a few dozen or hundred records, but isn't viable for 20k+ records.

lead_ids = Prospect.where.not(lead_id: nil).pluck(:id)
prospect_ids = Prospect.where(lead_id: nil).pluck(:id)
prospects = Prospect.order_by_ids([lead_ids, prospect_ids].flatten)

Here is the source for order_by_ids:

class ApplicationRecord < ActiveRecord::Base
    self.abstract_class = true

    def self.order_by_ids(ids)
        # https://stackoverflow.com/a/26777669/4139179
        order_by = ["CASE"]
        ids.each_with_index do |id, index|
            order_by << "WHEN #{self.name.underscore}s.id='#{id}' THEN #{index}"
        end

        order_by << "END"
        order(order_by.join(" "))
    end
end

The problem with this second approach is that it takes up to 1 second per 1000 records to build the list or ordered ids for each subset (prospects with and without a lead_id) and then use that to fetch the entire dataset in order.

Is there a better approach that will return the entire dataset (in a way that can be paginated) ordered by some attribute (e.g. updated_at ASC) where prospects with a lead_id are at the top of the list and those without are at the bottom?

Daniel Bonnell
  • 4,817
  • 9
  • 48
  • 88

1 Answers1

1

When using ORDER BY "column" in PostgreSQL, NULL values will come last by default. So

Prospect.order(:lead_id, :updated_at)

should do the trick.

Your actual need:

The effect in practice would be that when a user views a list of their sales prospects, those that have been converted to leads will show up first in the list, even if the list is sorted by other attributes.

# prospect.rb
scope :with_leads,    -> { where.not(lead_id: nil) }
scope :without_leads, -> { where(lead_id: nil) }
scope :ordered,       -> { order(:total_clicks_count) } 

And then use these scopes to present to the user:

Prospect.with_leads.ordered.each do
  #...
end

Prospect.without_leads.ordered.each do
  #...
end
Andrey Deineko
  • 51,333
  • 10
  • 112
  • 145
  • This does not seem to be the case as far as I can tell. I've added two examples to my question to illustrate the issue I'm seeing. – Daniel Bonnell Sep 18 '18 at 21:13
  • @ACIDSTEALTH your queries are returning correct results. It is sorted by `lead_id` – Andrey Deineko Sep 18 '18 at 21:20
  • @ACIDSTEALTH it will first sort everything by `lead_id ASC` and *then* if there is anything to reorder based on the `total_clicks_count DESC` - it will reorder. In your case there is nothing to reorder – Andrey Deineko Sep 18 '18 at 21:22
  • @ACIDSTEALTH if you had a result set of `[[35, 1], [36, 1], [37, 0], [37, 1]]` from the `lead_id ASC` order *then* it would apply `total_clicks_count DESC` and reorder it to `[[35, 1], [36, 1], [37, 1], [37, 0]]`. Do you see what I mean? – Andrey Deineko Sep 18 '18 at 21:24
  • @ACIDSTEALTH to have it sorted by total clicks first and then by lead id, swap the arguments to `order`: `Prospect.orded(:total_clicks_count, :lead_id)` – Andrey Deineko Sep 18 '18 at 21:34
  • I just realized that my question isn't really clear, hence the confusion. What I'm really looking for here is to push all the prospects that don't have a lead_id to the end of the list. I don't care what the lead_id is. only whether it has one or not. The effect in practice would be that when a user views a list of their sales prospects, those that have been converted to leads will show up first in the list, even if the list is sorted by other attributes. Currently this isn't the case. The prospects with a lead_id show up 1st but because lead_id is unique, it effectively prevents sub-sorting. – Daniel Bonnell Sep 18 '18 at 21:47
  • @ACIDSTEALTH edited the answer. you should ask your question with precise need, because otherwise you waste your time and the time of someone who answers your question and then finds out that the issue is actually different – Andrey Deineko Sep 19 '18 at 05:01
  • Sort of! I was inspired by your idea to get each subset separately and then pull an ordered list of ids. From there I created a custom query to order the entire set by id. This allows for easy pagination and secondary sorting. Problem is this is extremely inefficient (it takes about 1s per 1000 records) when dealing with 10k+ records and therefore not a viable solution. For now I've convinced my boss to give this idea up for now because I'm not seeing any viable/performant solution that would allow secondary sorting and still work with pagination. I'm open to other ideas. :) – Daniel Bonnell Sep 25 '18 at 18:43
  • I've updated the question/title to make it clearer what I'm after here. – Daniel Bonnell Sep 25 '18 at 21:06