3

I have a situation where I need to return a single collection of objects from mongo, but need to use two queries to get the results. The order of these results is important because they are paginated.

Here's the first query: (listings based on a category and a price range)

my_listings = MoListing.where(criteria_a)

The second query needs to use the results of the first query as a filter. So something like:

everything_else = MoListing.where(criteria_b)

Then union the results:

my_listings << everything_else

And finally, return paginated results:

my_listings.page(1).per(25)

It seems that part of my issue is that mongo queries are not executed until they are needed. Is there a way for me to trigger the execution of a query at a given point? Or is there another approach I should take in building this result set?

Update with more info

The behavior I'm seeing is that what gets returned is just the results in listings. I have also confirmed that everything_else does contain the expected records (48 records in my_listings, 52 in everything_else as expected).

When applying .all to my queries as mentioned in the comments, no impact is made. A puts listings.inspect results in

10:57:00 web.1   |    #<Mongoid::Criteria
10:57:00 web.1   |    selector: {"price"=>{"$gte"=>25, "$lte"=>75}},
10:57:00 web.1   |    options:  {},
10:57:00 web.1   |    class:    MoListing,
10:57:00 web.1   |    embedded: false>

However, listings.count does result in 48. Am I just missing some stupid simple way of merging these results? And once I do have the results in one collection, how will this impact the pagination functions that follow. I'm using kaminari for pagination.

Update 2

Per an answer below and my own trial and error, I've found to_a to be a solution, but not an ideal one. This does function:

#merge the results together as an Array
results = (listings.to_a | everything_else.to_a)

This causes the pagination via Kaminari to have to change as we are no longer working with mongo criteria, but instead, a standard Array. Here's the new pagination method:

Kaminari.paginate_array(results).page(page).per(per_page)

Working with a small dataset of 100 records, this is fine and dandy - 54ms

"debug":{"success":true,"pre_render_duration":54.808775999999995,"overall_duration":86.36554100000001,"count":25},"pagination":{"total_pages":4,"current_page":1}}

However, using a larger dataset I'm seeing significantly slower times when using the .to_a method to combine these. Although the examples are not exactly apples to apples, this large of a difference points to the issue being with to_a returning everything, forcing Kaminari to work with a lot more actual data:

My results without to_a, simply returning all records with criteria applied - 15ms

"debug":{"success":true,"pre_render_duration":15.107164,"overall_duration":18.267599,"count":25},"pagination":{"total_pages":81,"current_page":1}}

My results with to_a, merging two resultsets - 415ms

"debug":{"success":true,"pre_render_duration":415.258199,"overall_duration":450.66537800000003,"count":25},"pagination":{"total_pages":81,"current_page":1}}

To summarize, this is not a valid option. Returning each dataset individually takes <15ms even with a large dataset, so I think what I need to accomplish is a way to merge the criteria together so that a single query is run against Mongo, allowing pagination to happen on the db, where it should be.

In SQL I would do something roughly like

select
  *
from
  listings
where
  field = "blah"
union all
select
  *
from
  listings
where
  field <> "blah"

Is it possible to do this in Mongo?

Levi Rosol
  • 4,398
  • 6
  • 28
  • 36
  • What do you mean part of your issue? Do you get incomplete results ? Do you get only listings?? Is that so? please elaborate on what is the problem here? – Jatin Ganhotra Sep 07 '12 at 15:45
  • Solution for Trigger the execution in Active Record is using .all functions. Just add `.all` after your queries & they will be executed there itself and there will be no lazy loading. – Jatin Ganhotra Sep 07 '12 at 15:46

3 Answers3

4

Maybe you could create a class to encapsulate details on how data is retrieved for that specific array and, by using Mongo driver, you could skip and limit query options to reduce transferred data sizes.

Using this approach, you could use something like this (namings not very good and i didnt tested the code, but you'll take the point):

class MoListingDataRetriever
  def initialize(page_size)
    @page_size = page_size / 2 #since you'll have two queries
    driver_instance = MoListing.db #just an exemple. You could use any of your classes that are mongo documents to do this
    @collection_driver = driver_instance.collection("mo_listing") #or whatever you collection name is on mongo
  end

  def retrieve_mo_listings(query_param_a, query_param_b, current_page)
    query_options = {
      limit: @page_size,
      page: current_page,
      skip: (@page_size * (current_page - 1)) #to skip a number of records already retrieved from the query
    }
    results_from_query_a = @driver_instance.find(query_param_a, query_options)
    results_from_query_b = @driver_instance.find(query_param_a, query_options)
    results_from_query_b.to_a.concat(results_from_query_b.to_a)    
  end
end
Rudy Seidinger
  • 1,059
  • 13
  • 22
  • This ultimately ended up being the answer. Instead of using Kaminari for paging, I needed to roll my own and do logic to figure out if records were even needed from either result set. Not ideal, but it does work and is relatively fast. – Levi Rosol Sep 17 '12 at 14:10
2

This might be crude way to do it though:

# Let us say the listings is obtained using listing_query_params
listings = MoListing.where(listing_query_params)

# and everything else is from everything_else_query_params
everything_else = MoListing.where(everything_else_query_params)

results = [listings.to_a, everything_else.to_a].flatten

results.page(1).per(25)

Is this what you want? I have tried it on one of my mongoid models and seems to work this way.

PS: But .to_a has a performance hit - the whole result sets are fetched and merged. But looking at the number of records (~50 a piece) you mention, that should be alright.

Suren
  • 190
  • 1
  • 4
  • This is very similar to the solution I came up with as well. The only difference is that I used an Array union vs flatten - results = (listings | everything_else). This had repercussions on how Kaminari works, so I'll update my post with details for clarification after more testing. Also, real world usage of this will be using 1000's of records, so you bring up a good point about performance. I'm going to test with more data to see if it's still acceptable before marking this answer as correct and awarding the bounty. It does seem like there should be a better solution than .to_a. – Levi Rosol Sep 11 '12 at 15:49
  • My response to this has been added to the original question. – Levi Rosol Sep 11 '12 at 19:42
  • I could have very well written `results = MoListing.any_of(listing_query_params, everything_else_query_params)` but you needed the ordering of results as well so I am not really sure how to order them as well in the same query! – Suren Sep 13 '12 at 17:41
0

try this:

my_listings = MoListing.where(criteria_a)
everything_else = MoListing.where(criteria_b)
all_listings = MoListing.or(my_listings.selector).or(everything_else.selector).page(1).per(25)