Setup:
- Rails 4
- MySQL
- ThinkingSphinx
I have a model (Record
) in my app with almost 500 million rows. This model has 32 fields, but the only two I care about for a particular Sphinx search are name
and token
. name
is what I am searching against using Sphinx, and token
is what I want returned to perform other actions in Rails with.
My indices set up is:
ThinkingSphinx::Index.define :records, :with => :real_time do
# fields
indexes name
indexes token
# attributes
has token, as: :token_attr, type: :string
# < several additional attributes >
end
What I want to do is query Sphinx on :records
matching against name
and have it return distinct token
strings in an array.
Here's what I have:
Record.search("red", indices: %w(records), max_matches: num_tokens_i_need, group_by: :token_attr)
... where num_tokens_i_need
is generally somewhere in the thousands (less than 10,000)
The above query takes between 5-8 minutes to complete. However, when I simply do:
Record.search("red", indices: %w(records), max_matches: num_tokens_i_need).map(&:token).uniq
The search is incredibly fast (returning several million records in a couple hundred milliseconds), but I don't get back num_tokens_i_need
due to the .uniq
call.
Basically what I need to do is have a fast Sphinx search which gives me back an exact number of distinct token for a given term (such as "red").
If seeing my sphinx.conf or anything else would be helpful, please let me know.