I am using tire to perform searches on sets of objects that have a category attribute (there are 6 different categories).
I want the results to come in pages of 6 with one of each category on a page (while it is possible).
Eg1. So if the first,second and third category had 2 objects each and the fourth, fifth and sixth categories had 1 object each the pages would look like:
Data: [1,1,2,2,3,3,4,5,6]
1: 1,2,3,4,5,6
2: 1,2,3
Eg2. [1,1,1,1,1,2,2,3,4,5]
1: 1,2,3,4,5,1
2: 2,1,1,1
In something like ruby it wouldn't be too difficult to sort based on the number of times a number has appeared.
Something like
times_seen = {}
results.sort_by do |r|
times_seen[r.category] ||= 0
[times_seen[r.category] += 1, r.category]
end
E.g.
irb(main):032:0> times_seen = {};[1,1,1,1,1,2,2,3,4,5].sort_by{|i| times_seen[i] ||= 1; [times_seen[i] += 1, i];}
=> [1, 2, 3, 4, 5, 1, 2, 1, 1, 1]
To do this with a large number of results would be really slow because we would need to pull them all into ruby first and then sort.
Ideally we want to do this in elastic search and let it handle the pagination for us.
There is Script based sorting in elastic search: http://www.elasticsearch.org/guide/reference/api/search/sort/
{
"query" : {
....
},
"sort" : {
"_script" : {
"script" : "doc['field_name'].value * factor",
"type" : "number",
"params" : {
"factor" : 1.1
},
"order" : "asc"
}
}
}
So if we could do something like this but have the times_seen logic from above in it, it would make life really easy, but it would require having a times_seen variable that persists between scripts.
Any ideas on how to achieve a uniform distribution based on an attribute or if it is possible to somehow use a variable in the script sort?