Say you have 4 sorted sets with thousands and thousands of keys and scores. Since they are sorted sets, getting the top items can ben done in logaritmic time complexity.
The easy way would be to take the union of the sets, and then get the top items. But doing so is at least linear to the sum of all items in all sets.
The best way I could think of is this:
- Take the top N items from every set
- Find the item with the lowest rank and the higest score for that rank.
- Devide that score by the number of sets. (Any key with a score lower than this can never be in the top N)
- Take the union of those keys. (Ignoring scores)
- Find the scores for all keys in all sets. (A key might have score 1 in one set and 10000 in another)
That is like, finding all keys that could possibly be in the top list, and do the union with those keys. There are probably more efficient ways to limit the number of items to consider.
[edit] Keys occur in one or more sets, and their summed scores determines the final score. So a key that is in all sets with a low score might have a higher score than a key with a high score that is in only one set.