We have an issue around deduplication when our data is spread across multiple indexes, and there exists a particular id in more than one index.
When doing a straight select, we get X records back, but when we do a group by, we will get counts that add up to more than X. We have, as stated above, tracked this back to the offending id existing in more than one index.
Sphinx is smart enough to deduplicate the records when doing the straight select, but doesn't when bucketing them for a group by.
Of course it would be better to not have the duplicates, and we'll hopefully find a way to deal with that, but for the time being, I'm wondering if there is a way to tell sphinx to do the deduplication on group by as well?