I've found myself needing to support result grouping with an accurate ngroups count. This required colocation of documents by a secondaryId
field.
I'm currently indexing documents using the compositeId router in solr. The uniqueKey
is documentId
and I'm adding a shard key at the front like this:
doc.addField("documentId", secondaryId + "!" + actualDocId);
The problem I'm seeing is that the document count accross my 3 shards is now uneven:
shard1: ~30k
shard1: ~60k
shard1: ~30k
(This is expected to grow a lot.)
Apparently the hashes of secondaryId
are not very evenly distributed, but I don't know enough about possible values.
Any thoughts on getting a better distribution of these documents?