I am working with an existing mongodb collection. The data looks like the following:
{ user_id: 123, post: { id: 123456789, title: "..." } },
{ user_id: 123, post: { id: 123456790, title: "..." } },
{ user_id: 124, post: { id: 123456791, title: "..." } }
I need to shard this collection, and I'm having trouble selecting a shard key. I often perform operations based on a user (e.g. get all posts from user 123). Should I create a shard key based on
{
user_id: 1,
post.id: 1
}
or the same, but hashed?
If it is hashed I assume that range-queries will be broadcast to all shards. But if it is not hashed, will documents be evenly distributed across shards? You can see the values increase monotonically.
Thanks,
EDIT: I think I made a mistake, it appears composite indexes cannot be hashed. From the documentation (https://docs.mongodb.com/manual/core/index-compound):
You may not create compound indexes that have hashed index type. You will receive an error if you attempt to create a compound index that includes a hashed
I guess that means that this question is not sensible, so I'll close.
EDIT 2: On second thought, the question is valid, but it would be better phrased like so - I appear to have two options:
Hash the post.id field, which should be unique, and if hashed will help ensure even distribution of data across shards, or
Create a composite key of user_id and post.id, like the code above. This will also guarantee uniqueness, and should help with data locality for a single user. But will it ensure even data distribution across shards?
Thanks