How would one calculate the probability that two Guids start with the same N number of characters?
Situation:
We are considering using the first n characters from a guid as a cosmosdb collection partition key. We don't want to use the entire guid because we don't want every document to be in its own logical partition, but we also probably don't want to just use the first character of a guid as the partition key because we might then store too many documents in a partition and overflow the partition limit.
Example:
So if we use the first 4 ( number pulled randomly out of thin air) characters of a guid as the partion key, how can we calculate roughly how many documents will stored in each partition per month? For this example let's assume we're talking about partitioning 4 million documents a month.
Update
It sounds like every guid character has 16 potential values. 0-9 and a-f (hex char set). Assuming Guid characters are random ( I'm not sure this is true) there should be 16^4 possible four character guid starts (~65k combinations). Therefore, at most we'd have 65k partitions. And if we assume random distribution seems like 4,000,000 documents into 65,000 partitions should be roughly 61 documents per partition right?