0

From all the research I've done it seems that even when using a partition key in CosmosDB/DocumentDB you may run into an issue with capacity if that partition hits the 10gb limit. I've seen strategies where you can append a date for logging purposes or other such time-based key in order to partition by date and this makes sense however I wanted to come up with a relatively safe option for general entities in my domain as well as entities that belong to specific users or accounts. In my first pass at this I came up with this:

1. Use '_docType' as the partition key.
2. Account documents are named "Account"
3. Documents belonging to an Account are named "Account<AccountId>"
4. Documents for a specific entity type are named "<EntityName>"
5. Documents belonging to an entity are named "EntityName-<EntityId>"
6. Documents of a particular entity type belonging to a spcific account are 
named "EntityName-Account-<AccountId>"
7. Documents belonging to particular entity type for a specific account are named "EntityName-<EntityId>-Account-<AccountId>"

Does this look like a good strategy that will scale well using unlimited/partitioned collections?

Is there anything missing from this strategy?

Any potential flaws or issues from this method?

INNVTV
  • 3,155
  • 7
  • 37
  • 71
  • 1
    I've used a similar approach and it works fine. There is just no guarantee that either the Account doc types or EntityName doc types will not exceed the 10gb limit. If you don't have any other field that you know you will always query on (date, customer id, etc) then you don't have much of a choice other than to use this above approach. You could use this strategy combined with some kind of auto-archiving system for documents that haven't been touched in a while for a more complete solution (i.e. move old/inactive docs to blob storage after a while). – Dan Dec 07 '18 at 20:18
  • 1
    One other thing I've done is to have a shard "index" in your PK which lets do something like "ACCOUNT_123_1" and "ACCOUNT_123_2" when the first PK fills. Tracking the shards can be a pain. You can query the first page and then try to query the second page and see if fails. You can use "wellknown" items to make look ups quick. (Use id="exists" or something). You can also put a single id="pages" document which tracks which pages exist, but it could get stale, so that's less nice. Just checking for existence is pretty cheap. You can also email us and ask for more space to a certain point. :) – Chris Anderson Dec 07 '18 at 23:38

0 Answers0