I'm using Azure Data Factory for copying required data from data lake to cosmos db.
On copying a sample size of 100, I see the run completed in ~5 min. On copying a sample size of 1 million, I see the run completed in ~1 hour.
The throughput I set for cosmos db container is ~2000 RU/s. I'm using only 1 container.
Based on this documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey
I have set Partition Key as /PersonnelNumber (there are ~1 million unique PersonnelNumber values in the data). Could you please help me understand if this is a right partition key? Or is it causing the run to slow down?
These 2 points are confusing - Partition key should:
Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value. Have a high cardinality. In other words, the property should have a wide range of possible values. Thank you!