-1

I'm using Azure Data Factory for copying required data from data lake to cosmos db.

On copying a sample size of 100, I see the run completed in ~5 min. On copying a sample size of 1 million, I see the run completed in ~1 hour.

The throughput I set for cosmos db container is ~2000 RU/s. I'm using only 1 container.

Based on this documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey

I have set Partition Key as /PersonnelNumber (there are ~1 million unique PersonnelNumber values in the data). Could you please help me understand if this is a right partition key? Or is it causing the run to slow down?

These 2 points are confusing - Partition key should:

Be a property that has a value which does not change. If a property is your partition key, you can't update that property's value. Have a high cardinality. In other words, the property should have a wide range of possible values. Thank you!

user989988
  • 3,006
  • 7
  • 44
  • 91

1 Answers1

2

There is no "PersonnelNumber" in the document. Instead, you have "Person.PersonnelNumber" field. So, Cosmos DB is unable to find "PersonnelNumber" in the document so it is populating the partition field of "PersonnelNumber" with an empty value.

change the field name from "Person.PersonnelNumber" to "PersonnelNumber" and then upload.

If every partition key has a unique value then its good for writes. In your case PersonnelNumber is a good partition key for write throughput because it has unique values. But, you will not be able to update the PersonnelNumber.

RCT
  • 1,044
  • 1
  • 5
  • 12