Situation:
I'm using the copy-activity from azure-data-factory to copy one json-file with 500 MB from a storage-account-blob to CosmosDB and from CosmosDb to a storage-Account-blob
The AzureBlobStorageLinkedService is configured with a SAS-Token.
Times:
CosmosDb to a storage-Account-blob: 4 minutes
Storage-account-blob to CosmosDB: 2 hours - over 7 hours (timeout)
CosmosDB:
Before copy-activity will be started, an empty collection with 20.000 RU/s will be created. I looked at the metrics of CosmosDB and it is really bored. There are only a few 429 errors. We have "default indexing-configuration" and a partitionKey. This means that we have data with several partitionKeys from several partitionKey-ranges (partitions)
Data:
In the json-file there are 48.000 json-objects. Some are small and some can have 200 KB.
Tries:
I tried with different WriteBatchSizes:
5: 2 hours
100: 2 hours
10.000: 7 hours (timeout)
I tried it with same/different regions => no difference
I tried it with smaller files => they are much faster (500 KB/s instead of 50 KB/s)
Question:
Why it is so slowly? Is the file with 500 MB too large?