Here is my problem. I am trying to upload a large csv file to cosmos db (~14gb) but I am finding it difficult to maximize the throughput I am paying for. On the azure portal metrics overview UI, it says that I am using 73 RU/s when I am paying for 16600 RU/s. Right now, I am using pymongo's bulk write function to upload to the db but I find that any bulk_write length greater than 5 will throw a hard Request rate is large.
exception. Am I doing this wrong? Is there a more efficient way to upload data in this scenario? Internet bandwidth is probably not a problem because I am uploading from an azure vm to cosmos db.
Structure of how I am uploading in python now:
for row in csv.reader:
row[id_index_1] = convert_id_to_useful_id(row[id_index_1])
find_criteria = {
# find query
}
upsert_dict = {
# row data
}
operations.append(pymongo.UpdateOne(find_criteria, upsert_dict, upsert=True))
if len(operations) > 5:
results = collection.bulk_write(operations)
operations = []
Any suggestions would be greatly appreciated.