I am experimenting on UPSERT-statements for multiple items (around 20 - 30k) using paython-arango and a aql query. Overall, this method seems to be much slower in python compared to NodeJS anyway but I tried two different methods and suprisingly the one I thought its faster is much slower and I try to understand why.
My initial method was to upsert every single item separatly with a statement like this:
aql = "UPSERT {_key: '" + data["_key"] + "'}"
aql += " INSERT " + doc
aql += " UPDATE " + doc
aql += " IN " + col_name
self.db.aql.execute(aql)
As I said before, this was slower than a similar NodeJS script in NodeJS maybe because NodeJS works async. Overall approx 45s für 27k entries (2.4 MB csv-dump). A lot of time for so little data.
Then I thought, one by one could be the problem and it could be better to send multiple items per query to the db like this:
aql = "FOR itm IN " + json.dumps(items)
aql += " UPSERT {_key: itm._key}"
aql += " INSERT itm"
aql += " UPDATE itm"
aql += " IN " + col_name
But actually, this was worse. Maybe 5 minutes and the ArangoDB-Backend was seriously slower and for a short time not even accessibly.
I am curious: Why is one single (ok, I used batches of 2, 10, 50, 100 items but no effect) aql-statement with FOR so much slower than 27k seperate requests? I expected otherwise. And is there a better way to perform multiple UPSERTs at once (with Python)?
Its a Windows-system with 2 Cores and 16 GB RAM. Reading performance is good but writing ...