My application is not live yet, so I'm testing the performance of my Gremlin queries before it gets into production.
To test I'm using a query that adds edges from one vertex to 300 other vertices. It does more things but that's the simple description. I added this mentioned workload of 300 just for testing.
If I run the query 300 times one after the other it takes almost 3 minutes to finish and 90.000 edges are created (300 x 300).
I'm worried because if I have like 60.000 users using my application at the same time they probably are going to be creating 90.000 edges in 2 minutes using this query and 60.000 users at the same time is not much in my case. If I have 1 million users at the same time I'm going to be needing many servers at full capacity, that is out of my budget.
Then I noticed that when my test is executing the CPU doesn't show much activity, I don't know why, I don't know how the DB works internally.
So I thought that maybe a more real situation could be to call my queries all at the same time because that is what's going to happen with the real users, when I tried test that I got ConcurrentModificationException
.
Is far as I understand this error happens because an edge or vertex is being read or written in 2 queries at the same time, this is something that could happen a lot in my application because all the user vertices are changing connections to the same 4 vertices all the time, this "collisions" are going to be happening all the time.
I'm testing in local using gremlin server 3.4.8 connecting using sockets with Node.js. My plan is to use AWS Neptune as my database when it goes to production.
What can I do to recover hope? There must be very important things about this subject that I don't know because I don't know how graph databases work internally.
Edit
I implemented a logic to retry the queries requests when receiving an error using the "Exponential Backoff" approach. It fixed the ConcurrentModificationException but there are a lot of problems in Gremlin Server when sending multiple queries at the same time that shows how multi-threading is totally unsupported and unstable in Gremlin Server and we should try multi-threading in other Gremlin compatible databases as the response says. I experienced random inconsistencies in the data returned by the queries and errors like NegativeArraySize and other random stuff coming from the database, be warned of this to not waste time thinking that your code could be bugged like it happened to me.