We use ArangoDB and PostgreSQL to store almost identical data. PostgreSQL is used to perform general types of queries which relational databases can perform well. ArangoDB was selected to perform kind of queries like graph traversals, finding a shortest path, etc.
At the moment we have a table with 160000 records in PostgreSQL and a collection with the same amount of documents in ArangoDB.
The API we are working on will be used by multiple number of users at the same time, so first point I wanted to check was how both ArangoDB and PostgreSQL would perform under the load. I created a simple load test which as a workload performs a simple select query with the filter to both ArangoDB and PostgreSQL.
The query selects top N records/documents with the filter by date field.
When I run load test all the queries to PostgreSQL are executed within 0.5 second, I increase the amount of users from 10 to 100 and it does not affect execution time at all.
The same queries to ArangoDB are taking about 2 seconds when you start with a single user, then the response time grows in the direct ratio with the amount concurrent users. With 30 concurrent users all the queries would time out after waiting for 60 seconds for the reply.
I tried to debug arangojs connector and found this:
var maxTasks = typeof agent.maxSockets === 'number' ? agent.maxSockets * 2 : Infinity;
and this:
Connection.agentDefaults = {
maxSockets: 3,
keepAlive: true,
keepAliveMsecs: 1000
};
which means that default arangojs behavior is to send not more than 6 concurrent queries to ArangoDB at the same time which leads to all the rest queries being queued on Node.js side. I tried to increase the number but it did not help and now it looks like all the queries are queued on the ArandoDB side. Now if I run the load and try to execute some query using ArangoDB Web Interface the query would hand for the unpredictable amount of time (depending on the amount of users at the moment) then return the result and would show me that it has been executed in about 4 seconds which is not true. For me it looks like ArangoDB can only execute one query a time while all the other queries are queued...
Am I missing something? Are there any setting to tune ArangoDB and improve it's performance under the load?
Update:
We use ArangoDB 3.0 and run it as a Docker container (from official image) with 1.5 GB of RAM.
Sample Document (we have about 16 000 of these):
{
"type": "start",
"from_date": "2016-07-28T10:22:16.000Z",
"to_date": "9999-06-19T18:40:00.000Z",
"comment": null,
"id": "13_start",
"version_id": 1
}
AQL Query:
FOR result IN @@collection
FILTER (result.version_id == 1)
FILTER (result.to_date > '2016-08-02T15:57:45.278Z')
SORT result._key
LIMIT 100
RETURN result