Gremlin query via HTTP is extremely slow

Question

So, I'm running two very simple gremlin queries through both the Gremlin Console and via an HTTP request (issued from the same machine as the Gremlin Server resides on). The queries look like this:

First query:

console: g.V(127104, 1069144, 590016, 200864).out().count()
http: curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"g.V(127104, 1069144, 590016, 200864).out().count()}' http://localhost:8182

Second query:

console: g.V(127104, 1069144, 590016, 200864).out().in().dedup().count()
http: curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"g.V(127104, 1069144, 590016, 200864).out().in().dedup().count()}' http://localhost:8182

It is by no means a huge graph - the first query returns 750 and the second query returns 9154. My problem is that I see huge performance differences between the queries run via HTTP compared to the console. For the first query both the console and the HTTP request returns immediately and looking at the gremlin server log, I'm please to see that the query takes only 1-2 milliseconds in both cases. All is good.

Now for the second query, the picture changes. While the console continues to provide the answer immediately, it now takes between 4 and 5 seconds (!!) for the HTTP request to return the answer! The server log reports roughly the same execution time (some 50-60 ms) for both executions of the second query, so what is going on? I'm only doing a count(), so the slow HTTP response cannot be a serialization issues - it only needs to return a number, just as in the first query.

Does anyone have any good ideas?

UPDATE:

Running profile() gives some interesting results (screen shots attached below). It looks like everything runs way slower when called via HTTP, which to me makes no sense...

From console:

Via HTTP request:

If you execute the second query again over HTTP - does it still take 4-5 seconds? — stephen mallette, Sep 07 '18 at 12:05
Yes, it consistently takes 4-5 seconds via HTTP. I've actually also tried to create another graph and the situation is easily reproducible. Calling via HTTP adds several seconds to the response-time compared to the console. — Glennie Helles Sindholt, Sep 09 '18 at 17:19
I tend to expect http to be slower, but you're showing a big margin on a `count()` and I can't think of what would cause that. Could you try this on TinkerGraph to rule out that JanusGraph is somehow the issue? Then, since it is reproducible, if you could amend your question to include a simple script to generate a graph that recreates the problem, that would be excellent. — stephen mallette, Sep 10 '18 at 10:49
Hmmm... looks like you are on to something. I've tried to load data into Tinkergraph instead and the HTTP request is now returning the result as fast as in the console. So why would Janusgraph be slowing the HTTP response down?? — Glennie Helles Sindholt, Sep 10 '18 at 17:04
when you use the console, are you sending `:remote` requests to JanusGraph hosted in Gremlin Server (i.e. Janus Server) or querying an embedded `JanusGraph` instance? — stephen mallette, Sep 10 '18 at 17:12
please include a `profile()` of your `:remote` and HTTP requests (i think you can get back a profile object over HTTP...if not, then maybe try to `profile().toString()`? — stephen mallette, Sep 11 '18 at 11:04
I have updated the question with screen shots of running `profile()`. Does this make any sense to you @stephenmallette?? — Glennie Helles Sindholt, Sep 14 '18 at 08:23
wow - i don't even have a guess at that one. it seems as though the total traversal cost is largely all in JanusGraph code, but I'm not sure what could be happening in TinkerPop code (i.e. `HttpGremlinEndpointHandler`) to influence that speed difference, but I'll keep thinking about it. perhaps you should point the JanusGraph user list to this question to see if any ideas surface there. Also, we may be reaching a point where you will need to provide a Java Flight Recording of both executions - that would hopefully yield some insight. — stephen mallette, Sep 14 '18 at 10:54
Here's the same thread on [janusgraph-users](https://groups.google.com/d/msg/janusgraph-users/P1Kd5duVp-k/LGenezzRBAAJ) — Jason Plurad, Sep 14 '18 at 13:47

Glennie Helles Sindholt · Accepted Answer · 2018-11-01T09:12:15.413

With the help of @stephen mallette I managed to find the answer to this question. It turns out that the console - which runs in a session - caches answers to queries, so when I queried the same ids multiple times the console simply retrieved the answer from the cache and didn't actually query Dynamo at all. HTTP on the other hand runs sessionless, so each query over HTTP was hitting Dynamo. Needless to say - retrieving a result from a cache is much, much faster than having to query Dynamo.

In order to force the query to hit Dynamo in the console, I have added a g.tx().rollback() after each query execution and the query now runs in comparable time whether I use the console or query via HTTP. Unfortunately it's rather slow in my opinion, but that's probably a topic for a different question :)

UPDATE: The reason for the slow response times with Dynamo was due to read/write rate-limiting that had been added to keep cost of Dynamo down. When increasing the rate-limits significantly, the query ran much faster. This, unfortunately, gets too expensive for me going forward, so I have now switched to running with Cassandra as backend instead, which also gets me excellent response times :)

Gremlin query via HTTP is extremely slow

1 Answers1