3

I'm using JanusGraph to add vertices to a cassandra backed database, and I noticed a large performance discrepancy when it comes to adding a vertex with (1) the addVertex() method provided by the JanusGraph java libraries vs (2) the addV() gremlin traversal function. Why is there such a discrepancy?

I am using JanusGraph version 0.2.0 with cql as the storage backend. I created a test that compares the time in milliseconds it takes to add and commit a vertex to the graph with three methods: (1) addV() gremlin function, (2) addV() gremlin function followed by an next() step to get the newly created vertex, and (3) the JanusGraph addVertex() method. I am starting from a completely empty graph storage. The code I used can be found below.

final Builder builder = JanusGraphFactory.build()
        .set("storage.backend", "cql")
        .set("storage.hostname", Config.get(CommonConfig.cassandra_host));

final JanusGraph graph = builder.open();

long nowMillis = TimeUtils.nowMillis();
graph.traversal().addV("myLabel");
graph.traversal().tx().commit();
System.out.println("(1) - Add vertex traversal only took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

nowMillis = TimeUtils.nowMillis();
graph.traversal().addV("myLabel").next();
graph.traversal().tx().commit();
System.out.println("(2) - Add vertex traversal and next took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

nowMillis = TimeUtils.nowMillis();
graph.addVertex("myLabel");
graph.traversal().tx().commit();
System.out.println("(3) - Add vertex method took " + (TimeUtils.nowMillis() - nowMillis) + " millis");

This is a sample output of running this:

(1) - Add vertex traversal only took 15 millis
(2) - Add vertex traversal and next took 739 millis
(3) - Add vertex method took 682 millis

This hints to me that (3) adding with JanusGraph addVertex does something similar to (2), but I don't understand why the time differences are so large. What causes (2) and (3) to take order of magnitude longer to run than (1)?

havenwang
  • 153
  • 9

1 Answers1

5

Your first bit of Gremlin that you are testing doesn't actually create a vertex. You are just measuring the creation of a Traversal object but not actually iterating it. The other two actually create a Vertex object in the graph. The general recommendation is to not use Graph.addVertex() as that is not a user focused API - it is meant for graph providers like JanusGraph. Only use the Gremlin language for interacting with you graph and that will give you the widest level of code portability.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • I see, I realized I've missed the concept of [terminal steps](http://tinkerpop.apache.org/docs/current/reference/#terminal-steps), thank you for the reply. Something like 700 milliseconds seems pretty slow to add a single vertex though, does this sound like expected performance given my use of JanusGraph + cassandra backend, or is there a way to speed this up? – havenwang Apr 30 '19 at 18:39
  • 1
    that seems slow. i wouldn't measure the `GraphTraversalSource` construction because ideally you would do `g = graph.traversal()` and then re-use `g`, but that shouldn't take a ton of time in your microbenchmark, i don't think. if you don't figure out how to speed things up you might want to bring your performance question to the janusgraph-users list https://groups.google.com/forum/#!forum/janusgraph-users – stephen mallette Apr 30 '19 at 18:44