We are using Titan with Persistit as backend, for a graph with about 100.000 vertices. Our use-case is quite complex, but the current problem can be illustrated with a simple example. Let's assume that we are storing Books and Authors in the graph. Each Book vertex has an ISBN number, which is unique for the whole graph.
I need to answer the following query: Give me the set of ISBN numbers of all Books in the Graph.
Currently, we are doing it like this:
// retrieve graph instance
TitanGraph graph = getGraph();
// Start a Gremlin query (I omit the generics for brevity here)
GremlinPipeline gremlin = new GremlinPipeline().start(graph);
// get all vertices in the graph which represent books (we have author vertices, too!)
gremlin.V("type", "BOOK");
// the ISBN numbers are unique, so we use a Set here
Set<String> isbnNumbers = new HashSet<String>();
// iterate over the gremlin result and retrieve the vertex property
while(gremlin.hasNext()){
Vertex v = gremlin.next();
isbnNumbers.add(v.getProperty("ISBN"));
}
return isbnNumbers;
My question is: is there a smarter way to do this faster? I am new to Gremlin, so it might very well be that I do something horribly stupid here. The query currently takes 2.5 seconds, which is not too bad, but I would like to speed it up, if possible. Please consider the backend as fixed.