0

We are using Titan with Persistit as backend, for a graph with about 100.000 vertices. Our use-case is quite complex, but the current problem can be illustrated with a simple example. Let's assume that we are storing Books and Authors in the graph. Each Book vertex has an ISBN number, which is unique for the whole graph.

I need to answer the following query: Give me the set of ISBN numbers of all Books in the Graph.

Currently, we are doing it like this:

// retrieve graph instance
TitanGraph graph = getGraph(); 
// Start a Gremlin query (I omit the generics for brevity here)
GremlinPipeline gremlin = new GremlinPipeline().start(graph);
// get all vertices in the graph which represent books (we have author vertices, too!)
gremlin.V("type", "BOOK");
// the ISBN numbers are unique, so we use a Set here
Set<String> isbnNumbers = new HashSet<String>();
// iterate over the gremlin result and retrieve the vertex property
while(gremlin.hasNext()){
    Vertex v = gremlin.next();
    isbnNumbers.add(v.getProperty("ISBN"));
}
return isbnNumbers;

My question is: is there a smarter way to do this faster? I am new to Gremlin, so it might very well be that I do something horribly stupid here. The query currently takes 2.5 seconds, which is not too bad, but I would like to speed it up, if possible. Please consider the backend as fixed.

stephen mallette
  • 45,298
  • 5
  • 67
  • 135
Martin Häusler
  • 6,544
  • 8
  • 39
  • 66

1 Answers1

2

I doubt that there is a much faster way (you will always need to iterate over all book vertices), however a less verbose solution to your task is possible with groovy/gremlin. On the sample graph you can run e.g. the following query:

gremlin> namesOfJaveProjs = []; g.V('lang','java').name.store(namesOfJaveProjs)
gremlin> namesOfJaveProjs
==>lop
==>ripple

Or for your book graph:

isbnNumbers = []; g.V('type','BOOK').ISBN.store(isbnNumbers)
Faber
  • 1,504
  • 2
  • 13
  • 21
  • Thanks for your answer. At least I know that I'm doing it the way it is supposed to be done. I will have to do some application-level caching to make it faster I guess, it's a read-mostly graph, so it should not be a big deal. – Martin Häusler Feb 19 '15 at 17:04