0

I am newbie to gremlin. I have a graph with airports as nodes and flights as edges with arrival time and departure time as edge properties. Now I am trying to use tinkerpop 3.1 to get the list of connections with 1 layovers, 2 layovers, 3 layovers given a origin airport, destination airport and departure time . I got it to work for 1 layover using below query. I am having tough time trying to generalize this query to find n connections using repeat and match. Any help is appreciated.

Sample graph script

graph = TinkerFactory.createModern()
g = graph.traversal()
g.addV('airport').property('name','PDX').as('PDX').
  addV('airport').property('name','JFK').as('JFK').
  addV('airport').property('name','PHX').as('PHX').
  addV('airport').property('name','ORD').as('ORD').
  addV('airport').property('name','IAD').as('IAD').
  addE('flight').property('depTime',9).property('arrTime',10).from('PDX').to('JFK').
  addE('flight').property('depTime',10).property('arrTime',11).from('PDX').to('PHX').
  addE('flight').property('depTime',12).property('arrTime',13).from('PDX').to('ORD').
  addE('flight').property('depTime',13).property('arrTime',14).from('PDX').to('IAD').
  addE('flight').property('depTime',11).property('arrTime',14).from('JFK').to('IAD').
  addE('flight').property('depTime',10).property('arrTime',12).from('PHX').to('IAD').
  addE('flight').property('depTime',14).property('arrTime',15).from('ORD').to('IAD').iterate()


Gremlin query


g.V().has("airport","name","PDX").outE().has("depTime",gt(6)).
match(
__.as('e1').values('arrTime').as('e1Arr'),
__.as('e1').outV().as('v1'),
__.as('e1').inV().as('v2'),
__.as('v2').outE().as('e2'),
__.as('e2').values('depTime').as('e2Dep')).
where('e2Dep',gt('e1Arr')).select('e2').inV().as("v3").has("airport","name","IAD").
select("v1","e1","v2","e2","v3").by("name").by(valueMap()).by("name").by(valueMap()).by("name");
  • 2
    it would help if you added a small sample graph as a script that could be run in the Gremlin Console - for example see https://stackoverflow.com/a/44886243/1831717 – stephen mallette Jul 04 '17 at 10:49

1 Answers1

2

I don't have a graph to test it, but this query should do the trick:

g.V().has("airport","name","PDX").outE().has("depTime", gt("09:30")).as("e").inV().
 emit().
   repeat(outE().filter(project("a","b").by("depTime").by(select(last, "e").by("arrTime")).
            where("a", gt("b"))).as("e").inV()).times(2).
 has("name", "IAD").path().by("name").by(valueMap())

If the dataset is relatively static, I would suggest to create shortcuts, e.g. pre-compute route vertices, that have a connection to all stops. For example:

g.V().has("airport","name","PDX").as("a0").
  V().has("airport","name","JFK").as("a1").
  V().has("airport","name","IAD").as("a2").
  addV("route").property("from", "PDX").
                property("to", "IAD").
                property("stops", 2).
                property("depTime", 9).
                property("arrTime", 14).as("r").
  addE("stop").from("r").to("a0").property("n", 0).
  addE("stop").from("r").to("a1").property("n", 1).
  addE("stop").from("r").to("a2").property("n", 2).iterate()

These route vertices will allow you to find routes of any length very quickly. The result of the first query can be used to create the route vertices; however, since it's timing out, you'll either have to increase timeouts + memory or run it as an OLAP query. The latter is preferred, but requires a few tweaks in the query:

gremlin> g.V().has("airport","name","PDX").outE("flight").has("depTime", gt(9)).as("e").
           values("depTime").as("d").select("e").inV().
           emit().
             repeat(outE("flight").filter(project("a","b").by("depTime").by(select(last, "d")).
                      where("a", gt("b"))).as("e").
                      values("depTime").as("d").select(last, "e").inV()).times(2).
           has("name", "IAD").path()
==>[v[0],e[13][0-flight->8],13,e[13][0-flight->8],v[8]]
==>[v[0],e[12][0-flight->6],12,e[12][0-flight->6],v[6],e[16][6-flight->8],14,e[16][6-flight->8],v[8]]
Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • @stephen Thank you. I added a sample graph script. This query works when I use limit(n). I have a graph with a million edges connecting different airports. When I do a count() on this query it times out. Looks like the path step is an expensive step(http://tinkerpop.apache.org/docs/3.1.1-incubating/reference/#path-step) . Is there a better way to keep track of paths? I even tried adding dedup, simplePath, condition to remove visiting the same airports again still the query times out(waited 5 min) – user3233001 Jul 04 '17 at 20:56
  • I've updated my answer to show you how to improve the query performance. Which graph db are you using? – Daniel Kuppitz Jul 05 '17 at 10:38
  • Thanks Daniel. I am using DSE Graph 5.1 . I agree with pre-computing. I tried running it in OLAP mode. It gives me a error "It is not possible to access more than a path element's id on GraphComputer". What would be the tweak? – user3233001 Jul 05 '17 at 14:52
  • @stephen, Daniel any advice on this is appreciated. Is there any way we can pass in multi-thread hint in the gremlin query? – user3233001 Jul 06 '17 at 17:56
  • No, this would be the task of the underlying graph db. – Daniel Kuppitz Jul 06 '17 at 20:08