0

Preface

  • I'm new to Gremlin and working through Kelvin Lawrence's awesome eBook on the topic in order to solve a specific use-case.
  • Due to the sheer amount to learn, I'm asking this question to get recommendations on how I might approach the challenge so that, as I read the eBook, I'll better know the sections to which to pay extra attention.
  • I intend to use AWS Neptune in the pursuit of solving this, so I tagged that topic as well.

Question

Respecting departure/arrival times of legs + other constraints, can the shortest path (the real-world, logistical meaning of "path") between origin and destination be "queried" (i.e., can I use the Gremlin console with a single statement)? Or is the use-case of such complexity that I will effectively need to write a program to accomplish it?


Use-Case / Detail

I hope to answer the question:

Starting at ORIGIN on DAY, can I get to DESTINATION while respecting [CONDITIONS]?

The good news is that I only need a true/false response (so limit(1)?) and a lack of a result (e.g., []) suffices for "no".

What are the conditions?

  • Flight schedules need to be respected. Instead of simple flight routes (i.e., a connection exists between BOSton and DALlas), I have actual flight schedules (i.e., on Wednesday, 9 Nov 2022 at 08:40, flight XYZ will depart BOSton and then arrive DALlas at 13:15) ... consequently, if/when there are connections, I need to respect arrival and departure times + some sort of buffer (i.e., a path for which a Traveler would arrive at 13:05 and depart on another leg at 13:06 isn't actually a valid path);
  • Aggregate travel time / cost limits. The answer to the question needs to be "No" if a path's aggregate travel time or aggregate cost exceeds specified limits. (Here, I believe I'll need to use sack() to track the cost - financial and time - of each leg and bail out of the repeat() until loop when either is hit?)

I apologize b/c I know this isn't a good StackOverflow question, since it's not technically specific -- my hope is that, at least, some specific technical recommendations might result.

The use-case seems like the varsity / pro version of the flight routes example presented in the eBook, which is perfect for someone brand-new to Gremlin ...

Dan
  • 4,197
  • 6
  • 34
  • 52

1 Answers1

1

There are a number of ways you might model this. One way I have seen used effectively is to essentially have two graphs. This first just knows about routes. You use that one to find ways to get from A to Z in x-hops. Then using the second graph, which tracks actual flights, using the results from the first search you look for flights within the time constraints you need to impose. So there is really the data modeling question and then the query writing part. Obviously the data model should enable the queries to be as efficient as possible.

There are a couple of useful blog posts related to your question. They mention Neo4j but are really quite generic and mainly focus on the data modeling aspects of your question.

I would focus on the data model, and once you have that, focus on the Gremlin queries. Amazon Neptune also now supports openCypher as an alternative property graph query language.

If you already have a data model worked out and can share a sample, I'm happy to update the answer with an example query or two.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38