Graph Database Design to Avoid Loops

Question

I am designing a system that computes best shipping route for commercial containers.
as such the path a container typically takes is:
pickup -> port of load -> port of destination -> delivery

I have composed a list of known locations from which a pickup/delivery can take place (such as cities) and a list of ports as well as the connections between those.

a sample of the data can be seen here

When looking for a route between Austin -> Frankfurt the graph should return only this path:

Austin -> Florida -> Port of Florida -> Port of Hamburg -> Frankfurt

Austin -> NYC -> Port of NYC -> Port of London -> Port of Hamburg ->Frankfurt is ruled out because it has two international steps

the graph also returns round trips (which it should not return) for example
Austin -> Florida -> Port of Florida -> Port of Hamburg -> Berlin -> Port of Hamburg -> Frankfurt

thus far I have composed the following gremlin query

g.V(*from_vertices)
    .repeat(
        outE()
        .has("ff_id", within(ff_id, "ANY"))
        .has("quote_methods", containing(quote_method.value))
        .has("valid_to", gte(current_date))
        .has("valid_from", lte(current_date))
        .in_v()
    )
    .until(hasId(within(*to_vertices)))
    .path()
    .as_("p")
    .map(unfold().coalesce(values("international_stops"), constant(0)).sum_())
    .as_("international_stops")
    .filter_(select("international_stops").is_(lte(1)))
    .select("p")
    .map(unfold().coalesce(values("pricing_document_ids"), constant("")).fold())
    .to_list()

I face two issues:

loops in the graph, the graph contains many loops, in addition to immediate ones it also contains round trips that take an arbitrary amount of edges
Due to memory and performance limitations I am unable to get all paths and then filter the ones containing loops

What is your 1 specific researched non-duplicate question? [ask] [Help] [mre] — philipxy, Nov 07 '22 at 11:28
Please note that the [tag:graphdb] tag (in contrast to the [tag:graph-databases] tag) is for a specific product, which, I suppose, you are not using, right? — Stefan - brox IT-Solutions, Nov 07 '22 at 11:43
Please also see the answer to your other question: https://stackoverflow.com/questions/74337850/aws-neptune-memorylimitexceededexception-on-a-small-dataset/74348078#74348078 — Kelvin Lawrence, Nov 07 '22 at 14:21

score 0 · Answer 1 · answered Nov 07 '22 at 12:41

I believe what you're missing here is the simplePath() step [1][2]. This will ignore cycles found by the repeat() step.

Ex:

g.V(<id>).repeat(out().simplePath()).until(hasId(<target-id>)).path()

[1] https://tinkerpop.apache.org/docs/current/reference/#simplepath-step

[2] https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#sp

Bingqing Lyu · Answer 2 · 2023-07-21T11:49:00.207

I may recommend GraphScope for your graph query needs due to its impressive performance and scalability.

In your cases,

GraphScope provide a PathExpand() operator, and you can query a simple_path (with no loops) simply by:

g.V().out("1..5").with('PATH_OPT', 'SIMPLE')

More details and examples can be found in the official doc.

GraphScope is designed to be efficient and scalable. It outperforms other systems such as mongoDB and Neo4j (Communication Version) by orders of magnitude.

Beside, GAIA-IR, the interactive graph query engine in GraphScope, provides a unified intermediate representation layer, making it easily to incorporate various graph query languages. Currently, GAIA-IR already supports the most popular graph query languages including Gremlin and Cypher.

You could refer to this article to learn the design of GAIA-IR and how it can be deployed and used.

Disclaimer: I'm an author of GraphScope.

Graph Database Design to Avoid Loops

2 Answers2