0

I have a query that needs to get complete path based on property. There are locations and peoples. People can travel from a location to another so I want a complete map of where they started from and ended at Suppose

P1 travels from a -> b -> c -> d -> e -> f
P2 travels from c -> d -> e -> f
P3 travels from a -> b -> c -> d
P4 travels from b -> c -> a -> d -> e -> f
P5 travels from e -> f -> a -> b -> c
P6 travels from d -> e -> a -> b -> c
P7 travels from a -> c -> e -> f

Those are the path that I want from the graph. Where p1, p2 ... pn is the property in edge called name. I already came up with query but I don't know how to optimize it. Also it can't handle those people that travel from a vertex and end at same vertex I have time on every session (but maybe gremlin can't travel by previous time and same session?)

g.withSack([])
    .V() // will eventually have some starting condition of about 10 unique people
    .repeat(
        choose(
            loops().is(0),
            outE().as('outgoing')
                .where(
                    __.outV()
                    .inE().values('name')
                    .where(
                        eq('outgoing'))
                        .by()
                        .by(values('name')
                    )
                    .count().is(0)
                )
                .sack(assign)
                .by(
                    union(
                        sack().unfold(),
                        identity().values('name')
                    )
                    .fold()
                )
                .filter(
                    sack().unfold().dedup().count().is(1)
                )
                .inV(),
           
            outE()
                .sack(assign)
                .by(
                    union(
                        sack().unfold(),
                        identity().values('name')
                    )
                    .fold()
                )
                .filter(
                    sack().unfold().dedup().count().is(1)
                )
                .inV()
          )
    )
    .until(
        outE().filter(sack().unfold().dedup().count().is(1)).count().is(1)
    )
    .filter(path().unfold().count().is(gt(5)))
    .path()

Right now there are several limitations of it. It gets every starting path from a provided vertex. But the query frequently runs into

{
  "detailedMessage": "A timeout occurred within the script during evaluation.",
  "requestId": "f34358bd-9db9-488f-be66-613a34d29f9b",
  "code": "TimeLimitExceededException"
}

Or memory exception. Is there some way to optimize this query? I can't exactly replicate this in gremlify since I have about 50,000 unique sessions and each of those travel anywhere from 2 to 50 vertex.

I will eventually perform traffic analysis on it but I still can't get this to perform within default neptune time even with about 1000 limit. But I would like to get this within 10 sec at max if possible. Or 30 at the upper limit

Here's relatively simple way to replicate this graph

  g.addV('place').as('1').
  property(single, 'placename', 'a').
  addV('place').as('2').
  property(single, 'placename', 'b').
  addV('place').as('3').
  property(single, 'placename', 'c').
  addV('place').as('4').
  property(single, 'placename', 'd').
  addV('place').as('5').
  property(single, 'placename', 'e').
  addV('place').as('6').
  property(single, 'placename', 'f').
  addV('place').as('7').
  property(single, 'placename', 'g').
  addV('place').as('8').
  property(single, 'placename', 'h').
  addV('place').as('9').
  property(single, 'placename', 'i').
  addE('person').from('1').to('2').
  property('name', 'p1').addE('person').
  from('2').to('3').property('name', 'p1').
  addE('person').from('3').to('4').
  property('name', 'p1').addE('person').
  from('4').to('5').property('name', 'p1').
  addE('person').from('2').to('3').
  property('name', 'p2').addE('person').
  from('3').to('4').property('name', 'p2').
  addE('person').from('4').to('5').
  property('name', 'p2').addE('person').
  from('6').to('7').property('name', 'p3').
  property('time', '2022-05-04 12:00:00').
  addE('person').from('7').to('8').
  property('name', 'p3').
  property('time', '2022-05-05 12:00:00').
  addE('person').from('8').to('9').
  property('name', 'p3').
  property('time', '2022-05-10 12:00:00').
  addE('person').from('9').to('6').
  property('name', 'p3').
  property('time', '2022-05-03 12:00:00').
  addE('person').from('5').to('6').
  property('name', 'p4').addE('person').
  from('6').to('7').property('name', 'p4').
  addE('person').from('7').to('8').
  property('name', 'p4').addE('person').
  from('8').to('9').property('name', 'p4').
  addE('person').from('3').to('4').
  property('name', 'p5').addE('person').
  from('4').to('4').property('name', 'p5').
  addE('person').from('4').to('5').
  property('name', 'p5').addE('person').
  from('5').to('6').property('name', 'p5').
  addE('person').from('6').to('7').
  property('name', 'p5').addE('person').
  from('1').to('2').property('name', 'p6').
  addE('person').from('2').to('3').
  property('name', 'p6').addE('person').
  from('3').to('4').property('name', 'p6').
  addE('person').from('4').to('5').
  property('name', 'p6')

Also I am starting with about 5000 vertex in the graph as I have set the condition to be 5 people should start from the place to be consider a valid starting point.

Rajesh Paudel
  • 1,117
  • 8
  • 19
  • Are you sure your logic prevents cyclic paths? Typical ways to remove these are the cyclicPath() and simplePath() filters of the aggregate('x').....where(without('x')) pattern. – HadoopMarc Oct 23 '22 at 09:19
  • @HadoopMarc well the logic right now does not find any cyclic path for some reason. If a session starts from a and ends at a then the query returns nothing which is still a problem . But I am thinking about adding simplePath but the issue is right now with time limit. It's taking way too much time I can't even profile the query – Rajesh Paudel Oct 24 '22 at 12:25
  • No idea which graph system you use. On JanusGraph you can add debugging print statements using closures like map{it->System.out.println(it.get())}. Also the sack() step takes closures. – HadoopMarc Oct 24 '22 at 14:01
  • @HadoopMarc I am using Neptune database – Rajesh Paudel Oct 24 '22 at 15:13
  • It looks like you are likely touching a lot of data as your query starts with `V` and has no filtering. Also rather than special case the situation where `loops` is zero, why not just process those before the repeat starts? I think this query can be simplified a lot but without any sample data it is quite hard to give you a tested answer. Are you able to edit the question to include the `addE` and `addV` steps that create a sample graph? There is an example of creating a small test graph in the answer to https://stackoverflow.com/questions/73546575/gremlin-sack-sum-once-per-distinct-value – Kelvin Lawrence Oct 24 '22 at 16:45
  • @KelvinLawrence Sounds reasonable I will update the answer with graph. The reason we are doing that inside of repeat is we initially get the vertex that has some special cases. Like there should be about 5 unique people that start travelling from a place to another and so on. That's why we start with V but it will have some filtering. – Rajesh Paudel Oct 28 '22 at 12:53
  • OK I will take a look. This seems similar to the discussion we had about https://stackoverflow.com/questions/73433016/gremlin-traverse-path-along-the-same-property – Kelvin Lawrence Oct 29 '22 at 13:16
  • In line with the original question from @HadoopMarc, the sample graph has infinite loops (cycles) in it. For example look at this path where person `p5` keeps going from `place d` to `place d`: `path[a, p1, b, p1, c, p1, d, p5, d, p5, d, p5, d]` This will cause the query to time out unless the query is written in a way to handle such cases. – Kelvin Lawrence Oct 29 '22 at 13:37
  • There are also many cases where the person ends up back where they started, which is another form of cycle. The query needs to handle these in some way or it will essentially be in an infinite loop. `path[g, p3, h, p3, i, p3, f, p3, g]` – Kelvin Lawrence Oct 29 '22 at 14:04
  • @KelvinLawrence yeah it's kind of same. I tried it with simple path but I don't know how to eradicate such flows? I tried adding simple path but as this already timesout it does not provide any result. The timeout is still prevelent. So I am wondering if there's anything to prevent this. – Rajesh Paudel Oct 30 '22 at 09:48
  • 1
    The timeout is due to the cycles (infinite loops). If you add a `simplePath` any path that includes cycles will not be included in the results. You could look at using something like `until(cyclicPath().or(not(out()))` type of logic. It really depends on what you want the `repeat` termination condition(s) to be. If you know the maxiumm depth that you ever want to go to, you could look at adding something like `.or().loops().is(6)` to the `until`. – Kelvin Lawrence Oct 30 '22 at 21:14

0 Answers0