0

I have an performance issue when I try to get result from a projection with gremlin. My approach is for sure false. But I understand well the issue.

I have a data models like :

enter image description here

I want to get a table for all D :

V1 | D.id | V2 | V3 | V4 | V5 | V6

To do it I have try a request like (every circles is a node):

g.V().hasLabel(A)
.out().hasLabel(B).as('B_node')
.out().hasLabel(C)
.out().hasLabel(V1).values('value1').as('value1')
.select('B_node')
.out().hasLabel(D)
.project('V1', 'D.id', 'V2', 'V3', 'V4', 'V5', 'V6')
.by(select('value1'))
.by(id())
.by(out().hasLabel(E).out().hasLabel(V2).values('value2'))
.by(out().hasLabel(F).out().hasLabel(V3).values('value3'))
.by(out().hasLabel(F).out().hasLabel(V4).values('value4'))
.by(out().hasLabel(G).out().hasLabel(V5).values('value5'))
.by(out().hasLabel(G).out().hasLabel(V6).values('value6'))

The problem is the number of node D and the number of node out of D which is large. I understand for each D I will execute multiple times the loop to find F and to find G. How can I avoid this ? and create an alias to do the loop only one times ?

If I'm not clear do not hesitate to ask me questions.

Pred05
  • 492
  • 1
  • 3
  • 13
  • Can you maybe explain the significance of the different labels? Are these representative of different objects in your dataset? I'm just curious what makes 'D' significant here. Will you always have some sort of label 'D' that you're outputting with the leaf nodes in the subgraph? So is this basically "starting anywhere in the graph, find me all leaf nodes and 'D'?" – Taylor Riggan Dec 18 '20 at 16:07
  • To image my business case. B is a car brand, D the car and all it child the composition of the car (for example F is the engine, V3 the power of the engine and V4 the consumption of the engine). So I want to list all cars with a little set of characteristics. Is respond to your question ? – Pred05 Dec 18 '20 at 16:25
  • Makes sense. So do the characteristics (the Vx vertices) and the model of car (D) need to be in an specific order? Or are you ok having a list of values containing all of those things in any order? Is there a reason you have V1 first and then then D? – Taylor Riggan Dec 18 '20 at 16:40
  • The order of the list doesn't matter for now. If we need to order the list we will do a post processing I think. V1 is an information of B which will be redundant for each car. It is only useful to categorized data. – Pred05 Dec 18 '20 at 17:05

1 Answers1

0

Not saying this is fully optimal, but this maybe give you some ideas on other things to try:

   g.V().hasLabel('A').
        repeat(
            out().
            sideEffect(
                or(
                    hasLabel('D'),
                    not(out())
                ).
                aggregate('collect'))
            ).
            fold().select('collect').unfold().label()

Or another way:

g.V().hasLabel('A').
    union(
        repeat(
            out().
            sideEffect(
                hasLabel('D').
                aggregate('collect')
             )
        ).
        until(not(out())).label(),
        select('collect').unfold().label()
    )

Output:

D, V1, V2, V3, V4, V5, V6

In either case, you're starting at vertex with label 'A' and traversing out in the graph until you hit all leaf nodes. Along the way you also want to pick up the vehicle node(s), which is what the sideEffect() step is for.

For those that stumble across this and want the graph to test with:

g.addV('A').as('A').
    addV('B').as('B').
    addV('C').as('C').
    addV('D').as('D').
    addV('E').as('E').
    addV('F').as('F').
    addV('G').as('G').
    addV('V1').as('V1').
    addV('V2').as('V2').
    addV('V3').as('V3').
    addV('V4').as('V4').
    addV('V5').as('V5').
    addV('V6').as('V6').
    addE('so-65359695').from('A').to('B').
    addE('so-65359695').from('B').to('C').
    addE('so-65359695').from('C').to('V1').
    addE('so-65359695').from('B').to('D').
    addE('so-65359695').from('D').to('G').
    addE('so-65359695').from('G').to('V5').
    addE('so-65359695').from('G').to('V6').
    addE('so-65359695').from('D').to('E').
    addE('so-65359695').from('E').to('V2').
    addE('so-65359695').from('D').to('F').
    addE('so-65359695').from('F').to('V3').
    addE('so-65359695').from('F').to('V4')
Taylor Riggan
  • 1,963
  • 6
  • 12