Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

Question

I have this scenario where each (source) Entity has Properties that have a target pointing to another Entity. Those property mappings are grouped together. What I want to do is query those Entities that have specific properties with corresponding targets but are under the same group.

The hypergraph would like that (rectangles are the hyperedges):

The JSON would look like that:

{ 
    id: 1, label: "Entity", 
    propertyGroups: [
    { 
        propertyGroupUuid: GroupUuid1, 
        property: {id: 1, label: "Property", name: "aName1"},
        target: {id: 2, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 2, label: "Property", name: "aName2"},
        target: {id: 3, label: "Entity"}
    },
    { 
        propertyGroupUuid: GroupUuid2, 
        property: {id: 3, label: "Property", name: "aName3"},
        target: {id: 4, label: "Entity"}
    }]
}

The flattest version of this in the graph database could look like that:

While the most expanded version of it could look like that:

So if I want to:

get all Entities that have Property 2 and Property 3 under the same PropertyGroupUuid "targeting" Entity 3 and Entity 4 respectively I should get back Entity 1
get all Entities that have Property 1 and Property 2 under the same PropertyGroupUuid "targeting" Entity 2 and Entity 3 respectively I should NOT get back Entity 1

How is it possible to do that with gremlin against the two versions of the graph and which one is more flexible/performant using the correct indices like the ones incorporated by DSE Graph? Are there better alternatives that I haven't thought of? If the answer is detailed and well explained I will give a bounty of at least 50 :)

Thank you!

You confuse me because you are showing properties as nodes. Generally, hyper-edges are implemented as nodes. Looking only at the initial diagram you would add three HE nodes, one for each of the three rectangles and add a link from each entity to each HE node that it is in. If properties and property groups are nodes then they would have links to their containing HE nodes as well. — Paul Jackson, Jan 08 '17 at 11:48
yeah my properties are actually objects with names not graph properties.. That's why I have them as nodes.. — Michail Michailidis, Jan 08 '17 at 17:33
So how do you access a "propety"? `g.V().has("Property", "name", "Property 1")`? — Daniel Kuppitz, Jan 09 '17 at 13:57
@Daniel yeah property names or their ids of course are unique — Michail Michailidis, Jan 09 '17 at 18:42

Daniel Kuppitz · Answer 1 · 2017-01-09T21:09:21.670

1

I don't understand your first model with decoupled property nodes, but here's the traversal for model 2:

g.V().has("Property", "name", "Property 2").in("hasProperty"). /* start at any of the property 2  */
  filter(out("hasTarget").has("name", "Entity 3")).            /*   with target entity 3          */
  in("hasSubGroup").filter(                                    /* traverse to the property group  */
    out("hasSubGroup").and(                                    /* traverse to all sub-groups      */
      out("hasProperty").has("name", "Property 3"),            /* filter those that are linked to */
      out("hasTarget").has("name", Entity 4")                  /*   property 3 w/ target entity 4  */
    )
  ).in("hasGroup")                                             /* traverse to all entities that match the above criteria */

Not knowing anything about the data in your graph, it's hard to predict the performmance for this traversal. But in general, the performance should be okay if a) property names are indexed and b) the branching factor is low.

edited Jan 09 '17 at 21:09

answered Jan 09 '17 at 20:21

Daniel Kuppitz

10,846
1
25
34

Thank you Daniel! The first model has decoupled properties so I can lookup the ids and names.. it could definitely be stored in Cassandra or somewhere else. I usually go with the second approach so not sure if the first is viable. The branching factor you mean how many Property sub-trees on each of the entities? – Michail Michailidis Jan 09 '17 at 20:28
I found a typo ```has("name", "Property 4")``` needs to be ```has("name", "Entity 4")```. Also where do you check that the subgroups are either the same or different for Property 2 and 3? I was expecting something more explicit with usage of ```as()```. Thanks! – Michail Michailidis Jan 09 '17 at 20:31
could it be that ```in("hasSubGroup").filter(``` needs to be ```in("hasGroup").filter(``` – Michail Michailidis Jan 09 '17 at 20:49
1

You're right about the typo, but the rest looks good to me. By "branching factor" I mean the number of out edges per `PropertyGroup` and `PropertySubGroup`. – Daniel Kuppitz Jan 09 '17 at 21:02

Which data modeling is better for this hypergraph performance-wise using Gremlin and DSE Graph?

1 Answers1