1

Does gremlin provide the ability to clone a vertex for instance v1->v2, v1->v3, v1->v4 how can I simply and efficiently create a new vertex v5 that also has edges that point to v2, v3, v4 (the same places that v1's edges point to) without have to explicitly set them and instead saying something like g.createV(v1).clone(v2).

Note that I am using the AWS Neptune version of gremlin, solution must be compatible with that.

rossb83
  • 1,694
  • 4
  • 21
  • 40

1 Answers1

7

A clone step doesn't exist (yet), but it can be solved with a single query.

Let's start with some sample data:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V(4).valueMap(true)                                   // the vertex to be cloned
==>[label:person,name:[josh],age:[32],id:4]
gremlin> g.V(4).outE().map(union(identity(), valueMap()).fold()) // all out-edges
==>[e[10][4-created->5],[weight:1.0]]
==>[e[11][4-created->3],[weight:0.4]]
gremlin> g.V(4).inE().map(union(identity(), valueMap()).fold())  // all in-edges
==>[e[8][1-knows->4],[weight:1.0]]

Now the query to clone the vertex might look a bit scary at a first glance, but it's really just the same pattern over and over again - jumping between the original and the clone to copy the properties:

g.V(4).as('source').
  addV().
    property(label, select('source').label()).as('clone').
  sideEffect(                                                // copy vertex properties
    select('source').properties().as('p').
    select('clone').
      property(select('p').key(), select('p').value())).
  sideEffect(                                                // copy out-edges
    select('source').outE().as('e').
    select('clone').
    addE(select('e').label()).as('eclone').
      to(select('e').inV()).
    select('e').properties().as('p').                        // copy out-edge properties
    select('eclone').
      property(select('p').key(), select('p').value())).
  sideEffect(                                                // copy in-edges
    select('source').inE().as('e').
    select('clone').
    addE(select('e').label()).as('eclone').
      from(select('e').outV()).
    select('e').properties().as('p').                        // copy in-edge properties
    select('eclone').
      property(select('p').key(), select('p').value()))

And in action it looks like this:

gremlin> g.V(4).as('source').
......1>   addV().
......2>     property(label, select('source').label()).as('clone').
......3>   sideEffect(
......4>     select('source').properties().as('p').
......5>     select('clone').
......6>       property(select('p').key(), select('p').value())).
......7>   sideEffect(
......8>     select('source').outE().as('e').
......9>     select('clone').
.....10>     addE(select('e').label()).as('eclone').
.....11>       to(select('e').inV()).
.....12>     select('e').properties().as('p').
.....13>     select('eclone').
.....14>       property(select('p').key(), select('p').value())).
.....15>   sideEffect(
.....16>     select('source').inE().as('e').
.....17>     select('clone').
.....18>     addE(select('e').label()).as('eclone').
.....19>       from(select('e').outV()).
.....20>     select('e').properties().as('p').
.....21>     select('eclone').
.....22>       property(select('p').key(), select('p').value()))
==>v[13]
gremlin> g.V(13).valueMap(true)                                   // the cloned vertex
==>[label:person,name:[josh],age:[32],id:13]
gremlin> g.V(13).outE().map(union(identity(), valueMap()).fold()) // all cloned out-edges
==>[e[16][13-created->5],[weight:1.0]]
==>[e[17][13-created->3],[weight:0.4]]
gremlin> g.V(13).inE().map(union(identity(), valueMap()).fold())  // all cloned in-edges
==>[e[18][1-knows->13],[weight:1.0]]

UPDATE

Paging support is a little tricky. Let me split this whole thing into a 3-step process. I will use edge ids as the sort criterion and to identify the last processed edge (this might not work in Neptune, but you can use a unique sortable property instead).

// clone the vertex with its properties
clone = g.V(4).as('source').
  addV().
    property(label, select('source').label()).as('clone').
  sideEffect(
    select('source').properties().as('p').
    select('clone').
      property(select('p').key(), select('p').value())).next()

// clone out-edges
pageSize = 1
lastId = -1
while (true) {
  t = g.V(4).as('source').
    outE().hasId(gt(lastId)).
    order().by(id).limit(pageSize).as('e').
    group('x').
      by(constant('lastId')).
      by(id()).
    V(clone).
    addE(select('e').label()).as('eclone').
      to(select('e').inV()).
    sideEffect(
      select('e').properties().as('p').
      select('eclone').
        property(select('p').key(), select('p').value())).
    count()
  if (t.next() != pageSize)
    break
  lastId = t.getSideEffects().get('x').get('lastId')
}

// clone in-edges
lastId = -1
while (true) {
  t = g.V(4).as('source').
    inE().hasId(gt(lastId)).
    order().by(id).limit(pageSize).as('e').
    group('x').
      by(constant('lastId')).
      by(id()).
    V(clone).
    addE(select('e').label()).as('eclone').
      from(select('e').inV()).
    sideEffect(
      select('e').properties().as('p').
      select('eclone').
        property(select('p').key(), select('p').value())).
    count()
  if (t.next() != pageSize)
    break
  lastId = t.getSideEffects().get('x').get('lastId')
}

I don't know if Neptune allows you to execute full scripts - if not, you'll need to execute the outer while loops in you application's code.

Daniel Kuppitz
  • 10,846
  • 1
  • 25
  • 34
  • How well would you say this scales and/or what limits would this query have? If there were 1,000,000 outgoing/incoming edges would this still work? Will this work with AWS Neptune API? – rossb83 Aug 18 '18 at 11:15
  • 1
    I think it ultimately depends on the graph db you're using. However, having that many incident edges is generally seen as a bad graph model. – Daniel Kuppitz Aug 18 '18 at 17:10
  • it seems like AWS Neptune is limiting this query to only 999 edges, is there a way to paginate this query such that I can run it for nodes with larger number of edges? – rossb83 Aug 19 '18 at 02:16
  • 1
    You'll need a guaranteed sort order and a unique property on your edges, then it might work. I'll try to put something together later tonight, perhaps just extend the previous example using a page size of 1. – Daniel Kuppitz Aug 19 '18 at 03:19
  • you are too kind! Embarrassingly I believe I was mistaken, I must have had a typo in my latest query because I just cloned a node with 100,000 outgoing edges on the smallest Neptune instance type and it was successful and took under a minute. Though its not urgent, pagination would be nice, yes I'll have guaranteed ordering of my node id's v0 - v100000. PS my graph model has *lots* of incoming edges, but not many outgoing edges (which is what I believe you meant be incident edges). – rossb83 Aug 19 '18 at 03:43
  • 1
    Incident edges refer to both - incoming and outgoing edges. For the paging you need a unique (sortable) identifier/property on the edges, not on the vertices. I'll post an update in a few minutes. – Daniel Kuppitz Aug 19 '18 at 04:30
  • Hi @DanielKuppitz, I tried the query with Azure CosmosDb and I am getting an error says . If I change this – A_Nabelsi Jul 19 '19 at 18:54
  • I guess you'll have to wait for CosmosDB to catch up with the latest Gremlin features. – Daniel Kuppitz Jul 19 '19 at 21:49