What is the best way to perform soft delete in Graph database?

Question

What is the best way of implementing soft delete with timestamps( start date and end date) in Graph database?

score 4 · Accepted Answer · answered May 07 '21 at 11:26

Well, it's fairly straightforward to blind a traversal based on a timestamp. Take this example graph were "ts" is a mock timestamp represented as a long:

gremlin> g.addV('person').property('name','alice').as('a').
......1>   addV('person').property('name','bob').as('b').
......2>   addV('person').property('name','claire').as('c').
......3>   addE('interacted').property('ts', 125).from('a').to('b').
......4>   addE('interacted').property('ts', 126).from('a').to('b').
......5>   addE('interacted').property('ts', 127).from('a').to('b').
......6>   addE('interacted').property('ts', 126).from('b').to('c').
......7>   addE('interacted').property('ts', 150).from('b').to('c').
......8>   addE('interacted').property('ts', 151).from('a').to('b').iterate()

You can simply write your Gremlin to account for the "ts":

gremlin> yesterday = 130
==>130
gremlin> g.V().has('person','name','alice').
......1>   outE('interacted').has('ts',gt(yesterday)).inV().
......2>   values('name')
==>bob

Depending on the complexity of your requirements, adding this filter on "ts" may get burdensome and clutter your code. If that is the case, it's possible that SubgraphStrategy might help:

gremlin> sg = g.withStrategies(SubgraphStrategy.build().edges(has('ts',gt(yesterday))).create())
==>graphtraversalsource[tinkergraph[vertices:3 edges:6], standard]
gremlin> sg.V().has('person','name','alice').out('interacted').values('name')
==>bob
gremlin> g.V().has('person','name','alice').out('interacted').values('name')
==>bob
==>bob
==>bob
==>bob

One word of warning on this based on experience is that soft-deleted edges and vertices can cause performance issues because they still must be considered via .has('ts',gt(yesterday)) and that is not free. When looking at many edges/vertices, it can add up. I've seen instances where 20% of the query time was simply trying to filter out the soft-deleted stuff. A traditional relational database can easily index items out based on a deleted timestamp, but a graph traversal must consider the edge to see if it's deleted. — Adam, Sep 06 '22 at 17:28

What is the best way to perform soft delete in Graph database?

1 Answers1