What is the best way of implementing soft delete with timestamps( start date and end date) in Graph database?
Asked
Active
Viewed 197 times
1 Answers
4
Well, it's fairly straightforward to blind a traversal based on a timestamp. Take this example graph were "ts" is a mock timestamp represented as a long:
gremlin> g.addV('person').property('name','alice').as('a').
......1> addV('person').property('name','bob').as('b').
......2> addV('person').property('name','claire').as('c').
......3> addE('interacted').property('ts', 125).from('a').to('b').
......4> addE('interacted').property('ts', 126).from('a').to('b').
......5> addE('interacted').property('ts', 127).from('a').to('b').
......6> addE('interacted').property('ts', 126).from('b').to('c').
......7> addE('interacted').property('ts', 150).from('b').to('c').
......8> addE('interacted').property('ts', 151).from('a').to('b').iterate()
You can simply write your Gremlin to account for the "ts":
gremlin> yesterday = 130
==>130
gremlin> g.V().has('person','name','alice').
......1> outE('interacted').has('ts',gt(yesterday)).inV().
......2> values('name')
==>bob
Depending on the complexity of your requirements, adding this filter on "ts" may get burdensome and clutter your code. If that is the case, it's possible that SubgraphStrategy
might help:
gremlin> sg = g.withStrategies(SubgraphStrategy.build().edges(has('ts',gt(yesterday))).create())
==>graphtraversalsource[tinkergraph[vertices:3 edges:6], standard]
gremlin> sg.V().has('person','name','alice').out('interacted').values('name')
==>bob
gremlin> g.V().has('person','name','alice').out('interacted').values('name')
==>bob
==>bob
==>bob
==>bob

stephen mallette
- 45,298
- 5
- 67
- 135
-
One word of warning on this based on experience is that soft-deleted edges and vertices can cause performance issues because they still must be considered via .has('ts',gt(yesterday)) and that is not free. When looking at many edges/vertices, it can add up. I've seen instances where 20% of the query time was simply trying to filter out the soft-deleted stuff. A traditional relational database can easily index items out based on a deleted timestamp, but a graph traversal must consider the edge to see if it's deleted. – Adam Sep 06 '22 at 17:28