0

What is the best way of implementing soft delete with timestamps( start date and end date) in Graph database?

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
Thirumal
  • 8,280
  • 11
  • 53
  • 103

1 Answers1

4

Well, it's fairly straightforward to blind a traversal based on a timestamp. Take this example graph were "ts" is a mock timestamp represented as a long:

gremlin> g.addV('person').property('name','alice').as('a').
......1>   addV('person').property('name','bob').as('b').
......2>   addV('person').property('name','claire').as('c').
......3>   addE('interacted').property('ts', 125).from('a').to('b').
......4>   addE('interacted').property('ts', 126).from('a').to('b').
......5>   addE('interacted').property('ts', 127).from('a').to('b').
......6>   addE('interacted').property('ts', 126).from('b').to('c').
......7>   addE('interacted').property('ts', 150).from('b').to('c').
......8>   addE('interacted').property('ts', 151).from('a').to('b').iterate()

You can simply write your Gremlin to account for the "ts":

gremlin> yesterday = 130
==>130
gremlin> g.V().has('person','name','alice').
......1>   outE('interacted').has('ts',gt(yesterday)).inV().
......2>   values('name')
==>bob

Depending on the complexity of your requirements, adding this filter on "ts" may get burdensome and clutter your code. If that is the case, it's possible that SubgraphStrategy might help:

gremlin> sg = g.withStrategies(SubgraphStrategy.build().edges(has('ts',gt(yesterday))).create())
==>graphtraversalsource[tinkergraph[vertices:3 edges:6], standard]
gremlin> sg.V().has('person','name','alice').out('interacted').values('name')
==>bob
gremlin> g.V().has('person','name','alice').out('interacted').values('name')
==>bob
==>bob
==>bob
==>bob
stephen mallette
  • 45,298
  • 5
  • 67
  • 135
  • One word of warning on this based on experience is that soft-deleted edges and vertices can cause performance issues because they still must be considered via .has('ts',gt(yesterday)) and that is not free. When looking at many edges/vertices, it can add up. I've seen instances where 20% of the query time was simply trying to filter out the soft-deleted stuff. A traditional relational database can easily index items out based on a deleted timestamp, but a graph traversal must consider the edge to see if it's deleted. – Adam Sep 06 '22 at 17:28