4

so far I know ArangoDB uses MVCC and therefore it creates revisions of nodes and edges for a undefined period of time until the garbage collector removes them.

I would like to implement a graph database schema and I need to keep the state of this database at specific times. This means I will configures times when the database management system take a snapshot of the state (e.g. every week).

So my question in short: is it possible to keep the revisions/versions of nodes/edges in arangodb (or maybe with a plugin) and a timestamp of their creation?

If no, is there a other graph databases which is able to do this?

mawey
  • 93
  • 5
  • I can't answer your question on ArangoDB, but I thought you might want to take a look at FluxGraph(https://github.com/datablend/fluxgraph) which is built on Datomic. Some good work was done there in exploring time-aware graphs. FluxGraph is built using Blueprints which exposes it to the TinkerPop stack (http://tinkerpop.com). Perhaps you've already explored it, but ArangoDB has its own implementation as well (https://github.com/triAGENS/blueprints-arangodb-graph) though I'm not sure that it conforms to your needs. – stephen mallette Mar 24 '14 at 13:41
  • Thank you. I saw fluxgraph and I think it is exactly what I need. But I want to use it with ArangoDB or OrientDB. So the hint with the blueprints API for arangodb is very helpful. Do you know if fluxgraph is only usable with Java or is it possible to use Python? – mawey Mar 25 '14 at 19:01
  • I'm not super familiar with the intracies for FluxGraph, but I think the author developed it with a series of `TimeAware` interfaces that extend the Blueprints API. I think his intent was to perhaps see those consumed into the standard Blueprints API, but that hasn't happened and I don't think it will. That does mean however that it is possible that those interfaces could be used to build a wrapper graph implementation over OrientDB or ArangoDB implementations that use those interfaces. Please take this as being highly theoretical. – stephen mallette Mar 25 '14 at 19:06
  • The TinkerPop stack does have python connectors, but ultimately they all kinda boil down to using Gremlin to interact with the Graph. – stephen mallette Mar 25 '14 at 19:06

1 Answers1

3

I think you can use arangodump (link to ArangoDB client tools manual) binary to create a snapshot at the desired point in time. This will save the state of the database (or just the specific collections that contain your graph data) to JSON files, which can be used for auditing or later reloading the data. arangodump is contained in the ArangoDB distributions.

The data dumped by arangodump will not contain any creation timestamps, but if you need them you can make them part of your data by just filling a "created" attribute in each node / edge when you create it.

I hope this helps.

stj
  • 9,037
  • 19
  • 33
  • thank you for the answer. I think about this solution. But the problem is, that I copy the whole collection no matter if the data was changed or not. So after a while there is a lot of garbage data which mess up the disc space. I will use the snapshots to show activity of the user in a specific time. The user should be able to choose a week in a timeline to see which files/folders/keywords etc. he has used/modified or was assigned to at this timepoint. – mawey Mar 25 '14 at 18:57