24

have a question on graph databases, can some one help me please? I'm handling quite a lot of data in mysql about 5M records a day sent by a router like device, access points, wireless bridges. The data is usually health data, gps etc... these are devices on vehicles. How do you handle time based data in graph databases? Has anyone applied neo4j for time-based data? It would be great to know how you query intervals and how you'd go about modelling.

I guess I can create a node for every single time i receive data with properties set each time like changed gps, health? It would be a time based graph - does that sound right? well with 5M rows mysql isn't performing bad - but as router gets new functionality new data comes through and I need to create new models again which isn't bad but not great. i want something which is semi structured and makes relating different things like why the user got kicked out is because of an access point associated to the router is down. My usual queries would be to raise alerts to say one of the device is down or if there is a reduced throughput etc. Would neo4j help me in marrying up these relationships better than mysql?

Would love to know what you guys think, any comments + thoughts appreciated.

opensourcegeek
  • 5,552
  • 7
  • 43
  • 64
  • For deeply querying semi-structured data, see Apache Solr. For applying rules (dynamically) to data, see Drools. – Jesvin Jose Feb 25 '12 at 07:20

2 Answers2

16

Please refer to the following GraphGist for a tutorial on how to do time-based graph storage using time scales.

http://gist.neo4j.org/?github-kbastani%2Fgists%2F%2Fmeta%2FTimeScaleEventMetaModel.adoc

Time Scale Graph

In the time scale graph that is modeled above, a shortest path traversal from a blue colored node to the transparent colored node constitutes a unique time identity in bits.

The identity traced by the red path is 0→1→0→1→0→0. The reverse path is 0→0→1→0→1→0 or simply 001010, a unique identity in bits.

MATCH p=shortestPath((n1:d)-[:child_of*]->(n2:y))
WHERE n1.key = 'd10'
RETURN DISTINCT reduce(s = '' , n IN nodes(p)| n.tempo + s) AS TimeIdentity
ORDER BY TimeIdentity

The Cypher query above models a shortest path traversal from blue colored node to transparent colored node. This is a bit string that represents a time identity that can be ordered by event depending on its position on the time scale event subgraph.

Please see the time scale event subgraph below:

Time Scale Event Subgraph

The image above represents a time scale connected to a series of events (met). Events, represented as triangular nodes in the image, are also connected to a hierarchy of features (John, Sally, Pam, Anne) which are then further generalized into classes (Person).

Now you can run a Cypher query like the one I listed earlier which will then order the events by time of occurrence as a bit string. Note: That you should apply a timestamp to the node that retrieves the actual time. Each blue node represents a time separated event but not necessarily the actual time, just a representation of events that happened in an order.

MATCH p=(p0:person)-[:event]->(ev)-[:event]->(p1:person)
WITH p, ev
MATCH time_identity = (d0:d)<-[:event]-(ev)
WITH d0, p
MATCH p1=(d0)-[:child_of*]->(y0:y)
RETURN extract(x IN nodes(p)| coalesce(x.name, x.future)) AS Interaction, reduce(s = '' , n IN nodes(p1)| n.tempo + s) AS TimeIdentity
ORDER BY TimeIdentity

The hierarchies in the time scale allow you to group events and to see representations at higher levels. So selecting all green nodes below an orange node selects 4 possible events (represented by blue nodes).

Let me know if you have any questions, and be sure to visit the GraphGist to see more details and actual live examples of the time scale event subgraph.

Kenny Bastani
  • 3,268
  • 15
  • 20
  • Thanks, but I'm struggling to understand how the time identity string is computed. Is there any in depth details on this approach please? – opensourcegeek Oct 29 '13 at 11:36
  • The shortest path algorithm takes care of it because you are going from bottom to top. If you were to go the other way you would run into ambiguity. Trace with your finger the shortest path from the blue node to the transparent node and you realize there is only one option per hop. Go the reverse way and you see two options per hop. The bit string itself is only an address to a time interval. Like a hash table. – Kenny Bastani Nov 01 '13 at 03:41
  • 1
    what do the light blue, green and yellow nodes represent on the time graph. Is that some sort of grouping/mutli-level representation? Do you have any examples with inserts? – MonkeyBonkey Feb 18 '14 at 16:43
  • Yes, the different colored nodes represent levels of depth within the binary tree. I am planning to update the GraphGist example to use looping to add depth to the tree. Also, please refer to https://gist.github.com/kbastani/8519557 for a multilevel calendar timeline in Cypher. – Kenny Bastani Mar 09 '14 at 09:19
  • Are the advantages of storing events in this way, as opposed to just storing some sort of timestamp value, available anywhere? I think I understand how the graph is used to order events, but why use this instead of just tagging the events with some sort of epoch? – Sam Storie Apr 02 '15 at 12:41
  • This kind of structure is good for doing frequency pattern analysis on events to determine a combination of events that frequently lead to a combination of other events. What's the correlation of X, Y, Z to A, B, C? But for general time stuff, use a time stamp. – Kenny Bastani Apr 04 '15 at 00:34
  • You said "unique time identity" but it seems the bit string isn't unique at all. For example, the first 3 bits in the path above is "010". But from the blue node (0), you can also go up (1), and up again (0), yielding the same bitstring "010". Maybe you meant "unique given the same starting node", but it's not even unique from the exact same starting node! – Hendy Irawan Jul 01 '15 at 13:55
  • 1
    I do get the general idea, and I thank @KennyBastani for this. I plan to use in my thesis http://lumen.hendyirawan.com/ to represent knowledge about order of events in the Neo4j graph. I'll be sure to put a proper credit to your gist. – Hendy Irawan Jul 01 '15 at 14:02
  • I think instead of labeling the nodes with bits, the bit should be assigned to relationships. So from one node, there's either 0, 1, or 2 outgoing relationships, and the relationship must uniquely (from that node) have 0 or 1 as its bit label. – Hendy Irawan Jul 01 '15 at 14:07
6

You could also look into indexing in the graph itself, see http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html for a timeline example. Otherwise, Lucene is packaged by default with Neo4j, works in much the same way as Solr.

Peter Neubauer
  • 6,311
  • 1
  • 21
  • 24