I have some time-series data (roughly on the order of 1-5 points per day) I need to be able to quickly access in a webapp using ArangoDB. The data is associated with a particular profile, but one collection is used for all the data for all profiles. Between the profile node and the data node, there is a report node and an event node. The report is simply a group of data points from a given event. The existing graph structure looks like this:
profile =====> event1 ========> reportA =======> data1
\ \ \=======> data2
\ \
\ \========> reportB =======> data3
\ \=======> data4
\
\==> event2 ========> reportA =======> data1
\ \=======> data2
\
\========> reportB =======> data3
\=======> data4
The chart I would like would effectively present data1
sequentially, by associated event, sorted by an attribute of the event. An analogous tabular structure of the result set I would like looks like this:
event dataAttr value
-------------------------------
event1 data1 42
event2 data1 6
event3 data1 7
event4 data1 343
I am likely to run this query for every dataAttr
in a given report, to effectively create a time-series result set for each dataAttr
on a particular profile for the last 10-20 events.
When investigating this problem in Neo4J, they recommended directly connecting sequential events to each other. I'm wondering if this is also a better approach in ArangoDB.
This would mean creating an additional graph that looks something like this:
data1 (of event1) => data1 (of event2) => data1 (of event3) => data1 (of event4)
data2 (of event1) => data2 (of event2) => data2 (of event3) => data2 (of event4)
Etc.
Each dataAttr
is connected to its cousin in the previous event, thus after traversing to the most recent event in the first graph, the second graph would be used to traverse n-layers to past events (practically 10-20).
Is this probably the best way to structure the data for a query like this? Performance will be critical as I potentially will be loading 20 charts on a page that each are fed by this query.
Would this query be faster simply querying on a document collection with indices rather than via graph traversal? The document collection structure could put a hash index on the dataAttr
and skiplist on the event (they will be sequentially ordered with string sorting).
I'm assuming that traversing down to data1
of event1
, back up to profile
, and back down event2
data1
and so on would be very inefficient.