1

I'm looking to do some analysis on a large set of customer transaction data. We have millions of transaction events come in with some quantity and timestamp value for various entities;

{
  "txId": "tx123"
  "item": "i87"
  "qty": 3
  "time": "2016-10-31 01:42:54"
}

I want to keep a series of aggregations rolled up at the day, week level and sliding windows, e.g. sum qty of item i87 transacted in the last 28 weeks.

There seem to be a number of options for this including a time series databases like InfluxDB or OpenTSDB. However most of these seem to be targeted towards metrics and monitoring which have short aggregation periods. How would such system cope with doing both small & large rolling windows?

tk421
  • 5,775
  • 6
  • 23
  • 34
NightWolf
  • 7,694
  • 9
  • 74
  • 121
  • Pre-computed aggregations, while useful as a caching strategy, have a drawback in that the rollup data becomes incorrect/dirty in case a detailed sample arrives with a time lag. Think of pre-computed aggregations as materialized views with all the intricacies involved. Why not run a query against the detailed data. e.g. SELECT SUM(qty) WHERE tags.item = '187'? – Sergei Rodionov Oct 31 '16 at 20:40

1 Answers1

0

You can't send JSON to InfluxDB. It uses the line protocol. Unless you somehow think of converting the JSON to this line protocol, it seems to me that you are more looking for an ELK stack of some sort. Elasticsearch is open source and free, but if you want authentication (like SHIELD) you have to pay. (Or you could use Nginx reverse proxy) You can send JSON to ELK and it can automatically parse the fields after which you can do analytics on them.

willemdh
  • 796
  • 2
  • 13
  • 34