I have the following MYSQL table:
measuredata:
- ID (bigint)
- timestamp
- entityid
- value (double)
The table contains >1 billion entries. I want to be able to visualize any time-window. The time window can be size of "one day" to "many years". There are measurement values round about every minute in DB.
So the number of entries for a time-window can be quite different. Say from few hundrets to several thousands or millions.
Those values are ment to be visualiuzed in a graphical chart-diagram on a webpage.
If the chart is - lets say - 800px wide, it does not make sense to get thousands of rows from database if time-window is quite big. I cannot show more than 800 values on this chart anyhow.
So, is there a way to reduce the resultset directly on DB-side? I know "average" and "sum" etc. as aggregate function. But how can I i.e. aggregate 100k rows from a big time-window to lets say 800 final rows?
Just getting those 100k rows and let the chart do the magic is not the preferred option. Transfer-size is one reason why this is not an option.
Isn't there something on DB side I can use? Something like avg() to shrink X rows to Y averaged rows? Or a simple magic to just skip every #th row to shrink X to Y?
update: Although I'm using MySQL right now, I'm not tied to this. If PostgreSQL f.i. provides a feature that could solve the issue, I'm willing to switch DB.
update2: I maybe found a possible solution: https://mike.depalatis.net/blog/postgres-time-series-database.html See section "Data aggregation".
The key is not to use a unixtimestamp but a date and "trunc" it, avergage the values and group by the trunc'ed date. Could work for me, but would require a rework of my table structure. Hmm... maybe there's more ... still researching ...
update3: Inspired by update 2, I came up with this query:
SELECT (`timestamp` - (`timestamp` % 86400)) as aggtimestamp, `entity`, `value` FROM `measuredata` WHERE `entity` = 38 AND timestamp > UNIX_TIMESTAMP('2019-01-25') group by aggtimestamp
Works, but my DB/index/structue seems not really optimized for this: Query for last year took ~75sec (slow test machine) but finally got only a one value per day. This can be combined with avg(value
), but this further increases query time... (~82sec). I will see if it's possible to further optimize this. But I now have an idea how "downsampling" data works, especially with aggregation in combination with "group by".