My application stores performance time series data in CrateDB, and in order to get the setup right, I have a few questions, since it's going to be around 72M records a day and it should scale properly :). My goal is to visualize the resulting data with Grafana and currently I have the following structure in mind:
CREATE TABLE metrics (
ts TIMESTAMP,
hostname STRING,
servicename STRING,
perfdata OBJECT(DYNAMIC)
)
// for example
{
"hostname": "localhost",
"servicename": "ping",
"timestamp": 1483699527,
"perfdata": {
"rta": {
"current": 0.5,
"unit": "ms",
"warn": 100,
"critical": 200
},
"pl": {
"current": 0,
"unit": "%",
"warn": 10,
"crit": 20
}
}
}
The important bits are the host-/servicename, the metric's name and values, and the timestamp. Which would also be the alternative schema:
CREATE TABLE metrics (
ts TIMESTAMP,
hostname STRING,
servicename STRING,
metric OBJECT(DYNAMIC) AS (
unit STRING,
name STRING,
value DOUBLE,
)
)
So which one would be the preferred way to store the data? Do I also need partitioning? My aggregations usually show the last 24h and rarely the last month...
Thanks!