I work for a company which receives data from smart meters. This data can be as much as 2 days old for a live stream and may get post populated in the case errors are made (gaps etc.). Currently we store this typically for 5 years. The data is then pulled into an SSAS Cube and aggregated into 1 minute, 5m, 30m, 1h, 1 day, 1 week, 1month aggregations. For each of these aggregations the Min, Max, Avg is also stored. Building this cube is slow and is not currently scalable since it mines its data from a singular source.
I think that an RRD style database per data point would be a better fit driven by the data push. However I have several questions about RRD (examples would be most welcome)
- Can RRD retain data granularity whilst also performing roll up over time?
- Can data be fed into RRD to correct gaps?
Examples would be welcome.