0

I am currently monitoring 5 different buildings, for each building there are around 300 rooms. Each room has 4 sensors, three which monitor temperature at different points of the room and one for the amount of power (killowatts) the room is consuming.

I am currently polling every sensor every 15 minutes which is producing 576,000 entries per day, the amount of buildings I am monitoring is soon going to increase.

I am currently storing all the information in MySQL, I have a MySQL table for each sensor type so the tables are named 'power', 'temp1', 'temp2', 'temp3'. The columns within these tables are 'id', 'building_id', 'epoch', 'value'.

I then use this data to produce graphs using the Chart.js library and statistics such as the amount of power used per building within a certain time period etc, I do all this using PHP.

I don't believe my MySQL database is going to be able to handle this without serious scale and clustering

I need to be able to view historic data for 5 years although some of the granularity can be lost after a certain period of time.

I have been informed that RRD might be able to solve my problem and have done some research on it but still have some questions.

Will it still allow me to create my own graphs using specifically the Chart.js library? If I can get time / value JSON data from the RRD this should be ok.

How many different RRD files will I need to create also? Would I need one per building? Per room? Per sensor? Is it still going to be easy to manage.

I have PHP scripts which run at 15 minute intervals that pull the data from the sensors using SNMP and then insert the data into MySQL if I can use the same scripts to also insert into the RRD that would also be great, from what I have seen you can use PHP to insert into RRD so that should be ok.

EDIT: I am currently reading http://michael.bouvy.net/blog/en/2013/04/28/graph-data-rrdtool-sensors-arduino/ which has started to answer some of my questions.

Jack Brown
  • 580
  • 3
  • 6
  • 20

2 Answers2

0

Whether you have one RRD file with 6000 metrics, or 5 files with 1200 metrics etc depends on how you are managing the data.

Firstly, you should not group together metrics for which the samples arrive at different points in time. So, if you sample one room at a time, you should probably have one RRD file per room (with 4 metrics in it). This will depend on what manages your sensors; if you have one device per room or per building. Retrieving the data and graphing it works whether you have one file or a thousand (though the 'thousand' scenario work much better in the latest version of RRDTool).

Secondly, are you likely to add new data points (IE, buildings or rooms)? You cannot (easily) add new metrics to an existing RRD file. So, if you expect to add a new building in the future, or to add or remove a room, then maybe one RRD per building or one per room would be better.

Without having any more information, I would guess that you'd be better off with one RRD per room (containing 4 metrics) and update them separately. Name the files according to the building and room IDs, and they can hold the power and 3 temperature values according to Epoch.

For graphing, RRDTool is of course capable of creating its own graphs by accessing the data directly. However, if you want to extract the data and put them into a graph yourself this is possible; the Xport function will allow you to pull out the necessary datapoints (possibly from multiple RRD files and with aggregation) which you can then pass to the graphing library of your choice. There is also the Fetch function if you want the raw data.

If your data samples are coming at 15min intervals, make sure you set your RRD Interval, Heartbeat and RRAs up correctly. In particular, the RRAs will specify what aggregation is performed and for how long data are kept at higher granularities. The RRAs should in general correspond to the resolutions you expect to graph the data at (which is why people generally use 5min/30min/2h/1d as these correspond nicely to daily, weekly, monthly and yearly graphs at a 400px width)

Steve Shipway
  • 3,754
  • 3
  • 22
  • 39
0

You might want to take a look at time-series databases and test a few systems that have built-in visualization, an API that allows you to perform aggregations and PHP wrappers. Time-series databases are optimized for efficient storage of timestamped data and have built-in functionality for time-series transformations.

https://en.wikipedia.org/wiki/Time_series_database

Sergei Rodionov
  • 4,079
  • 6
  • 27
  • 44