How design table partitioning for weather data of thousands of weather stations

Question

I have an MySQL database set up with weather data of multiple weather stations. All stations send their data every 5 minutes. A station has multiple sensors and their values are stored in a separate rows per sensor.

For instance: when a station sends the temperature, humidity, rain and windspeed, there are 4 records stored in the 'values' table in de database. And this every 5 minutes. Our database is grown to over >1 billion records. We don't have problems with performance, because we have good indexes and optimized queries. However, sometimes we have to perform manual queries to extract some specific data with a WHERE clause on the datetime field. And such a query takes much more time. And a few months ago we had to restore a backup, and that took half a day because of the big values table. So I am thinking of a solution to have a smaller 'values' table.

In our applications we need only access to the data of the last 3 months. And users can make graphs or download data based on years (2018, 2019, 2020, 2021, 2022 etc.).

I am thinking of a redesign of the database. The first idea was to create an archive cron script, that automatically archives data in year archives. Let's say different tables like 'values2018', 'values2019' etc...

In that way I have full control in the cron to archive data and keep only 3 months of data in the 'values' table. But I see a disadvantage in this approach: The archive cron will have to loop trough data to move (INSERT) data into another table and to remove the data in de 'values' table. I think this could be very time consuming and in the meantime the whole database should be accessible for inserting new data from the stations.

The second idea is to use MySQL partitioning and create RANGE partitions based on year. But I have no experience with partitioning, so this raises some questions:

How fast will an ALTER TABLE to create the partitions be done with over 1 billion records? Does this take long and or cost performance while being executed?

Will MySQL automatically optimize my existing queries to SELECT data from the 'values' table to use the correct partitions only? Or do I need to specify which partition to SELECT from? And should I modify the INSERT queries to insert new data?

Which of the above described solutions is preferred (manual archive or partitioning) and why?

Please provide `SHOW CREATE TABLE` and the slow queries. Do you expect to purge old data? Partitioning provides no performance improvement except in rare cases; I need more info to see if any case applies. — Rick James, Jan 19 '22 at 19:37
I've been working on a blog about sensor storage. Here's a pre-release of it: http://mysql.rjweb.org/doc.php/mysql_sensor It addresses most of your questions. — Rick James, Jan 19 '22 at 20:40
Thank you for the pre-release document. Interesting to read. We want to keep the data at least 5 years and then we might purge this data. But we did not decide that. So setting up partitioning could make sense. I am planning a summarize table too. For instance to calculate a day average temperature. So I could end up with a mix of both solutions. — Luc, Jan 20 '22 at 07:41
Good. If you pick 5 years, then I recommend 62 partitions. Be aware that switching to partitioning requires downtime proportional to the table size. And indexes need re-thinking. Probably this continues to hold: `PRIMARY KEY(sensor_id, datetime)`. `PARTITION BY RANGE(TO_DAYS(datetime))`. See [_Partition_](http://mysql.rjweb.org/doc.php/partitionmaint) — Rick James, Jan 20 '22 at 17:30

How design table partitioning for weather data of thousands of weather stations

0 Answers0