i have a device that its only job is to read data from multiple sensors every second. Every hour ( or unless a pull request is made) it has to push the data collected to a database. Doing this for 30 days makes the data very large. I would like to compress the data first before sending it to the database (on a different machine) via the network since space and computational time of the database is a precious resource.
The data will look close to this
TimeStamp | Sensor1 | Sensor2 | Sensor3 | Sensor4 | . . . . . | Sensor64
00:00:01 | 1 | 0 | 0 | 0 | | 3
00:00:02 | 1 | 8 | 0 | 0 | | 3
00:00:03 | 1 | 8 | 0 | 0 | | 3
00:00:04 | 1 | 2 | 0 | 0 | | 3
00:00:05 | 0 | 8 | 0 | 0 | | 3
00:00:06 | 0 | 8 | 0 | 0 | | 3
00:00:07 | 0 | 0 | 0 | 0 | | 3
00:00:08 | 0 | 0 | 0 | 0 | | 3
00:00:09 | 0 | 0 | 0 | 0 | | 3
00:00:10 | 1 | 2 | 0 | 0 | | 3
There will be most definitely a time where in the data gets repetitive(T.S 7-9 and 2-3) and would like to know a way to compress that portion for the database to store and when a webpage/app pulls the data the webpage/app will then uncompress the data to be graphed out to the user.
The planned database to be used is monggoDB (but is open to use other databases)
What i thought up with is to delete the items that repeats and when front end sees that there are missing timestamps it is understood that the item before the missing time stamp has repeated.
TimeStamp | Sensor1 | Sensor2 | Sensor3 | Sensor4 | . . . . . | Sensor64
00:00:01 | 1 | 0 | 0 | 0 | | 3
00:00:02 | 1 | 8 | 0 | 0 | | 3
00:00:04 | 1 | 2 | 0 | 0 | | 3
00:00:05 | 0 | 8 | 0 | 0 | | 3
00:00:07 | 0 | 0 | 0 | 0 | | 3
00:00:10 | 1 | 2 | 0 | 0 | | 3
but this solution is not reliable and will be more effective if there are a lot succeeding repeating data.
Is there a way to somehow takes the raw data and compress it at the bit level similar to how zip/rar works and the frontend will also be able to uncompress it.
just raw calculation, each of those sensors spits out a 16bit integer
16 bit integer x 64 sensors x 2 628 000 seconds in a month = 2 691 072 000 bits / 2.69 GB
2.9GB on a single pull is crazy big