I am trying to design a system that has time-series data. Here are the requirements of the system:
- Data comes into the system anytime of the day from multiple cities and departments.
- These individual records should be available for life-time. (optional)
- Most of the queries are aggregations. However, they happen on multiple columns. Example queries:
- Sum of all measurements on a particular day in a particular city
- Sum of all measurements in a (last) month in a department
- Sum of all measurements in a (last) week in a city for a department
- Sum of all measurements on a particular day in a state (State will also be part of the record)
- All queries should be computed with a very low latency (< 300-500 ms)
Here are the constraints I have:
- I went through multiple time-series databases. From my understanding all of them require expensive hardware. I would like to know if there is a way we can run it on commodity hardware.
- I am not sure how much data will be available at the beginning. Ideally it would be around 100 MB at peak(per day). So, I am not willing to spend hundreds of dollars at the beginning.
- Another idea I have been thinking about is to have an Amazon RDS instance and have multiple indexes on various columns and aggregate them as needed. I am not sure if this is a good idea though.