I just started learning about IoT and data streaming. Apologies if this question seems too obvious or generic.
I am working on a school project, which involves streaming data from hundreds (maybe thousands) of Iot sensors, storing said data on a database, then retrieving that data for display on a web-based UI.
Things to note are:
- fault-tolerance and the ability to accept incomplete data entries
- the database has to have the ability to load and query data by stream
I've looked around on Google for some ideas on how to build an architecture that can support these requirements. Here's what I have in mind:
- Sensor data is collected by FluentD and converted into a stream
- Apache Spark manages a cluster of MongoDB servers
a. the MongoDB servers are connected to the same storage
b. Spark will handle fault-tolerance and load balancing between MongoDB servers - BigQuery will be used for handling queries from UI/web application.
My current idea of a IoT streaming architecture :
The question now is whether this architecture is feasible, or whether it would work at all. I'm open to any ideas and suggestions.
Thanks in advance!