for research purpose I'm studying an architecture to do real-time (and also offline) data analytics and semantic annotation. I've attached a basic schema: I have some sensors linked to a raspberry pi 3. I suppose can handle this link with a mqqt broker like mosquitto. However, I want to collect data on raspberry, do something, and forward them to a cluster of commodity hardware to perform real time reasoning with Spark or Storm (any hint about which?). Then these data have to be stored in a NoSql db (Cassandra or HBase probably) accessible to an Hadoop cluster to execute batch reasoning, semantic data enrichment on them and re-store on same db. Therefore clients can query system to extract useful informations.
Which technology should I use in the red block? My idea is for MQQT but Kafka maybe could fit better my purposes?