I'm designing the new generation of an analysis system which needs to process many events from many sensors in near-real time. And to do that I want to use one of the Big Data Analytics platforms such as Hadoop
, Spark Streaming
or Flink
.
In order to analyze each event I need to use some meta-data from a table (DB) or at-least load it into a cached map.
The problem is that each mapper is going to be parallelized on several nodes.
So I have two things to handle:
- First, how to load/pass a HashMap to a mapper?
- Is there any way to keep the HashMap Consistent between the mappers?