I have a distributed system running on AWS EC2 instances. My cluster has around 2000 nodes. I want to introduce a stream processing model which can process metadata being periodically published by each node (cpu usage, memory usage, IO and etc..). My system only cares about the latest data. It is also OK with missing a couple of data points when the processing model is down. Thus, I picked hazelcast-jet which is an in-memory processing model with great performance. Here I have a couple of questions regarding the model:
- What is the best way to deploy hazelcast-jet to multiple ec2 instances?
- How to ingest data from thousands of sources? The sources push data instead of being pulled.
- How to config client so that it knows where to submit the tasks?
It would be super useful if there is a comprehensive example where I can learn from.