As shown in the digram,the pet-project that I am working on has two following components.
a) The "RestAPI layer" (set of micro-services)
b) "Scalable Parallelized Algorithm" component.
I am planing on running this on AWS.I realized that I can use ElasticBeanTalk to deploy my RestAPI module.(Spring Boot JAR with embedded tomcat)
I am thinking how to architect the "Scalable Parallelized Algorithm" component.Here are some design details about this:
- This consist of couple of Nodes which share the same data stored on S3.
- Each node perform the "algorithm" on a chunk of S3 data.One node works as master node and rest of the nodes send the partial result to this node.(embarrassingly parallel,master-slave paradigm).Master node get invoked by the RestAPI layer.
- A "Node" is a Spring Boot application which communicates with other nodes through HTTP.
- Number of "Nodes" is dynamic ,which means I should be able to manually add a new Node depend on the increasing data size of S3.
- There is a "Node Registry" on Redis which contains IPs of all the nodes.Each node register itself , and use the list of IPs in the registry to communicate with each other.
My questions:
1) Shall I use EC2 to deploy "Nodes" or can I use ElasticBeanStalk to deploy these nodes as well.I know with EC2 I can manage the number of nodes depend on the size of S3 data, but is it possible to do this with ElasticBeanStalk?
2) Can I use
Inet4Address.getLocalHost().getHostAddress()
to get the IP of the each Node ? Do EC2 instances have more than one IP ? This IP should be allow the RestAPI Layer to communicate with the "master" Node.
3) Whats the component I should use expose my RestAPI layer to the external world ? But I dont want to expose my "Nodes".
Update : I cant use MapReduce since the nodes have state. ie, During initialization , each Node read its chunk of data from S3 and create the "vector space" in memory.This a time consuming process , so thats why this should be stored in memory.Also this system need near-real-time response , cannot use a "batch" system like MR.