I am trying to use the container from https://github.com/cybermaggedon/accumulo-docker to create a 3 node deployment in the Google Kubernetes Engine. My main problem is how to make the nodes aware of each other. For example, the accumulo/conf/slaves
config file contains a list of all the nodes (either names or IPs, one per line), and needs to be replicated across all the nodes. Also, a single Accumulo node is designated as a master, and all slave nodes point to it by making it the only name/IP in the conf/masters file.
The documentation for the Accumulo docker container configures each container in this manner by providing environment variables, which are in turn used by the container startup script to rewrite the configuration files for that container, e.g.
docker run -d --ip=10.10.10.11 --net my_network \
-e ZOOKEEPERS=10.10.5.10,10.10.5.11,10.10.5.12 \
-e HDFS_VOLUMES=hdfs://hadoop01:9000/accumulo \
-e NAMENODE_URI=hdfs://hadoop01:9000/ \
-e MY_HOSTNAME=10.10.10.11 \
-e GC_HOSTS=10.10.10.10 \
-e MASTER_HOSTS=10.10.10.10 \
-e SLAVE_HOSTS=10.10.10.10,10.10.10.11,10.10.10.12 \
-e MONITOR_HOSTS=10.10.10.10 \
-e TRACER_HOSTS=10.10.10.10 \
--link hadoop01:hadoop01 \
--name acc02 cybermaggedon/accumulo:1.8.1h
This is a startup of one of the slave nodes, it includes itself in SLAVE_HOSTS
and points to the master in MASTER_HOSTS
.
If I implement my scaling as a stateful set under Kubernetes, how I can achieve a similar result? I can modify the container as needed, I have no problem creating my own version.