9

I'm trying to update a running EMR cluster with pip install on all the slave machines. How can I do that?

I can't do it with a bootstrap step because it is a long running EMR and I can't take it down.

The EMR cluster is running Spark & Yarn, so I would normally use spark slaves.sh, but I can't find that script on the master node. Is it installed in a place I haven't found? Or is there some way to install it?

I've seen other questions that say use yarn distributed-shell, but I can't find a working example of how to do that.

BTW, the cluster is using EMR 4.8.0, Spark 1.6.1, I believe.

  • Try tools like ansible/Saltstack to achieve your goals. Or try this linux script - https://hvivani.com.ar/2015/06/19/yarn-execute-a-script-on-all-the-nodes-of-the-cluster/. – annunarcist Dec 19 '16 at 08:40

1 Answers1

13

You can run yarn command from your nodes to get the list of all nodes and you might use SSH to run commands on all those nodes. Like in the article mentioned before, you can run something like

#Copy ssh key(like ssh_key.pem) of the cluster to master node.
aws s3 cp s3://bucket/ssh_key.pem ~/

# change permissions to read 
chmod 400 ssh_key.pem

# Run a PIP command
yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" | xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no -i ~/ssh_key.pem hadoop@{} "pip install package"
jc mannem
  • 2,293
  • 19
  • 23