I have a mrjob configuration that includes loading a large file from s3 into HDFS. I would like to include these commands in the configuration file, but it seems that all bootstrap commands execute on all of the nodes in the cluster. This is over-kill and might also create synchronization problems.
Is there some way to include startup commands for the master node only in the mrjob configuration or is the only solution to SSH into the head node after the cluster is up to perform these operations?
Yoav