I need to set up an Hadoop cluster on Google Compute Engine. While it seems straightforward either using the web console Click&Deploy or via the command line tool bdutil
, my concern is that my jobs require additional dependencies present on the machines, for instance Xvfb
, Firefox
, and others-- though all installable via apt-get
.
It's not clear to me the best way to go. The options that come in my mind are:
1) I create a custom image with the additional stuff, and use it for deploying the hadoop cluster, either via or click&deploy. Would that work?
2) Use a standard image and bdutil
with a custom configuration files (editing an existing one) to perform all the sudo apt-get install xxx
. Is it a viable option?
Option 1) is basically what I had to do in the past to run Hadoop on AWS, and honestly it's a pain to maintain. I'll be more than happy with Option 2) bit I'm not sure butil
is allowed to do that.
Do you see any other way to set up the hadoop cluster? Hany help is appreciated!