1

Does Amazon EMR allow passing a system property to a custom jar, e.g. hadoop jar -Dkey=value myjob.jar? (key, value above used during initialization of the application itself, rather then belong to hadoop's Configuration object.)

The related thread How to specify mapred configurations & java options with custom jar in CLI using Amazon's EMR? discusses ways to pass system properties to hadoop daemons only via Bootstrap Actions, which, apparently, won't allow do the same for the java entry point class.

Community
  • 1
  • 1
Ivan Balashov
  • 1,897
  • 1
  • 23
  • 33

1 Answers1

1

If you don't want to pass the to the mappers or reducers, you can do the following in the terminal or from a script -

export HADOOP_OPTS="-Dkey=value"
hadoop jar ...

You can also put those in $HADOOP_HOME/conf/hadoop-env.sh, if you want it for every job, without explicitly defining them every time you run a job.

Hope this makes sense.

SSaikia_JtheRocker
  • 5,053
  • 1
  • 22
  • 41
  • I'm rather new to EMR. When a job flow is created in EMR, one must provide a custom jar location + its arguments. At which point do we have terminal access to cluster? If so, this should be done before the job is started? Thanks, – Ivan Balashov Sep 04 '13 at 09:07
  • Can you check the Bootstrap actions and see if you can pass environment variables? I think, `export HADOOP_OPTS="-Dkey=value"` is an environment variable. If you want to take direct terminal access using ssh, [you can follow this guide](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-ssh.html). Thanks – SSaikia_JtheRocker Sep 04 '13 at 10:14
  • You might also [look at this](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html) – SSaikia_JtheRocker Sep 04 '13 at 10:17