0

I'm programatically provisioning an EMR cluster using the Java SDK, and am trying to pass arguments to the setup-impala script. The code I have looks like this:

...
    List<BootstrapActionConfig> bootstrapActions = new ArrayList<BootstrapActionConfig>();

    // --base-path, s3://elasticmapreduce, --impala-version, 1.2.1
    BootstrapActionConfig bsInstallImpala = new BootstrapActionConfig();
    bsInstallImpala.setName( "Install Impala" );
    ScriptBootstrapActionConfig scriptActionInstallImpala = new ScriptBootstrapActionConfig();
    scriptActionInstallImpala.setPath("s3://elasticmapreduce/libs/impala/setup-impala");

    List<String> impalaArgs = new ArrayList<String>();
    impalaArgs.add( "--base-path, s3://elasticmapreduce" );
    impalaArgs.add( "--impala-version, 1.2.1" );
    scriptActionInstallImpala.setArgs(impalaArgs);
    bsInstallImpala.setScriptBootstrapAction(scriptActionInstallImpala);
    bootstrapActions.add( bsInstallImpala );

... 
    RunJobFlowRequest request = new RunJobFlowRequest()
       .withName("OneButton Test")
       .withSteps(enabledebugging, installHive, installPig)
       .withLogUri("s3://somelogs/")
       .withAmiVersion("3.0.4")
       .withBootstrapActions(bootstrapActions)
       .withInstances(new JobFlowInstancesConfig()
           .withInstanceGroups(instanceGroups)
           .withEc2KeyName("redacted")
           .withHadoopVersion("2.2.0")
           .withKeepJobFlowAliveWhenNoSteps(true)
           .withTerminationProtected(true) );

But, when I send this request, the setup-impala script errors out as follows:

/usr/lib/ruby/1.8/optparse.rb:1450:in `complete': invalid option: --base-path, s3://elasticmapreduce (OptionParser::InvalidOption)
from /usr/lib/ruby/1.8/optparse.rb:1448:in `catch'
from /usr/lib/ruby/1.8/optparse.rb:1448:in `complete'
from /usr/lib/ruby/1.8/optparse.rb:1261:in `parse_in_order'
from /usr/lib/ruby/1.8/optparse.rb:1254:in `catch'
from /usr/lib/ruby/1.8/optparse.rb:1254:in `parse_in_order'
from /usr/lib/ruby/1.8/optparse.rb:1248:in `order!'
from /usr/lib/ruby/1.8/optparse.rb:1339:in `permute!'
from /usr/lib/ruby/1.8/optparse.rb:1360:in `parse!'
from /mnt/var/lib/bootstrap-actions/2/setup-impala:576:in `parse_arguments'
from /mnt/var/lib/bootstrap-actions/2/setup-impala:592:in `initialize'
from /mnt/var/lib/bootstrap-actions/2/setup-impala:902:in `new'
from /mnt/var/lib/bootstrap-actions/2/setup-impala:902

It looks like a problem with the syntax of the arguments for the bootstrap action, but I've tried every permutation that seems reasonable, and I always get this error (or a close approximation). But with the configuration listed above, when I view the cluster details in the web console, the arguments look identical to a cluster that I provisioned using the web console.

Any thoughts on what is going wrong here?

mindcrime
  • 657
  • 8
  • 23

1 Answers1

0

It turns out that the correct format is to use an = sign in the string for the argument. Like this:

impalaArgs.add( "--base-path=s3://elasticmapreduce" );
impalaArgs.add( "--impala-version=1.2.1" );

This is fairly counter-intuitive, since the argument string that you see in the web console also looks different now, whereas you would expect there to be one - and probably one one - "right way to do it". But apparently there's some subtle difference between creating the cluster using the web console and using the API.

mindcrime
  • 657
  • 8
  • 23