2

I'm trying to use s3distcp for an EMR job and got this exception:

Exception in thread "main" java.lang.RuntimeException: Argument --arg doesn't match.
        at emr.hbase.options.Options.parseArguments(Options.java:75)
        at emr.hbase.options.Options.parseArguments(Options.java:57)
        at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:151)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:187)

The controller shows that it was running this:

2013-11-03T00:54:52.277Z INFO Executing /usr/java/latest/bin/java -cp /home/hadoop/conf:/usr/java/latest/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-tools-1.0.3.jar:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/1 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/1/tmp -Djava.library.path=/home/hadoop/native/Linux-amd64-64 org.apache.hadoop.util.RunJar /mnt/var/lib/hadoop/steps/1/s3distcp.jar --arg --src --arg 's3://s3bucket/s3/' --arg --dest --arg hdfs:///tmp/mrjob/mrjob-jobid/step-output/1/ --arg --groupBy --arg 'd-0-([0-9]+-[0-9]+).log.gz'

which looks totally fine to me. Does anybody have any idea why it couldn't match --arg?

Thanks!

Thi Duong Nguyen
  • 1,745
  • 2
  • 12
  • 18

2 Answers2

1

I think it's likely that hdfs:///tmp/mrjob/mrjob-jobid/step-output/1/ must be enclosed withing single quotes.

I saw the syntax to be something like this:

--arg S3DistCp-OptionName1 --arg 'S3DistCp-OptionValue1'
SSaikia_JtheRocker
  • 5,053
  • 1
  • 22
  • 41
0

You want to enclose the EMR Step arguments in quotes as well:

 --arg "--src" --arg 's3://s3bucket/s3/' --arg "--dest" ...

I think either single or double quotes should work.

Dan Osipov
  • 1,429
  • 12
  • 15