1

I'd like to copy some files from emr-hdfs to s3 bucket using s3-dist-cp, I've tried this cmd from "EMR Master Node":

s3-dist-cp -Dmapred.job.name=my_copy_job --src hdfs:///user/hadoop/abc s3://my_bucket/my_key/

this command executes fine but when I check the job name in yarn resource manager UI, it displays like this: S3DistCp hdfs:///user/hadoop/abc **->** s3://my_bucket/my_key/

whereas, the expected job name should have been my_copy_job

Appreciate for any help,!

Note: when I run hadoop distcp with this option -Dmapred.job.name=my_copy_job, it displays job name correctly in yarn RM UI, but the job eventually fails

franklinsijo
  • 17,784
  • 4
  • 45
  • 63
TheCodeCache
  • 820
  • 1
  • 7
  • 27

1 Answers1

1

s3-dist-cp does not support -D style properties set during the runtime as hadoop distcp does. S3 Distcp accepts only a finite set of options as listed here. In addition to these options defined by S3DistCp, it accepts the Tool Interface's generic options.

But JobName is not one of them. JobName is hardcoded in the S3DistCp code and cannot be overriden.

franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • 1
    Thanks,! so, is this a limitation for s3-dist-cp, that it does not allow to customize the name of job? our requirement is to show appropriate job in Yarn Resource Managaer UI – TheCodeCache Apr 11 '20 at 16:18
  • 1
    Yes, jobname cannot be modified. See the updated answer. – franklinsijo Apr 11 '20 at 16:40
  • 1
    Thank you, however, I've done this by using "hadoop distcp" just to fulfill the client's requirement for now, though this way I am losing the optimization that s3-dist-cp offers over "hadoop-distcp" – TheCodeCache Apr 11 '20 at 16:46
  • How can the Tools interface be used to change some settings (like s3DistCp.copyfiles.reducer.tempDir)? Configuring some file? Using some magic parameter? (I have tested with -Ds3DistCp.copyfiles.reducer.tempDir=xxx but it complains about unregnized options) – okelet Feb 23 '23 at 10:20
  • Could you please provide the complete command that you tried – franklinsijo Feb 23 '23 at 17:49