5

While running s3distcp from S3 to HDFS:

 sudo -u hdfs hadoop jar /usr/lib/hadoop/lib/s3distcp.jar --src s3n://workAAAA-KKKK-logs/production-logs/Log-XXXX-click/Log-XXXXX-click-2013-03-27_06-21-19_i-7XXb2x39_00037.gz  --dest hdfs:///test/

I get the following exception.

Is there something wrong with my path syntax (s3n:// ; hdfs:///) ? Has anyone encountered this issue before?

13/04/04 12:10:52 INFO s3distcp.S3DistCp: Using output path 'hdfs:/tmp/96a8e57b-4c68-406c-b4ca-bf212de12d93/output'
13/04/04 12:10:53 INFO s3distcp.FileInfoListing: Opening new file: hdfs:/tmp/96a8e57b-4c68-406c-b4ca-bf212de12d93/files/1
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:91)
        at org.apache.hadoop.fs.Path.<init>(Path.java:99)
        at org.apache.hadoop.fs.Path.<init>(Path.java:58)
        at com.amazon.external.elasticmapreduce.s3distcp.FileInfoListing.getOutputFilePath(FileInfoListing.java:155)
        at com.amazon.external.elasticmapreduce.s3distcp.FileInfoListing.add(FileInfoListing.java:111)
        at com.amazon.external.elasticmapreduce.s3distcp.FileInfoListing.add(FileInfoListing.java:78)
        at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.createInputFileListS3(S3DistCp.java:122)
        at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.createInputFileList(S3DistCp.java:60)
        at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:529)
        at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
merours
  • 4,076
  • 7
  • 37
  • 69
bocse
  • 91
  • 1
  • 4
  • 4
    This might not be the issue here but the source path should be a directory path and not a file path. – Amar Apr 04 '13 at 12:31
  • 2
    Are you sure `hdfs:///` shouldn't be `hdfs://`? – Quetzalcoatl Apr 04 '13 at 12:54
  • @Amar You are completely right, now it works! – bocse Apr 04 '13 at 13:06
  • @Quetzalcoatl hdfs:/// is correct (three slashes). – bocse Apr 04 '13 at 13:07
  • @Amar is correct, I guess we can only use S3DistCp to copy all files in a S3 "directory" to a directory in HDFS. Am I correct? – Simon Guo May 16 '13 at 16:13
  • Ugh. So much time wasted from s***** documentation. Amazon's FAQ's show hdfs:/// http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html – Erik Kerber May 19 '13 at 15:13
  • Hi @Amar is there a possibility to use srcPattern for specifying filenames? see http://stackoverflow.com/questions/26273181/multiple-source-files-for-s3distcp – its me Oct 09 '14 at 08:04

2 Answers2

1

There is a way to request specific files if that what you need. You can use the --copyFromManifest option which allow you to supply s3distcp with a manifest file that holds all the files paths(even on different folders).

Eitan Illuz
  • 323
  • 2
  • 7
0

This problem also occurs when you are trying to write to a path that, even if it exists, does not have access privileges.

It is also the case when you try to write in a Redshift schema that does not exist.

Anxo P
  • 759
  • 7
  • 12