SeqFilesFromDirectory() error on amazon EMR

Question

I am trying to run a simple program on Amazon EMR which converts text files in a directory into sequence files. The program runs fine on my local machine but gives me following error on Amazon EMR. Could someone please tell me how to get rid of this error.

    Configuration conf=new Configuration();

    System.out.println("fs.default.name : - " + conf.get("fs.default.name"));
    Path input=new Path(URI.create(args[0]));
    Path output=new Path(URI.create(args[1]));

    ToolRunner.run(new SequenceFilesFromDirectory(),new String[]{
        "--input",input.toString(),
        "--output",output.toString(),
        "--overwrite",
        "--method","mapreduce"});

Thank you.

Exception in thread "main" java.lang.IllegalArgumentException: This file system object (hdfs://172.31.4.175:9000) does not support access to the request path .. You possibly called FileSystem.get(conf) when you should have called FileSystem.get(uri, conf) to obtain a file system supporting your path.

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:384) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:129) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:513) at org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140) at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.gifts.text.SeqFileDirectory.main(SeqFileDirectory.java:36) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:187)*

Update: As a workaround I tried to write a map reduce job myself to convert text files into sequence files (basically mirror what SequenceFilesFromDirectory was achieving). Since I have multiple small text files in my directory I decided to use CombineFileSplit instead of FileSplit to reduce the number of mappers. The MR job ran fine on my local machine but when I ran it to amazon EMR, I again got an error:Exception in thread "main" java.io.FileNotFoundException: File does not exist.The s3:// prefix was stripped off from the path which gave rise to the file not found exception. — user3376898, Jun 06 '14 at 22:49
So now I'm using a simple MR job with FileSplit which means I have tons of mappers for my job. Please let me know if anyone has a brighter idea. — user3376898, Jun 06 '14 at 22:52

SeqFilesFromDirectory() error on amazon EMR

0 Answers0