1

I have a Java program trying to load data to HDFS:

public class CopyFileToHDFS {
   public static void main(String[] args) {
   try{
         Configuration configuration = new Configuration();

         String msg = "message1";
         String file = "hdfs://localhost:8020/user/user1/input.txt";
         FileSystem hdfs = FileSystem.get(new URI(file), configuration);
         FSDataOutputStream outputStream = hdfs.create(new Path(file), true);
         outputStream.write(msg.getBytes());
      }
      catch(Exception e){
        System.out.println(e.getMessage());
     }
 }
}

When I run the program, it gives me an error:

    java.util.ServiceConfigurationError:    org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3.S3FileSystem not found

It looks like some configuration issues. Can anyone give me some suggestions?

Thanks

steve lin
  • 109
  • 1
  • 2
  • 9

1 Answers1

0

Something is specifying that org.apache.hadoop.fs.FileSystem includes S3. One possible cause is an old, stale META-INF file; see this Spark bug report.

If you're creating an uber-jar, it could be somewhere in there. If you can't find and eliminate the spec that's causing the problem, a work-around is to include AWS & Hadoop jars where the Spark driver/executors can find them; see this Stackoverflow question.

Community
  • 1
  • 1