I'm trying to use the magic output committer, But whatever I do I get the default output committer.
INFO FileOutputCommitter: File Output Committer Algorithm version is 10
22/03/08 01:13:06 ERROR Application: Only 1 or 2 algorithm version is supported
This is how I know I'm using it according to Hadoop docs.
What am I doing wrong?
this is my relevant conf (Using SparkConf()
), I tried many others.
.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
.set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "10")
.set("spark.hadoop.fs.s3a.committer.magic.enabled", "true")
.set("spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a", "org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory")
.set("fs.s3a.committer.name", "magic")
.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")
.set("spark.sql.parquet.output.committer.class", "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter")
I do not have any other configuration relevant to that. Not in code or conf files (Hadoop or Spark), maybe I should? The pathes I'm writing to starts with s3://. Using Hadoop 3.2.1, Spark 3.0.0 and EMR 6.1.1