I'm using the great databricks connector to read/write avro files. I have the following code
df.write.mode(SaveMode.Overwrite).avro(someDirectory)
Problem is that when I try to read this directory using sqlContext.read.avro(someDirectory)
it fails with
java.io.IOException: Not an Avro data file
due to the existence of the _SUCCESS file in that directory.
setting sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
solves the issue but I rather avoid doing it.
This sounds like a quite generic problem so I may be doing something wrong?