I tried to run my Spark/Scala code 2.3.0 on a Cloud Dataproc cluster 1.4 where there's Spark 2.4.8 installed. I faced an error concerning the reading of avro files. Here's my code :
sparkSession.read.format("com.databricks.spark.avro").load(input)
This code failed as expected. Then I added this dependency to my pom.xml
file:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.11</artifactId>
<version>2.4.0</version>
</dependency>
Which made my code run successfully. And this is the part that I don't understand , I'm still using the module com.databricks.spark.avro
in my code. Why is adding org.apache.spark.avro
dependency solved my problem, knowing that I'm not really using it in my code?
I was expecting that I will need to change my code to something like this:
sparkSession.read.format("avro").load(input)