TL;DR: Is it absolutely necessary that the Spark running a spark-shell (driver) have the exactly same version of the Spark's master?
I am using Spark 1.5.0 to connect to Spark 1.5.0-cdh5.5.0 via spark-shell:
spark-shell --master spark://quickstart.cloudera:7077 --conf "spark.executor.memory=256m"
It connects, instantiates the SparkContext and sqlContext fine. If I run:
sqlContext.sql("show tables").show()
it shows all my tables as expected.
However, if I try to access data from a table:
sqlContext.sql("select * from t1").show()
I get this error:
java.io.InvalidClassException: org.apache.spark.sql.catalyst.expressions.AttributeReference; local class incompatible: stream classdesc serialVersionUID = 370695178000872136, local class serialVersionUID = -8877631944444173448
It says that the serialVersionUIDs don't match. My hypothesis is that the problem is caused by trying to connect two different versions of spark. Any ideas if I'm right?