Is there a version compatibility issue between Spark/Hadoop/Scala/Java/Python?

Question

I'm getting an error while running spark-shell command through cmd but unfortunately without any luck so far. I have Python/Java/Spark/Hadoop(winutils.exe)/Scala installed with versions as below:

Python: 3.7.3
Java: 1.8.0_311
Spark: 3.2.0
Hadoop(winutils.exe):2.5x
scala sbt: sbt-1.5.5.msi

I followed below steps and ran spark-shell (C:\Program Files\spark-3.2.0-bin-hadoop3.2\bin>) through cmd:

Create JAVA_HOME variable: C:\Program Files\Java\jdk1.8.0_311\bin
Add the following part to your path: %JAVA_HOME%\bin
Create SPARK_HOME variable: C:\spark-3.2.0-bin-hadoop3.2\bin
Add the following part to your path: %SPARK_HOME%\bin
The most important part Hadoop path should include bin file before winutils.exe as the following: C:\Hadoop\bin Sure you will locate winutils.exe inside this path.
Create HADOOP_HOME Variable: C:\Hadoop
Add the following part to your path: %HADOOP_HOME%\bin

Am I missing out on anything? I've posted my question with error details in another thread (spark-shell command throwing this error: SparkContext: Error initializing SparkContext)

As you can see by the name `spark-3.2.0-bin-hadoop3.2`, your spark version is compatible with hadoop 3.2 version. Maybe you can upgrade your hadoop version to Hadoop 3.2.2. In our production environment, we use java 1.8.0_282, hadoop 3.2.2, spark: spark-3.1.1-bin-hadoop3.2, scala scala-2.12.13 and sbt: sbt-1.5.0 — XYZ, May 06 '22 at 10:47

score 1 · Answer 1 · answered Nov 13 '21 at 11:34

You went the difficult way in installing everything by hand. You may need Scala too, be extremely vigilant with the version you are installing, from your example it seems like it’s Scala 2.12.

But you are right: Spark is extremely demanding in term of version matching. Java 8 is good. Java 11 is ok too, but not any more recent version.

Alternatively, you can:

Try a very simple app like in https://github.com/jgperrin/net.jgp.books.spark.ch01
Use Docker with a pre made image, and if your goal is to do Python, I would recommend an image with Jupiter and Spark preconfigured together.

Is there a version compatibility issue between Spark/Hadoop/Scala/Java/Python?

1 Answers1