2

We use spark 2.4.0 to connect to Hadoop 2.7 cluster and query from Hive Metastore version 2.3. But the Cluster managing team has decided to upgrade to Hadoop 3.x and Hive 3.x. We could not migrate to spark 3 yet, which is compatible with Hadoop 3 and Hive 3, as we could not test if anything breaks.

Is there any possible way to stick to spark 2.4.x version and still be able to use Hadoop 3 and Hive 3?

I got to know backporting is one option, It would be great if you could point me in that direction.

Saiteja Parsi
  • 412
  • 1
  • 5
  • 15

1 Answers1

3

You can compile Spark 2.4 with Hadoop 3.1 profile instead of relying on default version. You need to use hadoop-3.1 profile as described in documentation on building Spark, something like:

./build/mvn -Pyarn -Phadoop-3.1 -DskipTests clean package
Alex Ott
  • 80,552
  • 8
  • 87
  • 132