Is it possible to run Spark (2.3) jobs on hadoop3 clusters specifically HDP 3.1 and CDH6 (beta)

Question

Also, CDH 6 is in beta stage and do they support spark 2.3 without any bells and whistles? is it possible to run the same old spark 2.x versions (2.3 specifically) on hadoop 3 enabled CDH or Hadoop clusters?

I'm interested in knowing the backwards compatibility changes with yarn , hdfs and mapreduce API's.

Is anyone using this in production?

mazaneicha · Answer 1 · 2018-10-03T11:54:03.463

1

CDH 6.0 GA was announced a couple of weeks ago. In addition to Hadoop 3, it also packages Spark 2.2 as the default Spark version: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_new_features.html#spark_new_features. However, it is possible to upgrade CDS to a higher (2.3.x) version separately.
CDH 6 seems to be unaffected by HMS incompatibility in Spark according to https://www.cloudera.com/documentation/spark2/latest/topics/spark2_troubleshooting.html#spark_troubleshooting__hive_compatibility.

edited Oct 03 '18 at 11:54

answered Sep 30 '18 at 19:14

mazaneicha

8,794
4
33
52

Ah yes CDH 6.0 is in GA. – Suraj N Oct 01 '18 at 10:48
But still i would like to know if Spark 2.3.2 is compatible with Hadoop3 on any of these vendors? It still looks like the version of hive that spark uses is not supported on hadoop3.x as per https://issues.apache.org/jira/browse/SPARK-18673 – Suraj N Oct 01 '18 at 10:59
After re-examining Cloudera docs at https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_60_packaging.html#cdh_packaging_600, I have to correct my answer. – mazaneicha Oct 03 '18 at 11:45

Is it possible to run Spark (2.3) jobs on hadoop3 clusters specifically HDP 3.1 and CDH6 (beta)

1 Answers1