1

When trying to write to HDFS from Spark within Zeppelin, I am receiving this ClassNotFoundException for org.apache.hadoop.mapred.DirectFileOutputCommitter:

java.lang.RuntimeException: java.lang.RuntimeException:    java.lang.ClassNotFoundException: Class org.apache.hadoop.mapred.DirectFileOutputCommitter not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2106)
at org.apache.hadoop.mapred.JobConf.getOutputCommitter(JobConf.java:725)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:983)

Code that is trying to run:

val model = LinearRegressionWithSGD.train(someRDD, numIterations)
val modelPath = "hdfs:///some_path/LinearRegressionWithSGD"
model.save(sc, modelPath)

When searching for this class, I cannot even find it. The closest I can find is org.apache.hadoop.mapred.FileOutputCommitter in Hadoop.

I am using commit 18c8c9ea512a0d87699a73e2ca26192d03748661 (Oct 9) of Zeppelin, Spark 1.5.0 on YARN, and Hadoop 2.6.

Greg
  • 557
  • 4
  • 20

1 Answers1

0

I had the same problem. Looked for that file in "hadoop-mapreduce-client-core.X.X.X.jar", but couldn't find that in the jar.

I fixed the problem by adding org.apache.hadoop.mapred.DirectFileOutputCommitter to my repository. Source of that file is found here : https://gist.github.com/apivovarov

Not sure yet what's the root cause of this issue. Digging into it. Will update here once I have the answer.