not found: type MultipleTextOutputFormat in spark scala

Question

I have a requirement to output the partitions based on the key. I'm trying using MultipleTextOutputFormat.

I found this https://stackoverflow.com/a/26051042/6561443

but when I am trying to do the same in spark-shell I am getting the error.

scala> import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.io.NullWritable

scala> import org.apache.spark._
import org.apache.spark._

scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._

scala> import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat

scala> class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] {
        override def generateActualKey(key: Any, value: Any): Any =
          NullWritable.get()
             override def generateFileNameForKeyValue(key: Any, value: Any, name: String): String =
          key.asInstanceOf[String]
      }

<console>:11: error: not found: type MultipleTextOutputFormat
       class RDDMultipleTextOutputFormat extends MultipleTextOutputFormat[Any, Any] {
                                                 ^
<console>:13: error: not found: value NullWritable
           NullWritable.get()

If I submit this application with spark-submit ,I am getting

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

Am I missing something here? Doesn't it work in spark-shell?

try to paste the code using the `:pa` command and CTRL-D to execute — eliasah, Mar 01 '18 at 12:41

score 0 · Answer 1 · edited Dec 14 '18 at 09:31

0

I've encountered the same issue. Try to write

class RDDMultipleTextOutputFormat extends org.apache.hadoop.mapred.lib.MultipleTextOutputFormat[Any, Any] {

instead. Worked for me. Don't know why.

edited Dec 14 '18 at 09:31

Suraj Rao

29,388
11
94
103

answered Dec 14 '18 at 09:18

Ian Wang

221
2
4

not found: type MultipleTextOutputFormat in spark scala

1 Answers1