1

Similar to this question I'm trying to use fileStream but receiving a compile-time error about the type arguments. I'm trying to ingest XML data using org.apache.mahout.text.wikipedia.XmlInputFormat provided by mahout-examples as my InputFormat type.

val fileStream = ssc.fileStream[LongWritable, Text, XmlInputFormat](WATCHDIR)

The compilation errors are:

Error:(39, 26) type arguments [org.apache.hadoop.io.LongWritable,scala.xml.Text,org.apache.mahout.text.wikipedia.XmlInputFormat] conform to the bounds of none of the overloaded alternatives of
 value fileStream: [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String, filter: org.apache.hadoop.fs.Path => Boolean, newFilesOnly: Boolean, conf: org.apache.hadoop.conf.Configuration)(implicit evidence$12: scala.reflect.ClassTag[K], implicit evidence$13: scala.reflect.ClassTag[V], implicit evidence$14: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)] <and> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String, filter: org.apache.hadoop.fs.Path => Boolean, newFilesOnly: Boolean)(implicit evidence$9: scala.reflect.ClassTag[K], implicit evidence$10: scala.reflect.ClassTag[V], implicit evidence$11: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)] <and> [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String)(implicit evidence$6: scala.reflect.ClassTag[K], implicit evidence$7: scala.reflect.ClassTag[V], implicit evidence$8: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)]
    val fileStream = ssc.fileStream[LongWritable, Text, XmlInputFormat](WATCHDIR)
                         ^
Error:(39, 26) wrong number of type parameters for overloaded method value fileStream with alternatives:
  [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String, filter: org.apache.hadoop.fs.Path => Boolean, newFilesOnly: Boolean, conf: org.apache.hadoop.conf.Configuration)(implicit evidence$12: scala.reflect.ClassTag[K], implicit evidence$13: scala.reflect.ClassTag[V], implicit evidence$14: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)] <and>
  [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String, filter: org.apache.hadoop.fs.Path => Boolean, newFilesOnly: Boolean)(implicit evidence$9: scala.reflect.ClassTag[K], implicit evidence$10: scala.reflect.ClassTag[V], implicit evidence$11: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)] <and>
  [K, V, F <: org.apache.hadoop.mapreduce.InputFormat[K,V]](directory: String)(implicit evidence$6: scala.reflect.ClassTag[K], implicit evidence$7: scala.reflect.ClassTag[V], implicit evidence$8: scala.reflect.ClassTag[F])org.apache.spark.streaming.dstream.InputDStream[(K, V)]
    val fileStream = ssc.fileStream[LongWritable, Text, XmlInputFormat](WATCHDIR)
                     ^

I'm very new to Scala, so I'm not really familiar with type classes (I'm assuming that's what's happening here?). Any help would be appreciated.

Community
  • 1
  • 1
Chris Matta
  • 3,263
  • 3
  • 35
  • 48

1 Answers1

1

The exception lists that it is searching for scala.xml.Text, whereas you need to be using org.apache.hadoop.io.Text

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180