0

I am writing the following code to load a file into Spark using newAPIHadoopFile API.

val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])

But I am getting the following error:

scala> val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
<console>:34: error: inferred type arguments [org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,org.apache.hadoop.mapred.TextInputFormat] do not conform to method newAPIHadoopFile's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]]
 val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.mapred.TextInputFormat](classOf[org.apache.hadoop.mapred.TextInputFormat])
required: Class[F]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                          ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.io.Text](classOf[org.apache.hadoop.io.Text])
required: Class[K]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                                                   ^
<console>:34: error: type mismatch;
found   : Class[org.apache.hadoop.io.Text](classOf[org.apache.hadoop.io.Text])
required: Class[V]
val lines = sc.newAPIHadoopFile("new_actress.list",classOf[TextInputFormat],classOf[Text],classOf[Text])
                                                                                                 ^

What am I doing wrong in the code?

sarthak
  • 774
  • 1
  • 11
  • 27

1 Answers1

2

TextInputFormat takes <LongWritable,Text>.

Note: be focused on extends part in both **InputFormat

@InterfaceAudience.Public
@InterfaceStability.Stable
public class TextInputFormat
extends FileInputFormat<LongWritable,Text>

that means you can not set both types for FileInputFormat as Text. If you want to use FileInputFormat you need to do something like:

You can try:

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
import org.apache.hadoop.io.Text
import org.apache.hadoop.io.LongWritable
val lines = sc.newAPIHadoopFile("test.csv", classOf[TextInputFormat], classOf[LongWritable], classOf[Text])

but in case you still want to use both types as Text you can use KeyValueTextInputFormat which is defined as:

@InterfaceAudience.Public @InterfaceStability.Stable public class
KeyValueTextInputFormat extends FileInputFormat<Text,Text>

You can try:

import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat
import org.apache.hadoop.io.Text
val lines = sc.newAPIHadoopFile("test.csv", classOf[KeyValueTextInputFormat], classOf[Text], classOf[Text])
VladoDemcak
  • 4,893
  • 4
  • 35
  • 42
  • Thanks...one more thing, is there any difference between `org.apache.hadoop.mapreduce.lib.input.TextInputFormat` and `org.apache.hadoop.mapred.TextInputFormat`? Which one should be chosen? – sarthak Oct 17 '16 at 20:24
  • Check http://stackoverflow.com/questions/16269922/hadoop-mapred-vs-hadoop-mapreduce – VladoDemcak Oct 18 '16 at 18:21