Spark DataFrame not supporting Char datatype

Question

I am creating a Spark DataFrame from a text file. Say Employee file which contains String, Int, Char.

created a class:

case class Emp (
  Name: String, 
  eid: Int, 
  Age: Int, 
  Sex: Char, 
  Sal: Int, 
  City: String)

Created RDD1 using split, then created RDD2:

val textFileRDD2 = textFileRDD1.map(attributes => Emp(
  attributes(0), 
  attributes(1).toInt, 
  attributes(2).toInt, 
  attributes(3).charAt(0), 
  attributes(4).toInt, 
  attributes(5)))

And Final RDDS as:

finalRDD = textFileRDD2.toDF

when I create final RDD it throws the error:

java.lang.UnsupportedOperationException: No Encoder found for scala.Char"

can anyone help me out why and how to resolve it?

score 2 · Accepted Answer · answered Sep 06 '17 at 22:30

Spark SQL doesn't provide Encoders for Char and generic Encoders are not very useful.

You can either use a StringType:

attributes(3).slice(0, 1)

or ShortType (or BooleanType, ByteType if you accept only binary response):

attributes(3)(0) match {
   case 'F' => 1: Short
   ...
   case _ => 0: Short
}

Spark DataFrame not supporting Char datatype

1 Answers1