0

I wrote the following code which aims to transform a dataframe to a dataset using a case class

def toDs[T](df: DataFrame): Dataset[T] = {
    df.as[T]
  } 

then case class DATA( name:String, age:Double, location:String)

I am getting:

Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
[error]     df.as[T]

Any idea how to fix this

scalacode
  • 1,096
  • 1
  • 16
  • 38

1 Answers1

1

You can read the data into a Dataset[MyCaseClass] in the following two ways:

Say you have the following class: case class MyCaseClass

1) First way: Import sparksession implicits in the scope and use the as operator to convert your DataFrame to Dataset[MyCaseClass]:

case class MyCaseClass

val spark: SparkSession = SparkSession.builder.enableHiveSupport.getOrCreate()

import spark.implicits._

val ds: Dataset[MyCaseClass]= spark.read.format("FORMAT_HERE").load().as[MyCaseClass]

2) You can create you own encoder in another object and import them in your current code

package com.funky.package

import org.apache.spark.sql.{Encoder, Encoders}

case class MyCaseClass

object MyCustomEncoders{

 implicit val mycaseClass:Encoder[MyCaseClass] = Encoders.product[MyCaseClass]

}

In the file containing the main method, import the above implicit value

import com.funky.package.MyCustomEncoders
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Dataset

val spark: SparkSession = SparkSession.builder.enableHiveSupport.getOrCreate()   

val ds: Dataset[MyCaseClass]= spark.read.format("FORMAT_HERE").load().as[MyCaseClass]
Yayati Sule
  • 1,601
  • 13
  • 25