How to set encoder for Spark dataset when importing csv or txt file

Question

I'm having an issue with this part of the Spark Mllib code from the docs (https://spark.apache.org/docs/latest/ml-collaborative-filtering.html), using either csv or txt files:

val ratings = 
 spark.read.textFile("data/mllib/als/sample_movielens_ratings.txt")
  .map(parseRating)
  .toDF()

I get the following error:

Error:(31, 11) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

.map(parseRating)
      ^

I have also have the following at the start of my object:

val conf = new 
SparkConf().setMaster("local[*]").set("spark.executor.memory", "2g") 
val spark = 
SparkSession.builder.appName("Mlibreco").config(conf).getOrCreate()
import spark.implicits._

It seems that the read.textFile method needs an encoder. I have found a few articles on how to set the encoder. However, I don't know how to implement it when importing the csv or txt file. Given that nothing about encoders is mentioned in the docs, there is also very likely that I have missed something obvious.

score 0 · Answer 1 · answered Sep 01 '17 at 20:54

0

Try this

val sparkSession: SparkSession = ***
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

and see this link to find one of the predefined encoder. Here

answered Sep 01 '17 at 20:54

user10089632

5,216
1
26
34

How to set encoder for Spark dataset when importing csv or txt file

1 Answers1