I'm trying to apply an optional transformation (parseRow
) to a Dataset
with flatMap
, so that None
s are discarded and Some
s are unwrapped.
import org.apache.spark.sql.SparkSession
object Main
{
def parseRow(line: String): Option[Map[String, String]] = Some(Map())
def main(args: Array[String])
{
val ss = SparkSession.builder.getOrCreate()
ss.read.textFile("somewhere")
.flatMap(parseRow _)
.write.parquet("somewhere/else")
}
}
Unfortunately, I got the following error:
[error] Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
[error] .flatMap(parseRow _)
[error] ^
I tried the fix suggested in the error, i.e. I added import ss.implicits._
, but it didn't change anything.
I think I might be in a situation similar to the one described here (i.e. for some reason the implicit conversion from Option
to TraversableOnce
is not being applied), but I'm not good enough in Scala to find a solution for my case.
I tried to change that line to .flatMap(line => parseRow(line))
or .flatMap(parseRow)
, but it didn't work.