0

So I've got this example code where I have a Dataset[Event] which I would like to group based on a key of generic type computed on the fly.

import org.apache.spark.sql.{ Dataset, KeyValueGroupedDataset }

case class Event(id: Int, name: String)
trait Key
case class NameKey(name: String) extends Key

abstract class EventProc[K <: Key] {
  def key(e: Event): K
  def group(ds: Dataset[Event]): KeyValueGroupedDataset[K, Event] = {
    import ds.sparkSession.implicits._
    ds.groupByKey { e => key(e) }
  }
}
class NameEventProc extends EventProc[NameKey] {
  def key(e: Event): NameKey = NameKey(e.name)
}

The idea is I should be able to do a new NameEventProc().group(ds) or similar on a different class which extends EventProc. But the code does not even compile and fails with the error.

<console>:26: error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
       ds.groupByKey { e => key(e) }
                     ^

From what I gathered, Spark is unable to understand the what type K is and is hence unable to use the appropriate encoder. But I am not sure how to fix this.

aa8y
  • 3,854
  • 4
  • 37
  • 62
  • This is the [custom encoder](http://stackoverflow.com/questions/37706420/how-to-create-a-custom-encoder-in-spark-2-x-datasets) problem that has plagued Datasets. – Alec Nov 15 '16 at 00:03
  • 2
    Oops. Meant to link to [this](http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset) – Alec Nov 15 '16 at 01:01

0 Answers0