Questions tagged [apache-spark-encoders]

54 questions
6
votes
1 answer

How to join two spark dataset to one with java objects?

I have a little problem joining two datasets in spark, I have this: SparkConf conf = new SparkConf() .setAppName("MyFunnyApp") .setMaster("local[*]"); SparkSession spark = SparkSession .builder() .config(conf) …
5
votes
1 answer

How to implement Functor[Dataset]

I am struggling on how to create an instance of Functor[Dataset]... the problem is that when you map from A to B the Encoder[B] must be in the implicit scope but I am not sure how to do it. implicit val datasetFunctor: Functor[Dataset] = new…
5
votes
1 answer

spark implicit encoder not found in scope

I have a problem with spark already outlined in spark custom kryo encoder not providing schema for UDF but created a minimal sample now: https://gist.github.com/geoHeil/dc9cfb8eca5c06fca01fc9fc03431b2f class SomeOtherClass(foo: Int) case class…
4
votes
1 answer

Spark Encoders: when to use beans()

I came across a memory management problem while using Spark's caching mechanism. I am currently utilizing Encoders with Kryo and was wondering if switching to beans would help me reduce the size of my cached dataset. Basically, what are the pros and…
4
votes
2 answers

How to create an Encoder for Scala collection (to implement custom Aggregator)?

Spark 2.3.0 with Scala 2.11. I'm implementing a custom Aggregator according to the docs here. The aggregator requires 3 types for input, buffer, and output. My aggregator has to act upon all previous rows in the window so I declared it like…
4
votes
2 answers

How to create a Dataset of Maps?

I'm using Spark 2.2 and am running into troubles when attempting to call spark.createDataset on a Seq of Map. Code and output from my Spark Shell session follow: // createDataSet on Seq[T] where T = Int works scala> spark.createDataset(Seq(1, 2,…
4
votes
1 answer

Generic T as Spark Dataset[T] constructor

In the following snippet, the tryParquet function tries to load a Dataset from a Parquet file if it exists. If not, it computes, persists and returns back the Dataset plan which was provided: import scala.util.{Try, Success, Failure} import…
Jivan
  • 21,522
  • 15
  • 80
  • 131
4
votes
1 answer

Spark: java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate

I'm writing a Spark application using version 2.1.1. The following code got the error when calling a method with LocalDate parameter? Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate -…
ca9163d9
  • 27,283
  • 64
  • 210
  • 413
4
votes
1 answer

Spark Error: Unable to find encoder for type stored in a Dataset

I am using Spark on a Zeppelin notebook, and groupByKey() does not seem to be working. This code: df.groupByKey(row => row.getLong(0)) .mapGroups((key, iterable) => println(key)) Gives me this error (presumably a compilation error, since it…
3
votes
0 answers

DataType (UDT) v.s. Encoder in Spark SQL

In Spark SQL, there're limited DataTypes for Schema, and there're limited Encoders for converting JVM objects to and from the internal Spark SQL representation. In practice, we may have errors like this regarding DataType, which usually happens in…
3
votes
1 answer

Impossible to operate on custom type after it is encoded? Spark Dataset

Say you have this (solution of encoding custom type is brought from this thread): // assume we handle custom type class MyObj(val i: Int, val j: String) implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[MyObj] val ds =…
jack
  • 1,787
  • 14
  • 30
3
votes
1 answer

Question regarding kryo and java encoders in datasets

I am using Spark 2.4 and referring to https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence Bean class: public class EmployeeBean implements Serializable { private Long id; private String name; private Long…
Dev
  • 13,492
  • 19
  • 81
  • 174
3
votes
2 answers

How to make an Encoder for scala Iterable, spark dataset

I'm trying to create a Dataset from a RDD y Pattern: y: RDD[(MyObj1, scala.Iterable[MyObj2])] So I created explicitly encoder : implicit def tuple2[A1, A2]( implicit e1: Encoder[A1], e2:…
3
votes
0 answers

spark custom kryo encoder not providing schema for UDF

When following along with How to store custom objects in Dataset? and trying to register my own kryo encoder for a data frame I face an issue of Schema for type com.esri.core.geometry.Envelope is not supported There is a function which will parse a…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
2
votes
1 answer

Is there an Encoder for Map type in Java Spark?

I am trying to create a custom Aggregator function producing a Map as the result, however it requires an Encoders. As referenced in https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Encoders.html, there isn't one for now. Does anyone…