Spark 2.1.0 UDF Schema type not supported

Question

I am using a data type called a Point(x: Double, y: Double). I am trying to using columns _c1 and _c2 as input to Point(), and then create a new column of Point values as follows

val toPoint = udf{(x: Double, y: Double) => Point(x,y)}

Then I call the function:

val point = data.withColumn("Point", toPoint(watned("c1"),wanted("c2")))

However, when I declare the udf I get the following error:

java.lang.UnsupportedOperationException: Schema for type com.vividsolutions.jts.geom.Point is not supported
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:729)
      at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:728)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
      at scala.collection.immutable.List.map(List.scala:285)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:728)
      at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
      at org.apache.spark.sql.functions$.udf(functions.scala:3084)
      ... 48 elided

I have properly imported this data type, and used it many times before. Now that I try to include it in the Schema of my udf it doesn't recognize it. What is the method to include types other than the standard Int, String, Array, etc...

I am using Spark 2.1.0 on Amazon EMR.

Here some related questions I've referenced:

How to define schema for custom type in Spark SQL?

Spark UDF error - Schema for type Any is not supported

@himanshulllTian Sorry that is the database with the columns c1, c2, c3, etc. — user306603, Apr 27 '17 at 23:46

Raphael Roth · Answer 1 · 2017-04-27T06:21:11.797

0

You should define Point as a case class

case class Point(x: Double, y: Double)

or if you wish

case class MyPoint(x:Double,y:Double) extends com.vividsolutions.jts.geom.Point(x,y)

This way the schema is inferred automatically by Spark

edited Apr 27 '17 at 06:21

answered Apr 27 '17 at 06:12

Raphael Roth

26,751
15
88
145

Doing this I get the following error: `case class myPoint has case ancestor geotrellis.vector.Point, but case-to-case inheritance is prohibited. To overcome this limitation, use extractors to pattern match on non-leaf nodes.` – user306603 May 22 '17 at 18:22

Spark 2.1.0 UDF Schema type not supported

1 Answers1