I am using Spark Mllib to generate predictions for my data and then store them to HDFS in Avro format:
val dataPredictions = myModel.transform(myData)
val output = dataPredictions.select("is", "probability", "prediction")
output.write.format("com.databricks.spark.avro").save(path)
I am getting the following Exception:
com.databricks.spark.avro.SchemaConverters$IncompatibleSchemaException:
Unexpected type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7.
My understanding is that the 'prediction' column format cannot be serialized as Avro.
- How do I convert a VectorUDT into an Array so that I can serialize it in Avro?
- Are there any better alternatives (I can't move away from Avro format)?