I am trying to run a sample application locally using: Scala (2.11), Spark(2.3.0) with streamset api version 3.8.0.
(I am trying to run a spark transformation as described in this tutorial: https://github.com/streamsets/tutorials/blob/master/tutorial-spark-transformer-scala/readme.md )
First I create a JavaRDD[Record], something like:
val testrecord = spark.read.json("...path to json file").toJavaRDD.asInstanceOf[JavaRDD[Record]]
Then I pass this JavaRDD[Record] to the transform method in DTStream class:
new DTStream().transform(testrecord)
The Transform method in the DTStream class itself is very simple:
@override def transform(javaRDD: JavaRDD[Record]): TransformResult = {
val recordRDD = javaRDD.rdd
val resultMessage = recordRDD.map((record) => record) //Just trying to pass incoming record as outgoing record - no transformation at all.
new TransformResult (resultMessage.toJavaRDD, error) // where error is already defined as a JavaPairRDD.
}
When I try this simple code out, I am getting the following exception exactly at this line:
val resultMessage = recordRDD.map((record) => record)
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to com.streamsets.pipeline.api.Record.
Any pointers as to why I may be getting this and how to resolve? Thanks in advance.
Note: Record is datacollector-api/Record : https://github.com/streamsets/datacollector-api/blob/master/src/main/java/com/streamsets/pipeline/api/Record.java