0

I am trying to run a sample application locally using: Scala (2.11), Spark(2.3.0) with streamset api version 3.8.0.

(I am trying to run a spark transformation as described in this tutorial: https://github.com/streamsets/tutorials/blob/master/tutorial-spark-transformer-scala/readme.md )

First I create a JavaRDD[Record], something like:

val testrecord = spark.read.json("...path to json file").toJavaRDD.asInstanceOf[JavaRDD[Record]]

Then I pass this JavaRDD[Record] to the transform method in DTStream class:

new DTStream().transform(testrecord)

The Transform method in the DTStream class itself is very simple:

@override def transform(javaRDD: JavaRDD[Record]): TransformResult = {

val recordRDD = javaRDD.rdd

val resultMessage = recordRDD.map((record) => record) //Just trying to pass incoming record as outgoing record - no transformation at all.


new TransformResult (resultMessage.toJavaRDD, error) // where error is already defined as a JavaPairRDD.


}

When I try this simple code out, I am getting the following exception exactly at this line:

val resultMessage = recordRDD.map((record) => record)
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to com.streamsets.pipeline.api.Record.

Any pointers as to why I may be getting this and how to resolve? Thanks in advance.

Note: Record is datacollector-api/Record : https://github.com/streamsets/datacollector-api/blob/master/src/main/java/com/streamsets/pipeline/api/Record.java

blueberret
  • 21
  • 1
  • 5

1 Answers1

0

I don't think you can run the sample application in an IDE - you have to do so within StreamSets Data Collector itself as detailed in the tutorial.

metadaddy
  • 4,234
  • 1
  • 22
  • 46
  • We tried that as the first instance, dropped the Spark transformer jar in Streamset transformer pipeline, but we were getting the same exception. So tried recreating in IDE. Is there any reason at all transform method code could throw the java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to com.streamsets.pipeline.api.Record. – blueberret Sep 24 '19 at 08:34