I have a Spark streaming application that listens to a Kafka topic.
When getting the data I need to process it and send to Kudu.
Currently I am using org.apache.kudu.spark.kudu.KuduContext API
and call the insert action with the data frame.
In order to create the data frame from my data I need to call collect()
so I can create the data frame using sqlContext.
Is there a way to create the dataframe/insert the data into Kudu without calling collect()
which is of course costly?
We are using Spark 1.6