SparkStreaming
context reading a stream from RabbitMQ
with an interval of 30 seconds. I want to modify the values of few columns of corresponding rows existing in cassandra and then want to store data back to Cassandra
. For that i need to check whether the row for the particular primary key exist in Cassandra or not if, yes, fetch it and do the necessary operation. But the problem is, i create the StreamingContext
on the driver and actions get performed on Worker. So, they are not able to get the StreamingContext
object reason being it wasn't serialized and sent to workers and i get this error :
java.io.NotSerializableException: org.apache.spark.streaming.StreamingContext
. I also know that we cannot access the StreamingContext
inside foreachRDD. But, How do i achieve the same functionality here without getting serialization error?
I have looked at fews examples here but it didn't help.
Here is the snippet of the code :
val ssc = new StreamingContext(sparkConf,30)
val receiverStream = RabbitMQUtils.createStream(ssc, rabbitParams)
receiverStream.start()
val lines = receiverStream.map(EventData.fromString(_))
lines.foreachRDD{ x => if (x.toLocalIterator.nonEmpty) {
x.foreachPartition { it => for (tuple <- it) {
val cookieid = tuple.cookieid
val sessionid = tuple.sessionid
val logdate = tuple.logdate
val EventRows = ssc.cassandraTable("SparkTest", CassandraTable).select("*")
.where("cookieid = '" + cookieid + "' and logdate = '" + logdate+ "' and sessionid = '" + sessionid + "')
Somelogic Whether row exist or not for Cookieid
} } }