I have a use case where we are streaming events and for each event I have to do some lookups. The Lookups are in Redis and I am wondering what is the best way to create the connections. The spark streaming would run 40 executors and I have 5 such Streaming jobs all connecting to same Redis Cluster. So I am confused what approach should I be taking to create the Redis connection
Create a connection object on the driver and broadcast it to the executors ( Not sure if it really works as I have to make this object Serializable). Can I do this with broadcast variables?
Create a Redis connection for each partition, however I have the code written this way
val update = xyz.transform(rdd => { // on driver if (xyz.isNewDay) { ..... } rdd }) update.foreachRDD(rdd => { rdd.foreachPartition(partition => { partition.foreach(Key_trans => { // perform some lookups logic here } } })
So now if i create a connection inside each partition it would mean that for every RDD and for each partition in that RDD I would be creating a new connection.
Is there a way i can maintain one connection for each partition and cache that object so that I would not have to create connections again and again?
I can add more context/info if required.