I've used spark structured streaming conume kafka messages and save data to redis. By extending the ForeachWriter[org.apache.spark.sql.Row], I used a redis sink to save data. The code runs well but just a little more than 100 datas be saved to redis per second. Is there any better way to speed up the procedure? While code like below would connect and disconnect to redis server every mico batch, any way to just connect once and keep the connections to miniminze the cost of connection which I supposed is the main cause of time consuming? I tried broadcast jedis but neither jedis nor jedispool isserializable so it didn't work.
My sink code is below:
class StreamDataSink extends ForeachWriter[org.apache.spark.sql.Row]{
var jedis:Jedis = _
override def open(partitionId:Long,version:Long):Boolean={
if(null == jedis){
jedis = FPCRedisUtils.getPool.getResource
}
true
}
override def process(record: Row): Unit = {
if(0 == record(3)){
jedis.select(Constants.REDIS_DATABASE_INDEX)
if(jedis.exists("counter")){
jedis.incr("counter")
}else{
jedis.set("counter",1.toString)
}
}
}
override def close(errorOrNull: Throwable): Unit = {
if(null != jedis){
jedis.close()
jedis.disconnect()
}
}
Any suggestions will be appreciated.