I am writing a spark streaming job that consumes data from Kafka & writes to RDBMS. I am currently stuck because I do not know which would be the most efficient way to store this streaming data into RDBMS.
On searching, I found a few methods -
- Using
DataFrame
- Using
JdbcRDD
- Creating connection &
PreparedStatement
insideforeachPartition()
of rdd and usingPreparedStatement.insertBatch()
I can not figure out which one would be the most efficient method of achieving my goal.
Same is the case with storing & retrieving data from HBase.
Can anyone help me with this ?