2

I am writing a spark streaming job that consumes data from Kafka & writes to RDBMS. I am currently stuck because I do not know which would be the most efficient way to store this streaming data into RDBMS.

On searching, I found a few methods -

  1. Using DataFrame
  2. Using JdbcRDD
  3. Creating connection & PreparedStatement inside foreachPartition() of rdd and using PreparedStatement.insertBatch()

I can not figure out which one would be the most efficient method of achieving my goal.

Same is the case with storing & retrieving data from HBase.

Can anyone help me with this ?

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
ronojoy ghosh
  • 121
  • 10

0 Answers0