1

I wanted to know what effect the batchsize option has on an insert operation using spark jdbc. Does this mean a bulk insert using one insert command similar to a bulk insert or a batch of insert commands that gets committed at the end?

Could someone clarify as this is not clearly mentioned in the documentation?

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
justlikethat
  • 329
  • 2
  • 12

1 Answers1

2

According to the source code, the option batchsize is used for executeBatch method of PreparedStatement, which is able to submit a batch of commands to the database for execution.

The key code:

val stmt = conn.prepareStatement(insertStmt)
while (iterator.hasNext) {
  stmt.addBatch()
  rowCount += 1
  if (rowCount % batchSize == 0) {
      stmt.executeBatch()
      rowCount = 0
    }
}

if (rowCount > 0) {
     stmt.executeBatch()
}

Back to your question, it is true that there are

a batch of insert commands

But, the statement gets committed at the end is wrong, because it is fine for only part of those inserts to execute successfully. No extra transaction requirements here. BTW, Spark will adopt the default isolation level if it is not specified.

chenzhongpu
  • 6,193
  • 8
  • 41
  • 79
  • Do you mean that if we set batchsize of 500 then each batch gets committed on its own in its own transaction instead of all batches getting committed at once in one single database transaction – Nikunj Kakadiya Jun 08 '22 at 10:17