I am currently working on a azure date bricks notebook that read files from a storage container into a data frame and then writes all the records to a table in MySQL. The file can have anywhere from 2 million to 10 million rows. I have the following write code in my notebook after using read to populate my data frame.
newFile.repartition(16).write
.format("jdbc").option("driver", "com.mysql.cj.jdbc.Driver")
.mode("append")
.option("url", s"${jdbc_url}")
.option("dbtable", pathToDBTable)
.option("user", s"user")
.option("password", s"pass")
.save()
I have played around with the partitions and decided to go with 16 because my cluster will have 16 cores. Other than this is there a faster way to insert all this data into my DB using write? Or any other suggested approaches to try within azure data bricks notebooks? It currently takes 10-15 min for 2 million row files.