1

Do you know if the following line can handle jdbc pool connection:

df.write
  .mode("append")
  .jdbc(url, table, prop)

Do you have any idea? Thanks

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
a.moussa
  • 2,977
  • 7
  • 34
  • 56

1 Answers1

2

I don't think so.

spark.read.jdbc requests Spark SQL to create a JDBCRelation. Eventually buildScan is executed that in turn calls JDBCRDD.scanTable that leads to JdbcUtils.createConnectionFactory(options) for JDBCRDD.

With that, you see driver.connect(options.url, options.asConnectionProperties) and unless driver deals with connection pooling Spark SQL does not do it.

(just noticed that you asked another question)

df.write.jdbc is similar. It leads to JDBCRelation again that uses the same RDD.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
  • I've been looking for something that can handle connection pool in spark structured streaming. Any suggestions? – Hao Wang Aug 21 '18 at 02:25
  • @HaoWang What would be the use case? Can you describe it in another question? – Jacek Laskowski Aug 21 '18 at 12:18
  • @JacekLaskowski I have similar use case with Spark Structured Streaming where I am reading from Kafka, doing some processing and finally writing micro-batch to Oracle inside foreachBatch loop. Right now a new connection is created for each task and I am wondering how can I avoid that. – conetfun May 04 '20 at 05:25
  • @conetfun Use Scala `object` feature to manage a pool of connections. With that, the JVM of an executor will load the object and create the pool for every task that is scheduled for this executor. The number of executors is the number of connection pools then. – Jacek Laskowski May 04 '20 at 07:39
  • @JacekLaskowski Thank you. Do you have any reference article or code to refer. Sorry, but I am quite new to scala or programming in general. – conetfun May 04 '20 at 17:21