Using the example from https://github.com/sutugin/spark-streaming-jdbc-source I've attempted to connect to a Postgres database as a streaming source in AWS Databricks.
I have a cluster running: 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)
This library is installed on my cluster: org.apache.spark:spark-streaming_2.12:3.3.2
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder
.appName("StructuredJDBC")
.getOrCreate()
import spark.implicits._
val jdbcOptions = Map(
"user" -> "myusername",
"password" -> "mypassword",
"database" -> "testDB",
"driver" -> "org.postgresql.Driver",
"url" -> "jdbc:postgresql://dbhostname:5432:mem:myDb;DB_CLOSE_DELAY=-1;DATABASE_TO_UPPER=false"
)
// Create DataFrame representing the stream of input lines from jdbc
val stream = spark.readStream
.format("jdbc-streaming")
.options(jdbcOptions + ("dbtable" -> "dimensions_test_table") + ("offsetColumn" -> "loaded_timestamp"))
.load
// Start running the query that prints 'select result' to the console
val query = stream.writeStream
.outputMode("append")
.format("console")
.start()
query.awaitTermination()
But I'm plagued with the error: NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport Caused by: ClassNotFoundException: org.apache.spark.sql.sources.v2.StreamWriteSupport
The only info I can find on this error doesn't appear to apply to my situation. What am I missing?
I've looked for other libraries, but this appears to be the only one that supports jdbc as a source on Scala 2.12.