I have created a delta table and now I'm trying to insert data to that table using foreachBatch(). I've followed this example. The only difference is that I'm using Java and not in a notebook, but I suppose that should not make any difference?
My code looks as follows:
spark.sql("CREATE TABLE IF NOT EXISTS example_src_table(id int, load_date timestamp) USING DELTA LOCATION '/mnt/delta/events/example_src_table'");
Dataset<Row> exampleDF = spark.sql("SELECT e.id as id, e.load_date as load_date FROM example e");
try {
exampleDF
.writeStream()
.format("delta")
.foreachBatch((dataset, batchId) -> {
dataset.persist();
// Set the dataframe to view name
dataset.createOrReplaceTempView("updates");
// Use the view name to apply MERGE
// NOTE: You have to use the SparkSession that has been used to define the `updates` dataframe
dataset.sparkSession().sql("MERGE INTO example_src_table e" +
" USING updates u" +
" ON e.id = u.id" +
" WHEN NOT MATCHED THEN INSERT (e.id, e.load_date) VALUES (u.id, u.load_date)");
})
.outputMode("update")
.option("checkpointLocation", "/mnt/delta/events/_checkpoints/example_src_table")
.start();
} catch (TimeoutException e) {
e.printStackTrace();
}
This code runs without any problems, but there is no data written to the delta table with url '/mnt/delta/events/example_src_table'. Anyone know what I'm doing wrong?
I'm using Spark 3.0 and Java 8.
EDIT
Tested on a Databricks Notebook using Scala, and then it worked just fine.