0

I have created a delta table and now I'm trying to insert data to that table using foreachBatch(). I've followed this example. The only difference is that I'm using Java and not in a notebook, but I suppose that should not make any difference?

My code looks as follows:

spark.sql("CREATE TABLE IF NOT EXISTS example_src_table(id int, load_date timestamp) USING DELTA LOCATION '/mnt/delta/events/example_src_table'");

Dataset<Row> exampleDF = spark.sql("SELECT e.id as id, e.load_date as load_date FROM example e");

        try {
            exampleDF
                    .writeStream()
                    .format("delta")
                    .foreachBatch((dataset, batchId) -> {
                        dataset.persist();
                        // Set the dataframe to view name
                        dataset.createOrReplaceTempView("updates");
                        // Use the view name to apply MERGE
                        // NOTE: You have to use the SparkSession that has been used to define the `updates` dataframe
                        dataset.sparkSession().sql("MERGE INTO example_src_table e" +
                                " USING updates u" +
                                " ON e.id = u.id" +
                                " WHEN NOT MATCHED THEN INSERT (e.id, e.load_date) VALUES (u.id, u.load_date)");
                    })
                    .outputMode("update")
                    .option("checkpointLocation", "/mnt/delta/events/_checkpoints/example_src_table")
                    .start();
        } catch (TimeoutException e) {
            e.printStackTrace();
        }

This code runs without any problems, but there is no data written to the delta table with url '/mnt/delta/events/example_src_table'. Anyone know what I'm doing wrong?

I'm using Spark 3.0 and Java 8.

EDIT

Tested on a Databricks Notebook using Scala, and then it worked just fine.

RudyVerboven
  • 1,204
  • 1
  • 14
  • 31
  • Did you see any error? Reading from a table as a streaming query is not supported in Apache Spark. If you save the query returned from `start()` and call `query.processAllAvailable()`, it will throw the exception happening in the background. – zsxwing Sep 06 '20 at 00:22
  • No I don't get any error. And I'm not trying to read from a delta table here. The select query for 'exampleDF' is on a temporary table I created from my stream. Which in my experience is possible in Apache Spark. – RudyVerboven Sep 07 '20 at 09:19

1 Answers1

0

try to follow a syntax like the following one in case you want to update the data with the new data

WHEN NOT MATCHED THEN 
    UPDATE SET e.load_date = u.load_date AND  e.id = u.id
    

If you only want to add the data it occupies something like this

WHEN NOT MATCHED THEN INSERT *