1

I posted this question on the databricks forum, I'll copy below but basically I need to ingest new data from parquet files into a delta table. I think I have to figure out how to use a merge statement effectively and / or use an ingestion tool.

I'm mounting some parquet files and then I create a table like this:

sqlContext.sql("CREATE TABLE myTableName USING parquet LOCATION 'myMountPointLocation'");

And then I create a delta table with a subset of columns and also a subset of the records. If I do both these things, my queries are super fast.

sqlContext.sql("CREATE TABLE $myDeltaTableName USING DELTA SELECT A, B, C FROM myTableName WHERE Created > '2021-01-01'");

What happens if I now run:

sqlContext.sql("REFRESH TABLE myTableName");

Does my table now update with any additional data that may be present in my parquet source files? Or do I have to re-mount those parquet files to get additional data?

Does my delta table also update with new records? I doubt it but one can hope...

Is this a case for AutoLoader? Or maybe I do some combination of mounting, re-creating / refreshing my source table, and then maybe MERGE new records / updated records into my delta table?

Dudeman3000
  • 551
  • 8
  • 21

0 Answers0