I posted this question on the databricks forum, I'll copy below but basically I need to ingest new data from parquet files into a delta table. I think I have to figure out how to use a merge statement effectively and / or use an ingestion tool.
I'm mounting some parquet files and then I create a table like this:
sqlContext.sql("CREATE TABLE myTableName USING parquet LOCATION 'myMountPointLocation'");
And then I create a delta table with a subset of columns and also a subset of the records. If I do both these things, my queries are super fast.
sqlContext.sql("CREATE TABLE $myDeltaTableName USING DELTA SELECT A, B, C FROM myTableName WHERE Created > '2021-01-01'");
What happens if I now run:
sqlContext.sql("REFRESH TABLE myTableName");
Does my table now update with any additional data that may be present in my parquet source files? Or do I have to re-mount those parquet files to get additional data?
Does my delta table also update with new records? I doubt it but one can hope...
Is this a case for AutoLoader? Or maybe I do some combination of mounting, re-creating / refreshing my source table, and then maybe MERGE new records / updated records into my delta table?