In my pyspark notebook, I
- read from tables to create data frames
- aggregate the data frame
- write to a folder
- create a SQL table from the output folder
For
#1. I do `spark.read.format("delta").load(path)`
#2. I do `df = df.filter.(...).groupby(...).agg(...)
#3. I do `df.write.format("delta").mode("append").save(output_folder)`
#4. I do `spark.sql(f"CREATE TABLE IF NOT EXISTS {sql_database}.{table_name} USING delta LOCATION '{path to output folder in #3}'") `
The issue I am having is I have debug print statements before and after step 3 and 4. And I check that there are parquet files in my output folder and the Path I use in creating the SQL table is correct. And there is no exception in the console.
But when I try to see look for that newly create table in SQL tool, I can't see it.
Can you please tell me if I need to wait for the 'write' to be done before I create the SQL tables? If yes, what do I need to do to wait for the write to be done?