I am using Databricks' Auto Loader functionality to process JSON files from a directory and save them into a Delta table in another subdirectory.
My code looks like this:
transporters = (spark
.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("recursiveFileLookup", "true")
.schema(transporters_schema)
.load(source_files_path)
.writeStream
.format("delta")
.outputMode("append")
.option("checkpointLocation", auto_loader_checkpoints_path)
.trigger(availableNow=True)
.start(target_table_path)
)
For some reason, the Delta tables contains hundreds of subdirectories containing Parquet files like the following:
01
|___
part_00001_fsdgwsdg_afafafafa.snappy.parquet
part_00002_fsdgwsdg_afafafafa.snappy.parquet
part_00003_fsdgwsdg_afafafafa.snappy.parquet
02
03
0f
0J
0j
0K
0o
0R
...
I did not expect these subdirectories to appear. What could be causing them?