Databricks Autoloader not saving data

Asked Apr 24 '23 at 20:17

Active Apr 25 '23 at 03:22

Viewed 75 times

I am very new to Databricks Autoloader. I am trying to ingest a simple csv file with 3 records with the format [Fname, Lname, age].

The following code runs successfully in Databricks, but no data is getting saved. I'm sure I am missing something very basic. Can anyone please help where I may be going wrong.

df = spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "csv") \
  .option("header", "true") \
  .option("cloudFiles.schemaEvolutionMode", "failOnNewColumns") \
  .option("cloudFiles.schemaLocation", "/dbfs/FileStore/temp/schema/") \
  .load("/dbfs/FileStore/inbound/dsi/data/") \
  .writeStream.trigger(once=True) \
  .option("checkpointLocation","/dbfs/FileStore/temp/_checkpoint") \
  .outputMode("append") \
  .start("/dbfs/FileStore/outbound/dsi/output/") \
  .awaitTermination()

The data is :

Can anyone please help.

Thanks

edited Apr 25 '23 at 03:22

asked Apr 24 '23 at 20:17

marie20

How do you check for the new data? Also, you don't need `/dbfs` in the file names - it's only for the local file API – Alex Ott Apr 25 '23 at 06:46
@AlexOtt the checkpoint provided by Autoloader determines if a file was read or not.. – marie20 Apr 27 '23 at 08:00
I know :-) I'm asking how did you check that you have no new data? looking into destination table? – Alex Ott Apr 27 '23 at 08:09
@AlexOtt thanks for your reply. I was running the job manually in a notebook - when I called `df.show()` it did not show any records, although unprocessed files were present in the input directory. – marie20 May 01 '23 at 04:07
df.show couldn’t be used for streams. Use display(df) – Alex Ott May 01 '23 at 06:55
@marie20 did you solve this issue? It looks like I might have something similar... https://stackoverflow.com/questions/76287095/databricks-autoloader-works-on-compute-cluster-but-does-not-work-within-a-task – ojp May 19 '23 at 08:07

Databricks Autoloader not saving data

0 Answers0