Questions tagged [databricks-autoloader]
69 questions
1
vote
1 answer
Azure Databricks is unable to create an Event Grid Subscription for Autoloader Streams
I am trying to create an autoloader stream in Azure Databricks.
Now when I am trying to start the writeStream, I am presented with exception saying:
com.databricks.sql.cloudfiles.errors.CloudFilesException: Failed to create an Event Grid…

R96R
- 99
- 9
1
vote
1 answer
Databricks autoloader works on compute cluster, but does not work within a task in workflows
I feel like I am going crazy with this. I have tested a data pipeline on my standard compute cluster. I am loading new files as batch from a Google Cloud Storage bucket. Autoloader works exactly as expected from my notebook on my compute cluster.…

ojp
- 973
- 1
- 11
- 26
1
vote
0 answers
Streaming table Schema change
Is it possible to update the schema (change data-type of a column) of a non-empty table in Databricks (loaded by streaming Autoloader) without impacting the checkpoint folder ?
Is there any work-around to achieve this ?
Update:
The data is read by…

marie20
- 723
- 11
- 30
1
vote
1 answer
Databricks Auto Loader creates strange subdirectories
I am using Databricks' Auto Loader functionality to process JSON files from a directory and save them into a Delta table in another subdirectory.
My code looks like this:
transporters =…

Aleksandra Angelova
- 51
- 6
1
vote
1 answer
Databricks processed files
I am currently setting up a data pipeline in databricks. The situation is as follow:
Incoming data comes as json-files. Data is being fetched asynchronously to the filestore. In case data is received multiple times a day, this is put into the same…

bluhub
- 129
- 1
- 2
- 10
1
vote
1 answer
How can I find the names of the files processed in Databricks Auto Loader
I am new to Databricks and PySpark and I am debugging a code.
I am trying to debug a code which uses Auto Loader. I expect 10 files to be received every 2 hours into a storage account and I have confirmed that they are in fact.
I would like to know…

learner
- 833
- 3
- 13
- 24
1
vote
1 answer
Using great expectations with databricks autolaoder
I have implemented a data pipeline using autoloader bronze --> silver --> gold.
now while I do this I want to perform some data quality checks, and for that I'm using great expectations library.
However I'm stuck with below error when trying to…

Chhaya Vishwakarma
- 1,407
- 9
- 44
- 72
1
vote
2 answers
How to get the checkpoint location of delta live table?
Suppose you already used checkpoint to update the delta table(external table) with Autoloader. How can I find out its checkpoint location?
I tried running the code below, but it didn't work in my environment.
SELECT * FROM sys.tables WHERE name LIKE…

Saito Mieko
- 11
- 3
1
vote
1 answer
Can Databricks Autoloader Keep Track of File Uploading Time
Is it possible to keep track of S3 file uploading time with Databricks autoloader? Looks like Autoloader would add columns for the file name and processing time but in our user case we would need to know the order the files are uploaded to S3.

seiya
- 1,477
- 3
- 17
- 26
1
vote
1 answer
pyspark - will partition option in autoloader->writesteam partitioned for existing table data?
i used autoloader to read data file and write it to table periodically(without partition at first) by below code:
.writeStream\
.option("checkpointLocation", "path") \
.format("delta")\
.outputMode("append")\
.start("table")
Now data size is…

peace
- 299
- 2
- 16
1
vote
1 answer
What kind of Nodes to choose for Autoloader- Azure
Ok, so, I have autoloader working in directory listing mode because the event driven mode requires way more elevated permissions that we can't in LIVE.
So, basically what the autoloader does is : reads parquet files, from many different folders…

Saugat Mukherjee
- 778
- 8
- 32
1
vote
1 answer
Databricks: I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2
I met with an issue when I was trying to use autoloader to read json files from Azure ADLS Gen2. I am getting this issue for specific files only. I checked the file are good and not corrupted.
Following is the issue:
Caused by:…

swapnil kamle
- 115
- 1
- 8
1
vote
0 answers
Azure Databricks Autoloader schema check does not recognize when a column is removed from schema
My team is writing a streaming application to load files into our data lake. Our environment is Azure, we are using spark and databricks for the application. It is a streaming application to read mostly csv files with a set schema. We are using…

ceddings76
- 11
- 1
1
vote
0 answers
How to store a schema in file and in which file format for databricks autoloader?
I am using databricks autoloader. Here, the table schema will be dynamic for the incoming data. I have to store the schema in some file and read it in autoloader during readStream.
How can I store the schema in a file and in which format?
Whether…

Thiru Balaji G
- 163
- 2
- 10
1
vote
0 answers
auto loader inferColumnTypes not working for dates
I'm trying to understand why does setting this option as true works for numbers and booleans but not for dates. However, if I specify the data type as DATE using schemaHints then it does pick it up.
Dates have the following format:…

hikizume
- 578
- 11
- 25