Questions tagged [databricks-autoloader]

69 questions
0
votes
1 answer

databricks autoLoader - why new data is not write to table when original csv file is deleted and new csv file is uploaded

I have a question about autoload writestream. I have below user case: Days before I uploaded 2 csv files into databricks file system, then read and write it to table by autoloader. Today, I found that the files uploaded days before has wrong data…
peace
  • 299
  • 2
  • 16
0
votes
1 answer

Can Databricks Auto loader infer partitions?

By default, when you're using Hive partitions directory structure,the auto loader option cloudFiles.partitionColumns add these columns automatically to your schema (using schema inference). This is the code: checkpoint_path =…
alxsbn
  • 340
  • 2
  • 14
0
votes
0 answers

Databricks Autoloader - dealing with combined files

I'm working with some files that have some complexities multiple tab files concatenated into 1 csv files with some meta data prior to the csv data csv files with an extra row after the header that should be ignored csv files with log information…
stuartp
  • 55
  • 3
0
votes
1 answer

Databricks autoloader writing data with invalid characters in column name

when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema. How to deal with this issue? Note again that it is the…
0
votes
1 answer

Trigger workflow job with Databricks Autoloader

I have requirement to monitor S3 bucket for files (zip) to be placed. As soon as a file is placed in S3 bucket, the pipeline should start processing the file. Currently I have Workflow Job with multiple tasks the performs processing. In Job…
0
votes
0 answers

Creating a spark Dataframe within foreach() while using autoloader with BinaryFile option in databricks

I am using autoloader with BinaryFile option to decode .proto based files in databricks. I am able to decode the proto file and write it in csv format using foreach() and pandas library. But having challenge in writing it in delta format. End of the…
0
votes
1 answer

Not able to access certain JSON properties in Autoloader

I have a JSON file that is loaded by two different Autoloaders. One uses schema evolution and besides replacing spaces in the json property names, writes the json directly to a delta table, and I can see all the values are there properly. In the…
Chris de Groot
  • 342
  • 1
  • 9
0
votes
1 answer

Read data from mount in Databricks (using Autoloader)

I am using azure blob storage to store data and feeding this data to Autoloader using mount. I was looking for a way to allow Autoloader to load a new file from any mount. Let's say I have these folders in my mount: mnt/ ├─ blob_container_1 ├─…
1 2 3 4
5