1

I am using databricks autoloader. Here, the table schema will be dynamic for the incoming data. I have to store the schema in some file and read it in autoloader during readStream.

How can I store the schema in a file and in which format?

Whether the file can be read using schema option or "cloudFiles.schemaLocation" option?

spark.readStream.format("cloudFiles").schema("<schema>").option("cloudFiles.schemaLocation", "<path_to_checkpoint>").option("cloudFiles.format", "parquet").load("<path_to_source_data>")
Thiru Balaji G
  • 163
  • 2
  • 10
  • 1
    Have not worked specifically with Autoloader, but I suggest trying out the options mentioned in the link below - https://community.databricks.com/s/question/0D53f00001ebRDYCA2/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-tables-column - I personally have had success with saving the output of `df.schema.toDDL()` to a file and then reusing it wherever necessary using the `file_schema = StructType.fromDDL(toDDL output)` Hope it works for you... – rainingdistros Nov 18 '22 at 06:37

0 Answers0