I'm reading parquet files and try to load them into the target Delta table, using such code:
(spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option("cloudFiles.schemaLocation", checkpoint_path)
.option("cloudFiles.schemaEvolutionMode", "rescue")
.schema(<schema of my target table>)
.load(file_path)
.writeStream
.option("checkpointLocation", checkpoint_path)
.trigger(availableNow=True)
.toTable(table_name))
It throws an error "Parquet column cannot be converted. Column: [my_column], Expected Timestamp, Found: INT64."
The column "my_column" is defined as timestamp
in the target table, in the source data files it can be INT
sometimes.
By specifying option("cloudFiles.schemaEvolutionMode", "rescue") I expect Autoloader to save all data-type mismatching data into the _rescued_data column, instead of throwing an error.
Why it doesn't behave like this?