How costly is it to change the datatype of a column in Delta Lake?

Asked Apr 14 '23 at 16:42

Active Apr 14 '23 at 16:42

Viewed 50 times

I have a big data pipeline in Spark that writes output in parquet to delta lake (backed by storage accounts on Azure). The output schemas keep changing as I'm still figuring out what I need them to be, and sometimes this results in a column needing to change its datatype (e.g. a string now needs to be an int). However I can't simply make this change, as then when writing to the delta lake I'll get schema mismatch errors. So currently I've just been renaming columns (e.g. ColumnA that was a string becomes ColumnAInt etc.). This isn't very clean but I've been told that changing the datatype of a column is very expensive, but I haven't been able to find authoritative documentation on this.

I have seen this page: https://docs.delta.io/latest/delta-batch.html#-change-column-type but it doesn't mention how expensive this operation would be/how it scales with the amount of data. Would anyone have answers regarding that?

asked Apr 14 '23 at 16:42

ROODAY

How costly is it to change the datatype of a column in Delta Lake?

0 Answers0