Transform column names of Delta Live Tables in Databricks

Question

I am ingesting a csv file from a mounted blob storage to delta live table and here's my initial query:

CREATE INCREMENTAL LIVE TABLE table_raw
COMMENT "Ingesting data from /mnt/foo"
TBLPROPERTIES ("quality" = "bronze")
AS
SELECT * FROM cloud_files("/mnt/foo/", "csv")

I need/want to transform its column names (e.g. remove punctuations, trim whitespaces, etc.). How should I approach it?

The data has 42 columns. I need to transform all of them. – Jared Dominic Caraan Jun 10 '22 at 14:15 — Jared Dominic Caraan, Jun 10 '22 at 14:15

score 0 · Answer 1 · answered Mar 24 '23 at 16:24

First of all, it should be said that data cleansing should only be done in the silver layer. In the bronze layer you explicitly want to leave the data unchanged.

If you now want to apply some data preprocessing or column renaming you could just use common SQL syntax in combination with pyspark SQL functions (see here)

E.g.:

CREATE INCREMENTAL LIVE TABLE table_silver
COMMENT "Ingesting data from table_raw"
TBLPROPERTIES ("quality" = "silver")
AS
SELECT rtrim(a) AS ColumnName1
rtrim(b) AS ColumnName2
....
FROM LIVE.table_raw

the question about cleaning up the column names, while your code is for cleaning up the data itself — Alex Ott, Mar 25 '23 at 12:58

Transform column names of Delta Live Tables in Databricks

1 Answers1