Databricks autoloader writing data with invalid characters in column name

Question

when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters

Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema.

How to deal with this issue? Note again that it is the nested columns, not the outermost columns themselves. The latter would be easily fixed with a

for col in df.columns:
    df = df.select([col(c).alias(re.sub("[^0-9a-zA-Z\_]+","",c)) for c in df.columns])

How do I reach the nested columns, as they're not yet exploded?

score 0 · Answer 1 · answered Nov 11 '22 at 07:39

0

If you're writing to Delta Lake you can use column mapping to get around this.

answered Nov 11 '22 at 07:39

Christopher Grant

1 Answers1