0

when trying to use databricks' autoloader for writing data, the nested columns contain invalid characters

Found invalid character(s) among " ,;{}()\n\t=" in the column names of your schema.

How to deal with this issue? Note again that it is the nested columns, not the outermost columns themselves. The latter would be easily fixed with a

for col in df.columns:
    df = df.select([col(c).alias(re.sub("[^0-9a-zA-Z\_]+","",c)) for c in df.columns])

How do I reach the nested columns, as they're not yet exploded?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132

1 Answers1

0

If you're writing to Delta Lake you can use column mapping to get around this.