1

After migrating to Spark 3.2.0 i had to upgrade the external package of spark-avro to spark-avro 2.12:3.2.0.

After this migration i was unable to read any avro file that contains spaces in their column names.

The errors occurs on the read method below so i'm not able to rename the column name using .alias() or .withColumnRenamed()

spark.read.format('avro').load(
    'hdfs:///avrofile')

Here is an extract of the error logs:

pyspark.sql.utils.AnalysisException: Column name "Test Data" contains invalid character(s).Please use alias to rename it.

Note that i don't have this issue using spark-avro version below 2.12:3.1.0 however due to imcopability issues i'm not able to use to write avro files using spark-avro versions whose Spark version are below 3.2.0.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245

0 Answers0