I am using PySpark 2.4.3 and I have a dataframe that I wish to write to Parquet, but the column names have spaces, such as Hour of day
.
df = spark.read.csv("file.csv", header=True)
df.write.parquet('input-parquet/')
I am getting this error currently:
An error occurred while calling o425.parquet.
: org.apache.spark.sql.AnalysisException: Attribute name "Hour of day" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
How can I either rename the columns or give them aliases to be able to write to Parquet?