The dataset I'm working on has whitespaces in its columns and I got struck while trying to rename spark dataframe column name. Tried almost all the solutions available in stackoverflow. Nothing seems to work.
Note: The file must be a parquet file.
df.printSchema
root
|-- Type: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- ID: string (nullable = true)
|-- Catg Name: string (nullable = true)
|-- Error Msg: string (nullable = true)
df.show()
Error:
warning: there was one deprecation warning; re-run with -deprecation for details
org.apache.spark.sql.AnalysisException: Attribute name "Catg Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
Tried:
df.select(df.col("Catg Name").alias("Catg_Name"))
and then df.printSchema
root
|-- Type: string (nullable = true)
|-- timestamp: string (nullable = true)
|-- ID: string (nullable = true)
|-- Catg_Name: string (nullable = true)
|-- Error_Msg: string (nullable = true)
works well but when I use df.show() it throws the same error.
warning: there was one deprecation warning; re-run with -deprecation for details
org.apache.spark.sql.AnalysisException: Attribute name "Catg Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;