I have a piece of pyspark code the converts a dataframe into a physical table:
df.write.mode('overwrite').saveAsTable('sometablename')
In case the dataframe, df, contains columns which have spaces in their names it fails with the following error:
18/03/08 10:33:29 ERROR CreateDataSourceTableAsSelectCommand: Failed to write to table pivot_up_spaces_Export_Data_4
org.apache.spark.sql.AnalysisException: Attribute name "SUM_count_col umn" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:581)
at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldName(ParquetSchemaConverter.scala:567)
when I use registerTempTable on the same table, things work fine:
df.registerTempTable('sometablename')
However, I in spark-sql, I am able to create tables which have spaces in the column names. Is there any way I can get around this situation in pyspark ?
I am running this on a EMR 5.10.0 cluster which internally uses Spark 2.2.0.