I have a snappy.parquet file which I need to open as a DataFrame in spark, then upload to a database.
Two of the column names contain spaces (" ").
Using
emps = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_emps)
Gives and error - column names contain invalid characters.
Using
df_emps = spark.read.parquet(file)
for c in df_emps.columns:
df_emps = df_emps.withColumnRenamed(c, c.replace(" ", ""))
df_emps = spark.read.schema(df_emps.schema).parquet(file)
reads the file and creates the dataframe, but the two columns that contained spaces ar now null.
How can I read this file into a dataframe and retain the content of these fields?