2

I have a snappy.parquet file which I need to open as a DataFrame in spark, then upload to a database.
Two of the column names contain spaces (" ").
Using

emps = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_emps)

Gives and error - column names contain invalid characters.
Using

df_emps = spark.read.parquet(file)
for c in df_emps.columns:
   df_emps = df_emps.withColumnRenamed(c, c.replace(" ", ""))
df_emps = spark.read.schema(df_emps.schema).parquet(file)

reads the file and creates the dataframe, but the two columns that contained spaces ar now null.

How can I read this file into a dataframe and retain the content of these fields?

mck
  • 40,932
  • 13
  • 35
  • 50
NigelLegg
  • 115
  • 1
  • 11
  • `df_emps = spark.read.schema(df_emps.schema).parquet(file)` looks unnecessary to me. after line 3 the dataframe should be okay. In line 4, it nulled those columns because they no longer exist (they are renamed in df_emps.schema) – mck Dec 06 '20 at 12:34
  • Without line 4 I still get the error: – NigelLegg Dec 06 '20 at 14:30
  • An error was encountered: 'Attribute name "First Name" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it.;' – NigelLegg Dec 06 '20 at 14:30
  • could you show the code for uploading to database? I don't see df_emps being used in your first code snippet. – mck Dec 06 '20 at 14:32
  • It's not there because I haven't done it yet, because of the error. I have been using an AWS Glue Dev Endpoint and Jupyter notebook. I have the above code, then df_emps.show(), which show a dataframe with the two columns of Null. – NigelLegg Dec 06 '20 at 15:03
  • have you tried this? https://stackoverflow.com/a/51197279 – mck Dec 06 '20 at 15:09
  • Yes, I've tried all of those. – NigelLegg Dec 07 '20 at 07:00
  • 2
    Were you able to solve the issue? – el-aasi Apr 26 '21 at 08:14

1 Answers1

0

Since you are using AWS Glue, you can switch to using AWS Glue v4 which includes Spark 3.3.0 where the issue was fixed.

Andrey
  • 59,039
  • 12
  • 119
  • 163