0

The dataframe df_problematic in PySpark has the following columns:

+------------+-----------+------------+
|sepal@length|sepal.width|petal_length|
+------------+-----------+------------+
|         5.1|        3.5|         1.4|
|         4.9|          3|         1.4|

I'd expect the dataframe would not load or throw some error since the columns have @ and ..

But it looks like it loads just fine.

How can it be loaded?

Operations on the columns with special characters (unless I surround the column with `) throw an error. However, operations on the columns with normal names work just fine, e.g. sampling:

df_problematic_sampled = df_problematic.sample(fraction=0.8)
df_problematic_sampled.head(3)

Output:

[Row(sepal@length='4.7', sepal.width='3.2', petal_length='1.3', petal.width='.2', variety='Setosa'),
 Row(sepal@length='4.6', sepal.width='3.4', petal_length='1.4', petal.width='.3', variety='Setosa'),
 Row(sepal@length='4.4', sepal.width='2.9', petal_length='1.4', petal.width='.2', variety='Setosa')]

Does it mean that as long as I do not use the columns with special characters, and perform operations only on the columns with normal names, the dataframe df_problematic can be e.g. sampled/grouped/saved just fine?

Uylenburgh
  • 1,277
  • 4
  • 20
  • 46
  • 1
    if you have special character/keywords in col name you need to use backticks (`). If it's a normal column you don't need to. This Behaviour is expected – Equinox Feb 27 '23 at 10:13
  • 1
    spark will allow use to use those columns in a dataframe but you must use the \` backtick notation when referring to those columns in operations such as: df_probematic.select(\`sepal.width\`).show() – Chris Feb 27 '23 at 10:43
  • As I mention, I do use backticks. I understand that I can't operate on them without backticks. My question is how those columns are actually stored, would they br preserved just fine if I never touched them, and all operation would be done on the columns with normal names? – Uylenburgh Feb 27 '23 at 11:59
  • @Uylenburgh, correct the columns will be preserved and operate as if they were named without containing special characters – Chris Feb 27 '23 at 13:36

0 Answers0