pyspark transformer can not handle dataframe columns with spaces

Question

Using Pyspark to read data from Cassandra database.

Packages:

from  pyspark.ml.feature import SQLTransformer
from transform.Base import Transform

I have loaded the data it looks like below

+----+--------------------+-------+---+
|time|   MEM UTI PERC %   |devId  |Lid|
+----+--------------------+-------+---+
| 482|         8.661052632|      6| 20|
| 654|         9.162190612|      6| 20|
| 364|         8.219230769|      6| 20|

When I apply SQLTransform, which SQL STATEMENT AS

self.sqlstatement = "SELECT Time,MEM UTI PERC % FROM __THIS__ WHERE "

sqltrans = SQLTransformer()
sqltrans.setStatement(self.sqlstatement)
new_df = sqltrans.transform(sparkdf)

It throws error

mismatched input 'UTI' expecting {<EOF>, ';'}(line 1, pos 19)

So I modified the SQL Statement to wrap the spaced column inside double quotes/single quotes like below

SELECT Time,"MEM UTI PERC %" FROM __THIS__ WHERE

This time, the transformer doesn't throw exception but instead in replaces all the value of that spaced column with same column name , like below

+----+--------------+
|Time|MEM UTI PERC %|
+----+--------------+
| 212|MEM UTI PERC %|
|  26|MEM UTI PERC %|

I want to get data properly like

+----+--------------+
|Time|MEM UTI PERC %|
+----+--------------+
| 212|20.7          |
|  26|40.0          |

Does this answer your question? [How to escape column names with hyphen in Spark SQL](https://stackoverflow.com/questions/30889630/how-to-escape-column-names-with-hyphen-in-spark-sql), https://stackoverflow.com/questions/57963605/select-spark-dataframe-column-with-special-character-in-it-using-selectexpr — mazaneicha, Jun 23 '22 at 13:39
https://stackoverflow.com/a/43728748/1053697 Thanks , escaping with backticks(` `) does works for columns with spaces as well. Many thanks — sandejai, Jun 23 '22 at 17:53

score 0 · Answer 1 · answered Jun 23 '22 at 12:27

0

Try enclosing the column name in single quotes like this and see if it works:

self.sqlstatement = "SELECT Time,'MEM UTI PERC %' FROM __THIS__ WHERE "

Alternatively, you might need to escape the quotes this way:

self.sqlstatement = "SELECT Time,\"MEM UTI PERC %\" FROM __THIS__ WHERE "

See which one works. Cheers!

answered Jun 23 '22 at 12:27

Erick Ramirez

13,964
1
18
23

First option, I already tried and second option(escape character ), I just tried , both doesn't work. :) – sandejai Jun 23 '22 at 13:23

pyspark transformer can not handle dataframe columns with spaces

1 Answers1