How to escape column names with hyphen in Spark SQL

Question

I have imported a json file in Spark and convertd it into a table as

myDF.registerTempTable("myDF")

I then want to run SQL queries on this resulting table

val newTable = sqlContext.sql("select column-1 from myDF")

However this gives me an error because of the hypen in the name of the column column-1. How do I resolve this is Spark SQL?

Try escaping them with single quotes sqlContext.sql("select 'column-1' from myDF") — Identity1, Jun 17 '15 at 11:39
That does not work because it will treat 'column-1' as a string not a column name. — sfactor, Jun 17 '15 at 11:44
The SQL-99 standard specifies that double quote (") is used to delimit identifiers. Try them with double quotes maybe keeping the outer ones in single quotes — Identity1, Jun 17 '15 at 11:48

score 62 · Accepted Answer · answered Jun 26 '15 at 13:43

62

Backticks (`) appear to work, so

val newTable = sqlContext.sql("select `column-1` from myDF")

should do the trick, at least in Spark v1.3.x.

answered Jun 26 '15 at 13:43

PermaFrost

1,386
12
10

score 4 · Answer 2 · answered May 02 '17 at 01:30

4

Was at it for a bit yesterday, turns out there is a way to escape the (:) and a (.) like so:

Only the field containing (:) needs to be escaped with backticks

sqlc.select("select `sn2:AnyAddRq`.AnyInfo.noInfo.someRef.myInfo.someData.Name AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()

answered May 02 '17 at 01:30

GreenThumb

483
1
7
25

This works for columns having spaces as well, thanks – sandejai Jun 23 '22 at 17:51

score 2 · Answer 3 · answered Apr 28 '17 at 07:54

I cannot comment as I have less than 50 reps

When you are referencing a json structure with struct.struct.field and there is a namespace present like:

ns2:struct.struct.field the backticks(`) does not work.

jsonDF = sqlc.read.load('jsonMsgs', format="json")
jsonDF.registerTempTable("masterTable")
sqlc.select("select `sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name` AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()

pyspark.sql.utils.AnalysisException: u"cannot resolve 'sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name'

If I remove the sn2: fields, the query executes.

I have also tried with single quote ('), backslash (\) and double quotes("")

The only way it works if if I register another temp table on the sn2: strucutre, I am able access the fields within it like so

anotherDF = jsonDF.select("sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData")
anotherDF.registerTempTable("anotherDF")
sqlc.select("select Name from anotherDF").show()

score 0 · Answer 4 · answered May 10 '21 at 23:04

This is what I do, which also works in Spark 3.x.

I define function litCol() at the top of my program (or in some global scope):

litCols = lambda seq: ','.join(('`'+x+'`' for x in seq)) # Accepts any sequence of strings.

And then apply it as necessary to prepare my literalized SELECT columns. Here's an example:

>>> UNPROTECTED_COLS = ["RegionName", "StateName", "2012-01", "2012-02"]
>>> LITERALIZED_COLS = litCols(UNPROTECTED_COLS)
>>> print(LITERALIZED_COLS)
`RegionName`,`StateName`,`2012-01`,`2012-02`

The problematic column names in this example are the YYYY-MM columns, which Spark will resolve as an expression, resulting in 2011 and 2010, respectively.

How to escape column names with hyphen in Spark SQL

4 Answers4

Linked

Related