Spark: cannot resolve input columns

Question

I have

val colNames = data.schema.fieldNames
.filter(colName => colName.split("-")(0) == "20003" || colName == "eid")

which I then use to select a subset of a dataframe:

var medData = data.select(colNames.map(c => col(c)): _*).rdd

but I get

cannot resolve '`20003-0.0`' given input columns: 
[20003-0.0, 20003-0.1, 20003-0.2, 20003-0.3];;

What is going on?

score 4 · Answer 1 · answered Mar 30 '17 at 16:26

4

I had to include backticks like this:

var medData = data.select(colNames.map(c => col(s"`$c`")): _*).rdd

spark is for some reason adding the backticks

answered Mar 30 '17 at 16:26

cosmosa

Hey cosmos1990, any idea why this is happening? I just came up against the same thing and can't fathom why this might be happening – Sebastian Carroll Nov 02 '17 at 08:21
1

Looks like its not spark adding the back-ticks, but rather the back-ticks being required to escape the dots and hyphens - https://issues.apache.org/jira/browse/SPARK-18502 and https://stackoverflow.com/questions/30889630/how-to-escape-column-names-with-hyphen-in-spark-sql – Sebastian Carroll Nov 02 '17 at 08:27

1 Answers1