0

I have

val colNames = data.schema.fieldNames
.filter(colName => colName.split("-")(0) == "20003" || colName == "eid") 

which I then use to select a subset of a dataframe:

var medData = data.select(colNames.map(c => col(c)): _*).rdd

but I get

cannot resolve '`20003-0.0`' given input columns: 
[20003-0.0, 20003-0.1, 20003-0.2, 20003-0.3];;

What is going on?

cosmosa
  • 743
  • 1
  • 10
  • 23

1 Answers1

4

I had to include backticks like this:

var medData = data.select(colNames.map(c => col(s"`$c`")): _*).rdd

spark is for some reason adding the backticks

cosmosa
  • 743
  • 1
  • 10
  • 23
  • Hey cosmos1990, any idea why this is happening? I just came up against the same thing and can't fathom why this might be happening – Sebastian Carroll Nov 02 '17 at 08:21
  • 1
    Looks like its not spark adding the back-ticks, but rather the back-ticks being required to escape the dots and hyphens - https://issues.apache.org/jira/browse/SPARK-18502 and https://stackoverflow.com/questions/30889630/how-to-escape-column-names-with-hyphen-in-spark-sql – Sebastian Carroll Nov 02 '17 at 08:27