14

The line:

df.withColumn("test", expr("concat(lon, lat)")) 

works as expected but

df.withColumn("test", expr("concat(lon, lit(','), lat)"))

produces the following exception:

org.apache.spark.sql.AnalysisException: Undefined function: 'lit'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 12 at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1198)

Why? And what would be the workaround?

Shaido
  • 27,497
  • 23
  • 70
  • 73
Kyunam
  • 169
  • 1
  • 1
  • 5

1 Answers1

14

The string argument to expr will be parsed as a SQL expression and used to construct a column. Since lit is not a valid SQL command this will give you an error. (lit is used in Spark to convert a literal value into a new column.)

To solve this, simply remove the lit part:

df.withColumn("test", expr("concat(lon, ',', lat)")) 

Or use the in-built Spark concat function directly without expr:

df.withColumn("test", concat($"lon", lit(","), $"lat"))

Since concat takes columns as arguments lit must be used here.

Shaido
  • 27,497
  • 23
  • 70
  • 73
  • But how to write the following using `expr`? `df.withColumn("test", concat($"lon", lit(" a nonColumn string")) ` – MehrdadAP Oct 11 '19 at 18:01
  • 1
    @MehrdadAP: When using `expr` to concat a constant string you can simply use single quotes. The equivalent of the example using `expr` would be: `df.withColumn("test", expr("concat(lon, ' a nonColumn string')"))`. – Shaido Oct 12 '19 at 03:00