SPARK SQL - case when then

Question

I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?

select case when 1=1 then 1 else 0 end from table

Thanks Sridhar

See answers to duplicate question: https://stackoverflow.com/questions/30783517/apache-spark-add-an-case-when-else-calculated-column-to-an-existing-d — Josiah Yoder, Jul 13 '17 at 17:41

Spiro Michaylov · Accepted Answer · 2017-06-21T15:16:15.333

Before Spark 1.2.0

The supported syntax (which I just tried out on Spark 1.0.2) seems to be

SELECT IF(1=1, 1, 0) FROM table

This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.

Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.

Update for Spark 1.2.0 and beyond

As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:

SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData

Update for most recent place to figure out syntax from the SQL Parser

The parser source can now be found here.

Update for more complex examples

In response to a question below, the modern syntax supports complex Boolean conditions.

SELECT
    CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer

You can involve multiple columns in the condition.

SELECT
    CASE WHEN id = 1 OR state = 'MA' 
         THEN "OneOrMA" 
         ELSE "NotOneOrMA" END AS IdRedux
FROM customer

You can also nest CASE WHEN THEN expression.

SELECT
    CASE WHEN id = 1 
         THEN "OneOrMA"
         ELSE
             CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
    END AS IdRedux
FROM customer

how can you add additional if statements to this such as extending with `or`. ie. `SELECT CASE WHEN key = 1 or key2 = 1 THEN 1 ELSE 2 END FROM testData` — horatio1701d, Jun 17 '17 at 21:02
@horatio1701d The syntax of your example works for me (on Spark 2.1.0) as long as there is a column called key2 of the right type in testData. I added a bunch of examples of this [here](https://github.com/spirom/LearningSpark/blob/master/src/main/scala/sql/CaseWhenThen.scala) and have updated my answer accordingly. If you're having trouble making this work you may want to post a new question and link to it from here. — Spiro Michaylov, Jun 21 '17 at 15:17
**@Spiro Michaylov**, does that mean for multiple `WHEN` clauses, we need to use **nested** `CASE WHEN .. ELSE WHEN`? — y2k-shubham, Feb 12 '19 at 09:04

score 32 · Answer 2 · edited Apr 19 '22 at 15:16

32

For Spark 2.+ Spark when function

From documentation:

Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

 // Example: encoding gender string column into integer.

   // Scala:
   people.select(when(col("gender") === "male", 0)
     .when(col("gender") === "female", 1)
     .otherwise(2))

   // Java:
   people.select(when(col("gender").equalTo("male"), 0)
     .when(col("gender").equalTo("female"), 1)
     .otherwise(2))

edited Apr 19 '22 at 15:16

Michael Brenndoerfer

3,483
2
39
50

answered Dec 13 '17 at 19:50

Ehud Lev

2,461
26
38

2

Thanks Ehud! I used this approach to build a complex nested query with a few lines of code! – MiguelPuyol Jan 17 '19 at 09:45

score 3 · Answer 3 · answered Dec 20 '20 at 18:08

This syntax worked for me in Databricks:

  select 
    org, 
    patient_id,
    case 
      when (age is null) then 'Not Available'
      when (age < 15) then 'Less than 15'
      when (age >= 15 and age < 25) then '15 to 25'
      when (age >= 25 and age < 35) then '25 to 35'
      when (age >= 35 and age < 45) then '35 to 45'
      when (age >= 45) then '45 and Older'
    end as age_range
  from demo

score 1 · Answer 4 · answered Oct 19 '21 at 10:54

1

The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:

 case
   when exp1 in ('a','b','c')
    then element_at(map('a','A','b','B','c','C'), exp1)
   else exp1
  end

answered Oct 19 '21 at 10:54

mr.polden2010

11
1

swapnil shashank · Answer 5 · 2019-02-23T08:40:25.980

Based on my current production code, this works

   val identifierDF = 
   tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
   when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
        .when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
        .when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
        .when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
        .when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
        .otherwise(0)
        .alias("identifier_in_description_score")
    )

score 0 · Answer 6 · answered Sep 29 '21 at 11:06

0

Spark DataFrame API (Python version) also enable to do next query:

df.selectExpr('time', \
   'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')

answered Sep 29 '21 at 11:06

Vladimir Gavrysh

149
1
4

SPARK SQL - case when then

6 Answers6

Linked

Related