5

I am using dbplyr to write and run queries in SQL Server, and want to apply a conditioned mutate. This can be done using ifelse or using case_when. The query works when using ifelse but throws and exception when using case_when.

The issue appears to be the SQL syntax that the two commands get translated into. The case_when syntax does not appear to be valid SQL. Can you tell me why and how to fix it? Or is this a bug?

# libraries
library(DBI)
library(dplyr)
library(dbplyr)

# establish connection to database table
connection_string = "database.specific.string"
# mine looks something like "DRIVER=...; Trusted_Connection=...; DATABASE=...' SERVER=..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
my_table = tbl(db_connection, from = my_table_name)

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 != col2 ~ "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

SELECT 
    col1, col2,
    CASE
       WHEN CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) THEN ('a')
       WHEN CONVERT(BIT, IIF(col1 <> col2, 1.0, 0.0))) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Running this code throws an error

An expression of non-boolean type specified in a context where a condition is expected, near 'THEN'

However the ifelse query works as expected:

# attempted query
tmp = my_table %>%
    mutate(new_col = ifelse(col1 == col2, "a", "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

SELECT 
    col1, col2,
    CASE
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = TRUE) THEN ('a')
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = FALSE) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Note that in both cases the SQL syntax have been produced using show_query. Using translate_sql to produce the SQL code consistently produced much cleaner SQL syntax, but this is not the syntax that gets run on the server.

Does anyone else get these SQL queries? Any suggestions as to what is wrong and how to fix this?

Updated

Posted as an issue on the tidyverse and was informed that a solution has already been developed for case_when(..., TRUE ~ "b") being translated to ELSE 'b' (here).

However, as this does not address the syntax that is causing this exception. Question edited to focus on syntax that is the cause.

Update 2

Posted as an issue on dbplyr. A response from Christophe Dervieux (cderv) identifies that the cause appears to be that SQL server needs a special translation for case_when like it has for ifelse.

In the mean time, users can use multiple ifelse or if_else statements.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • I also tried using `ifelse` instead of `case_when` this gives: `CASE WHERE (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = TRUE) THEN ('a')` which does work. So perhaps this is a bug? – Simon.S.A. Aug 23 '18 at 04:40
  • There is an [ongoing PR](https://github.com/tidyverse/dbplyr/pull/142) that propose to replace the `WHEN(TRUE)` by a `ELSE` clause. That should solve your issue. – cderv Aug 27 '18 at 06:04

2 Answers2

0

Is it just that you have got your dplyr syntax slightly wrong?

Try this

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 == 'TRUE' ~ "b"
              # alternatively  col1 == 1 ~ "b"
)) 
Shinobi_Atobe
  • 1,793
  • 1
  • 18
  • 35
  • Sorry, the incorrect dplyr syntax was just a typo from my trying to think in both R and SQL at once (fixed above). The syntax you have proposed still produces the same SQL code that results in an error. – Simon.S.A. Aug 23 '18 at 20:32
  • Hmm, that's weird because it seems to work on my db... could it be to do with yow booleans are stored in your tables? as in are they stores as text 'TRUE' or 1/0? another thing to try would be using `if` and `else` (not `ifelse`) – Shinobi_Atobe Aug 24 '18 at 07:58
  • `ifelse` works correctly, but `case_when` does not. The difference is that the resulting SQL produced by `ifelse` contains ` = TRUE` on the end, while the SQL produced by `case_when` does not. – Simon.S.A. Aug 25 '18 at 03:42
  • I have no control over how booleans are stored in the db. I am not even sure how I could check this. – Simon.S.A. Aug 25 '18 at 03:42
0

See Update 2 above:

This appears to be an issue effecting case_when for SQL server.

The work-around at present is to use multiple ifelse or if_else statements:

data %>%
    mutate(new_col = ifelse(condition1, val1, NA)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition2, val2, new_col)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition3, val3, new_col))
    # etc
Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41