1

I've been using dplyr with dbplyr to generate queries for my sql server and pull the data, but I couldn't find a way to execute the NOT IN sql command.

I'm trying to generate something like this

SELECT * FROM EMPLOYEE WHERE ID NOT IN id_list

where id_list is obtained from another query in dplyr

id_list <- mutate(uid = distinct(id)) %>% select(uid) %>% collect()

id_list is a tbl_Microsoft SQL Server object.

What is the dplyr code to generate the SQL query?

Surya
  • 171
  • 1
  • 9
  • 2
    negate `%in%` ? `EMPLOYEE[!EMPLOYEE$ID %in% id_list]` Or with `dplyr`, `EMPLOYEE %>% filter(!ID %in% id_list)` – Ronak Shah May 31 '19 at 08:55
  • thanks. the negate `%in%` didn't work. But i think the filter function would do the job. edit: the !ID threw an sql error – Surya May 31 '19 at 09:01
  • `as.data.frame(id_list)$uid` did the trick – Surya May 31 '19 at 09:43
  • 1
    Please see an answer I posted that addresses how to negate this query using dbplyr. Ronak Shah's answer was on the right track just needed to put the comparison in parentheses before negating. – Matt Dzievit May 31 '19 at 11:01

1 Answers1

2

Ronak Shah's code is the correct idea, but is not negating properly. Right now it is just negating the ID instead of negating the results of the comparison.

EMPLOYEE %>% 
filter(!(ID %in% id_list)) %>%
collect()

If you put the comparison in parentheses and then negate the results of the comparison your query will work nicely (I tested it on a DB connection I have).

Matt Dzievit
  • 527
  • 3
  • 10
  • thanks. The main problem I had was that the `id_list` was a tibble and I had to convert it into a vector as I mentioned in the comment. I then used the vector with your snippet above. – Surya Jun 01 '19 at 04:25