"NOT IN" sql command in dplyr using result from a different query

Question

I've been using dplyr with dbplyr to generate queries for my sql server and pull the data, but I couldn't find a way to execute the NOT IN sql command.

I'm trying to generate something like this

SELECT * FROM EMPLOYEE WHERE ID NOT IN id_list

where id_list is obtained from another query in dplyr

id_list <- mutate(uid = distinct(id)) %>% select(uid) %>% collect()

id_list is a tbl_Microsoft SQL Server object.

What is the dplyr code to generate the SQL query?

negate `%in%` ? `EMPLOYEE[!EMPLOYEE$ID %in% id_list]` Or with `dplyr`, `EMPLOYEE %>% filter(!ID %in% id_list)` — Ronak Shah, May 31 '19 at 08:55
thanks. the negate `%in%` didn't work. But i think the filter function would do the job. edit: the !ID threw an sql error — Surya, May 31 '19 at 09:01
Please see an answer I posted that addresses how to negate this query using dbplyr. Ronak Shah's answer was on the right track just needed to put the comparison in parentheses before negating. — Matt Dzievit, May 31 '19 at 11:01

score 2 · Answer 1 · answered May 31 '19 at 11:00

2

Ronak Shah's code is the correct idea, but is not negating properly. Right now it is just negating the ID instead of negating the results of the comparison.

EMPLOYEE %>% 
filter(!(ID %in% id_list)) %>%
collect()

If you put the comparison in parentheses and then negate the results of the comparison your query will work nicely (I tested it on a DB connection I have).

answered May 31 '19 at 11:00

Matt Dzievit

527
3
10

thanks. The main problem I had was that the `id_list` was a tibble and I had to convert it into a vector as I mentioned in the comment. I then used the vector with your snippet above. – Surya Jun 01 '19 at 04:25

"NOT IN" sql command in dplyr using result from a different query

1 Answers1