0

I would like to understand the difference between dplyr joins and sql joins. I have an open connection to an oracle database in R:

con <- dbConnect(odbc::odbc(), …)

The 1st request :

dbGetQuery(con, "select * 
   from result join test on result.test_1 = test.test_1 
   join sample on test.sample = sample.id_2") %>% 
   setNames(make.names(names(.), unique = TRUE) )%>% 
   as_tibble()

gives a tibble with 9541 rows (what I want !)

The 2nd request :

tbl(con, "result")%>%
   inner_join(tbl(con, "sample"), by = c("test_1" = "id_2"))%>%
   collect()

gives a tibble with 2688 rows

test_1 and id_2 are character fields with spaces in it and numbers at the end. example: “ 3333” .

Thanks

Felipe
  • 719
  • 8
  • 20

1 Answers1

1

In SQL I see 3 tables, in R I see 2 tables : result and sample table.

Here is the probable cause of the difference.

phili_b
  • 885
  • 9
  • 27
  • yes you are right, I simply forgot one join with a third table "test" : tbl(con, "RESULT") %>% inner_join(tbl(con, "TEST"), by = c("TEST_1" = "TEST_1")) %>% inner_join(tbl(con, "SAMPLE") , by = c("SAMPLE" = "ID_2")) – Felipe Dec 01 '18 at 08:45