I want to run multiple SQL statements to select and filter, and then to join with multiple tables. My approach to this given below. It require returning intermediate results. That is problematic due to memory limitations. Hence, I am only using a small subset of records.
However, if I can do this work in SQL Server, then I will be limited by my local memory size.
Ideally, I am looking for a way to leave intermediate results in the SQL Server. Combining the following SQL query and dplyr filter into one SQL query seems difficult.
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = DEST_SERVER,
Database = DEST_DATABASE,
Trusted_Connection = "Yes")
# subset the ids and write to sql server
id_indices <- 1:30000
id_subset <- data.frame(record_id = all_ids[id_indices])
dbWriteTable(con, '#id_subset', id_subset)
# select those with id
first_out <- dbGetQuery(
con,
"select distinct id, claim_date, paid
from [my_schema].[my_table]
where ID IN
(select ID from #id_subset)")
# get only records for the latest date for each id
paid <- first_out %>% group_by(ID) %>%
filter(as.Date(claim_date) == max(as.Date(claim_date), na.rm = TRUE)) %>%
filter(paid == "Y")
# send the results to SQL server
dbWriteTable(con, '#paid', paid)
# then the paid table will be joined to another so on.