4

What is the dbplyr verbs combination that is equivalent to DBI::dbSendQuery(con, "DELETE FROM <table> WHERE <condition>").

What I want is not querying data from database, but removing data from and updating a table in database.

I want to do it in a dplyr way, but I am not sure if it is possible. I could not find anything similar in the package reference.

englealuze
  • 1,445
  • 12
  • 19

2 Answers2

2

dbplyr translates dplyr commands to query database tables. I am not aware of any inbuilt way to modify existing database tables using pure dbplyr.

This is likely a design choice.

  • Within R we do not need to distinguish between fetching data from a table (querying) and modifying a table. This is probably because in R we can reload the original data into memory if an error/mistake occurs.
  • But in databases querying and modifying a table are deliberately different things. When modifying a database, you are modifying the source so additional controls are used (because recovering deleted data is a lot harder).

The DBI package is probably your best choice for modifying the database

This is the approach I use for all my dbplyr work. Often a custom function that takes the query produced by dbplyr translation and inserting it into a DBI call (you can see examples of this in my dbplyr helpers GitHub repo).

Two approaches to consider for this: (1) an anti-join (on all columns) followed by writing a new table, (2) the DELETE FROM syntax.

Mock up of anti-join approach

records_to_remove = remote_table %>%
  filter(conditions)

desired_final_table = remote_table %>%
  anti_join(records_to_remove, by = colnames(remote_table))

query = paste0("SELECT * INTO output_table FROM (",
               sql_render(desired_final_table),
               ") AS subquery")

DBI::dbExecute(db_con, as.character(query))

Mock up of DELETE FROM syntax

records_to_remove = remote_table %>%
  filter(conditions)

query = sql_render(records_to_remove) %>%
  as.character() %>%
  gsub(search_term = "SELECT *", replacement_term = "DELETE")

DBI::dbExecute(db_con, query)

If you plan to run these queries multiple times, then wrapping them in a function, with checks for validity would be recommended.

For some use cases deleting rows will not be necessary.

You could think of the filter command in R as deleting rows from a table. For example in R we might run:

prepared_table = input_table %>%
  filter(colX == 1) %>%
  select(colA, colB, colZ)

And think of this as deleting rows where colX == 1 before producing output:

output = prepared_table %>%
  group_by(colA) %>%
  summarise(sumZ = sum(colZ))

(Or you could use an anti-join above instead of a filter.)

But for this type of deleting, you do not need to edit the source data, as you can just filter out the unwanted rows at runtime every time. Yes it will make your database query larger, but this is normal for working with databases.

So combining the preparation and output in SQL is normal (something like this):

SELECT colA, SUM(colZ) AS sumZ
FROM (

  SELECT colA, colB, colZ
  FROM input_table
  WHERE colX = 1

) AS prepared_table
GROUP BY colA

So unless you need to modify the database, I would recommend filtering instead of deleting.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thank you for the comprehensive answer. For the 2nd part of your answer. I understand in SQL a query is usually combined with multiple logics including group by and filter using where statement, which is different than `dplyr` in which processing logic can be clearly separated by pipe. But I don't understand why this is relevant to deleting rows from a table. What is your actual point here? Could you explain in another way? Thanks – englealuze Dec 18 '21 at 16:00
  • Good point, my second example is not clearly addressing your question. I intended it more for future readers who might assume deleting rows from a remote table is a better solution than just filtering them out. I will edit my answer to make this clearer. – Simon.S.A. Dec 18 '21 at 20:45
  • Hopefully those edits make the second section clearer. From your question it sounds like you intend to modify the database. So the second section is less relevant as an answer to you and more an attempt to future proof the answer. – Simon.S.A. Dec 18 '21 at 21:14
0

Here is an alternative to @Simon.S.A's delete code:

dbDeleteCustom = function(x, con, schema_name, table_name) {
      #x: 1-row dataframe that contains 1 row for deletion
      if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")

      where_str  <- purrr::map_chr(colnames(x), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))
      where_str  <- paste(where_str, collapse = " AND ")

      update_str <- glue::glue("DELETE FROM {schema_name}.{table_name} WHERE {where_str}")

      # Execute ------------------------------------------------------------------

      query_res <- DBI::dbSendQuery(con, update_str)
      rowsAffected <- dbGetRowsAffected(query_res)
      DBI::dbClearResult(query_res)

      return (rowsAffected)
    }

This code is based on code from this post: How to pass data.frame for UPDATE with R DBI

As @Simon suggests.. consider wrapper functions. These could include:

tryCatch(
        pool::poolWithTransaction(con, function(conn) {
          rowsAffected <- dbDeleteCustom(record_to_remove, conn, schema, "tableName ")

          log_step(paste0("removed entries for action, rows affected: ", rowsAffected))
        }
        ),
        error = function(e){
          "handle error - code"
        }
      )
Aaron C
  • 301
  • 1
  • 8