0

I'm working with in R and would like to insert a record using dplyr. I have included an example below of what I'm trying to achieve.

First, I load my libraries, create a toy dataset, then load it into a database.

# Load libraries
library(DBI)
library(duckdb)
library(dplyr)

# Create database connection
con <- dbConnect(duckdb(), 
                 dbdir="test.duckdb", 
                 read_only=FALSE)

# Toy data frame
df <- data.frame(a = 1:5,
                 b = letters[1:5],
                 c = runif(5))

# Write data to table
dbWriteTable(con, "foo", df)

I can now query my database using both SQL and dplyr syntax:

# Query          
dbGetQuery(con, "SELECT * FROM foo")
#   a b         c
# 1 1 a 0.1407891
# 2 2 b 0.8824569
# 3 3 c 0.2863700
# 4 4 d 0.9010191
# 5 5 e 0.8676229

tbl(con, "foo") %>% collect
#   # A tibble: 5 x 3
#       a b         c
#   <int> <chr> <dbl>
# 1     1 a     0.141
# 2     2 b     0.882
# 3     3 c     0.286
# 4     4 d     0.901
# 5     5 e     0.868

And I can insert a record with SQL syntax:

# Insert with SQL 
dbExecute(con, 
          "INSERT INTO foo VALUES (?, ?, ?)", 
          list(6, "F", 0.5))
dbGetQuery(con, "SELECT * FROM foo")

#   a b         c
# 1 1 a 0.1407891
# 2 2 b 0.8824569
# 3 3 c 0.2863700
# 4 4 d 0.9010191
# 5 5 e 0.8676229
# 6 6 F 0.5000000

Fantastic!


My question is, how do I write the SQL insert statement but using dplyr syntax instead?

Dan
  • 11,370
  • 4
  • 43
  • 68
  • Although not specific to `duckdb`, I have come across a comment by Hadley Wickham on [this answer](https://stackoverflow.com/a/26784801/1552004): "You can, but you shouldn't. dplyr is designed specifically for retrieving data from databases, not modifying in place. This answer uses internal functions - I strongly recommend against using these functions." – Dan Jun 28 '22 at 21:05

1 Answers1

1

This is almost certainly not possible.

Note that once you have written it into the database, df is not a typical (local) data frame but a local pointer to a remote database table. With a local data frame, dplyr commands can change the table, for example:

new_df = old_df %>% mutate(new_col = old_col * 2)

But databases are designed not to be modifiable in the same way. If you were to run

remote_df = tbl(con, "table_name")
new_remote_df = remote_df %>% mutate(new_col = old_col * 2)

Instead of creating a new database table, new_remote_df is reads from the original database table and constructs new_col dynamically/on-demand.

Hence, for remote database tables, dplyr does not let you change the underlying table, it only changes how the table is presented back to you.

This means that if you want to modify the database, you need to use database specific commands (such as INSERT INTO).

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thanks for the reply, Simon! From [this answer](https://stackoverflow.com/a/26784801/1552004) it looks like it _is_ possible, but not advised. – Dan Jun 28 '22 at 21:09
  • 1
    As Hadley is the author if many of these packages, I'd follow his advise that this is not advised. But you are right, it can be done. – Simon.S.A. Jun 29 '22 at 00:32