I am collecting live data from a website which populates a data frame in R. The rows could have the same unique id's, or new rows could be introduced. I want to send the dynamic data frame to a MariaDB database table, where the rows with existing unique id's update the columns that I have specified, rows without existing unique id's get inserted in the table as new rows. I can get this to work with the MariaDB INSERT ON DUPLICATE KEY UPDATE statement, and a function that generates the needed values from the dynamic data frame.
MWE:
install.packages("odbc")
insall.packages("RMariaDB")
library(odbc)
library(RMariaDB)
con <- dbConnect(RMariaDB::MariaDB(), host = Sys.getenv('MARIADB_DBHOST'),
port = Sys.getenv('MARIADB_DBPORT'), user = Sys.getenv('MARIADB_DBUSER'),
password = Sys.getenv('MARIADB_DBPW'), db = Sys.getenv('MARIADB_DBNAME'),
timeout = 5)
# Database table for mwe to work.
db_live <- data.frame(id = c(12, 22, 32), car_name = c("rolls royce","nissan","mercedes benz"), km = c(123,100,150), temp = c(78,60,80))
# Get table from database, id column is unique index.
db_live <- dbReadTable(con, "db_live")
print(db_live)
id car_name km temp
1 12 rols royce 123 78
2 22 nissan 100 60
3 32 mercedes benz 150 80
# Build dynamic dataframe
df_live <- data.frame(id = c(12, 22, 32, 42),
car_name = c("rolls royce","nissan","mercedes benz", "aston martin"),
km = c(140,120,200,40), temp = c(81,65,85,50))
print(df_live)
id car_name km temp
1 12 rols royce 140 81
2 22 nissan 120 65
3 32 mercedes benz 200 85
4 42 aston martin 40 50
# create function that generates a string with values for dbSendQuery.
gen_insert_values <- function(df) {
for(i in 1:nrow(df)) {
row_string <- paste(df[i,1], paste0("'",df[i,2],"'"), df[i,3], df[i,4],
collapse = ", ")
if(exists("df_string")) {
df_string <- paste0(df_string,", ",paste0("(",row_string,")"))
} else {
df_string <- paste0("(",row_string,")")
}
}
df_string
}
values <- gen_insert_values(df_live)
print(values)
"(12 'rolls royce' 140 81), (22 'nissan' 120 65), (32 'mercedes benz' 200 85), (42 'aston martin' 40 50)"
# Send query.
res <- dbSendQuery(con, paste0("INSERT INTO db_live (id,car_name,km,temp) VALUES ", values," ON DUPLICATE KEY UPDATE km = VALUES(km), temp = VALUES(temp);"))
dbClearResult(res)
#Check db table after sent query.
new_db_live <- dbReadTable(con, "db_live")
print(new_db_live)
id car_name km temp
1 12 rolls royce 140 81
2 22 nissan 120 65
3 32 mercedes benz 200 85
4 42 aston martin 40 50
This does not seem very efficient, as I have to change the query and the function in case I want to update more columns, and I include a for loop in my function which can cause the script to be slow.
Is there a more efficient way to solve this problem?