3

I am exporting data to a MySQL database from R using the RMySQL package. A big problem I am facing is duplicates: I have too many entries with same values. The system automatically produces the results and inserts the dataframe into the database.

Current Table in DB:

Name   Balance  
Bob      100
Ted      150
Carl     130

I also have a data frame (df) in R reflecting changed balances:

> df
    Name   Balance
[1] Bob      100
[2] Ted      150
[3] Bill     50

Data to be inserted in DB:

Name   Balance
Bill     50

Now after insertion, Table should look like this:

Name   Balance
 Bob      100
 Ted      100
 Carl     130
 Bill     50

But my dbwrite produces this:

Name   Balance
 Bob      100
 Ted      100
 Carl     130
 Bob      100
 Ted      150
 Bill     50

I am using this code:

dbWriteTable(con, "DB Table name", rbind(Dataframe), row.names=FALSE,append=TRUE)

Is there any way to check for existing ones and update the table with only new ones using RMYSQL in R.

Barranka
  • 20,547
  • 13
  • 65
  • 83

1 Answers1

0

Instead of using dbWriteTable() use dbSendQuery(). You can build an SQL query to insert each row of the data frame:

df <- data.frame(name=c('Bob','Ted','Bill'), balance=c(100,150,50))
strSQL <- paste(
  'insert into your_table (name, balance) values',
  paste(sprintf("('%s', %f)", df$name, df$balance), collapse=', '),
  'on duplicate key update balance = values(balance)',
  sep = ' '
)
# The query will be something like this:
#           insert into your_table (name, balance)
#           values ('Bob', 100.000000), ('Ted', 150.000000), ('Bill', 50.000000) 
#           on duplicate key update balance = values(balance);
# Important: If you want this to work, you must ensure that there's a unique
# index on column 'name' in your_table
dbSendQuery(con, strSQL)
Barranka
  • 20,547
  • 13
  • 65
  • 83
  • Thanks for the reply. I tried your query and got an error. Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1) – Aishwaryaa Viswanathan Sep 10 '14 at 09:10
  • @AishwaryaaViswanathan Sorry! Missed the last part of the SQL sentence: `on duplicate key update balance = values(balance)`. I've just edited the answer with the correction. Should work now – Barranka Sep 10 '14 at 14:45
  • 1
    It's better to use dbGetQuery because that automatically closes the result set. You should also either quote values or use a parameterised query in order to protect against SQL injection attacks – hadley Feb 22 '15 at 17:22
  • @hadley could you please post a link or reference on how to use prepared statements with `RMySQL`? I haven't found anything about it (and I've searched) – Barranka Feb 22 '15 at 20:58
  • It's mostly in the dev version - https://github.com/rstats-db/RMySQL. Will work on an overview doc soon – hadley Feb 23 '15 at 23:20