2

I create a MySQL table as follows:

create table schema_name.test (
col1 varchar(10),
col2 varchar(10),
primary key (col1),
index col1_index (col1)
)

and I write a data.frame to the table using dbWriteTable as follows:

> test <- data.frame(col1 = c("a","b"),col2 = c("apple","banana"))
> types <- c("varchar(10)","varchar(10)")
> names(types) <- names(test)
> dbWriteTable(conn = con, overwrite = TRUE, value = test, name = "test",field.types = types)

after doing this and inspecting the table in the MySQL environment I see that the primary key and index have been removed (con here is a connection object created with dbConnect( odbc::odbc(),"my_dsn_name")). This is undesired.

The behavior I'm trying to achieve is an overwrite of the data in the table whilst leaving the metadata/structure of the table intact.

I understand that I can create another table with the same structure, write the data to that table with dbWriteTable, then use an insert statement to copy the data from this other table into the target table, but this is quite a bit of overhead and feels like it should be unnecessary. Performing sequentially a delete and append of all data is similarly problematic.

Is there a more elegant solution to this problem, and is the destruction of keys and indexes intended behavior? The documentation is quite sparse and doesn't seem to offer solutions to this issue.

Rookatu
  • 1,487
  • 3
  • 21
  • 50

1 Answers1

1

First I would like to point out that your current table definition has a redundancy:

CREATE TABLE schema_name.test (
    col1 VARCHAR(10),
    col2 VARCHAR(10),
    PRIMARY KEY (col1),
    INDEX col1_index (col1)
);

MySQL will automatically create an index on the primary key, so your index definition is unnecessary.

Regarding your actual R problem, if you don't want R to overwrite the table, then try calling dbWriteTable with overwrite=FALSE. From poking around here, it seems that dbWriteTable with overwrite=TRUE might be dropping your MySQL table and then recreating it.

Based on your comments below, if you want to remove all data in the table and then insert new data, you could first truncate the table, then make your call to dbWriteTable:

dbGetQuery(con, "TRUNCATE TABLE schema_name.test")
dbWriteTable(conn=con, overwrite=FALSE, append=TRUE, value=test, name="test", field.types=types)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    Thanks @Tim Biegeleisen; I was aware that MySQL creates indexes on primary keys; I was just trying to demonstrate that *both* get dropped; regarding using `overwrite = FALSE` this gives "Error: Table test exists in database, and both overwrite and append are FALSE"; of course, if I also set `append = TRUE` then I will not get the desired behaviour (i.e. I want to overwrite all data in the table, not append). (I also added some detail about desired behavior). Any ideas (thanks again :))? – Rookatu May 14 '19 at 14:52
  • ```dbExecute(connection, " delete from table_name") ``` will delete the data without dropping the table. – Bruce Schardt May 14 '19 at 15:01
  • 1
    @Rookatu Try truncating the table, then calling `dbWriteTable` with `overwrite=FALSE` and `append=TRUE`. – Tim Biegeleisen May 14 '19 at 15:03
  • @Tim Biegeleisen I identified this approach in my original post as a possible solution, but was trying to determine if there is anything more elegant than this; this approach seems to ask more of the user than should be necessary and I will just end up writing a wrapper function which combines these steps, leaving me to wonder why the authors of the package didn't do so as this is a standard database action. I also wonder about performance: is this really the optimal way to replace all data in the table? Thanks – Rookatu May 14 '19 at 15:11
  • The problem is that your use case is very atypical. It is unusual to have the need to completely wipe a database table each time you write to it. Maybe this is why the creators of the RMySQL package never thought to create an API which allows for this. – Tim Biegeleisen May 14 '19 at 15:12
  • @TimBiegeleisen is that really true? I've been working with databases in a large company for many years and this is extremely common in several business units here. SAS even built a load protocol in their table loaders to do exactly this (namely the teradata fastloader); regardless, the question as to whether there is a more elegant way to do this, and about performance, remains ... – Rookatu May 14 '19 at 15:16
  • My solution requires making two API calls, and I don't know of any way to do this using only one call. – Tim Biegeleisen May 14 '19 at 15:16