0

I'm trying to write to a table in DB2 using overwrite = TRUE with dbWriteTable() from the DBI package in R. I would like to specify the table space I am writing to rather than writing to the default space (USERSPACE1). I also would rather not create the table before hand and use append = TRUE because it tends to take MUCH longer than overwriting (I have roughly 4 million rows with varying number of columns each time I'm writing it; minimum of 100 columns). Overwriting switches the table space back to default.

I first tried creating the table and defining the table space, but that requires appending rather than overwriting. I also tried creating the table in the table space I want it in and then overwriting, but the table switches back to the default table space. The arguments for dbWriteTable do not support altering the table space.

###################################
### Method One - Takes Too Long ###
###################################
# `variable_definitions` defines my variable types

query <- paste(
  '
  CREATE TABLE MYSCHEMA.TABLE_NAME
  (',variable_definitions,')
  IN "DESIRED_SPACE" DISTRIBUTE BY HASH ("FIELD_ONE")
  COMPRESS YES;
  '
  , sep = ''
) 

# This works to create the table in the desired table space
dbExecute(edw_advanl, query)
## 0

# This takes FOREVER
dbWriteTable(conn = myConn,
             name = "TABLE_NAME",
             value = myData,
             append = TRUE)


##########################################################
### Method Two - Changes the Table Space to USERSPACE1 ###
##########################################################
# `variable_definitions` defines my variable types

query <- paste(
  '
  CREATE TABLE MYSCHEMA.TABLE_NAME
  (',variable_definitions,')
  IN "DESIRED_SPACE" DISTRIBUTE BY HASH ("FIELD_ONE")
  COMPRESS YES;
  '
  , sep = ''
) 

# Again, this works to create the table in the desired table space
dbExecute(edw_advanl, query)
## 0

# This now puts the table in the default table space
dbWriteTable(conn = myConn,
             name = "TABLE_NAME",
             value = myData,
             overwrite = TRUE)

My desired result is to have myData (R dataframe) written in the DESIRED_SPACE table space as quick as possible. I'm currently using dbWriteTable but am open to other methods.

jon
  • 370
  • 1
  • 11
  • I suspect the difference in performance may be due to the `distribute by hash` clause, not insert itself. Looking at [the sources](https://github.com/r-dbi/odbc/blob/7b35549f9df935e1d132f6221860f87a6eb64ef6/R/Table.R) they both issue simple `insert` statements, but in the latter case the table is re-created by the driver without custom clauses (`distribute by` and `in `). Why do you think you need `distribute by hash`? – mustaccio May 25 '19 at 15:05
  • Thanks for the response! I was hoping to have the hash to boost performance on using the table, but if the trade-off means 4-5 hours to write the table then I am fine with losing it. I'm rewriting the table but it has already passed the amount of time that it takes for `overwrite = TRUE`. I'm going to give it another half hour or so to see if this is any better before I kill it. – jon May 28 '19 at 12:28
  • I doubt that the workload generated by R is suitable for DPF; did you actually verify that database partitioning improves performance when accessing this table? – mustaccio May 28 '19 at 12:34
  • Yes, it definitely helps with what I'm using the table for, though the difference can be negligible. I removed these definitions in building the table, and am experiencing the same lag with writing the table using `append = TRUE`. Do you think that the data type definitions have anything to do with the slow table writing? It's the only other thing I can think of that would impact it. – jon May 28 '19 at 14:46
  • I suggest you eliminate R from the picture while you're tuning database performance. Table structure, the data you're inserting, and about half a million other things can affect performance. – mustaccio May 28 '19 at 16:00
  • The data type definitions isn't R related. In order to use `append` from `dbWriteTable`, I need to first create the table structure and define the data types. So based on your response, I assume that you agree this is more likely the issue. Note that using `overwrite` (i.e. not defining the table structure & data types) writes the table in 9 minutes while `append` writes it in 4 hours. – jon May 28 '19 at 16:23
  • And yet when you look at the driver code they are doing the same series of inserts. The only difference overwrite has is it drops and creates the table. (unless I'm looking at the wrong driver, since you didn't say which one you're using). And no, I do not "agree this is more likely the issue", whatever "this" is; there is simply not enough information in your question to draw any conclusions. – mustaccio May 28 '19 at 16:56

0 Answers0