3

I have a table (df) of about 50,000 rows and 12 columns to write to SAP HANA. I use the RJDBC library and write row by row as follows:

# Returns the sql statement to insert one row
 build_insert_string <- function(db_output, row) {
  row_string <- paste(row, collapse="','")
  statement <- paste('INSERT INTO "', db_output$SCHEMA, '"."',db_output$table_name,'" (',db_output$string_of_columns,') VALUES (\'', row_string, '\');', sep='')
  return(statement)
}

# Insert row by row
for(i in 1:nrow(df)){
    tryCatch({ dbGetQuery(jdbcConnection, build_insert_string(db_output, df[i,])) }, error = function(e) {handle_db_errors(e)})
  }

where db_output is a list containing the output constants (schema, table and columns).

Currently, it takes almost half of day to write the table. It seems that HANA does not support batch inserts such as:

INSERT INTO example
  (example_id, name, value, other_value)
VALUES
  (100, 'Name 1', 'Value 1', 'Other 1'),
  (101, 'Name 2', 'Value 2', 'Other 2'),
  (102, 'Name 3', 'Value 3', 'Other 3'),
  (103, 'Name 4', 'Value 4', 'Other 4');

Did anyone encounter this challenge, and if so, did you find a way to circumvent it and improve the writing efficiency?

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
NRLP
  • 568
  • 3
  • 16

1 Answers1

0

I’ll leave this here for posterity:

While dbGetQuery is the clean solution for large tables – it executes the query and then clears the resultset after each insertion, it is also slow.

Apparently, multiple INSERTs into SAP HANA are successful when sent from the SQL editor but not when sent from R.

A (really) fast solution would be provided by:

dbWriteTable (
conn = jdbcConnection,
name= paste0(db_output$SCHEMA, ".",db_output$table_name),
value = df,
row.names = FALSE,
field.types = db_output$string_of_columns, 
append=TRUE
)

However, dbWriteTable() is not meant for large tables (it will throw a memory limit error). This limitation can be circumvented by increasing the memory allocation pool by modifying the Xmx Java option, such as: options(java.parameters="- Xmx5000m"). Use it at your own peril, especially if you aim to automate the writing of increasingly big tables.

Another potential solution we explored was to export the R ouput as .csv (or multiple .csvs in case of a more than 1 million rows), and then send a query to import the .csvs to SAP HANA. Large csvs get imported very fast to SAP HANA, but this solution entails an extra step (an intermediary .csv output) and it is more prone to incorrect data importation.

NRLP
  • 568
  • 3
  • 16