3

I am working with large datasets and tidyr's spread usually gives me error messages suggesting failure to obtain memoryto perform the operation.

Therefore, I have been exploring dbplyr. However, as it says here, and also shown below, dbplyr::spread() does not work.

My question here is whether there is another way to accomplish what tidyr::spread does while working with tbl_dbi and tbl_sql data without downloading to local memory.

Using sample data from here, below I present what I get and what I would like to do and get.

#sample tbl_dbi and tbl_sql data

df_sample <- tribble(~group1, ~group2, ~group3, ~identifier, ~value, 
                      8, 24, 6, 'mt_0', 
                      12, 18, 24, 6, 'mt_1', 4)

con <- DBI::dbConnect(RSQLite::SQLite(), "")
df_db <- copy_to(con, df_sample, 'df_sample')

#attempting to spread tbl_dbi and tbl_sql without downloading to local memory

//this does not work

df_db %>% spread(identifier, value)

Error in UseMethod("spread_") : 
  no applicable method for 'spread_' applied to an object of class "c('tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"

#attempting to spread tbl_dbi and tbl_sql after downloading to local memory

//this spreads the data but the output is in memory
//I would like to keep the output as 'tbl_dbi', 'tbl_sql', and 'tbl_lazy'

df_db %<>% collect() %>% spread(identifier, value)

class(df_db)
[1] "tbl_df"     "tbl"        "data.frame"

Thanks in advance for any help

Krantz
  • 1,424
  • 1
  • 12
  • 31
  • Can you please elaborate on your use case? What do the key and value columns contain? An iteration over columns usually can be replaced with a grouped operation on the corresponding long-form dataset, this should work pretty well on the database. – krlmlr Mar 12 '19 at 22:55
  • Thanks, @krlmlr. `key=identifier, value=value`. Normal data. Just large. Could you kindly share an example of your thoughts as an answer? – Krantz Mar 14 '19 at 02:01
  • What are the operations you want to perform on the "wide" dataset? – krlmlr Mar 14 '19 at 08:00
  • @Krantz, have you found solution to this problem? Any way spread on the database yet? – x85ms16 Oct 01 '19 at 22:08

0 Answers0