9

I'm looking for behavior similar to inserting into an already keyed SQL table, where the new rows added are inserted into existing keys. For example, in this case:

dt <- data.table(a=1:10)
setkey(dt, a)
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
dt.2 <- rbindlist(list(dt, data.table(a=1:5)))
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
# [2,] dt.2   15 1  a      

i would like to have the option of having dt.2 "inherit" the key (updated with the incremental data, obviously) from dt, instead of having no key as actually happened.

I was at first a bit surprised at the loss of the key in the first place, but that is clearly the documented behavior.

Is there a clean way of doing this without calling setkey after each rbind/rbindlist?

BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • the result of your `rbind` is unsorted, so you can't avoid calling `setkey` (had it been sorted you could shave off potentially a lot of time by setting the "sorted" attribute directly) – eddi Jan 13 '14 at 17:31
  • @eddi, understood, this is mostly a syntactic question. It seems providing the option for the re-creating of they key from within the `rbind`/`rbindlist` would be reasonable given this is the default SQL behavior. Aside: I'm assuming that if `dt` is very large and sorted, `setkey` will take advantage of that when creating the `dt.2` key, if not then there is definitely more than just a syntax issue here. – BrodieG Jan 13 '14 at 17:36
  • I guess a slightly different way of phrasing my last point, the result of by `rbind` is only partially unsorted, the first portion (in my use case, the large one) should already be sorted as per original key. – BrodieG Jan 13 '14 at 17:38
  • Setting a key in a data table is not the same as creating an index in a database table. See [this question](http://www.stackoverflow.com/questions/20076511/). – jlhoward Jan 13 '14 at 20:08

1 Answers1

9

Essentially, data.table doesn't currently support row insert at all, let alone into a keyed table. rbind creates a new data.table so isn't fast or memory efficient.

A similar question is here :

How to delete a row by reference in data.table?

Currently, the typical workflow is to load files from disk using fread and rbindlist them together, or load data from a database using RODBC or similar.

We'd like to add fast row insert, but it isn't done yet.

Community
  • 1
  • 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 1
    is there a road map for `data.table` someplace? I looked but didn't see one. – BrodieG Jan 13 '14 at 20:27
  • @BrodieG The closest to a public road map is the [feature request](https://r-forge.r-project.org/tracker/?atid=978&group_id=240&func=browse) list which is ordered by 5 levels of priority (5 is top). – Matt Dowle Jan 13 '14 at 23:56