rbind `data.tables` and preserve key

Question

I'm looking for behavior similar to inserting into an already keyed SQL table, where the new rows added are inserted into existing keys. For example, in this case:

dt <- data.table(a=1:10)
setkey(dt, a)
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
dt.2 <- rbindlist(list(dt, data.table(a=1:5)))
tables()
#      NAME NROW MB COLS KEY
# [1,] dt     10 1  a    a  
# [2,] dt.2   15 1  a

i would like to have the option of having dt.2 "inherit" the key (updated with the incremental data, obviously) from dt, instead of having no key as actually happened.

I was at first a bit surprised at the loss of the key in the first place, but that is clearly the documented behavior.

Is there a clean way of doing this without calling setkey after each rbind/rbindlist?

the result of your `rbind` is unsorted, so you can't avoid calling `setkey` (had it been sorted you could shave off potentially a lot of time by setting the "sorted" attribute directly) — eddi, Jan 13 '14 at 17:31
@eddi, understood, this is mostly a syntactic question. It seems providing the option for the re-creating of they key from within the `rbind`/`rbindlist` would be reasonable given this is the default SQL behavior. Aside: I'm assuming that if `dt` is very large and sorted, `setkey` will take advantage of that when creating the `dt.2` key, if not then there is definitely more than just a syntax issue here. — BrodieG, Jan 13 '14 at 17:36
I guess a slightly different way of phrasing my last point, the result of by `rbind` is only partially unsorted, the first portion (in my use case, the large one) should already be sorted as per original key. — BrodieG, Jan 13 '14 at 17:38
Setting a key in a data table is not the same as creating an index in a database table. See [this question](http://www.stackoverflow.com/questions/20076511/). — jlhoward, Jan 13 '14 at 20:08

score 9 · Accepted Answer · edited May 23 '17 at 11:48

9

Essentially, data.table doesn't currently support row insert at all, let alone into a keyed table. rbind creates a new data.table so isn't fast or memory efficient.

A similar question is here :

How to delete a row by reference in data.table?

Currently, the typical workflow is to load files from disk using fread and rbindlist them together, or load data from a database using RODBC or similar.

We'd like to add fast row insert, but it isn't done yet.

edited May 23 '17 at 11:48

Community

1
1

answered Jan 13 '14 at 19:19

Matt Dowle

58,872
22
166
224

1

is there a road map for `data.table` someplace? I looked but didn't see one. – BrodieG Jan 13 '14 at 20:27
@BrodieG The closest to a public road map is the [feature request](https://r-forge.r-project.org/tracker/?atid=978&group_id=240&func=browse) list which is ordered by 5 levels of priority (5 is top). – Matt Dowle Jan 13 '14 at 23:56

rbind `data.tables` and preserve key

1 Answers1