I have two data tables. One, let's call it procedures
, contains the procedures done to a patient, and has a patient pseudonym in each row.
Pseudonym procedure
AAA CT scan
AAA glucose infusion
AAB abdominal surgery
The second one, a lookup table called allo.dt
, assigns each patient pseudonym a special date, such as
pseudonym erstes.tx.datum
AAA 2014-01-10
AAB 2018-08-17
What I want is to add a column to the first data table, where in each row, I have the value of "erstes.tx.datum" from the second one.
Pseudonym procedure tx.datum
AAA CT scan 2014-01-10
AAA glucose infusion 2014-01-10
AAB abdominal surgery 2018-08-17
I am very new to data tables and failed to do it with a join.
setkey(procedures, Pseudonym)
setkey(allo.dt, pseudonym)
b <- merge(a, allo.dt, all.x=TRUE)
> A non-empty vector of column names for `by` is required.
This was surprising, when searching for doing a left join on data.table I didn't find any examples that needed a "by". I suspect that maybe the problem is that the pseudonym is obviously not the key in the procedures table, since it is not unique, but this is not something I can change.
Instead, I tried to use sapply
. There I had two problems. First, the performance was terrible. Second, a naive access to the date gave me back a table cell instead of the value inside it, which led to a coercion of my date to an integer.
class(allo.dt[allo.dt$pseudonym=="AAA","erstes.tx.datum"])
[1] "data.table" "data.frame"
I found an ugly workaround by turning that table cell into a single-element matrix, but it is practically illegible, and the performance is even worse. The whole solution I have is
procedures$erstes.tx.datum -> unlist(sapply(ops.procedures$Pseudonym, function(x) {as.matrix(allo.dt[allo.dt$pseudonym==x,"erstes.tx.datum"])[1,1]}))
I am pretty sure there must be a much easier solution here, because that's what data.table is for, but I'm stuck. What would be a good way to do this?
I am aware that this has been asked in Merging a data frame and a lookup table in r, retaining all records from data frame, but there the OP uses data frames and the answers suggest using plyr
. I am looking specifically for a solution based on data.table
or base r, without using plyr
.