I have an unbalanced panel like the following example:
test <- read.table(
text = "
A 2010-01-01 1 rdm
A 2010-01-10 2 dfg
A 2010-01-14 3 fdgfd
A 2010-02-15 4 fdgfd
A 2010-08-17 5 dg
A 2010-12-19 6 dfg
B 2009-01-01 1 dfg
B 2010-01-01 2 ydg
B 2010-01-10 3 fdgfd
B 2010-01-14 4 dfg
B 2010-02-15 5 dfg
",header=F)
library(data.table)
setDT(test)
names(test) <- c("ID", "date", "nr", "namecol")
I would like to balance it with regard to date, i.e. every individual (A, B, etc.) has NA rows for the dates where there is no data. I do not know the minimum date per group or the minimum date across groups to begin with. Same with the maximum, but maybe it is faster to just choose a maximum which equals a certain date (as compared to calculating it across groups). The desired output is:
out <- read.table(
text = "
A 2009-01-01 NA NA
A 2010-01-01 1 rdm
A 2010-01-10 2 dfg
A 2010-01-14 3 fdgfd
A 2010-02-15 4 fdgfd
A 2010-08-17 5 dg
A 2010-12-19 6 dfg
B 2009-01-01 1 dfg
B 2010-01-01 2 ydg
B 2010-01-10 3 fdgfd
B 2010-01-14 4 dfg
B 2010-02-15 5 dfg
B 2010-08-17 NA NA
B 2010-12-19 NA NA
",header=F)
setDT(out)
names(out) <- c("ID", "date", "nr", "namecol")
My data set is very large, so I believe it would be best to do this in data.table
(or plyr
, reshape2
) or something similar suitable.