I have a longitudinal dataset in a data table, similar to the simplified example below:
> head(data)
Country ID Date Value
1: AT AT6306 2012-11-01 16.2
2: AT AT6306 2012-11-02 12.2
3: AT AT6306 2012-11-03 11.3
4: AT AT6306 2012-11-04 14.2
5: AT AT6306 2012-11-05 17.3
6: AT AT6306 2012-11-06 12.5
> tail(data)
Country ID Date Value
1: SE SE0935 2014-06-25 16.2
2: SE SE0935 2014-06-26 12.2
3: SE SE0935 2014-06-27 11.3
4: SE SE0935 2014-06-28 14.2
5: SE SE0935 2014-06-29 17.3
6: SE SE0935 2014-06-30 12.5
ID
is the panel variable, it is entirely unique without overlaps between countries. The date range, looking only at unique values, ranges from 2012-10-23
to 2014-09-30
. Clearly, the ranges of Date
are not identical for each ID
. Moreover, there may be missing values. In order to have a balanced panel, I want to fill in the gaps of my dataset.
Adapting the answer here, as suggsted by @akron, I do the following:
data2 <- data[, CJ(ID=unique(ID), Date=unique(Date))]
setkey(data2, ID, Date)
data.new <- merge(data, data2, by=c("ID", "Date"), all.y = TRUE)
setkey(data.new, ID, Date)
Using the option all.y = TRUE
, R thus adds rows for every missing date in data
. However, now all fields other than ID
and Date
are blank, if the row did not exist before in data
. That is, my data looks something like this
> head(data.new)
Country ID Date Value
1: NA AT6306 2012-10-23 NA
2: NA AT6306 2012-10-24 NA
3: NA AT6306 2012-10-25 NA
4: NA AT6306 2012-10-26 NA
5: NA AT6306 2012-10-27 NA
6: NA AT6306 2012-10-28 NA
I do want the Value
to be NA, since it is missing. However, since the Country
does not change for a given ID
, I would like the field to be filled.