I have a multi-column xts object, which goes to second accuracy. I then have another xts object which contains one value for each day. I'd like to add that daily value as a column in the main xts object. Here is an example:
Sys.setenv(TZ = "UTC")
library(xts)
set.seed(777)
xt = xts( data.frame(A=1:20,B=201:220,C=round(runif(20)*10,1)),
order.by = as.POSIXct("2015-06-21") + (runif(20) * 86400 * 14) )
xd = xts( round(runif(14) - 0.5,1), as.Date("2015-06-21") + (1:14))
Using merge
does not work: the xd
entries get given a "00:00:00" timestamp, and therefore none of them match, so I get an xts object with lots of NAs:
A B C xd
2015-06-21 10:04:36 5 205 7.0 NA
2015-06-22 00:00:00 NA NA NA -0.5
2015-06-23 00:00:00 NA NA NA -0.2
2015-06-23 11:42:38 4 204 10.0 NA
2015-06-24 00:00:00 NA NA NA 0.1
...
Expected result:
A B C xd
2015-06-21 10:04:36 5 205 7.0 NA
2015-06-23 11:42:38 4 204 10.0 -0.2
2015-06-24 21:16:18 18 218 8.7 0.1
2015-06-25 02:30:24 15 215 8.7 -0.2
2015-06-25 07:48:42 16 216 1.0 -0.2
2015-06-25 15:04:34 14 214 5.9 -0.2
2015-06-26 07:50:09 1 201 6.9 -0.3
2015-06-27 19:28:33 7 207 3.5 0.5
...
Points:
- Real data will be much larger than this example, so excessive use of memory and CPU is to be avoided.
- As shown above, NAs or missing dates are possible in xd (though relatively rare).
- There are some days not represented in
xt
(as shown by the missing 2015-06-22 above). I don't want an entry created for such days. (I suppose I could usena.omit
to remove them, but it is possible - and not shown in my simple example - that I have some real NAs in the data that I don't want removed.)
UPDATE: As an example of NA
s in the original data getting lost, consider xt[10,'B'] <- NA
. Using Joshua's merge(xt, xd, fill=na.locf)[index(xt)]
solution, 2015-06-28 19:41:45 ends up as 8 203 1.7 0.4
when it should be 8 NA 1.7 0.4
.
Whether that is a problem or not will depend on what xt
will be used for next. FXQuantTrader's answer shows a workaround to preserve NA
s using a magic number, which is turned back into an NA
at the end. One alternative (that uses more memory) is to take a copy of any columns containing NAs, and replace the whole column afterwards.