7

I have to xts/zoo objects. each has measures of different variables over a different time span. I want to create a single time series comprising all measures at all times, with NAs for missing dates/variable combinations. how do I do that? artificial example:

library(xts)
x<-cbind(a=1:3,b=3:1)
rownames(x) = as.character(Sys.Date()-1:3)

y<-cbind(a=5:7,c=3:1)
rownames(y) = as.character(Sys.Date()-5:7)

xs=as.xts(x)
ys=as.xts(y)

#now what?

#desired outcome looks like:
            a  b   c
2013-03-10  7 NA   1
2013-03-11  6 NA   2
2013-03-12  5 NA   3
2013-03-14  3  1  NA
2013-03-15  2  2  NA
2013-03-16  1  3  NA

# regular merge looks like that (adding an a.1 variable)
merge(xs,ys)
            a  b a.1  c
2013-03-10 NA NA   7  1
2013-03-11 NA NA   6  2
2013-03-12 NA NA   5  3
2013-03-14  3  1  NA NA
2013-03-15  2  2  NA NA
2013-03-16  1  3  NA NA

# simple concatenation ignores variable names and looks like that
c(xs,ys)
           a b
2013-03-10 7 1
2013-03-11 6 2
2013-03-12 5 3
2013-03-14 3 1
2013-03-15 2 2
2013-03-16 1 3

# so what should I do?
amit
  • 3,332
  • 6
  • 24
  • 32
  • what happens if `xs` and `ys` both have value for `a`? – CHP Mar 17 '13 at 09:28
  • for my purposes its ok if xs "wins", and the relevant data point in ys is thrown/ignored. I'm open to other behaviors as well. – amit Mar 17 '13 at 09:48
  • `xts` has a `merge`function that does `inner`, `outer`, `left` and `right` merges. But your problem is not specific to `any` of them. you'll have to *write your own function*. – Arun Mar 17 '13 at 11:30

4 Answers4

4

This is not a general solution. But works for this example:

cbind(rbind(xs[,1],ys[,1]), cbind(xs[,-1],ys[,-1]))
           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

Just remind that cbind.xts is just merge.xts. S you can get the same result using merge

merge(rbind(xs[,1],ys[,1]), merge(xs[,-1],ys[,-1]))
           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

The problem with this solution is that if ys and xs have some dates incommon, you will have duplicated index in your final xts object.For example, if we replace y :

rownames(y) = as.character(Sys.Date()-3:5)

You get , a duplicated index for 2013-03-14, So im anot sure that it s a valid xts object.

merge(rbind(xs[,1],ys[,1]), merge(xs[,-1],ys[,-1]))
           a  b  c
2013-03-12 7 NA  1
2013-03-13 6 NA  2
2013-03-14 3  1  3
2013-03-14 5 NA NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA

EDIT a generalization of the solution:

inter <- intersect(names(ys), names(xs))
diffx <- setdiff(names(xs),inter)
diffy <- setdiff(names(ys),inter)

merge(rbind(xs[,inter],ys[,inter]), merge(xs[,diffx],ys[,diffy]))


           a  b  c
2013-03-10 7 NA  1
2013-03-11 6 NA  2
2013-03-12 5 NA  3
2013-03-14 3  1 NA
2013-03-15 2  2 NA
2013-03-16 1  3 NA
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • well, the real case behind my simple example, is that each time series contains multiple variables (even different number of variables), and their names is something I don't know ahead of time. I look for a simple way for "concatenating" those objects, without specifying the columns or variable names explicitly, so that "missing values" are given NA, and observations of the same variables are "stacked" appropriately – amit Mar 17 '13 at 09:55
  • @amit you can see my update. I tried a generalization. This should work. – agstudy Mar 17 '13 at 10:09
  • Ok. it now looks much better. I still hoped to find something simpler. Thanks. – amit Mar 17 '13 at 10:23
  • mee too. I hope you get something simpler. Maybe if @JoshuaUlrich see this answer , he can propose better solution. – agstudy Mar 17 '13 at 10:25
  • as a simple but realistic case - a dozen (one per each month to get a full year of data). More generally, I might have some rare overlaps, and the xts may have irregular dates. but overall, I try to get a few xts objects that measure different, but overlapping set of variables over different (rarely overlapping) set of dates within one year, into a single object that represent all measures of all variables over that year. – amit Mar 17 '13 at 11:40
2

What you want => merge(data.frame(x,d),data.frame(y,d),by=c("d","a"),all=T)

You should use data.frame not name vectors/matrix, here is a general solution, what you want is just a one liner with a full outer join (look at ?merge)

x<-cbind(a=1:3,b=3:1)
d= as.character(Sys.Date()-1:3)
DT1 = data.frame(x,d)
#DT1
#   a b          d
#1: 1 3 2013-03-16
#2: 2 2 2013-03-15
#3: 3 1 2013-03-14

y<-cbind(a=5:7,c=3:1)
d = as.character(Sys.Date()-5:7)
DT2 = data.frame(y,d)
#DT2
#   a b          d
#1: 1 3 2013-03-12
#2: 2 2 2013-03-11
#3: 3 1 2013-03-10
merge(DT1,DT2,by=c("d","a"),all=T)
#           d a  b  c
#1 2013-03-10 7 NA  1
#2 2013-03-11 6 NA  2
#3 2013-03-12 5 NA  3
#4 2013-03-14 3  1 NA
#5 2013-03-15 2  2 NA
#6 2013-03-16 1  3 NA
statquant
  • 13,672
  • 21
  • 91
  • 162
  • Here the OP is using `xts` object. They are matrix. – agstudy Mar 17 '13 at 10:22
  • All the same... what he wants is a join... that's what data.frame are for – statquant Mar 17 '13 at 10:25
  • thanks. I just found out that for dataframes there is a solution as you did above. the specific "by" parameter is not generic enough, but it works without. the only caveat is that the outcome is not an xts object, and it takes a couple more commands to convert it back to an xts – amit Mar 17 '13 at 10:27
  • The `by` is as generic as it can be, you can specify `by.x` and `by.y` if the names are different and they are dynamic. The outcome is a `data.frame` one line away from any type you want... – statquant Mar 17 '13 at 10:29
1

ok. spent some time thinking on this. because eventually I need to "merge" many such dataframes/xts into one, and not just merging two of them, I thought that it makes sense to do all this in one step: create a big matrix of all date/var combinations. then plugging into this big matrix all the observed data, object by object. the code looks as follows (will be happy to get comments on it, and feel free to use, without any kind of warranty, of course):

alltogether = function(dlist) {
    all.vars = unique(unlist(lapply(dlist,colnames)))
    all.obs = unique(unlist(lapply(dlist,rownames)))    
    res = array(NA,dim=c(length(all.obs),length(all.vars)),
          dimnames=list(all.obs,all.vars))
    for(d in dlist) {
            res[rownames(d),colnames(d)]=d
    }
    return(res)
}

alltogether.xts = function(xlist) {
    dlist = lapply(xlist,as.matrix)
    res = alltogether(dlist)
    xres = as.xts(res)
    return(xres)
}
amit
  • 3,332
  • 6
  • 24
  • 32
0

I would say to convert it to an numerical array (as.numeric(ts)), concatenate it with cbind(ts1,ts2) and then go back to the time series, ts(c(as.numeric(ts1),as.numeric(ts2))