8

Adding a column to an xts object is straightforward if you know the name of the column ahead of time. For example, to add a column named "b":

n <- 5
x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n))
x$b <- rnorm(n)

Adding a dynamically-named column (i.e., a column whose name is known only at runtime) is harder:

new.col.name <- 'c' # known only at runtime
x[, new.col.name] <- rnorm(n) # this generates an error

One approach is to add a column with a temporary name and then rename it:

stopifnot(!('tmp' %in% names(x)))
x$tmp <- rnorm(n)
names(x)[names(x) == 'tmp'] <- new.col.name

Is there a better way to do this? (Also, does assigning to names of an xts object result in a copy of the object being made? So, for example, would the above approach work well if n were very large?)

banbh
  • 1,331
  • 1
  • 13
  • 31

2 Answers2

9

The easiest/clearest thing to do is merge the original object with the new column(s), after you convert the new column(s) to a matrix (so you can set the column name).

set.seed(21)
newData <- rnorm(n)
x1 <- merge(x, matrix(newData, ncol=1, dimnames=list(NULL, new.col.name)))
# another way to do the same thing
dim(newData) <- c(nrow(x), 1)
colnames(newData) <- new.col.name
x2 <- merge(x, newData)

To answer your second question: yes, assigning names (and colnames) on an xts object creates a copy. You can see it does by using tracemem and the output from gc.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2892400>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259260 13.9     592000 31.7   350000 18.7
Vcells 1445207 11.1    4403055 33.6  3445276 26.3
R> colnames(x) <- "hi"
tracemem[0x2892400 -> 0x24c1ad0]: 
tracemem[0x24c1ad0 -> 0x2c62d30]: colnames<- 
tracemem[0x2c62d30 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3033660 -> 0x3403f90]: dimnames<-.xts dimnames<- colnames<- 
tracemem[0x3403f90 -> 0x37d48c0]: colnames<- dimnames<-.xts dimnames<- colnames<- 
tracemem[0x37d48c0 -> 0x3033660]: dimnames<-.xts dimnames<- colnames<- 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  259696 13.9     592000 31.7   350000 18.7
Vcells 1445750 11.1    4403055 33.6  3949359 30.2
R> print(object.size(x), units="Mb")
7.6 Mb

You can see the colnames<- call causes ~4MB of extra memory to be used (the "max used (Mb)" increased by that amount). The entire xts object is ~8MB, half of which is the coredata and the other half is the index. So the 4MB of extra memory used is to copy the coredata.

If you want to avoid the copy, you can set it manually. But be careful, because you could do something that would otherwise be caught by the "checks" in colnames<-.xts.

> R -q  # new R session
R> x <- xts::.xts(1:1e6, 1:1e6)
R> tracemem(x)
[1] "<0x2cc5330>"
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256397 13.7     592000 31.7   350000 18.7
Vcells 1440915 11.0    4397699 33.6  3441761 26.3
R> attr(x, 'dimnames') <- list(NULL, "hi")
tracemem[0x2cc5330 -> 0x28f4a00]: 
R> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  256403 13.7     592000 31.7   350000 18.7
Vcells 1440916 11.0    4397699 33.6  3441761 26.3
R> print(object.size(x), units="Mb")
7.6 Mb
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
1

I believe there is no good alternative, but column names are just an attribute, so are cheap to modify, and no copies will be made. (EDIT: uh-oh, just seen I seem to be saying the opposite to Joshua.--> See discussion in comments. It seems dimnames.xts does more than just set an attribute, and does involve copying the underlying data, so be careful.)

You can also use cbind(), which is a synonym for merge.xts, but (AFAIK) it offers no advantage to the x$b method you showed:

n <- 5
x <- merge(xts(order.by = as.Date('2015-1-1') + 1:n), a = rnorm(n))
x$b <- rnorm(n)
x = cbind(x, c = rnorm(n))
colnames(x)[3] = "real name"

I've also showed one way to change the column name. If you don't know it is the 3rd column, then generic approach is: colnames(x)[length(colnames(x))] = "real name"

Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • As I said in my answer, setting `colnames` does create copies. For example: `x <- .xts(1,1); tracemem(x); colnames(x) <- "hi"` – Joshua Ulrich Oct 09 '15 at 15:19
  • I was editing, while you added that comment @JoshuaUlrich The actual matrix data isn't copied is it? – Darren Cook Oct 09 '15 at 15:20
  • The actual matrix data is copied, sadly. I've added some detail to my answer to demonstrate. – Joshua Ulrich Oct 09 '15 at 15:39
  • @JoshuaUlrich That sounds crazy! I found `setAttrib()` https://github.com/wch/r-source/blob/edc54840351102073c3adf700432d91f2853c389/src/main/attrib.c#L214 but I cannot see anything about copying the underlying data the attributes are attached to. – Darren Cook Oct 09 '15 at 15:40
  • 1
    Note that I said `colnames<-` creates a copy for xts objects. You're correct that setting attributes does not create a copy, but that's not all `colnames<-` does on an xts object (see `dimnames<-.xts`). – Joshua Ulrich Oct 09 '15 at 15:45
  • Thanks @JoshuaUlrich Very educational discussion. I might delete my answer, as I don't think it adds anything now. – Darren Cook Oct 09 '15 at 15:50
  • Well, you do show another way to set the column name. Some might find that useful. – Joshua Ulrich Oct 09 '15 at 15:59