1

I would like to have unique numeric factors as part of an xts, so that over time...each number refers to a specific factor, independent of time.

To give an example, imagine a stock index that changes its constituents every day. We can simulate this if I have the following universe of two letter stock tickers

universe <- apply(as.data.frame(expand.grid(letters,letters)),1,paste0,collapse="")

and each day an index is created that is a random subsample of 20 of the stock tickers from the universe.

subsample.list <- lapply(1:50, function(y){
     sort(sample(universe,20,replace=FALSE))
 })

the key of unique stocks over the 50 days is:

uni.subsample <- sort(unique(unlist(subsample.list)))

I would like to basically be able to see which stocks were in the index each day if i had the xts object and unique factors.

Although it is not meant to be used this way....I was thinking something like:

tmp <- xts(do.call(rbind,subsample.list),Sys.Date()-c(50:1))

to create the xts.

however I would like to covert the coredata into a numeric matrix, where each number is the ticker from uni.subsample

so if tmp.adjusted['20130716'][1,] would be the numeric vector of numbers of length 20 that represents the numerical values of uni.subsample for the 16th July 2013, so I would expect that I would be able to get all of 2013-07-16's index members by using the xts objecting the following way uni.subsample[tmp.adjusted['20130716'][1,]]...i.e. the adjustment from tmp to tmp.adjusted converts the strings into factors, with unique levels associated with uni.subsample

I hope this makes sense...its kinda hard to explain....

h.l.m
  • 13,015
  • 22
  • 82
  • 169

2 Answers2

3

Here a vectorized solution:

   tmp.int <- xts(matrix(as.integer(factor(tmp,levels=uni.subsample,ordered=TRUE)),
          ncol=ncol(tmp)),index(tmp))

You are basically trying to code a matrix of ordered factor by their levels order.

EDIT adding some benchmarking :

set.seed(1233)
N <- 5000
subsample.list <- lapply(seq(N), function(y){
  sort(sample(universe,20,replace=FALSE))
})
uni.subsample <- sort(unique(unlist(subsample.list)))
tmp <- xts(do.call(rbind,subsample.list),Sys.Date()-seq(N))
ag <- function() xts(matrix(as.integer(factor(tmp,levels=uni.subsample,ordered=TRUE)),
                      ncol=ncol(tmp)),index(tmp))
no <- function()xts(apply(X=tmp, 
                          MARGIN=c(1,2), function(x) which(uni.subsample == x)),
               index(tmp))
library(microbenchmark)
microbenchmark(ag(),no(),times=1)

## N = 50 ag 24 faster
microbenchmark(ag(),no(),times=1)
Unit: milliseconds
 expr       min        lq    median        uq       max neval
 ag()  1.126405  1.126405  1.126405  1.126405  1.126405     1
## N = 500 ag 135 fatser
microbenchmark(ag(),no(),times=10)
Unit: milliseconds
 expr        min         lq     median         uq        max neval
 ag()   23.38484   26.19744   31.13428   35.51057   44.96251    10
 no() 3115.24902 3220.04940 3250.63773 3288.66867 3470.35053    10
     no() 24.000003 24.000003 24.000003 24.000003 24.000003     1
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

How about:

tmp.int <- xts(apply(X=tmp, MARGIN=c(1,2), function(x) which(uni.subsample == x)),
    index(tmp))

# to perform the lookup (e.g., 'find the name of the first value on May 27, 2013'):
uni.subsample[tmp.int['2013-05-27'][,1]]
Noah
  • 1,404
  • 8
  • 12