1

I want to split my large xts object in a list of regular one second periods containing all the observations of the original objects. The goal is to send each list element to nodes on my cluster for processing.

I came up with this solution, which is quite elaborate. I'm wondering if this code can be simplified:

library(xts)
set.seed(123)
myts = xts(1:10000, as.POSIXlt(1366039619, ts="EST", origin="1970-01-01") + rnorm(10000, 1, 100))

# insure we have at least one observation per second
secs = seq(trunc(index(head(myts, 1))), trunc(index(tail(myts, 1))), by="s")

# generate second periods endpoints
myts = merge(myts, secs, fill=na.locf)
myts.aligned = align.time(myts, 1)
myts.ep = endpoints(myts.aligned, "seconds", 1)

# split large xts object in list of second periods
myts.list = lapply(1:(length(myts.ep)-1), function(x, myts, ep) { myts[ep[x]:ep[x+1],] }, myts, myts.ep)

# call to parLapply here...
Robert Kubrick
  • 8,413
  • 13
  • 59
  • 91

1 Answers1

2

I think this does what you want:

split(myts, "secs")

It will create a list where each component is 1 second of non-overlapping data.

See ?split.xts

GSee
  • 48,880
  • 13
  • 125
  • 145
  • That's what I thought too, but `my.list` contains overlapping/duplicate observations. – Joshua Ulrich May 07 '13 at 15:45
  • @JoshuaUlrich That's true, I need the last observation from the previous second to generate a beginning of period value at the turn of each second, but that's the only reason why I have that overlapping point. split() with a fill parameter would be perfect. – Robert Kubrick May 07 '13 at 15:54
  • It looks like I can replace my call to lapply() with split() but I still have to keep the code to generate the turn of each second observation. – Robert Kubrick May 07 '13 at 15:59