1

I have a large xts object and want to subset the seconds in the time column, but only if there is a sequence of minimum 5 consecutive seconds. I have up to 8 data points per second (which shouldn't be counted as 5 consecutive points as they are measured within the same second).

And_sub_xts is my xts object

> str(And_sub_xts)
An ‘xts’ object on 2010-04-09 20:32:56/2010-04-26 06:56:57 containing:
 Data: chr [1:164421, 1:11] "0.255416" "0.168836" "0.212126" "0.229442" "0.238100" "0.212126" "0.168836" ...
- attr(*, "dimnames")=List of 2
 ..$ : NULL
 ..$ : chr [1:11] "CalSurge" "CalSway" "CalHeave" "Stat_Surge" ...
 Indexed by objects of class: [POSIXct,POSIXt] TZ: 
 xts Attributes:  
NULL

and the first 100 values for

abs(diff(.indexsec(And_sub_xts)) are

56 8 23 34 40 40 41 42 25 27 34 35 38 38 40 40 41 56 59 59 19 19 20 20 20 20 22 22 23 23 24 24 24 25 25 26 27 27 27 27 27 28 28 30 30 30 37 38 40 40 41 44 44 46 46 47 48 51 52 54 54 54 54 55 56 59 1 4 4 4 6 6 6 6 7 7 11 12 12 14 14 15 16 16 17 18 18 19 19 21 21 22 22 23 23 25 25 26 26 26

I marked the keeps in bold, so the subset should just consist of these data points.

I just realize that theorethically it could happen that there are some data points distributed like this

2010-04-09 20:32:20
2010-04-09 20:32:20
2010-04-09 20:32:21
2010-04-09 20:32:22
2010-04-09 20:32:22
2010-04-09 20:40:22
2010-04-09 22:52:23
2010-04-10 20:52:24

which wouldn't be 5 consecutive seconds, but you can't account for this with the .indexsec command - maybe anybody knows a way to go around this.

Thanks for your help!

Pat
  • 217
  • 1
  • 6

1 Answers1

1

Here's one way to do it. x is sample data that contains index values with seconds equal to your first 100 values.

require(xts)
# sample data
s <- c(56, 8, 23, 34, 40, 40, 41, 42, 25, 27, 34, 35, 38, 38, 40, 
40, 41, 56, 59, 59, 19, 19, 20, 20, 20, 20, 22, 22, 23, 23, 24, 
24, 24, 25, 25, 26, 27, 27, 27, 27, 27, 28, 28, 30, 30, 30, 37, 
38, 40, 40, 41, 44, 44, 46, 46, 47, 48, 51, 52, 54, 54, 54, 54, 
55, 56, 59, 1, 4, 4, 4, 6, 6, 6, 6, 7, 7, 11, 12, 12, 14, 14, 
15, 16, 16, 17, 18, 18, 19, 19, 21, 21, 22, 22, 23, 23, 25, 25, 
26, 26, 26)
S <- cumsum(ifelse(c(0, diff(s)) < 0, 1, 0)) * 60 + s
x <- .xts(seq_along(S), S, tzone="UTC")

The basic idea is to aggregate your data to 1-second resolution, so you can use rle (run-length encoding) to find the consecutive 5-second observations. Then find the first and last timestamps of the sets of 5-second observations in your aggregated data, and then find the locations of those timestamps in your original data. Finally, use the locations of the timestamps in your original data to create sets of sequences you can use to subset the consecutive 5-second groups of observations.

# aggregate data to 1-second resolution
oneSec <- period.apply(x, endpoints(x, 'seconds'), identity) 
# find the runs of 5 or more consecutive one-second increments
consec <- rle(diff(.index(oneSec)))
gte5s <- consec$lengths >= 5
# get the location of the first obs of the run in the 1-second data
begLoc <- cumsum(c(1,consec$lengths))[gte5s]
endLoc <- begLoc + consec$lengths[gte5s]
# get the timestamp of the first and last obs from the original data
beg <- lapply(index(oneSec)[begLoc], function(i) first(x[i, which.i=TRUE]))
end <- lapply(index(oneSec)[endLoc], function(i) last(x[i, which.i=TRUE]))
# create index vector between each value in 'beg' and 'end'
loc <- unlist(mapply(seq, beg, end))
# subset original object using index vector
X <- x[loc,]
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Hi, I've run the code now on two different xts objects (both have the same str as above). One works fine, the other one gives me the following Warning (followed by an Error later) `Warning message: In begLoc + consec$lengths[gte5s] : longer object length is not a multiple of shorter object length` . Do you have an idea why that could be? I compared their structure and apart from the length it is really the same... – Pat Apr 23 '15 at 21:16
  • It's hard to say without actual data, but my guess is that there are no consecutive 5 second data points. – Joshua Ulrich Apr 23 '15 at 22:19