0

I would like to create a sliding window where the start of the window is anchored and the end of the window grows in increments of one unit. So in the data frame below the start of the window would stay at 0.10 and the other end would move over 0.08, 0.15 and so on implementing a function each time it moves over column Speed. If the criteria of the function is not met then the end of the window keeps moving. Once the criteria is met I would like some output in a second column Out to coincide with all the elements previous in that whole window.

Once the criteria are met then the window terminates before anchoring again at the end of the last window and starting again, growing one unit at a time.

For example, with this data frame the criteria could be that the mean of the window is greater than 0.1 before starting again so:

mean(c(0.10, 0.08)) = 0.09 - criteria not met
mean(c(0.10, 0.08, 0.15)) = 0.11 - criteria met so all previous elements are labelled 'A'

Next:

mean(c(0.13, 0.14)) = 0.14 - criteria met so all previous elements are labelled 'B'

Next:

mean(c(0.08, 0.10)) = 0.09 - criteria not met
mean(c(0.08, 0.10, 0.07)) = 0.08 - criteria not met
mean(c(0.08, 0.10, 0.07, 0.15)) = 0.1 - criteria met so all previous elements are labelled 'C'


Speed  Out  
0.10    A    
0.08    A
0.15    A
0.13    B
0.14    B
0.08    C
0.10    C
0.07    C
0.15    C

I have already tried modifying the solutions in THIS Cross Validated post (answer by @mbq and also by @r_evolutionist without luck. Also I have tied using rollapply in the package zoo but I feel this requires a homemade function.

PharmR
  • 260
  • 1
  • 3
  • 12
  • 1
    It's not clear to me what you're asking. Could you provide the example desired output? – Hack-R Apr 06 '18 at 15:02
  • Thanks. I have edited the post so hopefully it's clearer now. – PharmR Apr 06 '18 at 15:08
  • `cumsum`, and some other cumulative functions are in `base`, `dplyr` has `cummean` (or [it's quite easy to implement if you don't want the dependency](https://stackoverflow.com/q/11074665/903061)). You can do the full cummean, then iteratively look for window breaks. I don't think you'll be able to do a fully vectorized solution for finding the window breaks, so if you need efficiency you could code your own version in Rcpp (the Rcpp implementation of cummean in my link should get you started). – Gregor Thomas Apr 06 '18 at 15:14
  • Thanks Gregor. I actually want to implement a different function to the window and not the mean but I thought this could be an easier example to demonstrate. Could a for loop do this? – PharmR Apr 06 '18 at 15:18

1 Answers1

1

1) First define a cummean function. Then using Speed defined in the Note at the end define st to return the start index for calculating the mean to the jth element of Speed where i is the start index for the prior element of Speed. Then use Reduce to apply st to 1:n where Speed has n elements. That gives a grouping variable g such that we apply cummean of Speed separately to each subset of Speed having common elements in g.

cummean <- function(x) cumsum(x) / seq_along(x)

st <- function(i, j) if (mean(Speed[i:j]) > 0.1) j+1 else i
g <- Reduce(st, seq_along(Speed), acc = TRUE)

ave(Speed, g, FUN = cummean)
## [1] 0.1000000 0.0900000 0.1500000 0.1300000 0.1400000 0.1150000 0.1033333
## [8] 0.0950000 0.0960000

The value of g produced is

g
## [1] 1 1 4 5 6 6 6 6 6

2) A different way of constructing g is to recognize that this can be cast as the set partitioning problem of integer linear programming where the components of the partition must be contiguous and have mean > 0.1 . Append Inf to the end of Speed and take its length, n. Then find all combinations of two elements of 0:n and if a zero appears replace it with the other element. Convert that to a zero one vector and then only keep those for which the mean > 0.1 giving const.mat. The right hand side is all ones as is the objective function. At the end we convert the 0-1 solution vector to g. Note that the actual values in g do not matter except for which positions have equal values.

library(lpSolve)

n <- length(Speed)+1
f <- function(x) {
  if (x[1] == 0) x[1] <- x[2]
  replace(numeric(n), x[1]:x[2], 1)
}
const.mat <- combn(0:n, 2, f)
ok <- apply(const.mat, 2, function(x) mean(c(Speed, Inf)[x == 1]) > .1)
const.mat <- const.mat[, ok]
const.rhs <- rep(1, nrow(const.mat))
obj <- rep(1, ncol(const.mat))
result <- lp("max", obj, const.mat, "=", const.rhs, all.bin = TRUE)
result
result$solution
g <- rowSums(const.mat[, result$solution == 1] %*% diag(1:result$objval))[-n]
g
## [1] 2 2 2 1 3 3 3 4 4

Now use g with the ave statement in (1).

Notes

  1. Note that cummean could be implemented as:

    library(zoo)
    cummean <- function(x) rollapplyr(x, seq_along(x), mean)
    

    which has the advantage that it is easy to replace mean with some other function.

  2. The input used above is:

    Speed <- c(0.1, 0.08, 0.15, 0.13, 0.14, 0.09, 0.08, 0.07, 0.1)
    
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks for this @G.Grothendieck - I have edited the post so that I make more sense hopefully - is this what your function does essentially? – PharmR Apr 07 '18 at 12:46
  • The two solutions each calculate `g` which groups the points such that each group has mean > 0.1 and then performs a `cummean` on each group. For example, the first three elements of `g` are the same so they form the first group. The 4th element of `g` is unique so it forms the next group, etc. As I read it this does answer the question. – G. Grothendieck Apr 07 '18 at 13:25
  • Continued. The `g` in the answer here is your `Out` except I think there is an error in `Out` in the question. The two points labelled `B` should be divided into two groups since each is > 0.1 – G. Grothendieck Apr 07 '18 at 13:35
  • Ah yes I realise now. I think I can modify to suit my needs. Thanks very much! – PharmR Apr 08 '18 at 09:09