5

I have a dataframe DF, with two columns A and B shown below:

A                    B                  
1                    0             
3                    0               
4                    0                   
2                    1                    
6                    0                    
4                    1                     
7                    1                 
8                    1                     
1                    0   

A sliding window approach is performed as shown below. The mean is calulated for column B in a sliding window of size 3 sliding by 1 using: rollapply(DF$B, width=3,by=1). The mean values for each window are shown on the left side.

    A:         1    3    4    2    6    4    7    8    1                                          
    B:         0    0    0    1    0    1    1    1    0                                
              [0    0    0]                                              0
                    [0    0    1]                                        0.33
                          [0    1    0]                                  0.33
                                [1    0    1]                            0.66
                                      [0    1    1]                      0.66
                                            [1    1    1]                1
                                                 [1    1    0]           0.66
output:        0   0.33 0.33 0.66   0.66    1     1    1   0.66

Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'.

I need to obtain the output as shown above. The output should like:

A                   B                  Output   
1                   0                      0
3                   0                      0.33
4                   0                      0.33
2                   1                      0.66
6                   0                      0.66
4                   1                      1
7                   1                      1
8                   1                      1
1                   0                    0.66

Any help in R?

Prradep
  • 5,506
  • 5
  • 43
  • 84
chas
  • 1,565
  • 5
  • 26
  • 54
  • (+1) Now I understand the question. Let me try to see if I can figure something out. Just one more thing. I think you lost the final output "mean_A" in this edit. Could you add it as well? Thanks. – Arun Apr 11 '13 at 11:09
  • @Arun Now i have added Mean_A. – chas Apr 11 '13 at 11:27
  • is `A` always a sequence 1:N? I don't see how the values in `A` matter to your calculation. It's pretty much `rollmax(rollmean(B,3),3)` so far as I understand it. – Carl Witthoft Apr 11 '13 at 11:39
  • @CarlWitthoft, not quite. user1779730, check my answer. – Arun Apr 11 '13 at 12:02
  • @CarlWitthoft, Hope the reframed question help to understand the problem – chas Apr 11 '13 at 13:03
  • Sorry, I don't have the patience to *re-read* your question. And it's not recommended to completely re-write the question. In these cases, it's recommended to mark the answer for that question and ask a separate new question. – Arun Apr 11 '13 at 13:08
  • More information, also on performance issues, can be found at http://stats.stackexchange.com/questions/3051/mean-of-a-sliding-window-in-r – Jasper Nov 17 '14 at 11:38

2 Answers2

6

Try this:

# form input data
library(zoo)
B <- c(0, 0, 0, 1, 0, 1, 1, 1, 0)

# calculate
k <- 3
rollapply(B, 2*k-1, function(x) max(rollmean(x, k)), partial = TRUE)

The last line returns:

[1] 0.0000000 0.3333333 0.3333333 0.6666667 0.6666667 1.0000000 1.0000000
[8] 1.0000000 0.6666667

If there are NA values you might want to try this:

k <- 3
B <- c(1, 0, 1, 0, NA, 1)
rollapply(B, 2*k-1, function(x) max(rollapply(x, k, mean, na.rm = TRUE)), partial = TRUE)

where the last line gives this:

[1] 0.6666667 0.6666667 0.6666667 0.5000000 0.5000000 0.5000000

Expanding it out these are formed as:

c(mean(B[1:3], na.rm = TRUE), ##
max(mean(B[1:3], na.rm = TRUE), mean(B[2:4], na.rm = TRUE)), ##
max(mean(B[1:3], na.rm = TRUE), mean(B[2:4], na.rm = TRUE), mean(B[3:5], na.rm = TRUE)),
max(mean(B[2:4], na.rm = TRUE), mean(B[3:5], na.rm = TRUE), mean(B[4:6], na.rm = TRUE)),
max(mean(B[3:5], na.rm = TRUE), mean(B[4:6], na.rm = TRUE)), ##
mean(B[4:6], na.rm = TRUE)) ##

If you don't want the k-1 components at each end (marked with ## above) drop partial = TRUE.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • There -- I knew someone would formulate my comment above correctly :-) – Carl Witthoft Apr 11 '13 at 15:10
  • @G.Grothendieck Thanks. What is 5 in the rollapply function? – chas Apr 11 '13 at 16:12
  • @G.Grothendieck Based on what approximation the width is set to 5? This is just a sample data. Real data has a window size of 5000 that slides by 1.In this case how would we determine the width of the window? – chas Apr 11 '13 at 16:34
  • @G.Grothendieck thanks a lot for very simple and effective solution. Now it seems we can simulate for any width. One more query, I initially used rollapply(DF$B, width=3,by=1) to calculate the mean of window size 3 sliding by=1 position. But in your solution, there is nothing about the sliding by='' parameter. Can i assume it calculates the mean in the same manner sliding by 1 position? – chas Apr 11 '13 at 16:46
  • There are two length 5 windows. I have deleted my comments since there were getting to be too many and have added some additional info at the end of the answer. – G. Grothendieck Apr 13 '13 at 19:22
  • @G.Grothendieck. Hi, i have used the function rollapply(x, 2*k-1, function(x) max(rollmean(x, k)), partial = TRUE)->output with k=5000. The first few values of the output were 0.2730,0.2732,0.2732,0.2734,0.2734... I tried to cross-check by calculating mean(x[1:5000]) which is 0.3538889. But the result from the rollapply function starts from 0.2730. Could you please explain the reason for mis-match of the results? – chas Apr 18 '13 at 20:56
  • Try it with a smaller set of data to verify your understanding. I have shown the correspondence in the case of example data at the end of the post above. Also note that `rollmean` is not equivalent to `rollapply(..., mean)` in all cases. `rollmean` does not support `na.rm` and also the sliding window size must be odd for `rollmean` as mentioned in `?rollmean`. These restrictions were made for sake of speed. – G. Grothendieck Apr 18 '13 at 23:28
0

The R library TTR has a number of functions for calculating averages over sliding windows

SMA = simple moving average

data$sma <- SMA(data$B, 3)

More documentation is here http://cran.r-project.org/web/packages/TTR/TTR.pdf

Cyanophage
  • 45
  • 4