9
input = cbind(c(3,7,3,5,2,9,1,4,6,4,7,3,7,4))
library(zoo)
output = cbind(rollmean(input,4))
print(input)
print(output)

output:

      [,1]
 [1,]    3
 [2,]    7
 [3,]    3
 [4,]    5
 [5,]    2
 [6,]    9
 [7,]    1
 [8,]    4
 [9,]    6
[10,]    4
[11,]    7
[12,]    3
[13,]    7
[14,]    4
      [,1]
 [1,] 4.50
 [2,] 4.25
 [3,] 4.75
 [4,] 4.25
 [5,] 4.00
 [6,] 5.00
 [7,] 3.75
 [8,] 5.25
 [9,] 5.00
[10,] 5.25
[11,] 5.25

but when I try to cbind it:

Error in cbind(input, output) :
  number of rows of matrices must match (see arg 2)
Calls: print -> cbind
Execution halted

I'd like to use a function that would be smart enough and do not give up if it doesn't get data on both ends of a vector and calculating output then according to only the data it is having. so for example in input[1] it will calculate only mean from right

Felipe Dalla Lana
  • 615
  • 1
  • 5
  • 12
rsk82
  • 28,217
  • 50
  • 150
  • 240
  • I changed title because it was not said clear that if rollmean function do not have parameter to calculate boundary cases then maybe is there something similar in other packages. – rsk82 Dec 12 '10 at 15:53
  • 3
    This was already addressed in my answer to your prior question: http://stackoverflow.com/questions/4418643/two-sided-moving-average – G. Grothendieck Dec 12 '10 at 15:56
  • The changed title still doesn't make clear what question you want answered. Your question is concerned with the length of the output, except for the last paragraph where you mention that you'd like a function that would handle the boundary cases. – Joshua Ulrich Dec 12 '10 at 16:01

3 Answers3

14

Although this is an old question, for anyone reading this, hope it helps.

Using rollapply with function mean, and partial = TRUE will keep the initial values where function cannot be calculated.

x <- rollapply(input, width = 5, FUN = mean, align = "centre", partial = TRUE)

??rollapply 
??rollapplyr # for right aligned moving average
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
Anusha
  • 1,716
  • 2
  • 23
  • 27
14

Look at the na.pad argument to rollmean(), and set it to TRUE. Missed the last bit; so you need also to align the means to the right:

> input <- c(3,7,3,5,2,9,1,4,6,4,7,3,7,4)
> rollmean(input, 4, na.pad = TRUE, align = "right")
 [1]   NA   NA   NA 4.50 4.25 4.75 4.25 4.00 5.00 3.75 5.25 5.00 5.25 5.25

Unless you need these things as 1-column matrices, drop the cbind() calls.

OK, from further clarifications it appears you want to compute some means that aren't really comparable to the other means in the result vector. But if you must...

> k <- 4
> c( cumsum(input[1:(k-1)]) / 1:(k-1), rollmean(input, k, align = "right") )
 [1] 3.000000 5.000000 4.333333 4.500000 4.250000 4.750000 4.250000 4.000000
 [9] 5.000000 3.750000 5.250000 5.000000 5.250000 5.250000

As the OP is interested in estimating the MA to then fit a spline to it, it might be instructive to see what one gains by doing this instead of estimating the spline directly from the data.

> ## model observed data
> mod <- smooth.spline(seq_along(input), input, df = 3)
> ## plot data and fitted spline
> plot(seq_along(input), input)
> lines(predict(mod, seq_along(input)), col = "red", lwd = 2)
> ## model the fudged MA
> mod2 <- smooth.spline(seq_along(input),
+                       c( cumsum(input[1:(k-1)]) / 1:(k-1),
+                         rollmean(input, k, align = "right") ), df = 3)
> ## add this estimated spline
> lines(predict(mod2, seq_along(input)), col = "blue", lwd = 2)

You'd be hard pushed to tell the difference between these two Comparison of direct smooth and smooth of MA

and the curves deviate most at the beginning where you are forcing estimation of the MA.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • i know na.pad but it gives what it name says, it is padding vector with na's. And that is not what I'm looking for, I want this function to really calculate mean for these boundary cases, as I wrote in the first post. – rsk82 Dec 12 '10 at 15:46
  • @user393087; sorry, but that was not clear. I've updated my answer to address the output you wanted. I wonder why you want to compute your means like this? The first 3 means are not the same "thing" as the other means computed... – Gavin Simpson Dec 12 '10 at 16:27
  • Because I need to do spline on it snd all data should be calculated in a way to possibly supress in greatest extent possible the errors that are from the missing data. So I think it wouldn't be simbple mean to perform that. – rsk82 Dec 12 '10 at 16:50
  • @user393087; the moving average (MA) is a simplistic way of estimating the underlying level or expectation of a series of data. When you model with a spline (or something similar) you are attempting to estimate the underlying level or expectation also. Your approach is to smooth the data twice and fudge the ends because the MA is not defined. Why not estimate the spline directly from the data? – Gavin Simpson Dec 12 '10 at 17:05
  • @user393087; I've updated my answer to discuss smoothing the MA or the raw data. I find it somewhat ironic that you are trying to minimise bias in the fitted spline by fudging the MA not to provide missing data. – Gavin Simpson Dec 12 '10 at 17:14
2

So far the question has been seen as ambiguous by three experience R coders, but it seems that you do want some sort of extrapolated value for the missing means. Whether you wanted the imputed values at the beginning or the end remains unclear. This code will return a right-aligned vector and replace the beginning NA's with the first not-NA value. There would also be the na.locf function in zoo if you wanted to work with left-aligned rollmeans.

long.roll <- function(input, k) { rtroll <-  
                           rollmean(input, k, align="right", na.pad=TRUE)
                return(c(rep(rtroll[k], k-1), rtroll[-(1:(k-1))]) ) }
long.roll(input,4)
#  [1] 4.50 4.50 4.50 4.50 4.25 4.75 4.25 4.00 5.00 3.75 5.25 5.00 5.25
# [14] 5.25
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • As Gabor said in his answer to a previous question by the same user, the development version of `zoo::rollapply` already does what the questioner wants. Nice effort though. ;-) – Joshua Ulrich Dec 12 '10 at 16:13
  • Agreed. I hadn't seen your comment because I was at the time already looking back at Gabor's reply, noting the "coming attraction" of a partial argument and upvoting his answer along with chiding user393087 for not recognizing a high quality answer. – IRTFM Dec 12 '10 at 16:30
  • & @Joshua `long.roll` and the development `zoo::rollapply` do different things with the "extrapolation". `long.roll` replicates the first valid mean into the `NA`, whilst `zoo::rollapply` with `partial = TRUE` computes the means using only the available data. The latter can be achieved via a trivial `cumsum` for the first `k-1` observations until the new `rollapply` hits CRAN. – Gavin Simpson Dec 12 '10 at 16:33
  • The answer by G. Grothendieck: http://stackoverflow.com/questions/4418643/two-sided-moving-average/4422120#4422120 - but as his answer involved DEV version, this is a good answer too but the output is differen from that by DEV rollmean: 4.333333 4.5 4.25 4.75 4.25 4 5 3.75 5.25 5 5.25 5.25 4.666667 5.5 – rsk82 Dec 12 '10 at 16:56
  • It is different. The difference comes from ambiguity in specifying what is wanted at the ends. I'm guessing that Gabor's new version will "slide" the mean-window in from the right and out to the left, so with an odd k argument there will not be a need to specify a right or a left alignment. The [1] value will be mean(x[1:((k-1)/2) and the [length(x)] value will be mean(x[(length(x)-1/2):length(x)]. – IRTFM Dec 13 '10 at 01:36