1

I am trying to plot the a time series with its corresponding 9 year running mean. I am using the rollapply function from the "zoo" package.

I don't know why the "running mean" time series is not aligned properly even though I change the "align" in the function.

The time series is from 1969 to 2009

Here's the data that I am using:

structure(list(Year = 1961:2009, Rain = c(7.6656130268, 8.1981182796, 
14.4514275121, 13.1530337942, 9.2569892473, 14.1592933948, 10.8212829069, 
3.2401689708, 14.5850998464, 9.614093702, 13.1677048572, 4.7452764977, 
20.7346774194, 9.3896697389, 21.9528735632, 22.5482334869, 6.0696620584, 
7.100640041, 4.706154987, 7.9103302611, 9.9548387097, 8.0649001536, 
6.2932888395, 3.8337173579, 23.5, 2.4107142857, 14.7172784575, 
9.7700076805, 7.6785330261, 7.5453917051, 8.8073044123, 7.7576420891, 
17.0896697389, 10.2380952381, 19.1981460882, 7.0900537634, 5.0630184332, 
22.1928955453, 17.3850945495, 14.71593702, 12.7344086022, 6.0408602151, 
8.0338524286, 7.1766513057, 21.8706989247, 10.6695852535, 21.4467185762, 
10.5718894009, 3.9693548387)), .Names = c("Year", "Rain"), class = 
"data.frame", row.names = c(NA, 
-49L))

Here's my script:

dat<- read.csv("test.csv",header=TRUE,sep=",")
dat[dat == -999]<- NA
dat[dat == -888]<- 0
dat<-data.frame(dat)

dat$mav <- rollapply(dat$Rain,width=9,mean,fill=NA,align="right")


p <- ggplot(dat, aes(x = Year))
p <- p + geom_line(aes(y = Rain,color="test"))
p <- p + geom_point(aes(y = Rain,color="test"),size=1)
p <- p + geom_line(aes(y=mav, color = "9-year running mean") , lwd = 1)
p <- p + theme(panel.background=element_rect(fill="white"),
         plot.margin = unit(c(0.5,0.5,0.5,0.5),"cm"),
         panel.border=element_rect(colour="black",fill=NA,size=1),
         axis.line.x=element_line(colour="black"),
         axis.line.y=element_line(colour="black"),
         axis.text=element_text(size=20,colour="black",family="serif"),
         axis.title=element_text(size=15,colour="black",family="serif"),
         legend.position = "top")
p <- p + scale_colour_manual(name="",values=c("test"="steelblue4","9-year running mean"="green"))
p <- p + scale_y_continuous(breaks=seq(0,50, by=10),limits=c(0,50), expand=c(0,0))
p <- p + scale_x_discrete(limits = c(seq(1961,2009,9)),expand=c(0,0))
p <- p + labs(x="Year",y="Rainfall(mm/day)")

Here's the output image: Output Image

What I am expecting:

[a] The time series of the running average should start at 1969 and the last value should be at 2000. But in the output image, the time series is shifted to the right and ends at 2009.

[b] When I set the 'align' to "center", the running mean starts at 1965.

[c] Any suggestion on how to do this correctly in R?

Lyndz
  • 347
  • 1
  • 13
  • 30
  • 1
    Try `rollapply(1:10, 5, mean, fill=NA, align='right')`, and you'll see that the non-`NA` values range from indices 5-10; this means that the first `n-1` values are `NA`, all others are usable values. In your data with width 9, that means the first 8 values should be `NA` and the remainder (through year 2009) are usable. As far as *"doing it correctly"* ... if you expect it to lose the first 8 and last 9, doesn't that mean your width should be 18? Otherwise, doing it right is a matter of perspective, and I'm afraid I'm going to side with R on this one. – r2evans Apr 25 '18 at 05:00
  • Hi. I got your point. But what I mean is when the width is less than 9 in my case it should be filled with NA. So 9 timesteps on both ends should be empty. – Lyndz Apr 25 '18 at 05:04
  • 1
    No. The gap should be `n-1`, not `n`, and it should only be (a) all on one side, or (b) split between the two sides. Think of it this way: the `i`th value should be the mean of the previous (`align='left'`), surrounding (`align='center'`), or following (`align='right'`) values. So by using `align='right'`, you are saying to place the return value in the right-most spot. This means there should never be a gap on the right side. – r2evans Apr 25 '18 at 05:07

1 Answers1

3

I think you might be misunderstanding how the width, fill, and alignment works in an rolling apply.

vec <- 1:10
rollapply(vec, 5, mean, fill=NA, align='right')
#  [1] NA NA NA NA  3  4  5  6  7  8

It is first taking the n=5 values and calculating the mean:

mean(vec[1:5])
# [1] 3

Where to put it? Since we said align='right', it places it in the right-most spot, so index 5.

#  [1]  1  2  3  4  5  6  7  8  9 10
#                   ^
#                   3

and since you said fill=NA, it keeps the preceding spaces and populates them with NA

#  [1]  1  2  3  4  5  6  7  8  9 10
#       ^  ^  ^  ^
#  [1] NA NA NA NA  3

For the next iteration, it takes the mean of the 2nd through 6th position:

mean(vec[2:6])
# [1] 4

which it then places in the 6th position:

#  [1]  1  2  3  4  5  6  7  8  9 10
#                      ^
#  [1] NA NA NA NA  3  4

When we get to the last iteration, we are calculating positions len-n+1 (10-5+1=6) through len (10), so

mean(vec[6:10])
# [1] 8

so it is put in the last position

#  [1]  1  2  3  4  5  6  7  8  9 10
#                                  ^
#  [1] NA NA NA NA  3  4  5  6  7  8

So, because we had width=5 and fill=NA, we will have 5-1=4 spaces filled with NA. (There might be more if there were any more NAs in the data.) Had we chosen instead width=5 without fill, then we would have had 5-1=4 spaces missing, meaning

# [1] 3 4 5 6 7 8

Had we done width=5, fill=NA, align='left', then we should see:

rollapply(vec, 5, mean, fill=NA, align='left')
#  [1]  3  4  5  6  7  8 NA NA NA NA

because we asked for NAs vice removal, and we said to put each value in the left-most for each window of width 5. The last iteration (mean(vec[6:10]) with a value of 8) was put in the left-most position of the last window of width 5, meaning there are four spaces to the right with known unknown values.

r2evans
  • 141,215
  • 6
  • 77
  • 149