-1

I have daily data for over 100 years that looks like

01.01.1856   12
02.01.1956   9
03.01.1956   -12
04.01.1956    7
etc. 

I wish to calculate the 30 year running average for this huge data. I tried converting the data into a time series but cant still figure out how to go about it. I will prefer a simple method that has to do with working with a data.frame.

Dearie.K
  • 11
  • 1
  • it does not have to be a timeseries (but representing a series of observations), take a look at http://www.inside-r.org/packages/cran/zoo/docs/rollmean or http://svitsrv25.epfl.ch/R-doc/library/caTools/html/runminmax.html – MLavoie Jan 16 '16 at 11:03
  • The `TTR` package provides all the moving averages one can possibly think of. – RHertel Jan 16 '16 at 11:50
  • The `movav()` function from the `prospectr` package does it as well – Frank Jan 16 '16 at 12:02

2 Answers2

1

I guess the preparation is the difficulty considering some leapyears. So I try to show some way for preparing, before using the already mentioned function runmean of package require(caTools). First we create example data (which is not necessary for you, but for the understanding). Second I divide the data frame into a list of data frames, one for each year and taking the mean values for each year. These two steps could be done at once, but I think the separated way is easier to understand and to adapt.

#example data
Days        <- seq(as.Date("1958-01-01"), as.Date("2015-12-31"), by="days")
Values      <- runif(length(Days))
DF          <- data.frame(Days = Days, Values = Values)
#start of script
Years       <- format(DF$Days, "%Y")
UniqueYears <- unique(format(DF$Days, "%Y"))
#Create subset of years
#look for every unique year which element of days is in this year.
YearlySubset <- lapply(UniqueYears, function(x){
              DF[which(Years == x), ]
            })
YearlyMeanValues <- sapply(YearlySubset, function(x){
              mean(x$Values)
            })

Now the running mean is applied:

#install.packages("caTools")
require(caTools)
RM <- data.frame(Years = UniqueYears, RunningMean30y = runmean(YearlyMeanValues, 30))

Just if I didn't got you right at first and you want some running mean for every day within about 30 years, of course you could simply do:

RM <- cbind(DF, runmean(DF$Values, 365 * 30))

And considering your problems creating a timeseries:

DF[ , 1] <- as.Date(DF[ , 1], format = "%Y.%m.%d")
Phann
  • 1,283
  • 16
  • 25
1

I would also suggest exploring RcppRoll in combination with dplyr which provides a fairly convenient solution to calculate rolling averages, sums, etc.

Code

# Libs
library(RcppRoll)  # 'roll'-ing functions for R vectors and matrices.
library(dplyr)     # data grammar (convenience)
library(zoo)       # time series (convenience)
library(magrittr)  # compound assignment pipe-operator (convenience)

# Data 
data("UKgas")
## Convert to data frame to make example better
UKgas <- data.frame(Y = as.matrix(UKgas), date = time(UKgas))

# Calculations
UKgas %<>%
    # To make example more illustrative I converted the data to a quarterly format
    mutate(date = as.yearqtr(date)) %>% 
    arrange(date) %>%
    # The window size can be changed to reflect any period
    mutate(roll_mean = roll_mean(Y, n = 4, align = "right", fill = NA))

Notes

As the data provided in the example was fairly modest I used quarterly UK gas consumption data available via the data function in the utils package.

Konrad
  • 17,740
  • 16
  • 106
  • 167