1

I have a data frame, a time series with 4 observation every hour for each site.

I'd like to compute moving average for every 4 hours, using data.table, I can do it per hour but not every n hours.

dput(df)
structure(list(time = structure(c(1414502100, 1414503000, 1414503900, 
1414504800, 1414505700, 1414506600, 1414507500, 1414508400, 1414509300, 
1414510200, 1414511100, 1414512000, 1414512900, 1414513800, 1414514700, 
1414515600, 1414516500, 1414517400, 1414518300, 1414519200, 1414520100, 
1414521000, 1414521900, 1414522800, 1414523700, 1414524600, 1414525500, 
1414526400, 1414527300, 1414528200, 1414529100, 1414530000, 1414530900, 
1414531800, 1414532700, 1414533600, 1414534500, 1414535400, 1414536300, 
1414537200), class = c("POSIXct", "POSIXt"), tzone = ""), site = c(2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 2108L, 
2108L, 2108L, 2108L), val = c(38L, 38L, 35L, 35L, 35L, 35L, 37L, 
38L, 38L, 36L, 36L, 35L, 33L, 31L, 27L, 26L, 20L, 16L, 14L, 11L, 
7L, 5L, 2L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L), month = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10), 
    hour = c("14", "14", "14", "15", "15", "15", "15", "16", 
    "16", "16", "16", "17", "17", "17", "17", "18", "18", "18", 
    "18", "19", "19", "19", "19", "20", "20", "20", "20", "21", 
    "21", "21", "21", "22", "22", "22", "22", "23", "23", "23", 
    "23", "00"), min = c("15", "30", "45", "00", "15", "30", 
    "45", "00", "15", "30", "45", "00", "15", "30", "45", "00", 
    "15", "30", "45", "00", "15", "30", "45", "00", "15", "30", 
    "45", "00", "15", "30", "45", "00", "15", "30", "45", "00", 
    "15", "30", "45", "00"), day = c(28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 
    28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 28L, 29L)), .Names = c("time", 
"site", "val", "month", "hour", "min", "day"), class = "data.frame", row.names = 191430:191469)



 dt<- data.table(df)
    dt[, ':=' ('hsd' = sd(val)), by = list(site, hour, day)]

 head(dt, 10)
                   time site val month hour min day      hsd
 1: 2014-10-28 14:15:00 2108  38    10   14  15  28 1.732051
 2: 2014-10-28 14:30:00 2108  38    10   14  30  28 1.732051
 3: 2014-10-28 14:45:00 2108  35    10   14  45  28 1.732051
 4: 2014-10-28 15:00:00 2108  35    10   15  00  28 1.000000
 5: 2014-10-28 15:15:00 2108  35    10   15  15  28 1.000000
 6: 2014-10-28 15:30:00 2108  35    10   15  30  28 1.000000
 7: 2014-10-28 15:45:00 2108  37    10   15  45  28 1.000000

Is this the right way to compute moving average and how can I calculate it for more than one hour

Frank
  • 66,179
  • 8
  • 96
  • 180
Sasukethorpido
  • 193
  • 3
  • 9
  • I suspect data.table doesn't have the best tools for rollers. You can use it together with RcppRoll or zoo, though. Maybe I've misunderstood... showing your desired output would help clarify. – Frank Nov 05 '15 at 19:36
  • 1
    I'm wondering what you mean by calculate every 4 hours? You want means for 4 hour intervals? – IRTFM Nov 05 '15 at 19:43
  • yes that is what I need – Sasukethorpido Nov 05 '15 at 19:52
  • Take a look [here](http://stackoverflow.com/questions/32649913/calculating-sum-of-previous-3-rows-in-r-data-table-by-grid-square/) or many similar questions. You just need to divide by some `n` in order to convert sum to a mean. – David Arenburg Nov 05 '15 at 20:16
  • I don't understand why you're asking about moving averages but then computing a standard deviation. – Dean MacGregor Nov 10 '15 at 21:12

1 Answers1

2

You can do this with dplyr and zoo. Here I'm grouping by site even though your sample data only includes one site, because I'm guessing that your actual data includes many. I'm also assuming that you want values for overlapping intervals, not sequential ones.

library(zoo)
library(dplyr)

new.df <- df %>%
  group_by(site) %>%  # This only matters if your actual data have multiple sites
  mutate(moving.avg = rollmean(x = val, width = 16,  # 16 is 4 hours x 4 obs per hour
    align = "right", fill = NA))

If you only want the mean for sequential windows -- i.e., every four hours, or in this case every 16th observation within each group -- then use rollapply with the by and align = "right" options specified, i.e.:

mutate(moving.avg = rollapply(data = val, FUN = mean, width = 16, by = 16,
  align = "right", fill = NA)) 
ulfelder
  • 5,305
  • 1
  • 22
  • 40
  • This solution will compute incorrect values if some periods are missing in the data. Data has to be expanded first. – jangorecki Mar 04 '19 at 15:40