0

I am new to coding. I have a data set of daily stream flow averages over 20 years. Following is an example:

          DATE   FLOW
1    10/1/2001   88.2
2    10/2/2001   77.6
3    10/3/2001   68.4
4    10/4/2001   61.5
5    10/5/2001   55.3
6    10/6/2001   52.5
7    10/7/2001   49.7
8    10/8/2001   46.7
9    10/9/2001   43.3
10  10/10/2001   41.3
11  10/11/2001   39.3
12  10/12/2001   37.7
13  10/13/2001   35.8
14  10/14/2001   34.1
15  10/15/2001   39.8

I need to create a loop summing the previous 6 days as well as the current day (rolling weekly average), and print it to an array for the designated water year. I have already created an aggregate function to separate yearly average daily means into their designated water years.

# Separating dates into specific water years

wtr_yr <- function(dates, start_month=9)
  # Convert dates into POSIXlt
  POSIDATE = as.POSIXlt(NEW_DATE)
  # Year offset
  offset = ifelse(POSIDATE$mon >= start_month - 1, 1, 0)
  # Water year
  adj.year = POSIDATE$year + 1900 + offset
  
# Aggregating the water year function to take the mean
  
mean.FLOW=aggregate(data_set$FLOW,list(adj.year), mean)
camille
  • 16,432
  • 18
  • 38
  • 60

1 Answers1

0

It seems that it can be done much more easily. But first I need to prepare a bit more data.

library(tidyverse)
library(lubridate)

df = tibble(
  DATE = seq(mdy("1/1/2010"), mdy("12/31/2022"), 1),
  FLOW = rnorm(length(DATE), 40, 10)
) 

output

# A tibble: 4,748 x 2
   DATE        FLOW
   <date>     <dbl>
 1 2010-01-01  34.4
 2 2010-01-02  37.7
 3 2010-01-03  55.6
 4 2010-01-04  40.7
 5 2010-01-05  41.3
 6 2010-01-06  57.2
 7 2010-01-07  44.6
 8 2010-01-08  27.3
 9 2010-01-09  33.1
10 2010-01-10  35.5
# ... with 4,738 more rows

Now let's do the aggregation by year and week number

df %>% 
  group_by(year(DATE), week(DATE)) %>% 
  summarise(mean = mean(FLOW))

output

# A tibble: 689 x 3
# Groups:   year(DATE) [13]
   `year(DATE)` `week(DATE)`  mean
          <dbl>        <dbl> <dbl>
 1         2010            1  44.5
 2         2010            2  39.6
 3         2010            3  38.5
 4         2010            4  35.3
 5         2010            5  44.1
 6         2010            6  39.4
 7         2010            7  41.3
 8         2010            8  43.9
 9         2010            9  38.5
10         2010           10  42.4
# ... with 679 more rows

Note, for the function week, the first week starts on January 1st. If you want to number the weeks according to the ISO 8601 standard, use the isoweek function. Alternatively, you can also use an epiweek compatible with the US CDC.

df %>% 
  group_by(year(DATE), isoweek(DATE)) %>% 
  summarise(mean = mean(FLOW))

output

# A tibble: 681 x 3
# Groups:   year(DATE) [13]
   `year(DATE)` `isoweek(DATE)`  mean
          <dbl>           <dbl> <dbl>
 1         2010               1  40.0
 2         2010               2  45.5
 3         2010               3  33.2
 4         2010               4  38.9
 5         2010               5  45.0
 6         2010               6  40.7
 7         2010               7  38.5
 8         2010               8  42.5
 9         2010               9  37.1
10         2010              10  42.4
# ... with 671 more rows

If you want to better understand how these functions work, please follow the code below

df %>% 
  mutate(
    w1 = week(DATE),
    w2 = isoweek(DATE),
    w3 = epiweek(DATE)
  )

output

# A tibble: 4,748 x 5
   DATE        FLOW    w1    w2    w3
   <date>     <dbl> <dbl> <dbl> <dbl>
 1 2010-01-01  34.4     1    53    52
 2 2010-01-02  37.7     1    53    52
 3 2010-01-03  55.6     1    53     1
 4 2010-01-04  40.7     1     1     1
 5 2010-01-05  41.3     1     1     1
 6 2010-01-06  57.2     1     1     1
 7 2010-01-07  44.6     1     1     1
 8 2010-01-08  27.3     2     1     1
 9 2010-01-09  33.1     2     1     1
10 2010-01-10  35.5     2     1     2
# ... with 4,738 more rows
Marek Fiołka
  • 4,825
  • 1
  • 5
  • 20