0

I need to obtain the last 30 years mean value of TOTAL column month over month.

The dataset is avaible here:

library(dplyr)
    
ENSO <-read.table("http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/detrend.nino34.ascii.txt", header = TRUE)
glimpse(ENSO)

For example for the sep-2021 I need to calculate:

 $$
 (TOTAL_{sep-2021} + 
 TOTAL_{sep-2020} +
 TOTAL_{sep-2019} +
 ...
 TOTAL_{sep-1991}) / 30 
 $$

I tried to use dplyr::mutate but I think that slider or zoo maybe can be helpful inside an condition because the time series begins in jan-1950 and obviously I wouldn't have the last 30 average values MoMs.

4 Answers4

0

You can either fix the current year to 2021 or get the higher year in the table.

Then, you only need to filter out years that are lower than this current year minus 30.

If you want to sugar-coat this, you can even use a custom name for your column.

Here is the code:

current_y = max(ENSO$YR)
col_name = paste0("total_mean_", current_y-30, "_to_", current_y)
ENSO %>% 
    filter(YR>current_y-30) %>% 
    group_by(MON) %>% 
    summarise(!!col_name:=mean(TOTAL))

# # A tibble: 12 x 2
#     MON total_mean_1991_to_2021
#   <int>                   <dbl>
# 1     1                    26.5
# 2     2                    26.7
# 3     3                    27.3
# 4     4                    27.8
# 5     5                    27.9
# 6     6                    27.7
# 7     7                    27.3
# 8     8                    26.8
# 9     9                    26.7
#10    10                    26.7
#11    11                    26.7
#12    12                    26.5
Dan Chaltiel
  • 7,811
  • 5
  • 47
  • 92
0

Here is a data.table approach.

functional:
1 - Split the table to a list of tables by month
2 - caculate the rolling mean of the last 30 months
3 - rowbind the monthly tables to a single table

library(data.table)
# Make ENSO a data.table, key by year and month
setDT(ENSO, key = c("YR", "MON"))
# Split by MON
L <- split(ENSO, by = "MON")
# Loop over L, create monthly mean over the last 30 entries
L <- lapply(L, function(x) {
  x[, MON30_avg := frollmean(TOTAL, n = 30)]
})
# Rowbind List together again
final <- rbindlist(L, use.names = TRUE, fill = TRUE)
Wimpel
  • 26,031
  • 1
  • 20
  • 37
0

Thank you Bloxx!

I can use the new variable by using

library(dplyr)
library(fpp3)

ENSO <- read.table("http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/ensostuff/detrend.nino34.ascii.txt", header = TRUE) %>%
mutate(
Dates = paste(YR, "-", MON),
Dates = yearmonth(Dates),
Month_Year = paste(month.name[month(Dates)],"/", year(Dates)),
diff_total = difference(TOTAL),
ANOM = round( TOTAL - ClimAdjust, digits = 2),

 # TMA = TMA_{t-1} + TMA_{t} + TMA_{t+1}
TMA = round( slide_dbl(ANOM, mean, .before = 1, .after = 1), digits=2 ), 

# ´Climatic Condition`= if 5 last consecutives TMA > 0.5 then El Niño, otherwise if 5 last consecutives TMA < -0.5 then La Niña    

`Climatic Condition` = 
  lag( case_when(
         rollapplyr(TMA < -0.5, 5, all, fill = FALSE) ~ "La Niña", 
         rollapplyr(TMA >  0.5, 5, all, fill = FALSE) ~ "El Niño") ),
      
`3 months` = 
       
       case_when( 
         month(Dates) ==  1 ~ "DJF",
         month(Dates) ==  2 ~ "JFM",    
         month(Dates) ==  3 ~ "FMA",
         month(Dates) ==  4 ~ "MAM",
         month(Dates) ==  5 ~ "AMJ",
         month(Dates) ==  6 ~ "MJJ",
         month(Dates) ==  7 ~ "JJA",
         month(Dates) ==  8 ~ "JAS",
         month(Dates) ==  9 ~ "ASO",
         month(Dates) ==  10 ~ "SON",
         month(Dates) ==  11 ~ "OND",
         month(Dates) ==  12 ~ "NDJ" )

  ) %>% as_tsibble(index = Dates)

  ENSO <- ENSO %>% # To reorder the dtaaframe
  select(

  Dates,
  Month_Year,
  YR,
  MON,
  TOTAL,
  ClimAdjust,
  ANOM,
  TMA,
  `3 months`,
  `Climatic Condition`, 
   diff_total

   )


  ClimAdj <- ENSO %>% 
  group_by(MON) %>% 
  summarise(ClimAdj = mean(TOTAL) )

  ENSO <- left_join(ENSO, ClimAdj %>%
          select(Dates, ClimAdj), by = c("Dates" = "Dates"))   

  ENSO <- ENSO %>%
       select(
           -MON.y
              ) %>%
   rename(
           MON = "MON.x"
          ) 

      ENSO <- ENSO %>%
    select(

    Dates,
    Month_Year,
    YR,
    MON,
    TOTAL,
    #ClimAdjust,
    ClimAdj,
    ANOM,
    TMA,
    `3 months`,
    `Climatic Condition`, 
    diff_total

 )  

 glimpse(ENSO)
-1

Here is the updated code. You first arrange by year and month and then slice last 360 months (30 years)! Then group by month and then calculate mean:

ENSO %>% arrange(YR, MON) %>% slice_tail(n = 360) %>% group_by(MON) %>% summarise(mean(TOTAL))

Hope this is what you want. Each month has mean for the last 30 years.

Bloxx
  • 1,495
  • 1
  • 9
  • 21
  • This is missing the "last 30 years" part of the question. – r2evans Oct 14 '21 at 13:27
  • I edited the answer. – Bloxx Oct 14 '21 at 18:50
  • 2
    The assumption that you always have three perfect years of data is fragile and too-often unreliable in real data. A more robust approach should take into account missing data, so not hard-code row indexing. – r2evans Oct 14 '21 at 19:30
  • I agree, but in this case I am looking at the potential research question. E.g., What is the average temperature per month in last 30 years. Lets assume the month December would have 10 NAs, it would be wrong to take 30 data points, cos that would actually check the average from 1981-2021... Correct way would be to take only 20 data points, and in that way still look at the original time scale. – Bloxx Oct 14 '21 at 19:43
  • 1
    I understand. In my experience, hard-coding something along the lines of "360 for 30 years of data" has almost always broken when one or more assumptions are proven incorrect. (*"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so"*, often attributed to Mark Twain with no evidence.) It works here if/when the data is perfect, but my recommendations all lean towards more generic, defensive, extensible solutions. – r2evans Oct 14 '21 at 20:58