Groupby, filter, summarise and then apply the result to the whole column

Question

I am working with R and I have a series x at quarterly frequency and for which I want to extract the mean over the four quarters in 2012 and store that value in all rows of a newly created column. I have this kind of dataset

date durabl services 
2011-10-01 56.7 37.1
2012-01-01 68.1 90.6
2012-04-01 34.1 29.1
2012-07-01 22.56 34.12
2012-10-01 44.89 66.8

And I want to get to this

date durabl services base_durabl base_services
2011-10-01 56.7 37.1 42.4125 55.155
2012-01-01 68.1 90.6 42.4125 55.155
2012-04-01 34.1 29.1 42.4125 55.155
2012-07-01 22.56 34.12 42.4125 55.155
2012-10-01 44.89 66.8 42.4125 55.155

I tried using dplyr

df %>% group_by(year(date)) %>% filter(year(date)==2012) %>% mutate(base_durabl= mean(durabl),base_services=mean(services))

or with summarise but of course I just get a smaller tibble.

Why do you want the mean for 2012 also assigned to the row 1? — langtang, Oct 30 '22 at 14:42
I basically want that mean to rebase the whole durabl and services series, meaning I want to divide the second and third column by the fourth and fifth, respectively. And I would need to do that in two steps, so first store the means of 2012 in new columns, then rebase the series. — Fef894, Oct 30 '22 at 14:45

score 2 · Accepted Answer · answered Oct 30 '22 at 14:44

You can simply index the rows of services and durabl for which you want the mean estimated:

df %>% 
  mutate(base_durabl = mean(durabl[year(date)==2012]),
         base_services = mean(services[year(date)==2012]))

Output:

         date durabl services base_durabl base_services
1: 2011-10-01  56.70    37.10     42.4125        55.155
2: 2012-01-01  68.10    90.60     42.4125        55.155
3: 2012-04-01  34.10    29.10     42.4125        55.155
4: 2012-07-01  22.56    34.12     42.4125        55.155
5: 2012-10-01  44.89    66.80     42.4125        55.155

That makes sense, I don’t know why I wanted to use group_by haha. Thanks!! — Fef894, Oct 30 '22 at 14:50

score 2 · Answer 2 · answered Oct 30 '22 at 15:22

Here is an alternative way to @langtang's comprehensibly answer:

library(dplyr)

df %>% 
  mutate(across(-date, ~mean(.[grep("2012", date)], na.rm=TRUE), .names = "base_{col}"))

        date durabl services base_durabl base_services
1 2011-10-01  56.70    37.10     42.4125        55.155
2 2012-01-01  68.10    90.60     42.4125        55.155
3 2012-04-01  34.10    29.10     42.4125        55.155
4 2012-07-01  22.56    34.12     42.4125        55.155
5 2012-10-01  44.89    66.80     42.4125        55.155

score 2 · Answer 3 · answered Oct 30 '22 at 16:08

Using base R

df[paste0("base_", names(df)[-1])] <- as.list(colMeans(subset(df, select = -date,
                 format(as.Date(date), "%Y") == "2012")))

-output

> df
        date durabl services base_durabl base_services
1 2011-10-01  56.70    37.10     42.4125        55.155
2 2012-01-01  68.10    90.60     42.4125        55.155
3 2012-04-01  34.10    29.10     42.4125        55.155
4 2012-07-01  22.56    34.12     42.4125        55.155
5 2012-10-01  44.89    66.80     42.4125        55.155

data

df <- structure(list(date = c("2011-10-01", "2012-01-01", "2012-04-01", 
"2012-07-01", "2012-10-01"), durabl = c(56.7, 68.1, 34.1, 22.56, 
44.89), services = c(37.1, 90.6, 29.1, 34.12, 66.8)), 
class = "data.frame", row.names = c(NA, 
-5L))

Groupby, filter, summarise and then apply the result to the whole column

3 Answers3

data