Create column for mean of another column, filtered after a dummy variable

Question

My table looks (simplified) like this:

| countrycode | year | dummy | Value

| "AUS" | 2008 | 1 | -3

| "AUS" | 2009 | 0 | -2

| "BEL" | 2008 | 0 | -5

| "BEL" | 2009 | 1 | -1

| "BEL" | 2010 | 1 | -2

| "BEL" | 2011 | 1 | -1

| "CAN" | 2008 | 0 | -2

| "CAN" | 2009 | 0 | -5

| "CAN" | 2010 | 1 | 2

| "CAN" | 2011 | 0 | -4

And I would like to get the Mean of the Values grouped by countrycode, like that:

| countrycode | year | dummy | Value | Mean

| "AUS" | 2008 | 1 | -3 | -3

| "AUS" | 2009 | 0 | -2 | -3

| "BEL" | 2008 | 0 | -5 | -1,333

| "BEL" | 2009 | 1 | -1 | -1,333

| "BEL" | 2010 | 1 | -2 | -1,333

| "BEL" | 2011 | 1 | -1 | -1,333

| "CAN" | 2008 | 1 | -2 | -0,5

| "CAN" | 2009 | 0 | -5 | -0,5

| "CAN" | 2010 | 1 | 1 | -0,5

| "CAN" | 2011 | 0 | -4 | -0,5

My question is how can I create a new column for mean from another column that is filtered after a dummy variable?

Please provide your data with `dput()`, because it makes easier to reproduce an example — Vinícius Félix, Sep 15 '21 at 15:34
Does this answer your question? [Using dplyr to group\_by and conditionally mutate a dataframe by group](https://stackoverflow.com/questions/42980374/using-dplyr-to-group-by-and-conditionally-mutate-a-dataframe-by-group) — user438383, Sep 15 '21 at 15:42

Ben · Answer 1 · 2021-09-16T01:48:59.147

Perhaps this is what you are looking for. After grouping by countrycode, you can use mutate to create your new column. For each group, you can get the mean Value where dummy is 1. By using mutate you will have this added to all rows in the group, even if dummy is 0.

library(dplyr)

df %>% 
  group_by(countrycode) %>% 
  mutate(mean = mean(Value[dummy == 1]))

Output

   countrycode  year dummy Value  mean
   <chr>       <int> <int> <int> <dbl>
 1 AUS          2008     1    -3 -3   
 2 AUS          2009     0    -2 -3   
 3 BEL          2008     0    -5 -1.33
 4 BEL          2009     1    -1 -1.33
 5 BEL          2010     1    -2 -1.33
 6 BEL          2011     1    -1 -1.33
 7 CAN          2008     1    -2 -0.5 
 8 CAN          2009     0    -5 -0.5 
 9 CAN          2010     1     1 -0.5 
10 CAN          2011     0    -4 -0.5

Data

df <- structure(list(countrycode = c("AUS", "AUS", "BEL", "BEL", "BEL", 
"BEL", "CAN", "CAN", "CAN", "CAN"), year = c(2008L, 2009L, 2008L, 
2009L, 2010L, 2011L, 2008L, 2009L, 2010L, 2011L), dummy = c(1L, 
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L), Value = c(-3L, -2L, -5L, 
-1L, -2L, -1L, -2L, -5L, 1L, -4L)), class = "data.frame", row.names = c(NA, 
-10L))

Vinícius Félix · Answer 2 · 2021-09-15T20:46:34.677

0

library(dplyr)

your_data_frame %>% 
  filter(dummy == 1) %>%
  group_by(countrycode) %>% 
  summarize(mean = mean(Value, na.rm = TRUE))

edited Sep 15 '21 at 20:46

answered Sep 15 '21 at 15:34

Vinícius Félix

8,448
6
16
32

How do I add that I only want to use it on values on the same row with dummy=1? – Passiv Sep 15 '21 at 16:45
Updated my answer, using `filter` – Vinícius Félix Sep 15 '21 at 19:25
doing this, I'll get the mean of all values together with dummy ==1, but I want for every country an unique value (being the mean of their values). Thanks for helping btw. – Passiv Sep 15 '21 at 20:45
Ok, `mutate` keeps the data dimensions, to summarise you can use `summarize`/`summarise` instead – Vinícius Félix Sep 15 '21 at 20:47
With summarize I'll still get only one value (mean of all variables with dummy==1). – Passiv Sep 15 '21 at 21:04
You want the mean of each country code, only for dummy = 1, correct? – Vinícius Félix Sep 15 '21 at 21:05

Create column for mean of another column, filtered after a dummy variable

2 Answers2