-1

My table looks (simplified) like this:

| countrycode | year | dummy | Value

| "AUS" | 2008 | 1 | -3

| "AUS" | 2009 | 0 | -2

| "BEL" | 2008 | 0 | -5

| "BEL" | 2009 | 1 | -1

| "BEL" | 2010 | 1 | -2

| "BEL" | 2011 | 1 | -1

| "CAN" | 2008 | 0 | -2

| "CAN" | 2009 | 0 | -5

| "CAN" | 2010 | 1 | 2

| "CAN" | 2011 | 0 | -4

And I would like to get the Mean of the Values grouped by countrycode, like that:

| countrycode | year | dummy | Value | Mean

| "AUS" | 2008 | 1 | -3 | -3

| "AUS" | 2009 | 0 | -2 | -3

| "BEL" | 2008 | 0 | -5 | -1,333

| "BEL" | 2009 | 1 | -1 | -1,333

| "BEL" | 2010 | 1 | -2 | -1,333

| "BEL" | 2011 | 1 | -1 | -1,333

| "CAN" | 2008 | 1 | -2 | -0,5

| "CAN" | 2009 | 0 | -5 | -0,5

| "CAN" | 2010 | 1 | 1 | -0,5

| "CAN" | 2011 | 0 | -4 | -0,5

My question is how can I create a new column for mean from another column that is filtered after a dummy variable?

user438383
  • 5,716
  • 8
  • 28
  • 43
Passiv
  • 1
  • 1
  • Please provide your data with `dput()`, because it makes easier to reproduce an example – Vinícius Félix Sep 15 '21 at 15:34
  • Does this answer your question? [Using dplyr to group\_by and conditionally mutate a dataframe by group](https://stackoverflow.com/questions/42980374/using-dplyr-to-group-by-and-conditionally-mutate-a-dataframe-by-group) – user438383 Sep 15 '21 at 15:42

2 Answers2

1

Perhaps this is what you are looking for. After grouping by countrycode, you can use mutate to create your new column. For each group, you can get the mean Value where dummy is 1. By using mutate you will have this added to all rows in the group, even if dummy is 0.

library(dplyr)

df %>% 
  group_by(countrycode) %>% 
  mutate(mean = mean(Value[dummy == 1]))

Output

   countrycode  year dummy Value  mean
   <chr>       <int> <int> <int> <dbl>
 1 AUS          2008     1    -3 -3   
 2 AUS          2009     0    -2 -3   
 3 BEL          2008     0    -5 -1.33
 4 BEL          2009     1    -1 -1.33
 5 BEL          2010     1    -2 -1.33
 6 BEL          2011     1    -1 -1.33
 7 CAN          2008     1    -2 -0.5 
 8 CAN          2009     0    -5 -0.5 
 9 CAN          2010     1     1 -0.5 
10 CAN          2011     0    -4 -0.5 

Data

df <- structure(list(countrycode = c("AUS", "AUS", "BEL", "BEL", "BEL", 
"BEL", "CAN", "CAN", "CAN", "CAN"), year = c(2008L, 2009L, 2008L, 
2009L, 2010L, 2011L, 2008L, 2009L, 2010L, 2011L), dummy = c(1L, 
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L), Value = c(-3L, -2L, -5L, 
-1L, -2L, -1L, -2L, -5L, 1L, -4L)), class = "data.frame", row.names = c(NA, 
-10L))
Ben
  • 28,684
  • 5
  • 23
  • 45
0
library(dplyr)

your_data_frame %>% 
  filter(dummy == 1) %>%
  group_by(countrycode) %>% 
  summarize(mean = mean(Value, na.rm = TRUE))
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32