I have been trying to learn the best way to recode variables in a column based on the condition of a name being associated with more than one race.
I have been working with a dataframe like this:
df <- data.frame('Name' = c("Jon", "Jon", "Bobby", "Sarah", "Fred"),
'Race' = c("Black", "White", "Asian", "Asian", "Black"))
What I am trying to do is recode any value that appears more than once in a group and transform it into a "multi-racial" category.
The end goal is to construct a dataframe like below:
df1 <- data.frame('Name' = c("Jon", "Bobby", "Sarah", "Fred"),
'Race' = c("Multiracial", "Asian", "Asian", "Black"))
The way I currently am doing it is by getting a list of people with multiple answers grouping race by name. Then, get a list of the names with more than one answer and for the names with more than one answer only, replace the race with "multi-racial". Code shown below:
df1 <- unique(df[, c('Name', 'Race')])
multi_answer <-
df1 %>%
dplyr::group_by(Name) %>%
dplyr::summarise(n_answers = n_distinct(Race))
multi_answer <- multi_answer[multi_answer$n_answers >1,]
df1[df1$Name %in% c(multi_answer$Name), 'Race'] <- 'multi-racial'
df1 <- unique(df1)