Creating a new column in a dataframe based on the answer choices in the other columns

Question

I'm a bit confused on how to populate my new column based on character combinations I have from each of my other columns.

Here is my original dataframe:

df <-  data.frame('Hispanic'=c("N", "Y", "N", "N"), 'Black'=c("Y", "N", "N", "Null"), 'Asian'=c("N", "Y", "N", "N"), 
                  'HN'=c("N", "N", "N", "N"), 'AN'=c("N", "N", "N", "Y"), 'White'=c("N", "Y", "N", "Null"), 
                  'NA'=c("N", "N", "Y", "Y"))

I want to code the variables in the new column based on different combinations of race and ethnicity. Specifically I'm trying to get these factors into the categories of Black (Non-Hispanic), Asian (Non-Hispanic), Native Hawaiian (Non-Hispanic), American Indian/Alaska Native (Non-Hispanic), Multiracial (Non-Hispanic) and Hispanic. So whenever a record has Hispanic as a yes, the populated value should just be Hispanic but if the value is a no it should detail either the single race selected with Non-Hispanic (ex: Black, NH) or if they selected more than one race it would be multiracial and Non-Hispanic (Ex: Multiracial, NH).

The goal is to get something that looks like the results below:

df1 <- data.frame('Hispanic'=c("N", "Y", "N", "N"), 'Black'=c("Y", "N", "N", "Null"), 'Asian'=c("N", "Y", "N", "N"), 
                  'HN'=c("N", "N", "N", "N"), 'AN'=c("N", "N", "N", "Y"), 'White'=c("N", "Y", "N", "Null"), 
                  'NA'=c("N", "N", "Y", "Y"), 
                  'R_E'=c("Black, NH", "Hispanic", "Native American, NH", "Multi-racial, NH" ))

Row 2 is Y on hispanic, asian and white. That is equal to Hispanic, is that correct? — Chamkrai, Apr 28 '22 at 21:36

Onyambu · Accepted Answer · 2022-04-29T15:39:41.433

0

df %>%
  rowid_to_column() %>%
  left_join(pivot_longer(.,-rowid) %>%
    group_by(rowid) %>%
    mutate(value = value == 'Y') %>%
    summarise(value = if(any(name =='Hispanic' & value))
      'Hispanic' else paste(if (sum(value)>1)
      'multiracial' else name[value], 'NH')))

      rowid Hispanic Black Asian HN AN White NA.          value
1     1        N     Y     N  N  N     N   N       Black NH
2     2        Y     N     Y  N  N     Y   N       Hispanic
3     3        N     N     N  N  N     N   Y         NA. NH
4     4        N  Null     N  N  Y  Null   Y multiracial NH

edited Apr 29 '22 at 15:39

answered Apr 28 '22 at 21:48

Onyambu

67,392
3
24
53

This is close to what I'm trying to do but if you look at row 2 it is being reported as "multiracial NH" because Hispanic is being included in the calculation. Whenever a Y is in the Hispanic column the value should always be Hispanic. – snailwhale Apr 29 '22 at 15:26
@snailwhale you never stated that in your original post – Onyambu Apr 29 '22 at 15:27
@snailwhale check the edit now – Onyambu Apr 29 '22 at 15:38
apologies! I assumed the part where I describe the expected categories for race/ethnicity and the results I was looking for shown in df1 was enough detail. – snailwhale Apr 29 '22 at 15:39
Thank you! This was my first post and was very informative! – snailwhale Apr 29 '22 at 15:54

Creating a new column in a dataframe based on the answer choices in the other columns

1 Answers1