I am working with survey data that has a question about race. Each race category is its own variable. Here is what I want to do:
- Create a new variable,
p.race
. - Assign
p.race
the value of one of the eight variables for race/ethnicity (below). - Determine whether an individual marked two or more races and assign
p.race
the value "Two or more races" in such cases. - Assign
p.race
the value "Hispanic or Latino" when they indicated this ethnicity. - Create a new variable,
p.poc
, to indicate if they are a person of color (i.e., not white, including Hispanic/Latino). This shall be 0 or 1.
The eight race categories are white*, black*, Asian*, AIAN*, NHPI*, some other race*, two or more races*, and Hispanic; where * denotes not Hispanic or Latino ethnicity.
Here is what I tried so far for parsing out "Two or more races":
p['p.race'] <- NA # create new variable for race
# list of variable names that store a string indicating the race
## e.g., `race_white` would be either blank or contain "White, European, Middle Eastern, or Caucasian"
race.list <- c('p.race_white', 'p.race_black', 'p.race_asian', 'p.race_aian', 'p.race_nhpi', 'p.race_other')
# iterate through each record
for ( n in 1:length(p) ) {
multiflag = 0
# iterate through the race list
for ( i in race.list ) {
# if it is not blank, +1 to multiflag
if ( p$i[n] != '' ) {
multiflag <- multiflag + 1
}
}
# if multiflag was flagged more than once, assign "Two or more races" to `race`
if ( multiflag > 1 ) {
p$p.race[n] <- 'Two or more races'
}
}
When executed, it returns this error:
> Error in if (p$i[n] != "") { : argument is of length zero
And here is my poc
variable coding with error below:
p['p.poc'] <- 0 # create a new variable for whether they are a person of color
for ( n in 1:length(p) ) {
if ( p$p.race_black[n] == 'Black, African-American, or African'
| p$p.race_asian[n] == 'Asian or Asian-American'
| p$p.race_aian[n] == 'American Indian or Alaskan Native'
| p$p.race_nhpi[n] == 'Native Hawaiian or other Pacific Islander'
| p$p.race_other[n] == 'Other (please specify)'
| p$p.hispanic[n] == 'Yes') {
p$p.poc[n] <- 1
}
}
> Error in if (p$p.race_black[n] == "Black, African-American, or African" | :
missing value where TRUE/FALSE needed
I don't really know where to start for assigning the new race
variable one of the eight race categories without making it a very long code.
If it helps, below are the survey questions:
Q1. Do you consider yourself of Hispanic, Latino, or Spanish origin?
- Yes
- No
Q2. Which race do you identify with (check all that apply)?
- White, European, Middle Eastern, or Caucasian
- Black, African-American, or African
- Asian or Asian-American
- American Indian or Alaskan Native
- Native Hawaiian or other Pacific Islander
- Other (please specify)
And here is the sample output (text truncated):
> p[264:271]
#
# p.hispanic p.race_white p.race_black p.race_asian p.race_aian p.race_nhpi p.race_other
# 1 Yes White
# 2 No White
# 3 No Black
# 4 No White Asian
# 5 Yes Some other race
And here is a dput
output:
> dput(p[264:270])
structure(list(p.hispanic = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "No", "Yes"
), class = "factor"), p.race_white = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("",
"White, European, Middle Eastern, or Caucasian"), class = "factor"),
p.race_black = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"Black, African-American, or African"), class = "factor"),
p.race_asian = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("",
"Asian or Asian-American"), class = "factor"), p.race_aian = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "American Indian or Alaskan Native"
), class = "factor"), p.race_nhpi = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
p.race_other = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"Other (please specify)"), class = "factor")), .Names = c("p.hispanic",
"p.race_white", "p.race_black", "p.race_asian", "p.race_aian",
"p.race_nhpi", "p.race_other"), class = "data.frame", row.names = c(NA,
-79L))