I am looking for a more efficient method of re-coding column entries in a dataframe, where the recoding is conditional on the entries in other columns.
Take this simple example, which demonstrates my current procedure of creating a new column for the recoded data, converting it to character, and then using the subset square brackets to recode the data (is there an official name for this procedure?).
## example data frame
df = data.frame( id = seq( 1 , 100 , by=1 ) ,
x = rep( c("W", "Z") , each=50),
y = c( rep( c("A","B","C","D") , 25 ) ) )
# add a new column based on column y; convert to character
df$newY = as.character( df$y )
# change newY entries to numbers based on conditions in other columns
df$newY[ df$x == "W" & df$newY == "B" ] <- 1
df$newY[ df$x == "Z" & df$newY == "D" ] <- 3
This procedure is fine for recoding variables with a small number of conditions, but becomes cumbersome for larger number of conditional arguments or when there are many distinct variables to recode.
Could anyone help me with finding a more efficient method of doing this?
Thanks!