More efficient method of recoding one column in a data.frame conditional on other column entries

Question

I am looking for a more efficient method of re-coding column entries in a dataframe, where the recoding is conditional on the entries in other columns.

Take this simple example, which demonstrates my current procedure of creating a new column for the recoded data, converting it to character, and then using the subset square brackets to recode the data (is there an official name for this procedure?).

## example data frame
df = data.frame( id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25 ) ) )

# add a new column based on column y; convert to character 
df$newY = as.character( df$y ) 

# change newY entries to numbers based on conditions in other columns
df$newY[ df$x == "W" & df$newY == "B" ] <- 1
df$newY[ df$x == "Z" & df$newY == "D" ] <- 3

This procedure is fine for recoding variables with a small number of conditions, but becomes cumbersome for larger number of conditional arguments or when there are many distinct variables to recode.

Could anyone help me with finding a more efficient method of doing this?

Thanks!

Is there some kind of logic or pattern in the recoding? By efficiency, do you mean a method that requires less typing or do you mean faster performance / memory efficiency? — talat, Feb 24 '16 at 10:17
Would something like this solve your problem: `df$newY = as.factor( paste0(df$y, df$x) ) ; as.numeric(df$newY)` — Raad, Feb 24 '16 at 10:21
@MaxPD In my data, recoding is conditional on one other column in the dataframe (as in the example) but there are up to four multiples of the same variable needing to be converted to the same new coding (e.g. imagine if there was A1, A2, A3 etc. in the above example needing to be recoded to 1, conditional on "W"). There are also 8 distinct variables in the conditional column (e.g. "x" column above), and up to 11 different variables in the y column, meaning 8 blocks of ~ 11 lines of recoding. I hope that is clear. — user3237820, Feb 24 '16 at 10:29
@docendodiscimus By efficiency, I do mean just less typing, sorry. For patterning, it's difficult. For instance, a variable, e.g. A, conditional on Z, may have to be recoded as 1, but A conditional on W needs recoding as 2. Perhaps a better approach would be to reshape the data frame from long to wide format, and recode each variable as a separate column... — user3237820, Feb 24 '16 at 10:31
.. well thats life. If there are too many conditions, be it, you have to code them ayy. — CuriousBeing, Feb 24 '16 at 10:32
@NBATrends Thank you, although this solution returns a column combining two conditions (e.g. AW , AZ). For my example, I require recoding to numbers (although I may be missing the versatility of your solution!). — user3237820, Feb 24 '16 at 10:34
@MaxPD Yep, that's what I was thinking! Thanks for your help. — user3237820, Feb 24 '16 at 10:35
The call to as.numeric would do that for you in the second line of my proposition — Raad, Feb 24 '16 at 10:37

score 1 · Accepted Answer · answered Feb 24 '16 at 10:48

1

Some approaches to this:

df <- data.frame(id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25)))

# Take the product (my preference)
as.numeric(df$x) * as.numeric(df$y)

# Create new factor based on x and y and convert to numeric
as.numeric(as.factor(paste0(df$x, df$y)))

answered Feb 24 '16 at 10:48

Raad

2,675
1
13
26

Thanks! This helps a lot. I missed your second example in the comment to the OP. – user3237820 Feb 24 '16 at 10:56

More efficient method of recoding one column in a data.frame conditional on other column entries

1 Answers1