R. Conditional replace of characters in data frame if two columns match

Question

I have a data frame with columns c1 to c11, which looks like this:

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11  
G A G 1 1 0 1 1 0 0 1
T C T 0 0 1 1 0 1 0 1
C C T 0 1 1 1 1 1 1 0

I would like to do the following: if the character in c1 is the same as c3, replace, from c4 to c11, 1s by 2s and 0s by 3s. Otherwise, replace 1s by 3s and 0s by 2s

At the end I would get this data frame:

c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11  
G A G 2 2 3 2 2 3 3 2
T C T 3 3 2 2 3 2 3 2
C C T 2 3 3 3 3 3 3 2

G. Grothendieck · Answer 1 · 2018-01-12T20:11:53.650

1) Converting x = 0, 1 to y = 3, 2 is the same as subtracting x from 3. Also converting x = 0, 1 to y = 2, 3 is the same as adding 2 to x. Thus:

DF[4:11] <- with(DF, (c1 == c3) * (3 - DF[4:11]) + (c1 != c3) * (DF[4:11] + 2))

giving:

> DF
  c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
1  G  A  G  2  2  3  2  2  3   3   2
2  T  C  T  3  3  2  2  3  2   3   2
3  C  C  T  2  3  3  3  3  3   3   2

2) It could also be done like this which is longer but is more direct from the definition of what is wanted:

DF[4:11] <- with(DF, (c1 == c3) * (2 * (DF[4:11] == 1) + 3 * (DF[4:11] == 0)) +
                     (c1 != c3) * (3 * (DF[4:11] == 1) + 2 * (DF[4:11] == 0)))

Note

We used this as the input. Note that c1, c2 and c3 are assumed to be character, not factor, and the remainder numeric.

Lines <- "
c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11  
G A G 1 1 0 1 1 0 0 1
T C T 0 0 1 1 0 1 0 1
C C T 0 1 1 1 1 1 1 0"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

Thank you very much @G. Grothendieck !. Your solution also works very nice — Lucas, Jan 12 '18 at 20:17

score 1 · Accepted Answer · answered Jan 12 '18 at 19:43

Try the following. It uses nested ifelse and an index vector. Maybe there are simpler ways, but this one only uses base R.

fun <- function(x){
    ifelse(inx,
        ifelse(x == 1, 2, 3),
        ifelse(x == 1, 3, 2)
    )
}

inx <- as.character(data$c1) == as.character(data$c3)
data[4:11]  <- lapply(data[4:11], fun)

R. Conditional replace of characters in data frame if two columns match

2 Answers2

Note