Symmetric percent change between data frames in R

Question

I have two data frames. It is easy to calculate percent change from t1 to t2 like this:

t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")

t3 <- (t2-t1)/t1 *100

t3
#>             gene1      gene2      gene3
#> patient1      -50        100        -50
#> patient2        0        300        300
#> patient3      100        700        -75

but what if I want to do symmetric percent change such that a value change from 20 to 5 would not be -75, but -300. I tried this:

t3 <- ifelse(t2 > t1, ((t2-t1)/t1) * 100, ((t2-t1)/t2) * 100)

but that gives me some weird list of 3x9.

In principle using ifelse should work. If I reduce the complexity then it works just fine

t3 <- ifelse(t2 > t1, "a", "b")
t3
#>             gene1      gene2      gene3
#> patient1        b          a          b
#> patient2        b          a          a
#> patient3        a          a          b

Ideally my output would be:

t3
#>             gene1      gene2      gene3
#> patient1     -100       100        -100
#> patient2        0       300         300
#> patient3      100       700        -300

Just to be clear: the `-100` (at 1, 1 in `t3`) means that the value was reduced by 100% of the later value (from 1 down to 0,5), right? — David, Oct 14 '20 at 15:22
Wouldn't that imply that the for gene 1, patient 3 the change would be 50 (= `(20 - 10) / 20 * 100`) instead of the 100 you put in your expected output? — David, Oct 14 '20 at 15:28
@David: by the nonsymmetric calculation, yes. But I'm interested in doing in a symmetric fashion like: ifelse(t2 > t1, ((t2-t1)/t1) * 100, ((t2-t1)/t2) * 100) — strugglebus, Oct 14 '20 at 15:39

David · Accepted Answer · 2020-10-14T16:01:32.213

How about this one?

# recreate your data
t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")

t1
#>          gene1 gene2 gene3
#> patient1     1     1     5
#> patient2     5     1     5
#> patient3    10     1    20

t2
#>          gene1 gene2 gene3
#> patient1   0.5     2   2.5
#> patient2   5.0     4  20.0
#> patient3  20.0     8   5.0

# iterate over each column and compute the ifelse...
res <- lapply(seq_len(ncol(t1)), function(i) {
  x <- t2[, i]
  y <- t1[, i]
  diff <- x - y
  ifelse(x > y, diff / y, diff / x) * 100
})
# convert to data.frame and reset the names and rownames
res <- as.data.frame(res)
rownames(res) <- rownames(t1)
names(res) <- names(t1)
res
#>          gene1 gene2 gene3
#> patient1  -100   100  -100
#> patient2     0   300   300
#> patient3   100   700  -300

^{Created on 2020-10-14 by the reprex package (v0.3.0)}

Edit

Even better and probably faster:

t3 <- (t2 - t1) / pmin(t1, t2) * 100
t3
#>          gene1 gene2 gene3
#> patient1  -100   100  -100
#> patient2     0   300   300
#> patient3   100   700  -300

Note that pmin, similar to ifelse applies the min function element wise to each iteration of elements of its inputs, thus pmin(t1, t2) returns a data.frame of the min values at each location, saving us the ifelse statement.

Yes! The second answer was beautifully simple. You get the bounty after the 24 hour waiting period — strugglebus, Oct 14 '20 at 16:23
Yeah, the solution was really simple, but it also took me a for loop and the lapply solution to find it :D — David, Oct 14 '20 at 16:26
This is outside the scope of the original post, but do you know why the ifelse statement acts so strangely? It seems to be running the test expression on each cell, but outputting the results for the entire column...or something — strugglebus, Oct 14 '20 at 17:54
ifelse works best if given a vector (you can get its code by typing `ifelse` in the console), so internally, the test (`t2 > t1`) is turned into a vector. Then for each element (9 elements in total), the yes/no part of the call is evaluated at the appropriate position, thus a list of 9 elements is returned. Makes sense? — David, Oct 14 '20 at 18:25

Symmetric percent change between data frames in R

1 Answers1

Edit