Is there any way to replace values of df in R using sum of rows?

Question

I have an issue which looks like easy to solve, but I'm stuck. I have a dataframe composed of columns (significant pathways retrieved from GSEA) and rows (entrez gene ids). In this data frame there are 1 if a gene is present in a pathway or 0 when not. This is my data frame:

                         Path_A      Path_B       Path_C
Gene_1                   0           1            0
Gene_2                   1           1            0
Gene_3                   0           0            1
Gene_4                   1           1            1

I want to sum the rows (genes) to calculate how many times a gene is present in distinct pathways, and thus get something like this:

                          Path_A      Path_B       Path_C
Gene_1                   0           1            0
Gene_2                   2           2            0
Gene_3                   0           0            1
Gene_4                   3           3            3

At this point, I tried using my_df <- mutate(my_df, sum = rowSums(my_df)) to create a new column sum and then recode the 1 with sum value for each pathway column; however, I failed.

Thanks in advance

score 4 · Answer 1 · answered Aug 24 '21 at 22:38

Use rowSums, replicate by row and assign it to the rows

df1[] <- rowSums(df1, na.rm = TRUE)[row(df1)] * df1

-output

> df1
       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3

data

df1 <- structure(list(Path_A = c(0L, 1L, 0L, 1L), Path_B = c(1L, 1L, 
0L, 1L), Path_C = c(0L, 0L, 1L, 1L)), class = "data.frame", 
row.names = c("Gene_1", 
"Gene_2", "Gene_3", "Gene_4"))

score 3 · Accepted Answer · answered Aug 24 '21 at 22:56

3

You could use dplyr but the base R solution akrun posted is more reasonable:

library(dplyr)

df1 %>% 
  mutate(across(Path_A:Path_C, ~ .x * rowSums(across(Path_A:Path_C))))

returns

       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3

answered Aug 24 '21 at 22:56

Martin Gal

16,640
5
21
39

Thanks for your help! It worked really fine! Both answers solved my issue. – necrosnake Aug 25 '21 at 00:00

score 2 · Answer 3 · answered Aug 24 '21 at 23:56

2

Here is a dplyr variation: I thought to use across with rowSums, but as I recently learned: Using . in the rowSums bypasses the across() we could do it with a helper column:

library(dplyr)
df1 %>% 
    mutate(helper = rowSums(.)) %>% 
    mutate(across(everything(), ~ifelse(. != 0, helper, .))) %>% 
    select(-helper)

       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3

answered Aug 24 '21 at 23:56

TarJae

72,363
6
19
66

Doesn't matter at all, but I think you could remove the `ifelse` function and simplify to `~(. != 0) * helper)`. The condition inside the brackets evaluates to `FALSE` for `0` which is basically `0`. – Martin Gal Aug 25 '21 at 11:08

Is there any way to replace values of df in R using sum of rows?

3 Answers3

data