1

I have an issue which looks like easy to solve, but I'm stuck. I have a dataframe composed of columns (significant pathways retrieved from GSEA) and rows (entrez gene ids). In this data frame there are 1 if a gene is present in a pathway or 0 when not. This is my data frame:

                         Path_A      Path_B       Path_C
Gene_1                   0           1            0
Gene_2                   1           1            0
Gene_3                   0           0            1
Gene_4                   1           1            1

I want to sum the rows (genes) to calculate how many times a gene is present in distinct pathways, and thus get something like this:

                          Path_A      Path_B       Path_C
Gene_1                   0           1            0
Gene_2                   2           2            0
Gene_3                   0           0            1
Gene_4                   3           3            3

At this point, I tried using my_df <- mutate(my_df, sum = rowSums(my_df)) to create a new column sum and then recode the 1 with sum value for each pathway column; however, I failed.

Thanks in advance

necrosnake
  • 41
  • 6

3 Answers3

4

Use rowSums, replicate by row and assign it to the rows

df1[] <- rowSums(df1, na.rm = TRUE)[row(df1)] * df1

-output

> df1
       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3

data

df1 <- structure(list(Path_A = c(0L, 1L, 0L, 1L), Path_B = c(1L, 1L, 
0L, 1L), Path_C = c(0L, 0L, 1L, 1L)), class = "data.frame", 
row.names = c("Gene_1", 
"Gene_2", "Gene_3", "Gene_4"))
akrun
  • 874,273
  • 37
  • 540
  • 662
3

You could use dplyr but the base R solution akrun posted is more reasonable:

library(dplyr)

df1 %>% 
  mutate(across(Path_A:Path_C, ~ .x * rowSums(across(Path_A:Path_C))))

returns

       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
2

Here is a dplyr variation: I thought to use across with rowSums, but as I recently learned: Using . in the rowSums bypasses the across() we could do it with a helper column:

library(dplyr)
df1 %>% 
    mutate(helper = rowSums(.)) %>% 
    mutate(across(everything(), ~ifelse(. != 0, helper, .))) %>% 
    select(-helper)
       Path_A Path_B Path_C
Gene_1      0      1      0
Gene_2      2      2      0
Gene_3      0      0      1
Gene_4      3      3      3
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Doesn't matter at all, but I think you could remove the `ifelse` function and simplify to `~(. != 0) * helper)`. The condition inside the brackets evaluates to `FALSE` for `0` which is basically `0`. – Martin Gal Aug 25 '21 at 11:08