0

Dear fellow Stackoverflow users,

I am a beginner in using R language for the purpose of analysing biological data and am facing a problem that I haven't been able to solve yet - maybe someone more experienced can help me out on this?

I have a large data frame which is a binary matrix. each row represents a different gene; each column a different condition in an experiment.

"1" in a cell indicates that gene is present in the given condition, "0" indicates the gene is not present.

How do I get a vector with rownames of the rows that contain a "1" only in a given column, but no other column (i.e., genes that are uniquely present in that condition?)

And how can I get a vector with rownames of the rows that contain "1" in a specified set of columns but "0" in all other columns (i.e., genes that are uniquely present in conditions /colums 1,2 and 5 for example?

I am looking forward to your suggestions!

Many thanks:-)

Cettt
  • 11,460
  • 7
  • 35
  • 58
Rose Cave
  • 23
  • 2

1 Answers1

0

here is a possibility using the tidyverse package. Since you did not provide any data I created some dummy data which looks like this:

EDIT: I included rownames

> mydata
      A B C D E
id_1 0 1 1 0 0
id_2 0 1 0 1 0
id_3 1 1 1 1 0
id_4 1 0 0 0 0
id_5 0 0 1 1 1
id_6 1 0 1 0 0

So I have six rows (named id_1 to id_6) with 5 columns named A to E.

Say I want to filter all rows where "B" and "D" are equal to 1 and the other columns are equal to zero. This can be done like this:

library(tidyverse)
mydata %>% as_tibble(rownames = "id") %>% 
  filter_at(vars(c("B", "D")), all_vars(. == 1)) %>% 
  filter_at(vars(-c("B", "D", "id")), all_vars(. == 0))

# A tibble: 1 x 6
  id        A     B     C     D     E
  <chr> <int> <int> <int> <int> <int>
1 id_2     0     1     0     1     0
Cettt
  • 11,460
  • 7
  • 35
  • 58
  • Thanks, that seems to work for returning the rows :) However, I would like to get the rownames of the returned rows since those contain the gene names I am interested in. Do you have any idea how to return the rownames? I haven't managed so far – Rose Cave Mar 06 '19 at 16:05
  • Is there a way to keep the rownames in the returned data frame? – Rose Cave Mar 06 '19 at 16:25