6
df <- data.frame(
    exp=c(1,1,2,2),
  name=c("gene1", "gene2", "gene1", "gene2"),
    value=c(1,1,3,-1)
    )

In trying to get customed to the dplyr and reshape2I stumbled over a "simple" way to select rows based on several conditions. If I want to have those genes (the namevariable) that have valueabove 0 in experiment 1 (exp== 1) AND at the same time valuebelow 0 in experiment 2; in df this would be "gene2". Sure there must be many ways to this, e.g. subset df for each set of conditions (exp==1 & value > 0, and exp==2 and value < 0) and then join the results of these subset:

library(dplyr)    
inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]

Although this works it looks very akward, and I feel that such conditioned filtering lies at the heart of reshape2 and dplyr but cannot figure out how to do this. Can someone enlighten me here?

talat
  • 68,970
  • 21
  • 126
  • 157
user3375672
  • 3,728
  • 9
  • 41
  • 70

4 Answers4

17

One alternative that comes to mind is to transform the data to a "wide" format and then do the filtering.

Here's an example using "data.table" (for the convenience of compound-statements):

library(data.table)
dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0]
#     name 1  2
# 1: gene2 1 -1

Similarly, with "dplyr" and "tidyr":

library(dplyr)
library(tidyr)
df %>% 
  spread(exp, value) %>% 
  filter(`1` > 0 & `2` < 0)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • This solution is so easy and brilliant at the same time! Thanks! I t has allowed me to learn a lot! By the way, may I ask you a question? Why should the new column names be expressed as `\`1\`` instead of `'1'` or `"1"`? Thank you! – Francisco Rodríguez Algarra Dec 01 '14 at 15:47
  • @FranciscoRodriguezAlgarra, I didn't try the others, to tell the truth. Generally, when dealing with problematic variable names, I go straight to the backticks, which is also recommended from the help page at `?Quote`. – A5C1D2H2I1M1N2O1R2T1 Dec 01 '14 at 16:35
  • @Ananda Mahto, could you expland a little on your comment regarding backticks - is it the variable names that might be problematic (since they have names ´1´ and '2')? – user3375672 Dec 01 '14 at 16:55
  • 2
    @user3375672, yes. `dcast` doesn't try to make syntactically valid names, so in this case we end up with columns named "1" & "2". Since they are not syntactically valid, they need to be quoted in some way. – A5C1D2H2I1M1N2O1R2T1 Dec 01 '14 at 16:59
4

Another dplyr option is:

group_by(df, name) %>% filter(value[exp == 1] > 0 & value[exp == 2] < 0)

#Source: local data frame [2 x 3]
#Groups: name
#
#  exp  name value
#1   1 gene2     1
#2   2 gene2    -1
talat
  • 68,970
  • 21
  • 126
  • 157
1

Probably this is even more convoluted than your own solution, but I think it has a "dplyr" feel:

df %>% 
    filter((exp == 1 & value > 0) | (exp == 2 & value < 0)) %>% 
    group_by(name) %>% 
    filter(length(unique(exp)) == 2) %>% 
    select(name) %>% 
    unique()

#Source: local data frame [1 x 1]
#Groups: name

#   name
#1 gene2
1

filter allows multiple parameters with comma, sames as select. Each extra condition is an AND:

group_by(df, name) %>% filter(value[exp == 1] > 0, value[exp == 2] < 0)

From official documentation: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

The examples shown there are:

  • flights[flights$month == 1 & flights$day == 1, ] in base R

  • filter(flights, month == 1, day == 1) in dplyr.

Pablo Casas
  • 868
  • 13
  • 15