Isolating rows from a table that have columns ranked in a desired manner

Question

I have a tab delineated table in which the last three columns contain statistical values, and I would like to retrieve only rows in which the columns are ranked in a desired manner. The rows I am interested in are those where the values in the columns are ranked as such A_C>A_B>B_C.

Here is an example of the table:

marker  chr A_B A_C B_C
rs1000073   1   0.097328991622858   0.101954778294364   0.0155614929271569
rs1000283   1   0.194891573233045   0.0612572864045251  0.0287416461802493
rs1000352   1   0.146693199204067   0.166583183464355   -0.00301950205401285
rs1000451   1   0.116693199204067   0.266583183464355   0.00401950205401285

So in this case, I would only want to retrieve the rs1000352 and rs1000073 rows (the actual table has more than a million rows in it, but you get the idea).

From there I will write the rows of interest to a new tab deliminated text file (I know how to do this part).

Does anyone have any suggestions on how to do this?

@Thomas in what this comment can help the OP? "I know how to do this part"? I think you miss read the question. — agstudy, Jul 23 '13 at 12:09
@agstudy I missed the "I know how to do this part" (re: writing), but the OP still does not show us what they've actually tried. — Thomas, Jul 23 '13 at 12:22
@Thomas good point! The OP never show what he was tried even in his previous question. — agstudy, Jul 23 '13 at 12:24

score 2 · Answer 1 · answered Jul 23 '13 at 12:02

2

Do you mean this (after using read.table to initialize my.df):

my.df.new <- subset(my.df, (A_C>A_B) & (A_B>B_C))

(...which also appears to return rs1000451, but it seems like that is intended.)

answered Jul 23 '13 at 12:02

texb

547
2
13

score 2 · Accepted Answer · answered Jul 23 '13 at 12:26

a data.table solution, syntax sugar!:

DT <- data.table(dt)
dt <- DT[(A_C>A_B) & (A_B>B_C)]

You can even check the result visually:

library(reshape2)
dtl <- melt(dt)
library(ggplot2)
ggplot(subset(dtl,variable!='chr'))+
  geom_point(aes(marker,value,color=variable),size=5)
ggplot(subset(dtl,variable!='chr'))+
  geom_point(aes(marker,value,color=reorder(variable,value)),size=5)

enter image description here

score 1 · Answer 3 · answered Jul 23 '13 at 12:07

An alternative if you wanna get the indices of the rows:

df<- data.frame(marker = c('rs1000073','rs1000283','rs1000283', 'rs1000352'), A_B= c(0.097328991622858, 0.194891573233045, 0.146693199204067, 0.116693199204067), 
            A_C= c(0.101954778294364,0.0612572864045251,0.166583183464355,0.266583183464355), B_C = c(0.0155614929271569, 0.0287416461802493,  -0.00301950205401285,0.00401950205401285))
i<- which((df$A_C>df$A_B )& (df$A_B>df$B_C))

Isolating rows from a table that have columns ranked in a desired manner

3 Answers3