0

I have a tab delineated table in which the last three columns contain statistical values, and I would like to retrieve only rows in which the columns are ranked in a desired manner. The rows I am interested in are those where the values in the columns are ranked as such A_C>A_B>B_C.

Here is an example of the table:

marker  chr A_B A_C B_C
rs1000073   1   0.097328991622858   0.101954778294364   0.0155614929271569
rs1000283   1   0.194891573233045   0.0612572864045251  0.0287416461802493
rs1000352   1   0.146693199204067   0.166583183464355   -0.00301950205401285
rs1000451   1   0.116693199204067   0.266583183464355   0.00401950205401285

So in this case, I would only want to retrieve the rs1000352 and rs1000073 rows (the actual table has more than a million rows in it, but you get the idea).

From there I will write the rows of interest to a new tab deliminated text file (I know how to do this part).

Does anyone have any suggestions on how to do this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user2439887
  • 61
  • 1
  • 11

3 Answers3

2

Do you mean this (after using read.table to initialize my.df):

my.df.new <- subset(my.df, (A_C>A_B) & (A_B>B_C))

(...which also appears to return rs1000451, but it seems like that is intended.)

texb
  • 547
  • 2
  • 13
2

a data.table solution, syntax sugar!:

DT <- data.table(dt)
dt <- DT[(A_C>A_B) & (A_B>B_C)]

You can even check the result visually:

library(reshape2)
dtl <- melt(dt)
library(ggplot2)
ggplot(subset(dtl,variable!='chr'))+
  geom_point(aes(marker,value,color=variable),size=5)
ggplot(subset(dtl,variable!='chr'))+
  geom_point(aes(marker,value,color=reorder(variable,value)),size=5)

enter image description here

agstudy
  • 119,832
  • 17
  • 199
  • 261
1

An alternative if you wanna get the indices of the rows:

df<- data.frame(marker = c('rs1000073','rs1000283','rs1000283', 'rs1000352'), A_B= c(0.097328991622858, 0.194891573233045, 0.146693199204067, 0.116693199204067), 
            A_C= c(0.101954778294364,0.0612572864045251,0.166583183464355,0.266583183464355), B_C = c(0.0155614929271569, 0.0287416461802493,  -0.00301950205401285,0.00401950205401285))
i<- which((df$A_C>df$A_B )& (df$A_B>df$B_C))
WAF
  • 1,141
  • 20
  • 44