I am looking for a fast way to extract a large number of rows from an even larger table. The top of my table is as follows:
> head(dbsnp)
snp gene distance
rs5 rs5 KRIT1 1
rs6 rs6 CYP51A1 1
rs7 rs7 LOC401387 1
rs8 rs8 CDK6 1
rs9 rs9 CDK6 1
rs10 rs10 CDK6 1
And the dimensions:
> dim(dbsnp)
[1] 11934948 3
I want to select the rows that have the rownames contained in a list:
> head(features)
[1] "rs1367830" "rs5915027" "rs2060113" "rs1594503" "rs1116848" "rs1835693"
> length(features)
[1] 915635
Not surprisingly, the straightforward way of doing this temptable = dbsnp[features,]
takes quite a long time.
I've been looking into ways to do this through the sqldf package in R. I thought that that might be faster. Unfortunately, I can't figure out how to select rows with certain rownames in SQL.
Thanks.