snp to gene mapping with lowest p value

Question

I am doing snp to gene mapping and after mapping with 50 kb of snp, I have following file. for example (snp,gene,pvalue)

 1. ars113  ap1 0.1 
 2. ars113  ap1 0.1 
 3. ars113  ap1 0.2 
 4. ars113  ap1 0.2
 5. ars113  ap2 0.1 
 6. ars113  ap2 0.2 
 7. ars114  ap6 0.1 
 8. ars114  ap6 0.3

How do i choose only the markers with lowest p value for each gene? Is there other easy way for whole process.

score 0 · Answer 1 · answered Oct 27 '16 at 18:58

0

This should do it.

aggregate(pvalue ~ gene + snp, df, min)

Or if you want wider format:

tapply(df$pvalue, INDEX=list(df$gene, df$snp), min)

answered Oct 27 '16 at 18:58

emilliman5

5,816
3
27
37

score 0 · Answer 2 · answered Oct 27 '16 at 19:09

Read in the file as a dataframe, then group the dataframe by gene and filter for the minimum pvalue for each gene.

library(dplyr)
library(readr)

df <- read_delim("filename.txt", delim = " ", col_names = c("snp", "gene", "pvalue"))
df %>% group_by(gene) %>% filter(pvalue == min(pvalue))

snp to gene mapping with lowest p value

2 Answers2