-2

I am doing snp to gene mapping and after mapping with 50 kb of snp, I have following file. for example (snp,gene,pvalue)

 1. ars113  ap1 0.1 
 2. ars113  ap1 0.1 
 3. ars113  ap1 0.2 
 4. ars113  ap1 0.2
 5. ars113  ap2 0.1 
 6. ars113  ap2 0.2 
 7. ars114  ap6 0.1 
 8. ars114  ap6 0.3

How do i choose only the markers with lowest p value for each gene? Is there other easy way for whole process.

Prradep
  • 5,506
  • 5
  • 43
  • 84

2 Answers2

0

This should do it.

aggregate(pvalue ~ gene + snp, df, min)

Or if you want wider format:

tapply(df$pvalue, INDEX=list(df$gene, df$snp), min)
emilliman5
  • 5,816
  • 3
  • 27
  • 37
0

Read in the file as a dataframe, then group the dataframe by gene and filter for the minimum pvalue for each gene.

library(dplyr)
library(readr)

df <- read_delim("filename.txt", delim = " ", col_names = c("snp", "gene", "pvalue"))
df %>% group_by(gene) %>% filter(pvalue == min(pvalue))
yeedle
  • 4,918
  • 1
  • 22
  • 22