-2

I have a huge list of gene names, and I'd like to map corresponding gene IDs to each name. I've tried using this R library: org.Hs.eg.db, but it creates more IDs than names, making it hard to map the results together, especially if the list is long.

Example of an input file (7 gene names):

RPS6KB2
PSME4
PDE4DIP
APMAP
TNRC18
PPP1R26
NAA20

Ideal output would be (7 IDs):

6199
23198
9659
57136
84629
9858
51126

Current output (8 IDs !!):

6199
23198
9659
57136
27320 *undesired output ID*
84629
9858
51126

Any suggestions on how to solve this issue? or use other simple tools to do the required task (map gene IDs)?

This is the code I'm using:

library("org.Hs.eg.db") #load the library

input <- read.csv("myfile.csv",TRUE,",") #read input file

GeneCol = as.character(input$Gene.name) #access the column that has gene names in my file

output = unlist(mget(x = GeneCol, envir = org.Hs.egALIAS2EG, ifnotfound=NA)) #get IDs

write.csv(output, file = "GeneIDs.csv") #write the list of IDs to a CSV file
Bayram Sarilmaz
  • 103
  • 1
  • 3
  • 13

1 Answers1

3

use mapIds() on your org.Hs.eg.db package. But the reason you're seeing 8 ids is because the mapping between symbols is not 1:1. You'll need to decide on a strategy for dealing with such multiple maps. Also, ask questions about Bioconductor packages on the Bioconductor support site https://support.bioconductor.org .

Here's a complete example (note how I do not need your file 'myfile.csv' to run this, so it is easy to reproduce)

library(org.Hs.eg.db)
symbol <- c(
    "RPS6KB2", "PSME4", "PDE4DIP", "APMAP", "TNRC18",
    "PPP1R26", "NAA20"
)
mapIds(org.Hs.eg.db, symbol, "ENTREZID", "SYMBOL")

The output is

> mapIds(org.Hs.eg.db, symbol, "ENTREZID", "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
RPS6KB2   PSME4 PDE4DIP   APMAP  TNRC18 PPP1R26   NAA20 
 "6199" "23198"  "9659" "57136" "84629"  "9858" "51126" 
Martin Morgan
  • 45,935
  • 7
  • 84
  • 112