0

My data are genes in a list of list structure, like this:

>listoflists <- list(samp1 = c("ENSG00000000003", "ENSG00000000005", "ENSG00000000419", "ENSG00000000457"),
              samp2 = c("ENSG00000002834", "ENSG00000002919", "ENSG00000002933"),
              samp3 = c("ENSG00000000971", "ENSG00000001036", "ENSG00000001084", "ENSG00000001167"))

I'm trying to convert gene identifiers. When working with similar data in a dataframe structure, I've successfully used code like this:

>library(org.Hs.eg.db)
>gene_df$symbol <- mapIds(org.Hs.eg.db,keys=rownames(gene_df),column="SYMBOL",keytype="ENSEMBL",multiVals="first")

But now I'm working with a list of lists. I would like to keep the same structure, and I think the answer provided here should give me insight, but when I try to use a nested apply command like this:

>convertedLoL <- lapply(listoflists, function(x) lapply(listoflists[x], function(i)mapIds(org.Hs.eg.db,keys=listoflists[i],column="SYMBOL",keytype="ENSEMBL",multiVals="first")))
 Error in listoflists[[i]] : 
  attempt to select less than one element in get1index 

>convertedLoL <- lapply(listoflists, function(x) lapply(listoflists[x], function(i)mapIds(org.Hs.eg.db,keys=listoflists[[x]][[i]],column="SYMBOL",keytype="ENSEMBL",multiVals="first")))
 Error in listoflists[[x]] : no such index at level 1 

I keep getting errors. I think my issues stem from the fact that I don't fully comprehend how apply works and how to reference lists. Could someone help me?

EDIT

I thought I'd figured it out, but it still isn't quite right.

>convertedLoL <- lapply(listoflists, function(x) sapply(x, function(i)mapIds(org.Hs.eg.db,keys=i,column="SYMBOL",keytype="ENSEMBL",multiVals="first")))

will give me what might be a list of a list of a list. It's also REALLY slow. So I still need help...

strugglebus
  • 45
  • 1
  • 9

1 Answers1

3

You show a list of vectors in your example. You could simply do:

lapply(listoflists, function(x) mapIDs(org.Hs.eg.db, keys=x, column="SYMBOL", keytype="ENSEMBL", multiVals="first")))

Regarding speed, with many lists (or vectors and maybe overlapping elements) you could be better off mapping all (used) IDs to SYMBOL once and then doing a lookup on that data.frame/data.table/named vector.

# get all ids used in the lists as named vector
geneids <- unique(Reduce(c, listoflists))
key.table <- select(org.Hs.eg.db, keys = geneids, columns = c("SYMBOL","ENSEMBL"),
    keytype = "ENSEMBL")
keys <- setNames(key.table$SYMBOL, key.table$ENSEMBL)

convertedLoL <- lapply(listoflists, function(x) keys[x])

user12728748
  • 8,106
  • 2
  • 9
  • 14