0

I have downloaded an extensive dataset from NIH GEO and am attempting to convert the Ensembl names in the first column to MGI symbols

The table I've named SOD is shown below

SOD Data - Total rows = 15,396

I used the following code:

setwd("C:/R/Project")
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("biomaRt", version = "3.8")
library(BiocManager)
library(biomaRt)
SOD<-read.csv("Static Organoid Data.csv")
names_only<-data.frame(SOD[,1])
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
Gene_list <- getBM(attributes = c("ensembl_gene_id", "mgi_symbol"),
                   values     = names_only, 
                   mart       = mart)
View(Gene_list)

This outputs a list of ensembl and MGI symbols with over 55,000 rows.

I have tried adding filter = "ensembl_gene_id into the getBM function but the output has 0 rows and 0 columns.

What am I doing wrong here?

neilfws
  • 32,751
  • 5
  • 50
  • 63
mikephel
  • 30
  • 6

1 Answers1

0

Your ensembl IDs are versioned, meaning that they are of the form they have a .# whereas the ensembl ids in biomart aren't. To fix this you need to remove the .# at the end of the names as follows:

names_only <- gsub("\\.*","",data.frame(SOD[,1]))
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
Gene_list <- getBM(attributes = c("ensembl_gene_id", "mgi_symbol"),
                   values     = names_only,
                   filter     = "ensembl_gene_id",
                   mart       = mart)
GordonShumway
  • 1,980
  • 13
  • 19
  • Thanks for the advice! I threw your code directly in and now when I do 'head(Gene_list)' my output is '[1] ensembl_gene_id mgi_symbol <0 rows> (or 0-length row.names)' Something else silly I'm doing here? – mikephel Nov 21 '18 at 03:03
  • can you add the output of dput(head(SOD[,1)) to your question so I can test the code – GordonShumway Nov 21 '18 at 16:47