0

I have used lapply along with biomart to extract the homologues for 3 different species. I also need to extract the target IDs for all of the homologues and I was hoping to also use lapply for the target IDs as well to make my code more efficient. The code I have so far is below:

Load Biomart:

library(biomaRt)

Set the species vector

species <- c("hsapiens", "mmusculus", "ggallus")

Make a connection to ensembl for all species

ensembl_hsapiens <- useMart("ensembl", 
                            dataset = "hsapiens_gene_ensembl")
ensembl_mmusculus <- useMart("ensembl", 
                         dataset = "mmusculus_gene_ensembl")
ensembl_ggallus <- useMart("ensembl",
                           dataset = "ggallus_gene_ensembl")  

Get the human genes

hsapien_PC_genes <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), 
                          filters = "biotype", 
                          values = "protein_coding", 
                          mart = ensembl_hsapiens)

ensembl_gene_ID <- hsapien_PC_genes$ensembl_gene_id

Get the homologues but exclude humans as these have already been retrieved by using species[2:9]

all_homologues <- list()



all_homologues <- lapply(species[2:9], function(s) getBM(attributes = c("ensembl_gene_id", 
                                                                   "external_gene_name", 
                                                                   paste0(s, c("_homolog_ensembl_gene",
                                                                               "_homolog_associated_gene_name"))),
                                                    filters = "ensembl_gene_id",
                                                    values = c(ensembl_gene_ID),
                                                    mart = ensembl_hsapiens))

This is where I run into problems, I don't know how to subset the ensembl_gene_id for each species and use lapply to run it. What I have tried so far is below:

target_id <- list()

target_id <- lapply(species, function(s) getBM(attributes = c("ensembl_gene_id", 
                                             "external_gene_name", 
                                             "hsapiens_homolog_associated_gene_name", 
                                             "hsapiens_homolog_perc_id"), 
                              filters = "ensembl_gene_id", 
                              values = c(all_homologues[[]][["ensembl_gene_id"]]), 
                              mart = get(paste0("ensembl_", s))))

I can get it to work the normal way like this:

target_id[["mmusculus"]] <- getBM(attributes = c("ensembl_gene_id", 
                                             "external_gene_name", 
                                             "hsapiens_homolog_associated_gene_name", 
                                             "hsapiens_homolog_perc_id"), 
                              filters = "ensembl_gene_id", 
                              values = c(all_homologues[["mmusculus"]]$ensembl_gene_id), 
                              mart = ensembl_mmusulus)

target_id[["ggallus"]] <- getBM(attributes = c("ensembl_gene_id", 
                                                 "external_gene_name", 
                                                 "hsapiens_homolog_associated_gene_name", 
                                                 "hsapiens_homolog_perc_id"), 
                                  filters = "ensembl_gene_id", 
                                  values = c(all_homologues[["ggallus"]]$ensembl_gene_id), 
                                  mart = ensembl_ggallus)

But this is not as efficient as getting r to automatically change the species for me

Jack Dean
  • 163
  • 1
  • 7
  • Is this a double post from a week ago? Please dont make duplicate posts about essentially the same topic. – Amar Apr 30 '18 at 10:17
  • I appreciate the help, but the two questions are similar although they are actually asking for different things. In the previous question, I am asking what would be the best way to change the species name in my code. I now know I that the best way is to use lapply, what I don't know is, is how to change the species name for both the values and the mart at the same time. I have tried using paste0 but I think I'm having an issue subsetting from the list. Hope this makes my question clearer – Jack Dean Apr 30 '18 at 14:15

1 Answers1

2

I have found a solution:

target_id <- lapply(species[-1], function(s) getBM(attributes = c("ensembl_gene_id", 
                                             "external_gene_name", 
                                             "hsapiens_homolog_associated_gene_name", 
                                             "hsapiens_homolog_perc_id"), 
                              filters = "ensembl_gene_id", 
                              values = all_homologues[[paste0(s)]][paste0(s, "_homolog_ensembl_gene")], 
                              mart = ensembl[[paste0(s)]]))
Jack Dean
  • 163
  • 1
  • 7