3

I am very new with the GO analysis and I am a bit confuse how to do it my list of genes.

I have a list of genes (n=10):

gene_list

    SYMBOL ENTREZID                              GENENAME
1    AFAP1    60312   actin filament associated protein 1
2  ANAPC11    51529 anaphase promoting complex subunit 11
3   ANAPC5    51433  anaphase promoting complex subunit 5
4     ATL2    64225                     atlastin GTPase 2
5    AURKA     6790                       aurora kinase A
6    CCNB2     9133                             cyclin B2
7    CCND2      894                             cyclin D2
8    CDCA2   157313      cell division cycle associated 2
9    CDCA7    83879      cell division cycle associated 7
10  CDCA7L    55536 cell division cycle associated 7-like

and I simply want to find their function and I've been suggested to use GO analysis tools. I am not sure if it's a correct way to do so. here is my solution:

x <- org.Hs.egGO

# Get the entrez gene identifiers that are mapped to a GO ID

    xx<- as.list(x[gene_list$ENTREZID])

So, I've got a list with EntrezID that are assigned to several GO terms for each genes. for example:

> xx$`60312`
$`GO:0009966`
$`GO:0009966`$GOID
[1] "GO:0009966"

$`GO:0009966`$Evidence
[1] "IEA"

$`GO:0009966`$Ontology
[1] "BP"


$`GO:0051493`
$`GO:0051493`$GOID
[1] "GO:0051493"

$`GO:0051493`$Evidence
[1] "IEA"

$`GO:0051493`$Ontology
[1] "BP"

My question is : how can I find the function for each of these genes in a simpler way and I also wondered if I am doing it right or? because I want to add the function to the gene_list as a function/GO column.

Thanks in advance,

Amit Kumar Gupta
  • 17,184
  • 7
  • 46
  • 64
user3576287
  • 932
  • 3
  • 16
  • 30
  • You can find some good information on https://www.bioconductor.org/help/workflows/ – JeremyS Feb 09 '16 at 09:36
  • Check out clusterProfiler. You can do a GO analysis and plot results in two lines. http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html – Leo Brueggeman Mar 29 '19 at 16:33

1 Answers1

4

EDIT: There is a new Bioinformatics SE (currently in beta mode).


I hope I get what you are aiming here.

BTW, for bioinformatics related topics, you can also have a look at biostar which have the same purpose as SO but for bioinformatics

If you just want to have a list of each function related to the gene, you can query database such ENSEMBl through the biomaRt bioconductor package which is an API for querying biomart database. You will need internet though to do the query.

Bioconductor proposes packages for bioinformatics studies and these packages come generally along with good vignettes which get you through the different steps of the analysis (and even highlight how you should design your data or which would be then some of the pitfalls).

In your case, directly from biomaRt vignette - task 2 in particular:

Note: there are slightly quicker way that the one I reported below:

# load the library
library("biomaRt")

# I prefer ensembl so that the one I will query, but you can
# query other bases, try out: listMarts() 
ensembl=useMart("ensembl")

# as it seems that you are looking for human genes:
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
# if you want other model organisms have a look at:
#listDatasets(ensembl)

You need to create your query (your list of ENTREZ ids). To see which filters you can query:

filters = listFilters(ensembl)

And then you want to retrieve attributes : your GO number and description. To see the list of available attributes

attributes = listAttributes(ensembl)

For you, the query would look like something as:

goids = getBM(

        #you want entrezgene so you know which is what, the GO ID and
        # name_1006 is actually the identifier of 'Go term name'
        attributes=c('entrezgene','go_id', 'name_1006'), 

        filters='entrezgene', 
        values=gene_list$ENTREZID, 
        mart=ensembl)

The query itself can take a while.

Then you can always collapse the information in two columns (but I won't recommend it for anything else that reporting purposes).

Go.collapsed<-Reduce(rbind,lapply(gene_list$ENTREZID,function(x)
                           tempo<-goids[goids$entrezgene==x,]
                           return(
                                   data.frame('ENTREZGENE'= x,
                                  'Go.ID'= paste(tempo$go_id,collapse=' ; '),
                                  'GO.term'=paste(tempo$name_1006,collapse=' ; '))
)


Edit:

If you want to query a past version of the ensembl database:

ens82<-useMart(host='sep2015.archive.ensembl.org',
               biomart='ENSEMBL_MART_ENSEMBL',
               dataset='hsapiens_gene_ensembl')

and then the query would be:

goids = getBM(attributes=c('entrezgene','go_id', 'name_1006'),  
        filters='entrezgene',values=gene_list$ENTREZID, 
        mart=ens82)


However, if you had in mind to do a GO enrichment analysis, your list of genes is too short.
Mitra
  • 655
  • 8
  • 16
  • There are probably better and more elegant way to collapse the information with `dplyr` – Mitra Feb 09 '16 at 10:55
  • Thanks alot! I still have problem with getBM because BioMart web services are temporarily down! :/ "Error in value[[3L]](cond) : Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down." – user3576287 Feb 09 '16 at 14:52
  • @user3576287 Actually, yes it seems so. If you don't mind using a slightly older version of ensembl (i.e. version released in september 2015 instead of december 2015) it works. To retrive the previous version: `ens82<-useMart(host='sep2015.archive.ensembl.org', biomart='ENSEMBL_MART_ENSEMBL', dataset='hsapiens_gene_ensembl')` And then in the query you should substitute `mart=ensembl` by `mart=ens82` – Mitra Feb 09 '16 at 15:57
  • biomart doesn't seem to have updated since October 2015. – Parsa Jul 17 '17 at 15:45
  • @par, I don't know what you mean either for the GRCh37 or GRCh38 (default), the latest accessible through the R package is 'Ensembl Genes 89' which has been published in May 2017. (Basically, the same version as displayed by the website itself: http://www.ensembl.org/biomart/martview/ – Mitra Jul 18 '17 at 11:17