How can I convert Affymetrix probes into gene Symbols?

Question

I have carried out Affymetrix data analysis with oligo and limma. Now I need to perform the gene enrichment analysis on the upregulated and downregulated genes (on EnrichR, by searching the gene symbols ). However, when I annotated my data (with the clariomshumantranscriptcluster.db library as I am 100% sure that the data belongs to human cells) and found the corresponding gene symbol for each probe ID a lot of ID's gave "NA" values.

I have tried to use DAVID and the Affymetrix.com conversion tool but both give no results. I am very confused after reading this on Affymetrix.com: "Annotations beginning with "TC" refer to the TIGR Mouse Gene Index. Annotations beginning with "HT" (Human) or "ET" (other species) are sequence IDs from The Expressed Gene Anatomy Database (EGAD)." because the ID's that I have are all different, I have some starting with "TC", some starting with "HT" and some that are just a number.

I am not sure if I am doing the query search wrong by selecting the wrong GeneChip or by selecting the wrong NetAffx search; or if I am supposed to carry out 3 different searches after separating the different ID formats between HT, TC and number.

Hi Martina! Welcome to SO. Please consider removing unnecessary tags such as genetic-algorithm and genetic-programming which can be misleading. — Stefano Barbi, Mar 24 '22 at 11:21

score 1 · Answer 1 · answered Mar 24 '22 at 11:11

Here is a an approach that uses the biomaRt package to query the ensembl database.

library(biomaRt)

probes <- c("1007_s_at", "1053_at", "117_at",
            "121_at", "1255_g_at", "1294_at",
            "1316_at", "1320_at", "1405_i_at",
            "1431_at")

mart <- biomaRt::useEnsembl(biomart="ensembl",
                            dataset="hsapiens_gene_ensembl")

biomaRt::getBM(attributes=c("hgnc_symbol", "ensembl_gene_id",
                            "affy_hg_u133_plus_2"),
               filters = "affy_hg_u133_plus_2",
               values = probes,
               mart = mart)

##>    hgnc_symbol ensembl_gene_id affy_hg_u133_plus_2
##> 1         CCL5 ENSG00000274233           1405_i_at
##> 2         DDR1 ENSG00000234078           1007_s_at
##> 3         DDR1 ENSG00000215522           1007_s_at
##> 4         DDR1 ENSG00000230456           1007_s_at
##> 5         DDR1 ENSG00000137332           1007_s_at
##> 6       PTPN21 ENSG00000070778             1320_at
##> 7         RFC2 ENSG00000049541             1053_at
##> 8       GUCA1A ENSG00000048545           1255_g_at
##> 9     GUCA1ANB ENSG00000287363           1255_g_at
##> 10        THRA ENSG00000126351             1316_at
##> 11      CYP2E1 ENSG00000130649             1431_at
##> 12        DDR1 ENSG00000204580           1007_s_at
##> 13        CCL5 ENSG00000271503           1405_i_at
##> 14       HSPA6 ENSG00000173110              117_at
##> 15       HSPA7 ENSG00000225217              117_at
##> 16        PAX8 ENSG00000125618              121_at
##> 17        UBA7 ENSG00000182179             1294_at
##> 18     MIR5193 ENSG00000283726             1294_at

Thank you. However, the list of IDs I have has different formats (some start with "HT", some with "TC" and some are 8 digit numbers) so I have to first separate them according to the format. Do you know what Ensembl datasets I have to use for the code above for the "TC", "HT" and 8-digit formats? — Martina M, Mar 24 '22 at 12:08
@MartinaM Which platform are you using? as for the other questions I am quite sure that enrichment analysis is performed solely on one species. Moreover, most of the gene sets have no "ortholog" e.g. in human and in mouse. So I guess, the analyses in more than one species cannot be put together straightforwardly. — Stefano Barbi, Mar 24 '22 at 14:37
Thank you. I have used R this whole time, I just imported the clariom s. human annotation library and joined it with my gene expression data. I will find the most suitable libraries for the other kinds of files and annotate them separately. — Martina M, Mar 27 '22 at 22:49

score 0 · Answer 2 · answered Mar 26 '22 at 20:25

depends on what you mean with "a lot of IDs". Some IDs refer to control regions and do not have any gene symbol associated, but these are not so many. If there is not any special reason to use limma&co., why not resort to the free Transcriptome Analysis Console (TAC) Software from Affymetrix that provides ID mapping natively and several other functions?

https://www.thermofisher.com/it/en/home/life-science/microarray-analysis/microarray-analysis-instruments-software-services/microarray-analysis-software/affymetrix-transcriptome-analysis-console-software.html

Thank you. I tried to download it but my computer can not open it because it doesn't support Windows applications.. I guess I could try with a virtual machine — Martina M, Mar 28 '22 at 09:43

How can I convert Affymetrix probes into gene Symbols?

2 Answers2