0

I am running an enrichment analysis with gseapy enrichr on a list of genes. I am using the following code:

enr_res = gseapy.enrichr(gene_list = glist[:5000], 
                            organism = 'Mouse', 
                            gene_sets = ['GO_Biological_Process_2021'],
                            description = 'pathway',
                            #cutoff = 0.5
                            )

The result looks like this:

enr_res.results.head(10)

enter image description here

The question I have is, how do I get the full set of Genes (very right column in the picture) used for the individual pathways?

If I try the following code, it will just give me the already displayed genes. I added some correction to have a list that I then could further use for the analysis.

x = 'fatty acid beta-oxidation (GO:0006635)'

g_list = enr_res.results[enr_res.results.Term == x]['Genes'].to_string()

deliminator = ';'
g_list = [section + deliminator for section in g_list.split(deliminator) if section]

g_list = [s.replace(';', '') for s in g_list]
g_list = [s.replace(' ', '') for s in g_list]
g_list = [s.replace('.', '') for s in g_list]

first_gene = g_list[0:1]
first_gene = [sub[1 : ] for sub in first_gene]

g_list[0:1] = first_gene
for i in range(len(g_list)):
    g_list[i] = g_list[i].lower()
for i in range(len(g_list)):
    g_list[i] = g_list[i].capitalize()

g_list

I think my approach might be wrong to get all the Genes and I just get the displayed genes. Does somebody has an idea, how it is possible to get all genes?

Greenline
  • 13
  • 5
  • Your code is slightly overabundant but it should show a list of genes according to the condition. Please, could you provide a link to your results or a part of them? It is too hard to help otherwise. – Vovin Jun 30 '22 at 13:50
  • The result of this code gives me the displayed genes as shown above. But there are the three dots that indicate that there are more genes... However this can not be read out. – Greenline Jul 04 '22 at 08:56
  • I found a solution. Basically it was due to a limitation of displayed characters in jupyter lab. However I do not understand why this was affecting the actual output of genes. – Greenline Jul 07 '22 at 07:51

1 Answers1

0
pd.set_option('display.max_colwidth', 3000)

This increases the number of displayed characters and somehow this solves the problem for me. :)

Greenline
  • 13
  • 5