Apologies if a similar question has been asked before, I was unable to find one, possibly because of wording of the question.
Some current sample data looks like this, where the first column is a list of identifiers (genes) and the second column is a set of descriptors (gene ontology IDs):
Gene Gene_Ontology_ID
Gene1 GO1, GO2, GO4, GO6
Gene2 GO2, GO3, GO4
Gene3 GO5, GO7
I am wondering if there is an efficient way to transform a large table formatted similarly so that the "Gene_Ontology_ID" column now serves as the indentifying column, and the "Gene" column is now a list of genes with that Gene_Ontology_ID, like this:
Gene_Ontology_ID Gene
GO1 Gene1
GO2 Gene1,Gene2
GO3 Gene2
GO4 Gene1,Gene2
GO5 Gene3
GO6 Gene1
GO7 Gene3
Is there a solution to this, preferably using Unix, Python, or R? Any help would be greatly appreciated, thank you.