I have two files. One has a series of genes that I'm interested in. The other has genes and their pathways they are associated with. So the first list looks like this:
Solyc08g062250
Solyc02g069270
Solyc07g064990
Solyc09g065800
Solyc02g077620
Solyc01g104400
Solyc02g065290
Solyc02g090220
and another list with these genes and what "pathways" they belong to (this is a sample of the file, the file is much larger and has several pathways and genes):
Solyc10g008120 1,3,5-trimethoxybenzene biosynthesis
Solyc02g069920 1,4-dihydroxy-2-naphthoate biosynthesis I
Solyc04g005180 1,4-dihydroxy-2-naphthoate biosynthesis I
Solyc04g005190 1,4-dihydroxy-2-naphthoate biosynthesis I
Solyc04g005200 1,4-dihydroxy-2-naphthoate biosynthesis I
Solyc05g005180 1,4-dihydroxy-2-naphthoate biosynthesis I
Solyc06g071030 1,4-dihydroxy-2-naphthoate biosynthesis I
The catch is that several of my genes fall into several pathways. I need a good way to get each gene and have all of the pathways it is in charge of listed next to each gene ID that I input from a set.
I was originally trying to use the command
c<-b[b$GeneID %in% a$GeneIDs,]
where b was my pathway/GeneID and a was my list of Gene IDs that I wanted, but it only returns one pathway and I know a number of these genes fall into several pathways.
I'm new to programming entirely so I've been having trouble with this. Any help would be appreciated! I don't know how to search on Internet because I don't know what this is called.