I have one data set that includes a list of all the metabolite IDs from Kegg, and a data set with metabolite IDs that I have discovered from my samples. The goal is to use the metabolite IDs that I've found to select the IDs from the Kegg date frame, and only the IDs that I've found.
This may seem trivial, but my data does not include the actual molecule names and just the IDs, while the Kegg data includes the molecule names. I need the molecule names to do further research, and figuring this out would save me hours of time. I've tried to use the filter and mutate commands. You can see my code below. I am pretty new to r, so maybe this code will work and I've just botched it somewhere.
We would have two data frames like this:
kegg_data <- data.frame("ID" = c("C00001" , "C00002" , "C00003" , "C00004"),
"molecule" = c("H20" , "ATP" , "NAD" , "NADH"))
my_data <- data.frame("ID" = c("C00002", "C00004"))
Obviously, there would be many more IDs in both data sets.
Here is the code I have tried:
your_kegg_IDs <- kegg_data %>%
filter(my_data == my_data$ID)
The error code when running the filter command is : Error in filter_impl(.data, quo) : Evaluation error: level sets of factors are different.
Honestly, I do not know if I am on the right track here. Any help is appreciated. The perfect result would be ending with a data frame that only has the IDs I've found, including their molecule name.