I have a dataframe with > 300 unique samples, there are 2 columns of similar information per sample, and I'd like to filter for 34 specific values in one of those columns per sample. I've included a screenshot of the data to help visualize this problem. I basically want to generate a new dataframe with only the information from the 34 values that I specify. My apologies if this question is difficult to understand, I hope the screenshot helps to define the problem better.
In this screenshot, each column with "sampleID_r.variant" needs to be filtered for specific values I have in a separate dataframe. There are only 34 I'm interested in. With that, I'd like to store the corresponding value to the left in the column "sampleID_reads" along with it, like a dictionary. If anyone can help with this, I'd greatly appreciate it. Thank you so much.
EDIT: the original dataframe is in the following format:
sampleID_reads | sampleID_r.variant |
---|---|
1 | r.79_80ins79+1_79+76 |
64 | r.79_80ins79+10857_79+10938 |
53 | r.79_80ins80-13725_80-13587 |
72 | r.79_80ins80-5488_80-5435 |
16 | r.79_80ins79+2861_79+2900 |
the 34 samples are in the following format:
r_dot |
---|
r.646_729del |
r.-19_-18ins-19+428_-19+535 |
r.-25_-20del |
r.4186_4188del |
r.5333_5406del |
...so on and so forth |