1

I have converted the elements of a column in a set

set_genes = set(df['genes'].unique()]

And I also have a table (a tsv file) whose one column has values that match my set. I want to extract from this table the lines where values match.

Example

print(set_genes)
{'IDA'}

print(file)

1    1      10  IDA     ID1
1    10     20  IDA     ID2
1    20     30  IDA     ID3
2    1      10  IDB     ID1
2    20     20  IDB     ID2
2    30     30  IDB     ID3 

Results

1    1      10  IDA     ID1
1    10     20  IDA     ID2
1    20     30  IDA     ID3
Abhyuday Vaish
  • 2,357
  • 5
  • 11
  • 27

2 Answers2

1

If your TSV file is a dataframe called df then use this. Here column_name is the name of the column which contains set_genes:

df.loc[df['column_name'].isin(set_genes)]

Sample example:

import pandas as pd

df = pd.DataFrame({'C1': [1,1,1,2,2,2], 'C2': [1, 10, 20, 1, 10 ,30], 'C3': [10,20,30,10,20,30], 'C4': ['IDA', 'IDA', 'IDA', 'IDB', 'IDB', 'IDB'], 'C5':['ID1', 'ID2', 'ID3','ID1', 'ID2', 'ID3']})
df
   C1  C2  C3   C4   C5
0   1   1  10  IDA  ID1
1   1  10  20  IDA  ID2
2   1  20  30  IDA  ID3
3   2   1  10  IDB  ID1
4   2  10  20  IDB  ID2
5   2  30  30  IDB  ID3
set_genes = {'IDA'}
df2 = df.loc[df['C4'].isin(set_genes)]
df2
   C1  C2  C3   C4   C5
0   1   1  10  IDA  ID1
1   1  10  20  IDA  ID2
2   1  20  30  IDA  ID3
Abhyuday Vaish
  • 2,357
  • 5
  • 11
  • 27
0

you can try something like this:

import pandas as pd
  
data = {
    'A':['d', 'q', 's', 'a', 'a'], 
    'genes':['ID1', 'ID2', 'ID3', 'ID4', 'ID4'],  }
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# print(df)
# Get the unique values of 'B' column
df.genes.unique()

the out put is :

array(['ID1', 'ID2', 'ID3', 'ID4'], dtype=object)
bara-elba
  • 146
  • 6