Getting indeces of rows given query of header names and respective values in Pandas

Question

I am given a dataframe, a subset of headers and values of those columns. I am interested in finding the indices of the dataframe that contain the values of interest in the columns of interest without explicitly typing in the column name and value; i.e. using df.index[df['BoolCol'] == VALUE and df['BoolCol2' == VALUE2] as I wont know what the header and values will be, and they will change every so often. I'm not sure how to do this when you can't explicitly in the code type the column names and values, and simply using variables that contain the lists of headers, and list of values.

Code Summary/Example:

df:
    Pretreat  Setup
0        3.0    0.5
1        3.0    0.5
2        3.0    3.0
3        3.0    3.0
4        3.0    5.0
5        3.0    5.0
6        3.0    0.5
7        3.0    0.5

query_labels = ['Pretreat', 'Setup'] #querying against 2 columns, Pretreat and Setup
query_values = [(3.0, 0.5)] #looking for indeces where Pretreat == 3.0 and Setup == 0.5 (in order of query_labels)

#Expecting:
{(3.0, 0.5): [0, 1, 6, 7]}

score 0 · Accepted Answer · answered Jul 03 '22 at 21:09

You can convert to series and check equqlity of all columns:

s = pd.Series(query_values[0], index=query_labels)

df[df.eq(s).all(1)].index

Output:

Int64Index([0, 1, 6, 7], dtype='int64')

If there are many items in query_labels:

out = {k: df[df.eq(pd.Series(k, index=query_labels)).all(1)].index.to_list() 
       for k in query_values}

Output: {(3.0, 0.5): [0, 1, 6, 7]}

score 0 · Answer 2 · answered Jul 03 '22 at 21:14

I am not sure i 100% understand what you require, but here's a solution of what i think u ask for ->

import pandas as pd

t = {'Pretreat': [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0],
     'Setup': [0.5, 0.5, 3.0, 3.0, 5.0, 5.0, 0.5, 0.5]
     }

data = pd.DataFrame(t)
print(data)
# make a list of all rows in first column
pretest_list = list(data.iloc[:, 0]) 
# iloc is a dataframe method that slices data as specified in [$1,$2]
# $1 what rows you specify, $2 what columns you specify
satup_list = list(data.iloc[:, 1])
print(pretest_list)
print(satup_list)

i = 0
# each loop x is an element in pretest_list, y is an element in satup_list
for x, y in zip(pretest_list, satup_list):
    if(x == 3.0 and y == 0.5):
        print(i)
    i += 1

So the goal is to generalize the querying: i.e. the df would have more columns. I want to get indexes that contain the matching values without explicitly mentioning which headers as they will change with each run. `I.e. df = {'feature1': [1,2,3,...], 'feature2': [1,2,...], ...}` And then query `Run1: querylabels = ['feature1', 'feature3'], query_values = [(1, 3), (2,4)] Run2: querylabels = ['feature1', 'feature2', 'feature4'], query_values = [(1,2,3), (1, 4,2), (1, 6, 3)]` where I specify the query labels and values each run, but would prefer to not change the algo. — wholesome biochemist, Jul 03 '22 at 22:57

Getting indeces of rows given query of header names and respective values in Pandas

2 Answers2