How to shorten the command when filtering data-frame in Python

Question

In Python, a common way to filter a data frame is like this

df.loc[(df['field 1'] == 'a') & (df['field 2'] == 'b'), 'field 3']...

When df name is long, or when there are more filter conditions (only two in the above), the above line will be long naturally. Moreover, it is a bit tedious to have type out df name for each condition. In R or SQL, we don't really need to do that. So, my question is if there is a way to shorten the above line in Python. For example, is there a way that I don't have to write down df name in each condition? Thanks.

try to break down your script by assigning each condition to a variable (mask) — NassimH, Dec 27 '22 at 05:13

score 3 · Answer 1 · answered Dec 27 '22 at 05:41

We can write a function that accepts lists of field names and conditions, iterates in parallel to create masks for each name-condition pair in a generator expression, combines those masks with numpy.logical_and (since we want to replicate &; it would be logical_or for |) - skipping a redundant pair of parentheses due to a special syntax rule - and applies the result:

def matching(df, fields, values):
    return df.loc[np.logical_and(
        df[field] == value
        for field, value in zip(fields, values)
    )]

Which should then allow us to simplify the code:

matching(df, ['field 1', 'field 2', 'field 3'], ['a', 'b', 'c'])

Alternately, we could specify the input as field-value pairs (and skip the zip step), as keys and values of a dictionary (iterating over its .items()), etc. The generalization of arbitrarily many &s using numpy.logical_and is the key insight here; the rest is just software engineering.

Omri Bahat Treidel · Accepted Answer · 2022-12-28T23:49:52.983

3

you can use the query method as follows which is has an SQL like syntax

df.query('field_1 == "a" and field_2 == "b" and field_3 > 3.2 ')

Here are the docs for it https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

edited Dec 28 '22 at 23:49

answered Dec 27 '22 at 05:54

Omri Bahat Treidel

528
4
11

What about adding links to your answer, e.g. docs for [`DataFrame.query()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) ? – hc_dev Dec 27 '22 at 09:46

score 2 · Answer 3 · edited Dec 27 '22 at 09:17

2

You can use dataframe.eval() function. It can be done like below.

df[df.eval("(field 1 == 'a') & (field 2 == 'b')")]

edited Dec 27 '22 at 09:17

Julia Meshcheryakova

3,162
3
22
42

answered Dec 27 '22 at 05:31

Peddi Santhoshkumar

31
2

How to shorten the command when filtering data-frame in Python

3 Answers3