Pandas filter rows based on multiple conditions

Question

I have some values in the risk column that are neither, Small, Medium or High. I want to delete the rows with the value not being Small, Medium and High. I tried the following:

df = df[(df.risk == "Small") | (df.risk == "Medium") | (df.risk == "High")]

But this returns an empty DataFrame. How can I filter them correctly?

I've tried to create a dataframe with such data, and your string of code works properly. Could you give more information about what contains in dataframe and how do you generate it? — Mikhail Elizarev, Apr 27 '14 at 14:39
You requirement is a little unclear, if all your values can ever be small, mediu, or high and you want to drop rows that are any of these values then this will result in now rows so could you explain clearer what you require — EdChum, Apr 27 '14 at 14:49
Hmm.. your code is correct so I think you need to post data and code that reproduces your problem — EdChum, Apr 27 '14 at 15:04
For example, it would be useful to see what `df.risk.value_counts()` returns. — DSM, Apr 27 '14 at 15:17
@EdChum. Your previous (now deleted post) had df = df[df.risk.isin(['Small','Medium','High'])]. That gave the desired result ! — ArtDijk, Apr 27 '14 at 16:40
Before the advise by EdChum:In [154]: df.risk.value_counts() Out[154]: Small 23520 Medium 6164 High 402 None 251 After code from EdChum the 'None' was gone. :-) — ArtDijk, Apr 27 '14 at 16:41
Could you post some data though and also as @DSM suggests the output from df.risk.value_counts() — EdChum, Apr 27 '14 at 16:53
@ EdChum. I want to filter out all none Small, Medium or High values. — ArtDijk, Apr 27 '14 at 18:43
@ArtDijk well your code sample should've worked and it worked on my toy dataset, whilst my alternative suggestion works for you it'd be better to post some actual data if you can and the output from `df.risk.value_counts()` — EdChum, Apr 27 '14 at 19:20

EdChum · Accepted Answer · 2014-04-27T16:53:23.667

34

I think you want:

df = df[(df.risk.isin(["Small","Medium","High"]))]

Example:

In [5]:
import pandas as pd
df = pd.DataFrame({'risk':['Small','High','Medium','Negligible', 'Very High']})
df

Out[5]:

         risk
0       Small
1        High
2      Medium
3  Negligible
4   Very High

[5 rows x 1 columns]

In [6]:

df[df.risk.isin(['Small','Medium','High'])]

Out[6]:

     risk
0   Small
1    High
2  Medium

[3 rows x 1 columns]

edited Apr 27 '14 at 16:53

answered Apr 27 '14 at 14:39

EdChum

376,765
198
813
562

1

Aren't these logical statements equivalent to the one provided by author? – Mikhail Elizarev Apr 27 '14 at 14:42
I've tryed your example with author's slicing - it works correctly – Mikhail Elizarev Apr 27 '14 at 14:44
@MikhailElizarev it is slightly different but the OP's question is a little unclear as what they are doing would result in no results – EdChum Apr 27 '14 at 14:48
To make it mor universally, one could do this instead: `df[df["risk"].isin(['Small','Medium','High'])]` – Hacker Dec 25 '21 at 15:09

score 4 · Answer 2 · answered Feb 12 '19 at 16:57

4

Another nice and readable approach is the following:

small_risk = df["risk"] == "Small"
medium_risk = df["risk"] == "Medium"
high_risk = df["risk"] == "High"

Then you can use it like this:

df[small_risk | medium_risk | high_risk]

or

df[small_risk & medium_risk]

answered Feb 12 '19 at 16:57

Rafael

7,002
5
43
52

df[small_risk & medium_risk] always returns nothing. df["risk"] cannot be simultaneously equal "Small" AND "Medium" – Leonid Ganeline Apr 23 '21 at 00:39
1

That example was to show that AND gate in cases it’s applicable. – Rafael Apr 23 '21 at 06:31

score 1 · Answer 3 · answered Mar 03 '22 at 04:31

You could also use query:

df.query('risk in ["Small","Medium","High"]')

You can refer to variables in the environment by prefixing them with @. For example:

lst = ["Small","Medium","High"]
df.query("risk in @lst")

If the column name is multiple words, e.g. "risk factor", you can refer to it by surrounding it with backticks ` `:

df.query('`risk factor` in @lst')

query method comes in handy if you need to chain multiple conditions. For example, the outcome of the following filter:

df[df['risk factor'].isin(lst) & (df['value']**2 > 2) & (df['value']**2 < 5)]

can be derived using the following expression:

df.query('`risk factor` in @lst and 2 < value**2 < 5')

Pandas filter rows based on multiple conditions

3 Answers3

Linked