39

I'm trying to do boolean indexing with a couple conditions using Pandas. My original DataFrame is called df. If I perform the below, I get the expected result:

temp = df[df["bin"] == 3]
temp = temp[(~temp["Def"])]
temp = temp[temp["days since"] > 7]
temp.head()

However, if I do this (which I think should be equivalent), I get no rows back:

temp2 = df[df["bin"] == 3]
temp2 = temp2[~temp2["Def"] & temp2["days since"] > 7]
temp2.head()

Any idea what accounts for the difference?

Marco
  • 2,007
  • 17
  • 28
anon_swe
  • 8,791
  • 24
  • 85
  • 145

4 Answers4

72

Use () because operator precedence:

temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]

Alternatively, create conditions on separate rows:

cond1 = df["bin"] == 3    
cond2 = df["days since"] > 7
cond3 = ~df["Def"]

temp2 = df[cond1 & cond2 & cond3]

Sample:

df = pd.DataFrame({'Def':[True] *2 + [False]*4,
                   'days since':[7,8,9,14,2,13],
                   'bin':[1,3,5,3,3,3]})

print (df)
     Def  bin  days since
0   True    1           7
1   True    3           8
2  False    5           9
3  False    3          14
4  False    3           2
5  False    3          13


temp2 = df[~df["Def"] & (df["days since"] > 7) & (df["bin"] == 3)]
print (temp2)
     Def  bin  days since
3  False    3          14
5  False    3          13
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is getting so much better with samples! – Anton vBR Feb 25 '18 at 21:29
  • no parentheses for the first condition? This seems to change things for me. – khhc Jul 18 '19 at 11:10
  • 1
    If you have a LARGE DataFrame each of the conditions is filtering the *complete* DataFrame. Can you successively reduce the search space by chaining the filters? e.g. aplly the first, apply the second to *that* result, ...? – wwii Nov 24 '19 at 17:37
  • @wwii if really huge dataframe thwn you are right. first filter by condition which remove the most rows, assign to dataframe. then filter by second and similar... – jezrael Nov 24 '19 at 18:09
  • @wwii do it means here `temp1 = df[df["bin"] == 3] temp2 = temp1[temp1["days since"] > 7] temp3 = temp2[~temp2["Def"]]` – jezrael Nov 24 '19 at 18:14
  • 1
    What would be the efficient way when you have a large number of condition values. for example `df[(df.col1==0) & (df.col2==1) & (df.col3==1)]` has 3 column conditions, but what if there are 50 column condition values? is there any easy way where you put the columns and condition values as 2 lists something simpler like `column_list= df.columns[11:61] value_list= 'a list of 50 values' df[df[column_list]== value_list]` – Tanzin Farhat Sep 19 '20 at 05:02
9

OR

 df_train[(df_train["fold"]==1) | (df_train["fold"]==2)]

AND

 df_train[(df_train["fold"]==1) & (df_train["fold"]==2)]
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
RAHUL KUMAR
  • 1,123
  • 11
  • 9
  • 2
    While this code may provide a solution to the question, it's better to add context as to why/how it works. This can help future users learn, and apply that knowledge to their own code. You are also likely to have positive feedback from users in the form of upvotes, when the code is explained. – Amit Verma Jan 27 '21 at 09:09
3

Alternatively, you can use the method query:

df.query('not Def & (`days since` > 7) & (bin == 3)')
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
1

If you want multiple conditions:

Del_Det_5k_top_10 = Del_Det[(Del_Det['State'] == 'NSW') & (Del_Det['route'] == 2) |
                            (Del_Det['State'] == 'VIC') & (Del_Det['route'] == 3)]
4b0
  • 21,981
  • 30
  • 95
  • 142
Loku
  • 201
  • 2
  • 5