2

This is just a quick question with a yes or no answer. I couldn't find an answer for on google or here (difficult to google).

I just want to know if I am doing this the correct way.

I am trying to select data matching certain conditions. Here is a snipped from my code.

c1 = (data['recency']<=3) # seen in the last 3 months
c2 = (data['transactions_per_month']>=1) # buys a ticket once a month
c3 = (data['av_spend_per_month']>=30) # spends at least €30 per month
c4 = (data['Driver']==1) # is a driver

# slice the df
data[c1 & (c2 | c3) & c4]

Is this part correct? (c2 | c3) Can I add a | condition in the middle of my & conditions?

If it is wrong, what is the correct way to do it?

SCool
  • 3,104
  • 4
  • 21
  • 49
  • Yes but it needs to be like `data[ (col1>10)or (col2<10)]` i.e.passing the filters and not the fitered data. See mroe here https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing – Equinox Oct 02 '20 at 15:15
  • Yes, you can. https://stackoverflow.com/questions/48978550/pandas-filtering-multiple-conditions – Siva Kumar Sunku Oct 02 '20 at 15:25
  • @venky__ so you used `or` instead of `|`. Do I need to change the `|` to `or`? – SCool Oct 02 '20 at 15:30
  • Use the `operators( & | )` I should have written `|` – Equinox Oct 02 '20 at 15:33
  • Should be safer: `data[((c1) & ((c2) | (c3))) & (c4)]` . The parentheses you have around condition definitions make it look like tuple object with series in it, which is why I put the parentheses again – user2827262 Oct 02 '20 at 17:45

1 Answers1

0

Yes, this is a perfectly reasonable thing to do.

According to the Pandas manual, you can combine multiple selectors using boolean operators such as &, |, and ~.

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).

(Source.)

You might also explore the DataFrame.query() method, which can accomplish a similar thing.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66