0

Because of a typo, I happened upon some Pandas DataFrame boolean indexing syntax that I'm not familiar with and I can't find any information describing what is actually happening.

I was trying to retrieve a dataframe based on two conditions with an & but typed a * instead and I was surprised to see that the results are the same

    ex1 = dist[(dist['token'].str.isalnum()) * (dist['count']>2000)]
    ex2 = dist[(dist['token'].str.isalnum()) & (dist['count']>2000)]

    ex1 == ex2
    # returns
    #     token  count
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
    #     True   True
jb1225
  • 41
  • 3

1 Answers1

2

& is a bitwise logical operator, whereas * will cast booleans as real

You can get more details here https://www.pyblog.in/programming/bitwise-operators-in-python/

Hugolmn
  • 1,530
  • 1
  • 7
  • 20
  • Got it, Thanks! I personally like the python order of precedence better for the * than the &. Are there any disadvantages to using the real method? – jb1225 Jun 18 '20 at 13:33
  • 2
    Pandas relies on numpy, which uses [numexpr](https://github.com/pydata/numexpr) to evaluate expressions (faster than python). Using * with booleans is not supported by numexpr, and it lets Python do the job. I tried with a dataframe of 1e8 rows. By filtering based on 5 columns, using & was twice as fast. – Hugolmn Jun 18 '20 at 16:17