0

I am trying to wrap my mind around some unexpected behaviour of pandas dataframe query method:

Assuming a test dataframe:

>>> df = pd.DataFrame([[1,1,1,2,2,2],[1,2,3,4,5,6]], columns=['a', 'b', 'c', 'd', 'e', 'f'])
>>> df
   a  b  c  d  e  f
0  1  1  1  2  2  2
1  1  2  3  4  5  6

One can select the first row with the following query expression:

>>> df.query('a == b == c == 1 & d == e == f == 2')
   a  b  c  d  e  f
0  1  1  1  2  2  2

My aim however, is to select all rows except those satisfying above expression. Intuitively that should work by simply wrapping the entire expression in parenthesis and pre-pending a logical not. - Right?

>> df.query('~(a == b == c == 1 & d == e == f == 2)')
   a  b  c  d  e  f
0  1  1  1  2  2  2
1  1  2  3  4  5  6

Clearly that is not the expected result. If one however draws the not into the expression with a little algebra, the whole thing does work:

>>> df.query('~(a == b == c == 1) | ~(d == e == f == 2)')
   a  b  c  d  e  f
1  1  2  3  4  5  6

Can anybody explain to me what is going on here? Clearly the last two query strings are logically identical but they still return different results.

ARF
  • 7,420
  • 8
  • 45
  • 72
  • 1
    Which version of pandas are you using? The second example (what gives the unexpected output) works for me (pandas 0.15.1) – joris Dec 08 '14 at 15:40
  • @joris I was using the anaconda python distribution. Its standard version for pandas is currently 0.14.1. After updating to 0.15.1 the problem disappears. Thanks. – ARF Dec 08 '14 at 16:35

0 Answers0