30

I ran up against unexpected behavior in pandas when comparing two series. I wanted to know if this is intended or a bug.

suppose I:

import pandas as pd
x = pd.Series([1, 1, 1, 0, 0, 0], index=['a', 'b', 'c', 'd', 'e', 'f'], name='Value')
y = pd.Series([0, 2, 0, 2, 0, 2], index=['c', 'f', 'a', 'e', 'b', 'd'], name='Value')

x > y

yields:

a     True
b    False
c     True
d    False
e    False
f    False
Name: Value, dtype: bool

which isn't what I wanted. Clearly, I expected the indexes to line up. But I have to explicitly line them up to get the desired results.

x > y.reindex_like(x)

yields:

a     True
b     True
c     True
d    False
e    False
f    False
Name: Value, dtype: bool

Which is what I expected.

What's worse is if I:

x + y

I get:

a    1
b    1
c    1
d    2
e    2
f    2
Name: Value, dtype: int64

So when operating, the indexes line up. When comparing, they do not. Is my observation accurate? Is this intended for some purpose?

Thanks,

-PiR

CT Zhu
  • 52,648
  • 17
  • 120
  • 133
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 2
    Yeah, that doesn't feel right at all. – DSM Aug 21 '14 at 20:45
  • 5
    There is an issue about this: https://github.com/pydata/pandas/issues/1134, and a closed PR (https://github.com/pydata/pandas/pull/6860) – joris Aug 21 '14 at 20:57
  • 3
    Looks like it has to be expressed as `(x-y)>0` instead of `x>y` to have the elements aligned up then... – CT Zhu Aug 21 '14 at 21:32
  • 3
    nice comments on: https://github.com/pydata/pandas/issues/1134. Read especially snth comment: " At some point I did check the documentation to see if my understanding of index alignment was correct and there was no mention there that this only applies to the +, -, *, / operators and not to ==, !=, <, <=, >, >=." – Joop Sep 10 '14 at 09:59
  • can't seem to use series != series, does not produce correct values. pandas version '0.23.4' – xgg Jan 31 '19 at 21:48

1 Answers1

10

Bug or not. I would suggest to make a dataframe and compare the series inside the dataframe.

import pandas as pd
x = pd.Series([1, 1, 1, 0, 0, 0], index=['a', 'b', 'c', 'd', 'e', 'f'], name='Value_x')
y = pd.Series([0, 2, 0, 2, 0, 2], index=['c', 'f', 'a', 'e', 'b', 'd'], name='Value_y')

df = pd.DataFrame({"Value_x":x, "Value_y":y})
df['Value_x'] > df['Value_y']

Out[3]:

a     True
b     True
c     True
d    False
e    False
f    False
dtype: bool
firelynx
  • 30,616
  • 9
  • 91
  • 101
  • 2
    Small point, you could alternatively build the data frame from the series with `df=pd.concat([x,y], axis=1)`. – Robert Apr 13 '17 at 08:25
  • @Robert Thanks for pointing that out. I do not feel this is as explicit as my code though. "Explicit is better than implicit" - Zen of python – firelynx Sep 08 '17 at 08:52
  • As you please, but `pd.concat` is very much the standard idiom in Pandas. – Robert Sep 09 '17 at 09:38