Pandas Series.ne operator returning unexpected result against two slices of same Series

Question

So I have this series of integers shown below

from pandas import Series
s = Series([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

And I want to see how many times the numbers changes over the series, so I can do the following and get the expected result.

[i != s[:-1][idx] for idx, i in enumerate(s[1:])]
Out[5]: 
[True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False]

From there I could just count the number of True's present easy. But this is obviously not the best way to operate on a pandas Series and I'm adding this in a situation where performance matter so I did the below expecting identical results, however I was very surprised and confused.

s[1:].ne(s[:-1])
Out[4]: 
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39     True
dtype: bool

Not only does the output using the Series.ne method not make any logical sense to me but the output is also longer than either of the inputs which is especially confusing.

I think this might be related to this https://github.com/pandas-dev/pandas/issues/1134

Regardless I'm curious as to what I'm doing wrong and what the best way to accomplish this would be.

tl;dr:

Where s is a pandas.Series of int's

[i != s[:-1][idx] for idx, i in enumerate(s[1:])] != s[:-1].ne(s[1:]).tolist()

Edit Thanks all, reading some of the answers below a possible solution is sum(s.diff().astype(bool)) - 1 however I'm still curious why the above solution doesn't work

filed an issue https://github.com/pandas-dev/pandas/issues/19855 — Hunter Jackson, Feb 23 '18 at 01:29

score 1 · Answer 1 · answered Feb 22 '18 at 19:48

1

IIUC, Using shift

s!=s.shift()

answered Feb 22 '18 at 19:48

BENY

317,841
20
164
234

score 1 · Answer 2 · answered Feb 22 '18 at 19:49

1

You can use diff

s.diff().ne(0)

answered Feb 22 '18 at 19:49

piRSquared

285,575
57
475
624

2

also as `s.diff().astype(bool)` – Zero Feb 22 '18 at 19:52

Marco · Accepted Answer · 2018-02-22T23:59:14.350

1

You could take advantage of diff

>>> from pandas import Series
>>> s = Series([1, 2, 1, 3, 3, 1, 1])
>>> s.diff()
0    NaN
1    1.0
2   -1.0
3    2.0
4    0.0
5   -2.0
6    0.0
dtype: float64
>>> s.diff().ne(0) # Same of s.diff() != 0
0     True
1     True
2     True
3     True
4    False
5     True
6    False
dtype: bool
>>> # To know how many times the values had changed simply count the
... # number of True, except the first which is fault of the NaN
... # generated by `diff()` function.
...
>>> sum(s.diff().ne(0)) - 1
4

edited Feb 22 '18 at 23:59

answered Feb 22 '18 at 19:52

Marco

2,007
17
28

1

Combined this with @Zero 's comment below taking advantage of `astype(bool)` – Hunter Jackson Feb 22 '18 at 20:27

Pandas Series.ne operator returning unexpected result against two slices of same Series

3 Answers3