1

So I have this series of integers shown below

from pandas import Series
s = Series([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

And I want to see how many times the numbers changes over the series, so I can do the following and get the expected result.

[i != s[:-1][idx] for idx, i in enumerate(s[1:])]
Out[5]: 
[True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 True,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False]

From there I could just count the number of True's present easy. But this is obviously not the best way to operate on a pandas Series and I'm adding this in a situation where performance matter so I did the below expecting identical results, however I was very surprised and confused.

s[1:].ne(s[:-1])
Out[4]: 
0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26    False
27    False
28    False
29    False
30    False
31    False
32    False
33    False
34    False
35    False
36    False
37    False
38    False
39     True
dtype: bool

Not only does the output using the Series.ne method not make any logical sense to me but the output is also longer than either of the inputs which is especially confusing.

I think this might be related to this https://github.com/pandas-dev/pandas/issues/1134

Regardless I'm curious as to what I'm doing wrong and what the best way to accomplish this would be.

tl;dr:

Where s is a pandas.Series of int's

[i != s[:-1][idx] for idx, i in enumerate(s[1:])] != s[:-1].ne(s[1:]).tolist()

Edit Thanks all, reading some of the answers below a possible solution is sum(s.diff().astype(bool)) - 1 however I'm still curious why the above solution doesn't work

Hunter Jackson
  • 315
  • 2
  • 10

3 Answers3

1

IIUC, Using shift

s!=s.shift()
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can use diff

s.diff().ne(0)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

You could take advantage of diff

>>> from pandas import Series
>>> s = Series([1, 2, 1, 3, 3, 1, 1])
>>> s.diff()
0    NaN
1    1.0
2   -1.0
3    2.0
4    0.0
5   -2.0
6    0.0
dtype: float64
>>> s.diff().ne(0) # Same of s.diff() != 0
0     True
1     True
2     True
3     True
4    False
5     True
6    False
dtype: bool
>>> # To know how many times the values had changed simply count the
... # number of True, except the first which is fault of the NaN
... # generated by `diff()` function.
...
>>> sum(s.diff().ne(0)) - 1
4
Marco
  • 2,007
  • 17
  • 28