Find index where elements change value pandas dataframe

Question

Regaring to this question/answer, is there a way to accomplish the same function for a pandas dataframe structure without casting it as a numpy array?

It would be better if you make this a self-contained question — ayhan, Apr 20 '17 at 16:28

Scott Boston · Accepted Answer · 2023-02-07T13:50:57.213

15

Update: we can use this per @LorenzoMeneghetti

s[s.diff() != 0].index.tolist()

Output:

[0, 5, 8, 9, 10, 11, 12, 13, 14, 15, 16]

s = pd.Series([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])

print(s.diff()[s.diff() != 0].index.values)

OR:

df = pd.DataFrame([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])

print(df[0].diff()[df[0].diff() != 0].index.values)

Output:

[ 0  5  8  9 10 11 12 13 14 15 16]

edited Feb 07 '23 at 13:50

answered Apr 20 '17 at 16:28

Scott Boston

147,308
15
139
187

Thank you for your answer! One additional question though, why would the dataframe (from read_csv) return every index back instead of the index of when the value change? The code I use to read the csv read_csv(file, sep=',', header = None, skiprows = 1, usecols = [colNum], dtype = np.float 64, na_values = [" "]). I printed out the DataFrame from read_csv which gives me [6,6,6,6,1,1,1,1,1,2,2,2,2,2] but the code df[0].diff()... returns [0,1,2,3,4...11,12,13]. – ntmt Apr 20 '17 at 17:19
I suspect that your first column or column 0 is really line number and not the changing value you are expecting. It is hard for me to tell without the csv and the exact read statement you are doing. – Scott Boston Apr 20 '17 at 17:27
Ah, thank you, it seems that I have to match the colNum with df[colNum]. – ntmt Apr 20 '17 at 17:47
1

why the double diff()? Can't I just do s[s.diff() != 0].index.values ? – Lorenzo Meneghetti Dec 02 '20 at 14:31
@LorenzoMeneghetti honestly, that was an old answer I did when I was first learning pandas. You are correct that is a better solution. – Scott Boston Dec 02 '20 at 15:11

Find index where elements change value pandas dataframe

1 Answers1

Update: we can use this per @LorenzoMeneghetti

Linked