Regaring to this question/answer, is there a way to accomplish the same function for a pandas dataframe structure without casting it as a numpy array?
Asked
Active
Viewed 1.0k times
1 Answers
15
Update: we can use this per @LorenzoMeneghetti
s[s.diff() != 0].index.tolist()
Output:
[0, 5, 8, 9, 10, 11, 12, 13, 14, 15, 16]
s = pd.Series([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])
print(s.diff()[s.diff() != 0].index.values)
OR:
df = pd.DataFrame([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])
print(df[0].diff()[df[0].diff() != 0].index.values)
Output:
[ 0 5 8 9 10 11 12 13 14 15 16]

Scott Boston
- 147,308
- 15
- 139
- 187
-
Thank you for your answer! One additional question though, why would the dataframe (from read_csv) return every index back instead of the index of when the value change? The code I use to read the csv read_csv(file, sep=',', header = None, skiprows = 1, usecols = [colNum], dtype = np.float 64, na_values = [" "]). I printed out the DataFrame from read_csv which gives me [6,6,6,6,1,1,1,1,1,2,2,2,2,2] but the code df[0].diff()... returns [0,1,2,3,4...11,12,13]. – ntmt Apr 20 '17 at 17:19
-
I suspect that your first column or column 0 is really line number and not the changing value you are expecting. It is hard for me to tell without the csv and the exact read statement you are doing. – Scott Boston Apr 20 '17 at 17:27
-
-
1why the double diff()? Can't I just do s[s.diff() != 0].index.values ? – Lorenzo Meneghetti Dec 02 '20 at 14:31
-
@LorenzoMeneghetti honestly, that was an old answer I did when I was first learning pandas. You are correct that is a better solution. – Scott Boston Dec 02 '20 at 15:11