9

Regaring to this question/answer, is there a way to accomplish the same function for a pandas dataframe structure without casting it as a numpy array?

Community
  • 1
  • 1
ntmt
  • 163
  • 1
  • 1
  • 11

1 Answers1

15

Update: we can use this per @LorenzoMeneghetti

s[s.diff() != 0].index.tolist()

Output:

[0, 5, 8, 9, 10, 11, 12, 13, 14, 15, 16]

s = pd.Series([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])

print(s.diff()[s.diff() != 0].index.values)

OR:

df = pd.DataFrame([1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 3, 4, 3, 4, 3, 4, 5, 5, 5])

print(df[0].diff()[df[0].diff() != 0].index.values)

Output:

[ 0  5  8  9 10 11 12 13 14 15 16]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thank you for your answer! One additional question though, why would the dataframe (from read_csv) return every index back instead of the index of when the value change? The code I use to read the csv read_csv(file, sep=',', header = None, skiprows = 1, usecols = [colNum], dtype = np.float 64, na_values = [" "]). I printed out the DataFrame from read_csv which gives me [6,6,6,6,1,1,1,1,1,2,2,2,2,2] but the code df[0].diff()... returns [0,1,2,3,4...11,12,13]. – ntmt Apr 20 '17 at 17:19
  • I suspect that your first column or column 0 is really line number and not the changing value you are expecting. It is hard for me to tell without the csv and the exact read statement you are doing. – Scott Boston Apr 20 '17 at 17:27
  • Ah, thank you, it seems that I have to match the colNum with df[colNum]. – ntmt Apr 20 '17 at 17:47
  • 1
    why the double diff()? Can't I just do s[s.diff() != 0].index.values ? – Lorenzo Meneghetti Dec 02 '20 at 14:31
  • @LorenzoMeneghetti honestly, that was an old answer I did when I was first learning pandas. You are correct that is a better solution. – Scott Boston Dec 02 '20 at 15:11