Hi so I have a dataframe and I would like to find the index whenever one of the column's cumulative sum is equal to a threshold. It will then reset and start the cumsum again.
For example:
d = np.random.randn(10, 1) * 2
df = pd.DataFrame(d.astype(int), columns=['data'])
pd.concat([df,df.cumsum()],axis=1)
Outout:
Out[34]:
data data1
0 1 1
1 2 3
2 3 6
3 2 8
4 0 8
5 1 9
6 0 9
7 -1 8
8 1 9
9 2 11
So in the above sample data, data1
is the cumsum of column 1. If I set thres=5
this means that whenever the running sum of column 1 is greater than or equal to 5, I save the index. After that happens, the running sum resets and start again until the next running total sum is greater than or equal to 5 is reached.
Right now I am doing a loop and keeping to track the running sum an manually resetting. I was wondering if there is a fast vectorized way in pandas to do it as my dataframe is millions of rows long.
Thanks