Sum of column that resets to zero through out a process

Question

Is there an easy way to go about a total for a column that increments but can reset back to zero through out the dataset? I have started to go down the path of a for loop and keeping track of previous value if it isn't a zero and using multiple variables, but wanted to check if there is a better way to go about it. Here is an example of the data I'm trying to work with:

id	Time	Extruder1
157	5/22/2023 10:14:09.229 PM	1560.0
158	5/22/2023 10:16:28.582 PM	1563.0
159	5/23/2023 7:17:37.831 AM	1563.0
160	5/23/2023 7:19:57.184 AM	0.0
161	5/23/2023 7:33:53.302 AM	2.0
162	5/23/2023 7:36:12.655 AM	4.0
163	5/23/2023 7:38:32.008 AM	6.0
334	5/23/2023 2:15:41.371 PM	789.0
335	5/23/2023 2:18:00.724 PM	792.0
336	5/23/2023 2:20:20.077 PM	794.0
349	5/23/2023 2:50:31.666 PM	2.0

I would need to get a total of 3 from the first three rows and then get 6 from the next four, and so on. Is there a method in Pandas that will work for this type of column?

To find indices where the extruder resets you can use: `ix_reset = df.Extruder1.diff().fillna(0).lt(0)` — James, Jun 07 '23 at 13:01

Corralien · Accepted Answer · 2023-06-07T14:31:01.880

Following our conversation, if there is a gap greater than 10, we can consider we have a new group:

THRESHOLD = 10
df['Group'] = df['Extruder1'].diff().abs().gt(THRESHOLD).cumsum()
total = df.groupby('Group')['Extruder1'].agg(lambda x: x.iloc[-1] - x.iloc[0]).sum()

Output:

>>> total
14.0

>>> df
     id                       Time  Extruder1  LineSpeed  Group
0   157  5/22/2023 10:14:09.229 PM     1560.0        0.0      0
1   158  5/22/2023 10:16:28.582 PM     1563.0        0.0      0
2   159   5/23/2023 7:17:37.831 AM     1563.0        0.0      0  # 1563 - 1560 = 3

3   160   5/23/2023 7:19:57.184 AM        0.0        0.0      1
4   161   5/23/2023 7:33:53.302 AM        2.0        0.0      1
5   162   5/23/2023 7:36:12.655 AM        4.0        0.0      1
6   163   5/23/2023 7:38:32.008 AM        6.0        0.0      1  # 6 - 0 = 6

7   334   5/23/2023 2:15:41.371 PM      789.0        0.0      2
8   335   5/23/2023 2:18:00.724 PM      792.0        0.0      2
9   336   5/23/2023 2:20:20.077 PM      794.0        0.0      2  # 794 - 789 = 5

10  349   5/23/2023 2:50:31.666 PM        2.0        0.0      3  # 2 - 2 = 0

IIUC, you can use:

df['Group'] = df['Extruder1'].lt(df['Extruder1'].shift()).cumsum()
# OR
df['Group'] = df['Extruder1'].diff().lt(0).cumsum()

print(df)

# Output
     id                       Time  Extruder1  LineSpeed  Group
0   157  5/22/2023 10:14:09.229 PM     1560.0        0.0      0
1   158  5/22/2023 10:16:28.582 PM     1563.0        0.0      0
2   159   5/23/2023 7:17:37.831 AM     1563.0        0.0      0
3   160   5/23/2023 7:19:57.184 AM        0.0        0.0      1
4   161   5/23/2023 7:33:53.302 AM        2.0        0.0      1
5   162   5/23/2023 7:36:12.655 AM        4.0        0.0      1
6   163   5/23/2023 7:38:32.008 AM        6.0        0.0      1
7   334   5/23/2023 2:15:41.371 PM      789.0        0.0      1
8   335   5/23/2023 2:18:00.724 PM      792.0        0.0      1
9   336   5/23/2023 2:20:20.077 PM      794.0        0.0      1
10  349   5/23/2023 2:50:31.666 PM        2.0        0.0      2

Step by step:

>>> out = df['Extruder1'].diff()
0        NaN
1        3.0
2        0.0
3    -1563.0
4        2.0
5        2.0
6        2.0
7      783.0
8        3.0
9        2.0
10    -792.0
Name: Extruder1, dtype: float64

>>> out = out.lt(0)
0     False
1     False
2     False
3      True
4     False
5     False
6     False
7     False
8     False
9     False
10     True
Name: Extruder1, dtype: bool

>>> out = out.cumsum()
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    2
Name: Extruder1, dtype: int64

Another representation (Python>=3.8):

>>> pd.concat([out := df['Extruder1'],
               out := out.diff(),
               out := out.lt(0),
               out.cumsum()], axis=1,
              keys=['Extruder1', 'diff', 'lt', 'cumsum'])

    Extruder1    diff     lt  cumsum
0      1560.0     NaN  False       0
1      1563.0     3.0  False       0
2      1563.0     0.0  False       0
3         0.0 -1563.0   True       1
4         2.0     2.0  False       1
5         4.0     2.0  False       1
6         6.0     2.0  False       1
7       789.0   783.0  False       1
8       792.0     3.0  False       1
9       794.0     2.0  False       1
10        2.0  -792.0   True       2

Hi Corralien, I did not explain it well enough. I need to total the Extruder1 column from where it starts until it resets 0 or ends. the first three columns would give me a total of 3(1560 to 1563), then the next four columns would give me a total of 6(0 to 6) to add to my total, then add 5(789 to 794) so my grand total in Extruder1 is 11. — Parker3306, Jun 07 '23 at 13:49
The last row just indicates that the counter on our extruder got reset and may go from 2.0 to 43, for example. — Parker3306, Jun 07 '23 at 14:06
How do you detect a new cycle? Can we use a gap (threshold). For example, how can I determine a new group between 6 and 789? — Corralien, Jun 07 '23 at 14:09
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253990/discussion-between-parker3306-and-corralien). — Parker3306, Jun 07 '23 at 14:11

Sum of column that resets to zero through out a process

1 Answers1