1

Is there an easy way to go about a total for a column that increments but can reset back to zero through out the dataset? I have started to go down the path of a for loop and keeping track of previous value if it isn't a zero and using multiple variables, but wanted to check if there is a better way to go about it. Here is an example of the data I'm trying to work with:

id Time Extruder1 LineSpeed
157 5/22/2023 10:14:09.229 PM 1560.0 0.0
158 5/22/2023 10:16:28.582 PM 1563.0 0.0
159 5/23/2023 7:17:37.831 AM 1563.0 0.0
160 5/23/2023 7:19:57.184 AM 0.0 0.0
161 5/23/2023 7:33:53.302 AM 2.0 0.0
162 5/23/2023 7:36:12.655 AM 4.0 0.0
163 5/23/2023 7:38:32.008 AM 6.0 0.0
334 5/23/2023 2:15:41.371 PM 789.0 0.0
335 5/23/2023 2:18:00.724 PM 792.0 0.0
336 5/23/2023 2:20:20.077 PM 794.0 0.0
349 5/23/2023 2:50:31.666 PM 2.0 0.0

I would need to get a total of 3 from the first three rows and then get 6 from the next four, and so on. Is there a method in Pandas that will work for this type of column?

James
  • 32,991
  • 4
  • 47
  • 70
Parker3306
  • 63
  • 3
  • 9

1 Answers1

3

Following our conversation, if there is a gap greater than 10, we can consider we have a new group:

THRESHOLD = 10
df['Group'] = df['Extruder1'].diff().abs().gt(THRESHOLD).cumsum()
total = df.groupby('Group')['Extruder1'].agg(lambda x: x.iloc[-1] - x.iloc[0]).sum()

Output:

>>> total
14.0

>>> df
     id                       Time  Extruder1  LineSpeed  Group
0   157  5/22/2023 10:14:09.229 PM     1560.0        0.0      0
1   158  5/22/2023 10:16:28.582 PM     1563.0        0.0      0
2   159   5/23/2023 7:17:37.831 AM     1563.0        0.0      0  # 1563 - 1560 = 3

3   160   5/23/2023 7:19:57.184 AM        0.0        0.0      1
4   161   5/23/2023 7:33:53.302 AM        2.0        0.0      1
5   162   5/23/2023 7:36:12.655 AM        4.0        0.0      1
6   163   5/23/2023 7:38:32.008 AM        6.0        0.0      1  # 6 - 0 = 6

7   334   5/23/2023 2:15:41.371 PM      789.0        0.0      2
8   335   5/23/2023 2:18:00.724 PM      792.0        0.0      2
9   336   5/23/2023 2:20:20.077 PM      794.0        0.0      2  # 794 - 789 = 5

10  349   5/23/2023 2:50:31.666 PM        2.0        0.0      3  # 2 - 2 = 0

IIUC, you can use:

df['Group'] = df['Extruder1'].lt(df['Extruder1'].shift()).cumsum()
# OR
df['Group'] = df['Extruder1'].diff().lt(0).cumsum()

print(df)

# Output
     id                       Time  Extruder1  LineSpeed  Group
0   157  5/22/2023 10:14:09.229 PM     1560.0        0.0      0
1   158  5/22/2023 10:16:28.582 PM     1563.0        0.0      0
2   159   5/23/2023 7:17:37.831 AM     1563.0        0.0      0
3   160   5/23/2023 7:19:57.184 AM        0.0        0.0      1
4   161   5/23/2023 7:33:53.302 AM        2.0        0.0      1
5   162   5/23/2023 7:36:12.655 AM        4.0        0.0      1
6   163   5/23/2023 7:38:32.008 AM        6.0        0.0      1
7   334   5/23/2023 2:15:41.371 PM      789.0        0.0      1
8   335   5/23/2023 2:18:00.724 PM      792.0        0.0      1
9   336   5/23/2023 2:20:20.077 PM      794.0        0.0      1
10  349   5/23/2023 2:50:31.666 PM        2.0        0.0      2

Step by step:

>>> out = df['Extruder1'].diff()
0        NaN
1        3.0
2        0.0
3    -1563.0
4        2.0
5        2.0
6        2.0
7      783.0
8        3.0
9        2.0
10    -792.0
Name: Extruder1, dtype: float64

>>> out = out.lt(0)
0     False
1     False
2     False
3      True
4     False
5     False
6     False
7     False
8     False
9     False
10     True
Name: Extruder1, dtype: bool

>>> out = out.cumsum()
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    2
Name: Extruder1, dtype: int64

Another representation (Python>=3.8):

>>> pd.concat([out := df['Extruder1'],
               out := out.diff(),
               out := out.lt(0),
               out.cumsum()], axis=1,
              keys=['Extruder1', 'diff', 'lt', 'cumsum'])

    Extruder1    diff     lt  cumsum
0      1560.0     NaN  False       0
1      1563.0     3.0  False       0
2      1563.0     0.0  False       0
3         0.0 -1563.0   True       1
4         2.0     2.0  False       1
5         4.0     2.0  False       1
6         6.0     2.0  False       1
7       789.0   783.0  False       1
8       792.0     3.0  False       1
9       794.0     2.0  False       1
10        2.0  -792.0   True       2
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Hi Corralien, I did not explain it well enough. I need to total the Extruder1 column from where it starts until it resets 0 or ends. the first three columns would give me a total of 3(1560 to 1563), then the next four columns would give me a total of 6(0 to 6) to add to my total, then add 5(789 to 794) so my grand total in Extruder1 is 11. – Parker3306 Jun 07 '23 at 13:49
  • Do you mean "rows" instead of "columns"? – Corralien Jun 07 '23 at 13:51
  • Yes, sorry! I mean the rows of Extruder1 column... – Parker3306 Jun 07 '23 at 14:00
  • What about the last row? – Corralien Jun 07 '23 at 14:03
  • The last row just indicates that the counter on our extruder got reset and may go from 2.0 to 43, for example. – Parker3306 Jun 07 '23 at 14:06
  • How do you detect a new cycle? Can we use a gap (threshold). For example, how can I determine a new group between 6 and 789? – Corralien Jun 07 '23 at 14:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253990/discussion-between-parker3306-and-corralien). – Parker3306 Jun 07 '23 at 14:11