Calculate cumulative sum from last non-zero entry in python

Question

I have a numeric series like [0,0,0,0,1,1,1,0,0,1,1,0]. I would like to calculate the numeric sum from the last non-zero values. i.e the cumsum will be reset to zero once a zero entry occurs.

input: [0,0,0,0,1,1,1,0,0,1,1,0]
output:[0,0,0,0,1,2,3,0,0,1,2,0]

Is there a built-in python function able to achieve this? Or better way to calculate it without loop?

yes, do numpy and pandas have any function related? – AAA Jun 12 '19 at 22:22 — AAA, Jun 12 '19 at 22:22

benvc · Answer 1 · 2021-06-02T16:18:12.763

9

You can do it with itertools.accumulate. In addition to passing an iterable as the first argument, it accepts an optional 2nd argument that should be a 2 argument function where the first argument is the accumulated result and the second argument is the current element from the iterable. You can pass a fairly simple lambda as the optional 2nd argument to calculate the running total unless the current element is zero.

from itertools import accumulate

nums = [0,0,0,0,1,1,1,0,0,1,1,0]

result = accumulate(nums, lambda acc, elem: acc + elem if elem else 0)
print(list(result))
# [0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0]

edited Jun 02 '21 at 16:18

answered Jun 12 '19 at 22:18

benvc

14,448
4
33
54

thanks for your quick response. perfect! btw, can accumulate function be used on dataframe? Say, I have a matrix, where each column is a numeric series. I would like to calculate the cumsum for each column. – AAA Jun 12 '19 at 22:27
@AAA sure, `itertools.accumulate` will accept any iterable, so lots of ways to use with a dataframe, series, etc. – benvc Jun 12 '19 at 22:29

score 1 · Answer 2 · answered Jun 12 '19 at 23:06

We can do this in numpy with two passes of np.cumsum(..). First we calculate the cumsum of the array:

a = np.array([0,0,0,0,1,1,1,0,0,1,1,0])
c = np.cumsum(a)

This gives us:

>>> c
array([0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5])

Next we filter a on elements where the value is 0 and we elementwise calculate the difference between that element and its predecessor:

corr = np.diff(np.hstack(((0,), c[a == 0])))

then this is the correction we need to apply on those elements:

>>> corr
array([0, 0, 0, 0, 3, 0, 2])

We can then make a copy of a (or do this inplace), and subtract the correction:

a2 = a.copy()
a2[a == 0] -= corr

this gives us:

>>> a2
array([ 0,  0,  0,  0,  1,  1,  1, -3,  0,  1,  1, -2])

and now we can calculate the cummulative sum of a2 that will reset to 0 for an 0, since the correction keeps track of the increments in between:

>>> a2.cumsum()
array([0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0])

or as a function:

import numpy as np

def cumsumreset(iterable, reset=0):
    a = np.array(iterable)
    c = a.cumsum()
    a2 = a.copy()
    filter = a == reset
    a2[filter] -= np.diff(np.hstack(((0,), c[filter])))
    return a2.cumsum()

this then gives us:

>>> cumsumreset([0,0,0,0,1,1,1,0,0,1,1,0])
array([0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0])

Calculate cumulative sum from last non-zero entry in python

2 Answers2