5

I have a numeric series like [0,0,0,0,1,1,1,0,0,1,1,0]. I would like to calculate the numeric sum from the last non-zero values. i.e the cumsum will be reset to zero once a zero entry occurs.

input: [0,0,0,0,1,1,1,0,0,1,1,0]
output:[0,0,0,0,1,2,3,0,0,1,2,0] 

Is there a built-in python function able to achieve this? Or better way to calculate it without loop?

Mala Pokhrel
  • 143
  • 2
  • 2
  • 19
AAA
  • 695
  • 1
  • 7
  • 21

2 Answers2

9

You can do it with itertools.accumulate. In addition to passing an iterable as the first argument, it accepts an optional 2nd argument that should be a 2 argument function where the first argument is the accumulated result and the second argument is the current element from the iterable. You can pass a fairly simple lambda as the optional 2nd argument to calculate the running total unless the current element is zero.

from itertools import accumulate

nums = [0,0,0,0,1,1,1,0,0,1,1,0]

result = accumulate(nums, lambda acc, elem: acc + elem if elem else 0)
print(list(result))
# [0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0]
benvc
  • 14,448
  • 4
  • 33
  • 54
  • thanks for your quick response. perfect! btw, can accumulate function be used on dataframe? Say, I have a matrix, where each column is a numeric series. I would like to calculate the cumsum for each column. – AAA Jun 12 '19 at 22:27
  • @AAA sure, `itertools.accumulate` will accept any iterable, so lots of ways to use with a dataframe, series, etc. – benvc Jun 12 '19 at 22:29
1

We can do this in numpy with two passes of np.cumsum(..). First we calculate the cumsum of the array:

a = np.array([0,0,0,0,1,1,1,0,0,1,1,0])
c = np.cumsum(a)

This gives us:

>>> c
array([0, 0, 0, 0, 1, 2, 3, 3, 3, 4, 5, 5])

Next we filter a on elements where the value is 0 and we elementwise calculate the difference between that element and its predecessor:

corr = np.diff(np.hstack(((0,), c[a == 0])))

then this is the correction we need to apply on those elements:

>>> corr
array([0, 0, 0, 0, 3, 0, 2])

We can then make a copy of a (or do this inplace), and subtract the correction:

a2 = a.copy()
a2[a == 0] -= corr

this gives us:

>>> a2
array([ 0,  0,  0,  0,  1,  1,  1, -3,  0,  1,  1, -2])

and now we can calculate the cummulative sum of a2 that will reset to 0 for an 0, since the correction keeps track of the increments in between:

>>> a2.cumsum()
array([0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0])

or as a function:

import numpy as np

def cumsumreset(iterable, reset=0):
    a = np.array(iterable)
    c = a.cumsum()
    a2 = a.copy()
    filter = a == reset
    a2[filter] -= np.diff(np.hstack(((0,), c[filter])))
    return a2.cumsum()

this then gives us:

>>> cumsumreset([0,0,0,0,1,1,1,0,0,1,1,0])
array([0, 0, 0, 0, 1, 2, 3, 0, 0, 1, 2, 0])
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555