4

I have a numpy array with only -1, 1 and 0, like this:

np.array([1,1,-1,-1,0,-1,1])

I would like a new array that counts the -1 encountered. The counter must reset when a 0 appears and remain the same when it's a 1:

Desired output:

np.array([0,0,1,2,0,1,1])

The solution must be very little time consuming when used with larger array (up to 100 000)


Edit: Thanks for your contribution, I've a working solution for now.

I'm still looking for a non-iterative way to solve it (no for loop). Maybe with a pandas Series and the cumsum() method ?

tdy
  • 36,675
  • 19
  • 86
  • 83

5 Answers5

2

Maybe with a pandas Series and the cumsum() method?

Yes, use Series.cumsum and Series.groupby:

s = pd.Series([1, 1, -1, -1, 0, -1, 1])

s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])

Step-by-step

  1. Create pseudo-groups that reset when equal to 0:

    groups = s.eq(0).cumsum()
    # array([0, 0, 0, 0, 1, 1, 1])
    
  2. Then groupby these pseudo-groups and cumsum when equal to -1:

    s.eq(-1).groupby(groups).cumsum().to_numpy()
    # array([0, 0, 1, 2, 0, 1, 1])
    

Timings

not time consuming when used with larger array (up to 100,000)

groupby + cumsum is ~8x faster than looping, given np.random.choice([-1, 0, 1], size=100_000):

%timeit series_cumsum(a)
# 3.29 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit miki_loop(a)
# 26.5 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit skyrider_loop(a)
# 26.8 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
tdy
  • 36,675
  • 19
  • 86
  • 83
1

Let's first save your numpy array in a variable:

a = np.array([1,1,-1,-1,0,-1,1])

I define a variabel, count to hold the value you care about, and set it to be zero. Then I define a list to hold the new elements. Let's call it l. Then I iterate on elemnts of a and in each ieration I name the element i. Inside each iteration, I implement the logic:

  • if i is -1, then increase counter
  • else, if i is 0, reset the counter
  • and do nothing otherwise And finally, I append the counter to l. Lastly, convert l to be a numpy array, out.
l = []
count = 0
for i in a:
    if i == -1:
        count+=1
    elif i==0: 
        count = 0
    l.append(count)
out = np.array(l)
out
Fatemeh Sangin
  • 558
  • 1
  • 4
  • 19
  • 1
    While this code may answer the question, [including an explanation](https://meta.stackoverflow.com/questions/392712/explaining-entirely-code-based-answers) of how or why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – ppwater Dec 09 '21 at 09:25
  • 1
    dear @ppwater, Is it better now? – Fatemeh Sangin Dec 09 '21 at 10:12
1

I seem to get a 10x speedup over Pandas solution with numba for this benchmark:

from numba import jit

inp1 = np.array([1,1,-1,-1,0,-1,1], dtype=int)
inp2 = np.random.randint(-1, 10, size=10**6)

@jit
def with_numba(arr):
  val = 0
  put = np.zeros_like(arr)
  for i in range(arr.size):
    if arr[i] == -1:
      val += 1
    elif arr[i] == 0:
      val = 0
    put[i] = val

  return put

def with_pandas(inp):
  s = pd.Series(inp)
  return s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
  
assert (with_numba(inp1) == with_pandas(inp1)).all()
assert (with_numba(inp2) == with_pandas(inp2)).all()

%timeit with_numba(inp2)
# 100 loops, best of 5: 4.57 ms per loop
%timeit with_pandas(inp2)
# 10 loops, best of 5: 46.3 ms per loop
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
0

Use a for loop. Set a variable which starts at 1 and reset it each time you encounter a different number. For example:

counter = 1;
outputArray = [];
for number in npArray:
    if number == -1:
        outputArray.append(counter)
        counter += 1
    elif number == 1:
        outputArray.append(0)
    else:
        outputArray.append(0)
        counter = 1
print(outputArray)
0

Here is a fix for @skyrider's code

npArray = [1,1,-1,-1,0,-1,1]
counter = 0
outputArray = []
for number in npArray:
    if number == -1:
        counter += 1
        outputArray.append(counter)
    elif number == 0:
        outputArray.append(0)
        counter = 0
    else:
        outputArray.append(counter)
print(outputArray)
Miki
  • 157
  • 1
  • 8