Different results to counting zero-crossings of a large sequence

Question

This question stems from looking at the answers provided to this question regarding counting the number of zero crossings. Several answer were provided that solve the problem, but the NumPy appproach destroyed the others with respect to time.

When I compared four of the answers however I notice that the NumPy solution provides a different result for large sequences. The four answers in question are loop and simple generator, better generator expression , and NumPy solution.

Question: Why does the NumPy solution provide a different result than the other three? (and which is correct?)

Here are the results for counting the number of zero crossings:

Blazing fast NumPy solution
total time: 0.303605794907 sec
Zero Crossings Small: 8
Zero Crossings Med: 54464
Zero Crossings Big: 5449071

Loop solution
total time: 15.6818780899 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Simple generator expression solution
total time: 16.3374049664 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Modified generator expression solution
total time: 13.6596589088 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

And the code used to get the results:

import time
import numpy as np

def zero_crossings_loop(sequence):
    s = 0
    for ind, _ in enumerate(sequence):
        if ind+1 < len(sequence):
            if sequence[ind]*sequence[ind+1] < 0:
                s += 1
    return s

def print_three_results(r1, r2, r3):
    print 'Zero Crossings Small:', r1
    print 'Zero Crossings Med:', r2
    print 'Zero Crossings Big:', r3
    print '\n'

small = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

print 'Blazing fast NumPy solution'
tic = time.time()
z1 = (np.diff(np.sign(small)) != 0).sum()
z2 = (np.diff(np.sign(med)) != 0).sum()
z3 = (np.diff(np.sign(big)) != 0).sum()
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Loop solution'
tic = time.time()
z1 = zero_crossings_loop(small)
z2 = zero_crossings_loop(med)
z3 = zero_crossings_loop(big)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Simple generator expression solution'
tic = time.time()
z1 = sum(1 for i, _ in enumerate(small) if (i+1 < len(small)) if small[i]*small[i+1] < 0)
z2 = sum(1 for i, _ in enumerate(med) if (i+1 < len(med)) if med[i]*med[i+1] < 0)
z3 = sum(1 for i, _ in enumerate(big) if (i+1 < len(big)) if big[i]*big[i+1] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Modified generator expression solution'
tic = time.time()
z1 = sum(1 for i in xrange(1, len(small)) if small[i-1]*small[i] < 0)
z2 = sum(1 for i in xrange(1, len(med)) if med[i-1]*med[i] < 0)
z3 = sum(1 for i in xrange(1, len(big)) if big[i-1]*big[i] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

Aside: right now, your non-numpy methods (treating them as your reference behaviour) would say that [-1,0,1,0,-1] never crossed zero, because you're only looking for when you cross zero in one step. Is that what you intended? — DSM, May 16 '15 at 18:55
That was my interpretation of the original question. I'll leave the decision of what is correct to the question's author. — Scott, May 16 '15 at 18:57

jwilner · Accepted Answer · 2015-05-17T17:42:58.770

5

Your solutions differ in their treatment of zero. The numpy.diff solution will still return a diff going from -1 to 0 or 1 to 0, counting those as a zero crossing, while your iterative solutions don't because they use the product being less than zero as their criterion. Instead, test for <= 0, and the numbers will be equivalent.

edited May 17 '15 at 17:42

answered May 16 '15 at 18:42

jwilner

6,348
6
35
47

Mike Müller · Answer 2 · 2015-05-16T22:09:53.070

I get the same results as the loop with:

((array[:-1] * array[1:]) < 0).sum()

This:

small = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3,
                 -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

for name, array in [('small', small), ('med', med), ('big', big)]:
    print('loop ', name, zero_crossings_loop(array))
    print('Numpy', name, ((array[:-1] * array[1:]) < 0).sum())

prints:

loop  small 8
Numpy small 8
loop  med 44901
Numpy med 44901
loop  big 4496911
Numpy big 4496911

UDPATE

This version avoids the problem with zeros:

def numpy_zero_crossings2(array):
    nonzero_array = array[np.nonzero(array)]
    return ((nonzero_array[:-1] * nonzero_array[1:]) < 0).sum()

It gives the same result as the answer by @djsutton:

>>> numpy_zero_crossings2(big) == numpy_zero_crossings(big)     
True

but seesm a bit faster:

%timeit numpy_zero_crossings2(big)
1 loops, best of 3: 194 ms per loop

vs:

%timeit numpy_zero_crossings(big)
1 loops, best of 3: 227 ms per loop

This is actually faster than the numpy.diff/numpy.sign solution. Very cool. @RahulMurmuria, this is faster than fast! With the option to count zeros as well as crossings (or not) by using either `<0` or `<=0`. Though it doesn't address @djsutton observation. Not saying that it was supposed to. — Scott, May 16 '15 at 19:55
You could post this as an answer here: http://stackoverflow.com/questions/30272538/python-code-for-counting-number-of-zero-crossings-in-an-array/ and point out that it can solve both cases counting zero and not. — Scott, May 16 '15 at 21:30

score 3 · Answer 3 · edited May 16 '15 at 19:23

3

Both the iterative and numpy solutions do not do well at counting crossings when a data element is equal to zero. For the data [1,0,-1] the iterative solution gives 0 crossings and the numpy solution gives 2 crossings, neither of which seems correct.

One solution would be to drop data elements equal to zero. In NumPy you might try something like

def numpy_zero_crossings(data):
    return (np.diff(np.sign(data)[np.nonzero(data)]) != 0).sum()

However, this introduces another iteration through the array, so it will increase run time by another O(n)

edited May 16 '15 at 19:23

Mike Müller

82,630
20
166
161

answered May 16 '15 at 19:13

djsutton

171
2
9

2

Great observation. @RahulMurmuria might be interested in this. – Scott May 16 '15 at 19:19

Different results to counting zero-crossings of a large sequence

3 Answers3

Linked