5

This question stems from looking at the answers provided to this question regarding counting the number of zero crossings. Several answer were provided that solve the problem, but the NumPy appproach destroyed the others with respect to time.

When I compared four of the answers however I notice that the NumPy solution provides a different result for large sequences. The four answers in question are loop and simple generator, better generator expression , and NumPy solution.

Question: Why does the NumPy solution provide a different result than the other three? (and which is correct?)

Here are the results for counting the number of zero crossings:

Blazing fast NumPy solution
total time: 0.303605794907 sec
Zero Crossings Small: 8
Zero Crossings Med: 54464
Zero Crossings Big: 5449071

Loop solution
total time: 15.6818780899 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Simple generator expression solution
total time: 16.3374049664 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

Modified generator expression solution
total time: 13.6596589088 sec
Zero Crossings Small: 8
Zero Crossings Med: 44960
Zero Crossings Big: 4496847

And the code used to get the results:

import time
import numpy as np

def zero_crossings_loop(sequence):
    s = 0
    for ind, _ in enumerate(sequence):
        if ind+1 < len(sequence):
            if sequence[ind]*sequence[ind+1] < 0:
                s += 1
    return s

def print_three_results(r1, r2, r3):
    print 'Zero Crossings Small:', r1
    print 'Zero Crossings Med:', r2
    print 'Zero Crossings Big:', r3
    print '\n'

small = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

print 'Blazing fast NumPy solution'
tic = time.time()
z1 = (np.diff(np.sign(small)) != 0).sum()
z2 = (np.diff(np.sign(med)) != 0).sum()
z3 = (np.diff(np.sign(big)) != 0).sum()
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Loop solution'
tic = time.time()
z1 = zero_crossings_loop(small)
z2 = zero_crossings_loop(med)
z3 = zero_crossings_loop(big)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Simple generator expression solution'
tic = time.time()
z1 = sum(1 for i, _ in enumerate(small) if (i+1 < len(small)) if small[i]*small[i+1] < 0)
z2 = sum(1 for i, _ in enumerate(med) if (i+1 < len(med)) if med[i]*med[i+1] < 0)
z3 = sum(1 for i, _ in enumerate(big) if (i+1 < len(big)) if big[i]*big[i+1] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)

print 'Modified generator expression solution'
tic = time.time()
z1 = sum(1 for i in xrange(1, len(small)) if small[i-1]*small[i] < 0)
z2 = sum(1 for i in xrange(1, len(med)) if med[i-1]*med[i] < 0)
z3 = sum(1 for i in xrange(1, len(big)) if big[i-1]*big[i] < 0)
print 'total time: {0} sec'.format(time.time()-tic)
print_three_results(z1, z2, z3)
Community
  • 1
  • 1
Scott
  • 6,089
  • 4
  • 34
  • 51
  • 1
    Aside: right now, your non-numpy methods (treating them as your reference behaviour) would say that [-1,0,1,0,-1] never crossed zero, because you're only looking for when you cross zero in one step. Is that what you intended? – DSM May 16 '15 at 18:55
  • That was my interpretation of the original question. I'll leave the decision of what is correct to the question's author. – Scott May 16 '15 at 18:57

3 Answers3

5

Your solutions differ in their treatment of zero. The numpy.diff solution will still return a diff going from -1 to 0 or 1 to 0, counting those as a zero crossing, while your iterative solutions don't because they use the product being less than zero as their criterion. Instead, test for <= 0, and the numbers will be equivalent.

jwilner
  • 6,348
  • 6
  • 35
  • 47
4

I get the same results as the loop with:

((array[:-1] * array[1:]) < 0).sum()

This:

small = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3,
                 -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
med = np.random.randint(-10, 10, size=100000)
big = np.random.randint(-10, 10, size=10000000)

for name, array in [('small', small), ('med', med), ('big', big)]:
    print('loop ', name, zero_crossings_loop(array))
    print('Numpy', name, ((array[:-1] * array[1:]) < 0).sum())

prints:

loop  small 8
Numpy small 8
loop  med 44901
Numpy med 44901
loop  big 4496911
Numpy big 4496911

UDPATE

This version avoids the problem with zeros:

def numpy_zero_crossings2(array):
    nonzero_array = array[np.nonzero(array)]
    return ((nonzero_array[:-1] * nonzero_array[1:]) < 0).sum()

It gives the same result as the answer by @djsutton:

>>> numpy_zero_crossings2(big) == numpy_zero_crossings(big)     
True

but seesm a bit faster:

%timeit numpy_zero_crossings2(big)
1 loops, best of 3: 194 ms per loop

vs:

%timeit numpy_zero_crossings(big)
1 loops, best of 3: 227 ms per loop
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • This is actually faster than the numpy.diff/numpy.sign solution. Very cool. @RahulMurmuria, this is faster than fast! With the option to count zeros as well as crossings (or not) by using either `<0` or `<=0`. Though it doesn't address @djsutton observation. Not saying that it was supposed to. – Scott May 16 '15 at 19:55
  • 1
    You could post this as an answer here: http://stackoverflow.com/questions/30272538/python-code-for-counting-number-of-zero-crossings-in-an-array/ and point out that it can solve both cases counting zero and not. – Scott May 16 '15 at 21:30
3

Both the iterative and numpy solutions do not do well at counting crossings when a data element is equal to zero. For the data [1,0,-1] the iterative solution gives 0 crossings and the numpy solution gives 2 crossings, neither of which seems correct.

One solution would be to drop data elements equal to zero. In NumPy you might try something like

def numpy_zero_crossings(data):
    return (np.diff(np.sign(data)[np.nonzero(data)]) != 0).sum()

However, this introduces another iteration through the array, so it will increase run time by another O(n)

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
djsutton
  • 171
  • 2
  • 9