12

I am looking to count the number of times the values in an array change in polarity (EDIT: Number of times the values in an array cross zero).

Suppose I have an array:

[80.6  120.8  -115.6  -76.1  131.3  105.1  138.4  -81.3
 -95.3  89.2  -154.1  121.4  -85.1  96.8  68.2]`

I want the count to be 8.

One solution is to run a loop and check for greater than or less than 0, and keep a history of the previous polarity.

Can we do this faster?

EDIT: My purpose is really to find something faster, because I have these arrays of length around 68554308, and I have to do these calculations on 100+ such arrays.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
Rahul Murmuria
  • 428
  • 1
  • 3
  • 16
  • Are you sure your expected count is 8, not 6? – Scott May 16 '15 at 06:34
  • 1
    Ok, so you are looking for the number of zero crossings? – Scott May 16 '15 at 06:36
  • the solution that yourself suggested seems to be a good one. you're not looking for a faster solution, but a solution with less code. – aliep May 16 '15 at 06:52
  • @Scott, yes, I am looking for the number of zero crossings. – Rahul Murmuria May 16 '15 at 07:38
  • @zero.zero.seven, I am actually looking for something faster. See EDIT in my question. – Rahul Murmuria May 16 '15 at 07:49
  • 1
    You should consider parallelising your code: look at `multiprocessing.Pool.map_async`, https://github.com/pydata/numexpr , PyCUDA, or MapReduce. – Kolmar May 16 '15 at 08:42
  • @RahulMurmuria There seems to be a different result provided by the numpy solution as compared to the others. I don't know which is correct. I posted a question http://stackoverflow.com/questions/30279315/different-results-to-counting-zero-crossings-of-a-large-sequence – Scott May 16 '15 at 18:33
  • Got an answer to my question in the link above. Might be of use to you. – Scott May 16 '15 at 19:05
  • @Scott, sorry I wasn't around during your delving. I had investigated this issue myself and found that 0 is handled differently while reading the documentation for np.diff(). I don't have any 0 in my arrays so it is a non-issue in my case. However, it is vital to understand this for any other applications. – Rahul Murmuria May 16 '15 at 21:18
  • Ok good. I was just curious myself and thought you should be aware (which you were). If you didn't see it you should check this answer http://stackoverflow.com/a/30279745/4663466 – Scott May 16 '15 at 21:31
  • 1
    Indeed! I have been experimenting with that. Seems so simple, I would have never attempted doing that multiplication, betting that it would be too costly. Definitely the best answer. – Rahul Murmuria May 16 '15 at 21:54

6 Answers6

16

This produces the same result:

import numpy as np
my_array = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3,  
                     89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
((my_array[:-1] * my_array[1:]) < 0).sum()

gives:

8

and seems to be the fastest solution:

%timeit ((my_array[:-1] * my_array[1:]) < 0).sum()
100000 loops, best of 3: 11.6 µs per loop

Compared to the fastest so far:

%timeit (np.diff(np.sign(my_array)) != 0).sum()
10000 loops, best of 3: 22.2 µs per loop

Also for larger arrays:

big = np.random.randint(-10, 10, size=10000000)

this:

%timeit ((big[:-1] * big[1:]) < 0).sum()
10 loops, best of 3: 62.1 ms per loop

vs:

%timeit (np.diff(np.sign(big)) != 0).sum()
1 loops, best of 3: 97.6 ms per loop
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
5

Here's a numpy solution. Numpy's methods are generally pretty fast and well-optimized, but if you're not already working with numpy there's probably some overhead from converting the list to a numpy array:

import numpy as np
my_list = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3,  89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
(np.diff(np.sign(my_list)) != 0).sum()
Out[8]: 8
Marius
  • 58,213
  • 16
  • 107
  • 105
  • This is crazy fast. I ran this against my and @Alik solution and I get a different result using numpy. Any idea why? See http://stackoverflow.com/questions/30279315/different-results-to-counting-zero-crossings-of-a-large-sequence. – Scott May 16 '15 at 18:34
  • 1
    This counts an extra crossing for each 0 in the input. I think that's fixed by doing the following: `(np.diff(np.sign(my_list)) != 0).sum() - (my_list == 0).sum()` – user2437378 Oct 03 '17 at 00:06
2

Based on Scott's answer

The generator expression proposed by Scott uses enumerate which returns tuples containing index and list item. List item are not used in the expression at all and discarded later. So better solution in terms of time would be

sum(1 for i in range(1, len(a)) if a[i-1]*a[i]<0)

If your list a is really huge, range may throw an exception. You can replace it with itertools.islice and itertools.count.

In Python version 2.x, use xrange instead of Python 3's range. In Python 3, xrange is no longer available.

Community
  • 1
  • 1
Konstantin
  • 24,271
  • 5
  • 48
  • 65
  • 2
    This **does not work when there is a zero** in the array! `[ 1, 2, 0, -1, 0, 0, -1, 2]` should yield `2` zero crossings, which it does not. Here is [a solution that handles zeros correctly](http://stackoverflow.com/a/40809378/2192488). – Serge Stroobandt Nov 25 '16 at 16:48
1

I think a loop is a straight forward way to go:

a = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3, 89.2, -154.1, 121.4, -85.1, 96.8, 68.2]

def change_sign(v1, v2):
    return v1 * v2 < 0

s = 0
for ind, _ in enumerate(a):
    if ind+1 < len(a):
        if change_sign(a[ind], a[ind+1]):
            s += 1
print s  # prints 8

You could use a generator expression but it gets ugly:

z_cross = sum(1 for ind, val in enumerate(a) if (ind+1 < len(a)) 
              if change_sign(a[ind], a[ind+1]))
print z_cross  # prints 8

EDIT:

@Alik pointed out that for huge lists the best option in space and time (at least out of the solutions we have considered) is not to call change_sign in the generator expression but to simply do:

z_cross = sum(1 for i, _ in enumerate(a) if (i+1 < len(a)) if a[i]*a[i+1]<0)
Scott
  • 6,089
  • 4
  • 34
  • 51
  • 1
    Just a note: `v1` and `v2` have different signs if `v1*v2 < 0`, so you can simplify your code a little bit – Konstantin May 16 '15 at 07:32
  • Yeah I was thinking that as I was writing it. Do you think that makes it less readable? – Scott May 16 '15 at 07:34
  • 1
    I am not sure, I am used to testing signs with multiplication so for me it makes code more readable. Anyway you can always leave a comment about `change_sign` function – Konstantin May 16 '15 at 07:38
  • 2
    It might be useful in your generator expression, but it will certainly require a comment. `z_cross = sum(1 for i, _ in enumerate(a) if (i+1 < len(a)) if a[i]*a[i+1]<0)`. Also note, that this is not a list comprehension, but a [generator expression](https://www.python.org/dev/peps/pep-0289/#rationale). – Konstantin May 16 '15 at 07:39
  • @Alik I agree, which is why I wrote the (mostly useless) function with a human readable name `change_sign`. Anyway, I think the short answer, among our many long answers, to the original question is: No, stick with a loop. Though there could be something in scipy/numpy since zero-crossings pops up a bit in signal processing. – Scott May 16 '15 at 07:45
  • @Alik Also thanks for highlighting your link, edited from list comprehension to generator. I started trying to solve this with list comprehension but then saw that '@fmatheis' beat me to it. – Scott May 16 '15 at 07:52
  • 1
    @Scott, I hope there is something in scipy/numpy that is faster! Thank you for actively brainstorming. I will spend a few more minutes on this before selecting an answer. – Rahul Murmuria May 16 '15 at 07:53
  • @RahulMurmuria If you have truly huge lists, the generator expression is a good choice as far as memory use. – Scott May 16 '15 at 08:03
  • 2
    @Scott, did a few tests. Plain cycle seems to be the slowest solution, then goes your generator expression and [modified generator expression](http://stackoverflow.com/questions/30272538/python-code-for-counting-number-of-zero-crossings-in-an-array#comment48645217_30272583) leads by 25% – Konstantin May 16 '15 at 08:09
  • @Alik Wow. I was running timeit as well and added the "modified generator expression" after your comment. That's impressive what a function call does 1e7 times. I'll point that out in my answer. – Scott May 16 '15 at 08:14
  • @Alik, I checked Soon's answer as well and it seems that the "modified generator expression" is the fastest – Rahul Murmuria May 16 '15 at 08:22
  • @Alik, I suppose Marius has given a faster solution! – Rahul Murmuria May 16 '15 at 08:29
  • @RahulMurmuria test it, upvote and accept it then. Upvote Scott's answer as well, because someone might might find his approach useful – Konstantin May 16 '15 at 08:33
  • This **does not work when there is a zero** in the array! `[ 1, 2, 0, -1, 0, 0, -1, 2]` should yield `2` zero crossings, which it does not. Here is [a solution that handles zeros correctly](http://stackoverflow.com/a/40809378/2192488). – Serge Stroobandt Nov 25 '16 at 16:48
0

Seems like, you want to group numbers by their sign. This could be done using built-in method groupby:

In [2]: l = [80.6,  120.8,  -115.6,  -76.1,  131.3,  105.1,  138.4,  -81.3, -95.3,  89.2,  -154.1,  121.4,  -85.1,  96.8,  68.2]

In [3]: from itertools import groupby

In [5]: list(groupby(l, lambda x: x < 0))
Out[5]: 
[(False, <itertools._grouper at 0x7fc9022095f8>),
 (True, <itertools._grouper at 0x7fc902209828>),
 (False, <itertools._grouper at 0x7fc902209550>),
 (True, <itertools._grouper at 0x7fc902209e80>),
 (False, <itertools._grouper at 0x7fc902209198>),
 (True, <itertools._grouper at 0x7fc9022092e8>),
 (False, <itertools._grouper at 0x7fc902209240>),
 (True, <itertools._grouper at 0x7fc902209908>),
 (False, <itertools._grouper at 0x7fc9019a64e0>)]

Then you should use function len which returns the number of groups:

In [7]: len(list(groupby(l, lambda x: x < 0)))
Out[7]: 9

Obviously, there will be at least one group (for a non-empty list), but if you want to count the number of points, where a sequence changes its polarity, you could just subtract one group. Do not forget about the empty-list case.

You should also take care about zero elements: shouldn't they be extracted into another group? If so, you could just change the key argument (lambda function) of groupby function.

awesoon
  • 32,469
  • 11
  • 74
  • 99
  • Very promising solution! Could this be faster than looping? I am running some tests. – Rahul Murmuria May 16 '15 at 08:00
  • this is definitely a different approach, but using groupby and lambda will make more expensive than using a simple for loop.. – Shan Valleru May 16 '15 at 08:32
  • @RahulMurmuria, no, this can't be faster, than looping, but it is definitely more readable than regular loop-based solution. If you are talking about speed, you should also provide the worst case and expected time. – awesoon May 16 '15 at 09:34
0

You can achieve it using list comprehension:

myList = [80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3,  89.2, -154.1, 121.4, -85.1, 96.8, 68.2]
len([x for i, x in enumerate(myList) if i > 0 and ((myList[i-1] > 0 and myList[i] < 0) or (myList[i-1] < 0 and myList[i] > 0))])
fmatheis
  • 241
  • 3
  • 8
  • 1
    on the first iteration aren't you comparing the last value in mylist with the first? i.e. when `i=0` you have `myList[-1]>0 and myList[0]<0...` – Scott May 16 '15 at 07:18
  • This could be bad say if the first value was -80.6, in our example we would expect 9 zero crossings, but your solution will give 10. – Scott May 16 '15 at 07:21
  • 1
    Thanks Scott, it is true, I added a condition to fix that problem. – fmatheis May 16 '15 at 07:29