3

I have a list e.g. my_list = [1, 3, 5, 7, 14, 16, 18, 22, 28, 30, 32, 41, 43]

I want a function that will return all values from the list where the difference between that value and previous value is not equal to 2, e.g. the function will return [1, 14, 22, 28, 41] for the above list. Note that the first value of my_list will always appear as the first value of the output. The input lists are of non-zero length and up to the order of 100's.

So far I have this:

def get_output(array):
    start = [array[0]]
    for i in range(1, len(array)-1):
        if (array[i] - array[i-1]) != 2:
            start.append(array[i])

    return start

Is there a vectorised solution that would be faster, bearing in mind I will be applying this function to thousands of input arrays?

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
Imran
  • 608
  • 10
  • 17
  • 1
    why is it returning `1` when there is no element before it? – Azat Ibrakov Sep 12 '17 at 07:08
  • @AzatIbrakov that's what I want it to return. First element of output is always first element of input. – Imran Sep 12 '17 at 07:09
  • To vectorize your function you need to use numpy. Maybe [this](https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array) may help. – RedEyed Sep 12 '17 at 07:15

4 Answers4

5

To avoid using the inefficient np.concat, use np.ediff1 instead of np.diff, which takes a to_begin argument to pre-pend to the result:

>>> my_list = [1, 3, 5, 7, 14, 16, 18, 22, 28, 30, 32, 41, 43]
>>> arr = np.array(my_list)
>>> np.ediff1d(arr, to_begin=0)
array([0, 2, 2, 2, 7, 2, 2, 4, 6, 2, 2, 9, 2])

So now, using boolean-indexing:

>>> arr[np.ediff1d(arr, to_begin=0) != 2]
array([ 1, 14, 22, 28, 41])
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
2

Apart from the first element which you can add manually (although it doesn't really make sense as per Azat Ibrakov comment) you can use np.where

a = np.array([1, 3, 5, 7, 14, 16, 18, 22, 28, 30, 32, 41, 43])
a[np.where(a[1:] - a[:-1] != 2)[0] + 1]

array([14, 22, 28, 41])

Adding first element:

[a[0]] + list(a[np.where(a[1:] - a[:-1] != 2)[0] + 1])

[1, 14, 22, 28, 41]
Julien
  • 13,986
  • 5
  • 29
  • 53
  • Does a[1:] return a copy object? I mean, how much memory does a[1:] use? As I know, slices return a copy, so it is not memory efficient. – RedEyed Sep 12 '17 at 07:18
  • No copy: slices are just views in numpy. – Julien Sep 12 '17 at 07:19
  • Could you prove this (some links)? Because slices of lists is a copy, isn't it? – RedEyed Sep 12 '17 at 07:20
  • 1
    try it yourself: modify a slice of a np.array, the original will be modified too. – Julien Sep 12 '17 at 07:21
  • 1
    Try `a = np.arange(10)`, `b = a[:5]`, `b[0] = 10`, `print(a)` – Daniel F Sep 12 '17 at 07:24
  • Boolean and lists of indices create a copy, slices create views. – Daniel F Sep 12 '17 at 07:25
  • @Julien, thanks! I checked it out and you are right! What about python lists, is there a method to make list slices like numpy slices(views)? – RedEyed Sep 12 '17 at 07:26
  • 1
    @DanielF it would take some serious hacking with perhaps `struct` to get the underlying buffer for the `list`. This would rely on python-version-specific implementation details. It would also be entirely unsafe, seeing as Python lists are re-sizable, and the underlying memory would be re-leased and re-allocated somewhere else potentially every time the list re-sizes. – juanpa.arrivillaga Sep 12 '17 at 07:33
  • @DanielF, Thanks, I'm not. I heard about memoryvie, I'll try it. – RedEyed Sep 12 '17 at 07:33
  • 1
    And if you want speed and vectorization, numpy is the way, don't waste time hacking python lists for worse results... – Julien Sep 12 '17 at 07:35
  • 1
    I think I created a monster >.< Deleting that comment so no one else tries it. – Daniel F Sep 12 '17 at 07:38
2

You could use boolean array indexing for NumPy arrays and np.diff to get the difference between values:

>>> my_list = [1, 3, 5, 7, 14, 16, 18, 22, 28, 30, 32, 41, 43]
>>> import numpy as np
>>> my_arr = np.array(my_list)
>>> my_mask = np.ones(my_arr.shape, dtype=bool)  # initial mask
>>> my_mask[1:] = np.diff(my_arr) != 2           # set all elements to False that have a difference of 2
>>> my_arr[my_mask]                              # mask the array
array([ 1, 14, 22, 28, 41])
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 1
    Might be a little more efficient to initialize `my_mask` with `np.empty(my_arr.shape, dtype=bool)`, `my_mask[0] = True`. No need to fill with ones from the start – Daniel F Sep 12 '17 at 07:23
  • That's right but the solution already is quite long-ish and the benefit will be quite small. But thank you, I didn't think of that! :) – MSeifert Sep 12 '17 at 07:30
0
import numpy as np

my_list = [1, 3, 5, 7, 14, 16, 18, 22, 28, 30, 32, 41, 43]
a = np.array(my_list)
output = a[[True] + list(a[1:]-a[:-1] != 2)]
print(output)
FooBar167
  • 2,721
  • 1
  • 26
  • 37