2

I have

x = np.array([20000, 700, 1000, -5000, -250, 30, -1000, 50, -30, 75, -999])

and I want to exclude the values -1000, -30 from x because there are corresponding values 1000 and 30 in front of them. I want to get

y = np.array([-5000, -250, -999])
user270199
  • 905
  • 1
  • 6
  • 12

3 Answers3

3

There is a fast O(n log n) valid and vectorized Numpy implementation.

The idea is to find the unique values in x (with np.unique) and locate for each unique value its first position. Then you can select the value v in x if -v is found before and v < 0. To find if it found before, you can perform a dichotomy in the sorted unique value (with np.searchsorted) to find if the current index is greater than the index found (in the unique values).

Here is the resulting code:

xUnique, xFirstPos = np.unique(x, return_index=True)
xIsNeg = x < 0
xNeg = -x
xNegUniquePos = np.searchsorted(xUnique, xNeg)
xNegIsFound = xUnique[xNegUniquePos] == xNeg
xHasNegBefore = np.logical_and(xNegIsFound, xFirstPos[xNegUniquePos] < np.arange(len(x)))
result = x[np.logical_and(xIsNeg, np.logical_not(xHasNegBefore))]
print(result)

Here is the result on some examples:

x = np.array([20000, 700, 1000, -5000, -250, 30, -1000, 50, -30, 75, -999])
result = np.array([-5000,  -250,  -999])

x = np.array([-5, 5, -5])
result = np.array([-5])

Here are timings for a random array of size 100_000 (with 33% of negative values in the range -1_000_000 to 2_000_000):

Mad Physicist's Numpy implementation:         38900.0 ms
Emi OB's implementation:                       1360.0 ms (incorrect so far)
Mad Physicist's pure Python implementation:      40.0 ms
This implementation:                             14.1 ms

So far this implementation is much faster than the other ones. It is worth nothing that the Mad Physicist's Numpy implementation takes several GiB of memory for this input size while the other solutions (including this one) take no more than 10 MiB.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
1

You likely won't get an O(n) implementation with pure numpy, but you could use a dict:

lookup = {v: i for i, v in enumerate(x) if v > 0}
result = [v for i, v in enumerate(x) if v < 0 and lookup.get(-v, x.size) > i]

This will be a bit slow in practice, but have good time complexity. A more practical solution would be to use numpy throughout:

# identify the negative numbers
idx = np.flatnonzero(x < 0)
# get the corresponding negative values
neg = x[idx]
# find the index of the first corresponding positive
# will contain false zero for non-matching
p = (-neg == x[:, None]).argmax(0)
# set non-matching to large number
p[x[p] != -neg] = x.size
# return only elements that have smaller index than corresponding positive
result = x[idx[p > idx]]

This is O(n^2) because of the argmax, but likely faster than a pure python implementation for arrays that you are likely to encounter in practice.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
0

Edited given new information

You can use list comprehension and enumerate. Go through each element in x and only keep it if it's less than 0 (negative) and that value multiplied by -1 is not in x up to that position (i.e. if it comes before it in the list).

y = np.array([i for pos,i in enumerate(x) if i<0 and i*-1 not in x[:pos])

Emi OB
  • 2,814
  • 3
  • 13
  • 29
  • This is what I was looking for @EmiOB. I'm wondering whether list comprehension is the best way to doing this when the numpy array is large though. – user270199 Aug 27 '21 at 12:22
  • @user270199. This doesn't pass the test case in the comments – Mad Physicist Aug 27 '21 at 15:07
  • Yes, I noticed that. Thank you for pointing it out @MadPhysicist – user270199 Aug 27 '21 at 15:09
  • @MadPhysicist I misinterpreted what OP wanted to achieve, and wrote this before their comment further explaining what they were after. This was correct at the time given the limited information given in the original quesiton – Emi OB Aug 27 '21 at 15:22
  • I was able to interpret the original question correctly based on what I thought was pretty unambiguous information. I asked the clarification in the comment because I could not understand why everyone was misreading the question. – Mad Physicist Aug 27 '21 at 15:23
  • It's my fault. I could have been a bit more precise with my question and especially with test cases. I'll keep that in mind for next time. Thank you for your answer @EmiOB. – user270199 Aug 27 '21 at 16:06
  • @user270199 I have changed my solution to hopefully now also give the desired result (I have been busy until now) – Emi OB Aug 31 '21 at 06:42