There is a fast O(n log n)
valid and vectorized Numpy implementation.
The idea is to find the unique values in x
(with np.unique
) and locate for each unique value its first position. Then you can select the value v
in x
if -v
is found before and v < 0
. To find if it found before, you can perform a dichotomy in the sorted unique value (with np.searchsorted
) to find if the current index is greater than the index found (in the unique values).
Here is the resulting code:
xUnique, xFirstPos = np.unique(x, return_index=True)
xIsNeg = x < 0
xNeg = -x
xNegUniquePos = np.searchsorted(xUnique, xNeg)
xNegIsFound = xUnique[xNegUniquePos] == xNeg
xHasNegBefore = np.logical_and(xNegIsFound, xFirstPos[xNegUniquePos] < np.arange(len(x)))
result = x[np.logical_and(xIsNeg, np.logical_not(xHasNegBefore))]
print(result)
Here is the result on some examples:
x = np.array([20000, 700, 1000, -5000, -250, 30, -1000, 50, -30, 75, -999])
result = np.array([-5000, -250, -999])
x = np.array([-5, 5, -5])
result = np.array([-5])
Here are timings for a random array of size 100_000 (with 33% of negative values in the range -1_000_000 to 2_000_000):
Mad Physicist's Numpy implementation: 38900.0 ms
Emi OB's implementation: 1360.0 ms (incorrect so far)
Mad Physicist's pure Python implementation: 40.0 ms
This implementation: 14.1 ms
So far this implementation is much faster than the other ones. It is worth nothing that the Mad Physicist's Numpy implementation takes several GiB of memory for this input size while the other solutions (including this one) take no more than 10 MiB.