Hey guys this question might be more about logic than code, hopefully someone can light it up.
So, I have a data list that contains some outliers, and I want to remove it by using the difference between each item on the list and identifying where the difference is far too big.
From this example, I want to remove from the data list the indexes[2,3,4]. What is the best way to do it??
I have tried to use np.argwhere() method to find the indexes, however, I am stuck on how to use the result of it to slice a np.array??
data=[4.0, 4.5, 22.5, 40.5, 22.5, 3.5, 3.0, 3.5, 4.5, 3.5, 2.5]
data=np.array(data)
d = data[:-1] - data[1:]
print(np.mean(d))
In this example, when I print the difference (d) it returns me this:
print(d) # returns:[ -0.5 -18. -18. 18. 19. 0.5 -0.5 -1. 1. 1. ]
That is good. Now, the logic I applied was to indicate where in d we have a number higher than the average of the original data.
x = np.argwhere(d>np.mean(data))
print(x) # returns: array([3], dtype=int64), array([4], dtype=int64)
indices_to_extract = [x[0]-1,x[-1]]
print(indices_to_extract) # returns: [array([2], dtype=int64), array([[4]], dtype=int64)]
a1 = np.delete(r,indices_to_extract,axis=0)
print(a1) #returns: [ 4. 4.5 40.5 3.5 3. 3.5 4.5 3.5 2.5]
#Desirable return:
[ 4. 4.5 3.5 3. 3.5 4.5 3.5 2.5]
Main question is, how to make the result from np.argwhere() range of number that can be used for slicing??