I am getting the nan values as decision scores when using Angle-based Outlier Detector because of which the outliers are not detected.
from pyod.models.abod import ABOD
from sklearn.preprocessing import MinMaxScaler
def outlier_ABOD(data, outliers_fraction=0.1):
data = np.array([data]).reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
data = scaler.fit_transform(data)
clf = ABOD(contamination=outliers_fraction)
clf.fit(data)
y_pred = clf.predict(data)
print(clf.decision_scores_)
return np.where(y_pred)[0]
X1 = np.array([1,1,3,2,1,2,1,2,3,2,1,88,1234,8888,1,2,3,2])
outliers = outlier_ABOD(X1, 0.1)
OUTPUT:
Decision Scores: [ nan nan -0.00000000e+00 nan
nan nan nan nan
-0.00000000e+00 nan nan -5.77145973e+03
-3.60509466e+00 -6.08142776e-03 nan nan
-0.00000000e+00 nan]
Outliers: array([], dtype=int64)
So, If you see the output there are some NaN values because of which clf.threshold_ is also NaN. Hence clf could not detect outliers when using clf.predict method and clf.predict() is returning all zeros indicating there are no outliers but actually there are outliers. How to prevent this?
EDIT: When I have taken for different value of X1
X1 = np.array([3,2,1,88,9,7, 90, 1, 2, 3, 1, 98, 8888])
outliers = outlier_ABOD(X1, 0.1)
The output displayed is
Decision scores: [-3.14048147e+14 -5.54457418e+15 -3.46535886e+14 -1.58233289e+12
-4.38660405e+12 -4.02831074e+13 -2.36040501e+12 -3.46535886e+14
-5.54457418e+15 -3.14048147e+14 -3.46535886e+14 -7.76901896e+10
-3.35886302e-05]
Outliers: array([ 1, 1, 1, 98, 8888])
So, for the first X1 value there are NaNs in decision scores and hence cannot produce outliers and for the second X1 value there are no NaNs in decision scores and hence it is able to produce outliers. Now, I could not understand why for some X1 values it is giving NaN outputs and for others it is not.