Isolation Forest gives different results when predicting one point instead of all

Question

I am trying to detect anomalies in some data. I have normal data and data which are considered anomalous.

I use Isolation Forest from scikit-learn library in python. I have create a model from the normal data like that:

model = IsolationForest(n_estimators=100, contamination=0.002)
model.fit(new_features)

When I am trying to do prediction:

predicted = model.predict(transformed_anomaly)

It works correctly. 35 out of 36 are detected as anomalies.

If I do this:

for anomaly in transformed_anomaly:
   predicted = model.predict(anomaly.reshape(1,-1))

Suddenly all points are classified as inliers.

I checked the shape of 'anomaly.reshape(1,-1)', it is (1, 2). The shape of 'transformed_anomaly' is (36,2)

Could someone point out the problem with it?

Can you add your complete code along with some data samples which ic causing this problem? — Vivek Kumar, Oct 19 '17 at 01:53
Ok, I restarted notebook kernel today and it works as I would expect. Now there is no difference if I predict all anomaly samples or one by one. This is so weird — user1872329, Oct 19 '17 at 08:43

score 2 · Answer 1 · answered Feb 11 '20 at 23:02

2

Pass random_state= 0 in isolation forest to get same results on every run. model = IsolationForest(n_estimators=100, contamination=0.002,,random_state= 0)

answered Feb 11 '20 at 23:02

Gopal

21
2

graj499 · Answer 2 · 2021-05-11T02:00:07.963

I have one more solution - Why not fix the seed value like this .

# Set a seed value
seed_value= 123

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

This will help you in getting same result evrytime on same data as it removes randomness from the model.

Isolation Forest gives different results when predicting one point instead of all

2 Answers2