0

I am trying to detect anomalies in some data. I have normal data and data which are considered anomalous.

I use Isolation Forest from scikit-learn library in python. I have create a model from the normal data like that:

model = IsolationForest(n_estimators=100, contamination=0.002)
model.fit(new_features)

When I am trying to do prediction:

predicted = model.predict(transformed_anomaly)

It works correctly. 35 out of 36 are detected as anomalies.

If I do this:

for anomaly in transformed_anomaly:
   predicted = model.predict(anomaly.reshape(1,-1))

Suddenly all points are classified as inliers.

I checked the shape of 'anomaly.reshape(1,-1)', it is (1, 2). The shape of 'transformed_anomaly' is (36,2)

Could someone point out the problem with it?

user1872329
  • 321
  • 3
  • 15
  • Can you add your complete code along with some data samples which ic causing this problem? – Vivek Kumar Oct 19 '17 at 01:53
  • Ok, I restarted notebook kernel today and it works as I would expect. Now there is no difference if I predict all anomaly samples or one by one. This is so weird – user1872329 Oct 19 '17 at 08:43

2 Answers2

2

Pass random_state= 0 in isolation forest to get same results on every run. model = IsolationForest(n_estimators=100, contamination=0.002,,random_state= 0)

Gopal
  • 21
  • 2
0

I have one more solution - Why not fix the seed value like this .

# Set a seed value
seed_value= 123

# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)

This will help you in getting same result evrytime on same data as it removes randomness from the model.

graj499
  • 87
  • 2
  • 12