0

I want to display informations about the result of isolation forest's output, like the isolation indices (on the graphic) and the accuracy of the prediction.

I use sklearn's isolation forest function.

clf = IsolationForest()
clf.fit(X_train)
yPredTest = clf.predict(X_test)
xx, yy = np.meshgrid(np.linspace(-3, 88), np.linspace(-1, 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.title("Isolation Forest")
plt.contourf(xx, yy, Z, cmap=plt.cm.Blues_r)
b = plt.scatter(X_test[:, 0], X_test[:, 1], c='black')
plt.show()

The result I have is like the image but with only one cluster (and some points spread) and all points are in the same colour : problem resolved by putting yPredTest as colour.

An other problem, is I do not know how to enable more than two features. I have two sets (train and test) which are like [[0,1,34,38O,24],[98,938,238,23,1],[...],[0,13,3,23,49]] and the algorithm make me truncate my sets like X_train = np.array(list)[:100,[1,2]] and X_test = np.array(list)[101:,[1,2]] otherwise (np.array(list)[:100,] and np.array(list)[101:,])it will stop and alert me:

ValueError: Number of features of the model must match the input. Model n_features is 8 and input n_features is 2

It seems that the issue issues at that line Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])

Chènevis
  • 513
  • 1
  • 9
  • 22

1 Answers1

1

I see "another question", but where is the first? You got the same color because of the argument c='black' when scattering. Try to assign yPredTest to this argument.

xx,yy is the grid of the plan graph (you can print them to check what they are). If you want to use more than two features, PCA may help.

Zealseeker
  • 823
  • 1
  • 7
  • 23
  • The questions were : display the isolation indices, the accuracy of the prediction, change the colour and add other features. Do you want me to use PCA with isolation forest or in addition to it (=iForest) in an other file? – Chènevis Jan 16 '17 at 11:00
  • @Chènevis I think you should learn from [tutorial](http://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html#sphx-glr-auto-examples-ensemble-plot-isolation-forest-py) how the it was ploted. And this is the [doc](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html) about how to use IsolationForest. And use IsolationForest.predict to get the indices of the test set. – Zealseeker Jan 16 '17 at 13:07
  • @Chènevis IsolationForest supports more than two features. Just change the shape of your data. But you have to realize that as human we cannot scatter them into a more than 3-dimension space. So if you only want to display the information visually, using PCA to reduce your features is a good idea, I think. If you are interested in ploting, try to scatter them into a 3d space, which is beyond my knowledge. – Zealseeker Jan 16 '17 at 13:14
  • Thanks write yPredTest as colour works great :) I have already read this documentation and it not helped me. In fact, I am not sure that this is what I mean; I update my question in order to explain it better – Chènevis Jan 16 '17 at 19:31