Questions tagged [isolation-forest]

44 questions
5
votes
0 answers

Evaluate multiple Isolation Forest estimators during GridSearchCV with custom scorer function

I have a sample of values that don't have a y target value. Actually, the X features (predictors) are all used to fit the Isolation Forest estimator. The goal is to identify which of those X-features and the ones to come in the future are actually…
NikSp
  • 1,262
  • 2
  • 19
  • 42
4
votes
3 answers

What is the difference between decision function and score_samples in isolation_forest in SKLearn

I have read the documentation of the decision function and score_samples here, but could not figure out what is the difference between these two methods and which one should I use for an outlier detection algorithm. Any help would be appreciated.
Anne
  • 77
  • 8
2
votes
0 answers

Threshold of anomaly score in scikit-learn's IsolationForest

I'm trying to understand more about how the contamination parameter affects the threshold_ in which a sample is predicted to be an anomaly or not in IsolationForest. In the code for IsolationForest here, in fit(), the threshold_ is set…
Rayne
  • 14,247
  • 16
  • 42
  • 59
2
votes
1 answer

Calculate memory usage of RandomForestClassifier and IsolationForest

I'd like to evaluate how many memory is used up by both sklearn.ensemble.IsolationForest sklearn.ensemble.RandomForestClassifier But sys.sizeof(my_isolation_forest_model) sys.sizeof(my_random_forest_classifier_model) always returns a value of 48,…
GooseIt
  • 69
  • 4
2
votes
1 answer

How to give more importance to some features in sklearn Isolation Forest

I am using sklearn isolation forest for an anomaly detection task. Isolation forest consists of iTrees. As this paper describes, the nodes of the iTrees are split in the following way: We select any feature (uniformly) randomly and perform a split…
2
votes
0 answers

IsolationForest is always predicting 1

I am working with a project to detect out-of-domain text input, with the help of IsolationForest and tf-idf feature. Following is my works in summarized form: TRAINING On tfidf: Fit and transform in-domain dataset using CountVectorizer(). Fit a…
hafiz031
  • 2,236
  • 3
  • 26
  • 48
2
votes
1 answer

Isolation forest with multiple features detecting everything as an anomaly

I have an isolation forest implementation where I take the features (all are numerical); scale them to be between 0 and 1 from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() data = scaler.fit_transform(df) x =…
kikee1222
  • 1,866
  • 2
  • 23
  • 46
1
vote
1 answer

Anomaly Detection for Cumulative Timeseries Data

I'm engaged in anomaly detection utilizing machine learning techniques. My specific challenge involves handling a timeseries dataset with cumulative values. To elaborate, each data point's value represents a cumulative sum; for instance, if the…
1
vote
0 answers

Scikit-learn Isolation Forest: Any way to extract the path lengths?

I'm using an IsolationForest classifier in my script and I'd like to extract some information for each prediction, such as the path lenght, which refers to the number of edges an observation must pass in the tree from the root to the terminal node.…
MVMV
  • 37
  • 6
1
vote
0 answers

How to update the isolation algorithm model with new data by not combining with existing data and without losing the old data's knowledge in python

I need to update the isolation model with newly fetched data by not combining with existing data. I used DSL queries to extract data from Network Logs. Then that data is fed as input to the isolation model. Now i need to write code such that.., If…
1
vote
3 answers

FIlrer csv table to have just 2 columns. Python pandas pd .pd

i got .csv file with lines like this…
Gaara
  • 39
  • 5
1
vote
0 answers

"Most outlier" feature

I am using the Sklearn implementation of Isolation Forest (IF) to detect outliers on a set of data of 20-30 features. It is working very well, but I would like insight into which feature has the highest impact when an outlier is detected. Please…
1
vote
1 answer

Python Vetiver model - use alternative prediction method

I'm trying to use Vetiver to deploy an isolation forest model (for anomaly detection) to an API endpoint. All is going well by adapting the example here. However, when deployed, the endpoint uses the model.predict() method by default (which returns…
1
vote
1 answer

How can update trained IsolationForest model with new datasets/datafarmes in python?

Let's say I fit IsolationForest() algorithm from scikit-learn on time-series based Dataset1 or dataframe1 df1 and save the model using the methods mentioned here & here. Now I want to update my model for new dataset2 or df2. My findings: this…
1
vote
0 answers

IsolationForest, transforming data

A colleague and myself are trying to detect anomalies in a large dataset. We want to try out different algorithms (LOF, OC-SVM, DBSCAN, etc) but we are currently working with IsolationForest. Our dataset is currently shaped a follows. It's a count…
1
2 3