0

I am working through a random forest model for the first time and have come across an issue with my accuracy quantification.

Currently, I split the dataset (30% as test size), fit the model, then predict y values based on my model, and score the model based on the testing values predicted. But I am currently getting a 100% accuracy issue, which I am wondering if it is because of the parameters set by my model, or due to me making a syntax error along the way.

Split the training set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=1)

Create and fit the model

# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000,
                           random_state = 42,
                           min_samples_split = 10,
                           max_features = "sqrt",
                           bootstrap = True)

# Train the model on training data
rf.fit(X_train, y_train)

Predict on test set and calculate accuracy

y_pred = rf.predict(X_test)

print("Accuracy:", round((rf.score(X_test, y_pred)*100),2), "%")

>> 100.0%

I am definitely learning as I go, but have had some formal trainings. Really just thrilled about the aspect of modeling, but want to figure out what mistakes I am making as I continue learning this process.

MaxDawg27
  • 71
  • 10
  • If you are just looking for accuracy, you can directly got with `accuracy_score()` function from scikit: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html – Ashwin Geet D'Sa Apr 15 '21 at 22:14

1 Answers1

0

You are almost there! The score() method accept X_test and y_test, the logic behind the score():

# simplified logic behind score()

def score(X, y):
  y_predicted = model.predict(X)
  value = compute_metric(y, y_predicted)
  return value

The above logic is just to show how the score works.

To get the score in your code:

rf.score(X_test, y_test)

You will get the R^2 score. docs Do you know now, why you get 100%?

If you want to get other metrics then you need to compute predictions and use regression metrics -> https://scikit-learn.org/stable/modules/classes.html#regression-metrics

You can also use AutoML for learning (yourself not a model). You can run AutoML to create the baseline models. AutoML will compute many metrics for you. Then you can write your own script and compare results.

Machavity
  • 30,841
  • 27
  • 92
  • 100
pplonski
  • 5,023
  • 1
  • 30
  • 34