I am working through a random forest model for the first time and have come across an issue with my accuracy quantification.
Currently, I split the dataset (30% as test size), fit the model, then predict y values based on my model, and score the model based on the testing values predicted. But I am currently getting a 100% accuracy issue, which I am wondering if it is because of the parameters set by my model, or due to me making a syntax error along the way.
Split the training set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=1)
Create and fit the model
# Import the model we are using
from sklearn.ensemble import RandomForestRegressor
# Instantiate model with 1000 decision trees
rf = RandomForestRegressor(n_estimators = 1000,
random_state = 42,
min_samples_split = 10,
max_features = "sqrt",
bootstrap = True)
# Train the model on training data
rf.fit(X_train, y_train)
Predict on test set and calculate accuracy
y_pred = rf.predict(X_test)
print("Accuracy:", round((rf.score(X_test, y_pred)*100),2), "%")
>> 100.0%
I am definitely learning as I go, but have had some formal trainings. Really just thrilled about the aspect of modeling, but want to figure out what mistakes I am making as I continue learning this process.