I am running a random forest regression model, but my results are not that great. One person recommended to check interaction effects.
Surprisingly I do not see too many questions about this. This one did not help me. I am also not sure how to incorporate to my code sklearn.preprocessing.PolynomialFeatures.
My data is very simple:
My code:
# Split data
y = starbucks_log.iloc[:, 0]
# x as all others
X = starbucks_log.drop('total_amount', axis = 1)
# Set seed for reproducibility
SEED = 1
# Split dataset into 80% train and 20% test
X_train, X_test, y_train, y_test = \
train_test_split(X, y,
test_size = 0.2,
random_state = SEED)
# Instantiate a random forests regressor 'rf' 400 estimators
rf = RandomForestRegressor(n_estimators = 400,
min_samples_leaf = 1,
random_state = SEED)
# Fit 'rf' to the training set
rf.fit(X_train, y_train)
# Predict the test set labels 'y_pred'
y_pred = rf.predict(X_test)
y_pred_train=rf.predict(X_train)
# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
rmse_train = MSE(y_train, y_pred_train)**(1/2)
# Print the test set RMSE
print('Test set RMSE of rf: {:.5f}'.format(rmse_test))
print('Train set RMSE of rf: {:.5f}'.format(rmse_train))
I would like to add all possible interaction effects of income
, age
, and male(gender)
. It would be easier to drop some of them later.
Thanks!