1

I am running a random forest regression model, but my results are not that great. One person recommended to check interaction effects.

Surprisingly I do not see too many questions about this. This one did not help me. I am also not sure how to incorporate to my code sklearn.preprocessing.PolynomialFeatures.

My data is very simple:

My code:

# Split data
y = starbucks_log.iloc[:, 0]

# x as all others 
X = starbucks_log.drop('total_amount', axis = 1)

# Set seed for reproducibility
SEED = 1

# Split dataset into 80% train and 20% test
X_train, X_test, y_train, y_test = \
train_test_split(X, y,
test_size = 0.2,
random_state = SEED)

# Instantiate a random forests regressor 'rf' 400 estimators
rf = RandomForestRegressor(n_estimators = 400,
min_samples_leaf = 1,
random_state = SEED)

# Fit 'rf' to the training set
rf.fit(X_train, y_train)
# Predict the test set labels 'y_pred'
y_pred = rf.predict(X_test)
y_pred_train=rf.predict(X_train)
# Evaluate the test set RMSE
rmse_test = MSE(y_test, y_pred)**(1/2)
rmse_train = MSE(y_train, y_pred_train)**(1/2)
# Print the test set RMSE
print('Test set RMSE of rf: {:.5f}'.format(rmse_test))
print('Train set RMSE of rf: {:.5f}'.format(rmse_train))

I would like to add all possible interaction effects of income, age, and male(gender). It would be easier to drop some of them later.

Thanks!

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63
  • 1
    I don't understand what you mean by interaction. Do you mean income*age?? Interaction terms are not generally used in context of random forest as random forest is not parametric model like linear or logistic regression. And there is no assumpttion of independence of variables. You could manually create the interaction term in your dataset and pass it to the model. – Quantum Dreamer Aug 14 '20 at 03:20
  • 1
    Also refer here https://stats.stackexchange.com/questions/201893/how-to-include-an-interaction-term-in-a-random-forest-model. I suggest to post Data Science/ML/Stats related questions in their own groups. – Quantum Dreamer Aug 14 '20 at 03:20
  • @QuantumDreamer, yes, income*age. Ok, thanks for this tip. I am new to this. – Anakin Skywalker Aug 14 '20 at 03:21
  • Yeah, I do not have problems to make it in R, but I am new to Python – Anakin Skywalker Aug 14 '20 at 03:21
  • 1
    Please close this question, if you are satisfied with the information from comments. Thank you – Quantum Dreamer Aug 14 '20 at 03:23
  • I am not sure how to close a question, based on comments. – Anakin Skywalker Aug 14 '20 at 03:27
  • 1
    Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/219788/discussion-between-quantum-dreamer-and-anakin-skywalker). – Quantum Dreamer Aug 14 '20 at 03:30

0 Answers0