0

I want to try using the random forest classifier in python without using train_test_split. I have a training dataset in one file and I want to train the python machine learning model using the training dataset and then I want to apply the model on the test dataset. All the datasets are in different excel files

I have tried to using SMOTE oversampling but I need to define 'X_train'

from imblearn.over_sampling import (SMOTE, SMOTENC, BorderlineSMOTE,SVMSMOTE,KMeansSMOTE)

sm = SMOTE(sampling_strategy='not majority',random_state=None,k_neighbors=10) x_train_res, y_train_res = sm.fit_resample(X_train, y_train)

NikhilR
  • 1
  • 2
  • How do you load your data? – tomjn Aug 26 '19 at 09:36
  • import numpy as np import pandas as pd import matplotlib as plt from sklearn.ensemble import RandomForestClassifier from imblearn.over_sampling import (SMOTE, BorderlineSMOTE, SVMSMOTE, SMOTENC,KMeansSMOTE) from imblearn.over_sampling import RandomOverSampler df=pd.read_excel ("E:\Python_tutorial\Train26082019.xlsx") – NikhilR Aug 26 '19 at 10:20
  • So is your question how to make `df` into `X_train` and `y_train`? – tomjn Aug 26 '19 at 12:00
  • Actually, I wanted to use one `df1` as the training set and another `df2` as a test set. To go around it, I have replaced the `X_train` and `y_train` by `df1` and `X_test` by `df2` – NikhilR Aug 28 '19 at 05:56

0 Answers0