Machine Learning train and test data

Question

Does anyone know how may I divide data manually instead of using the "train_test_split()" method please?

Let me explain, actually I’ve got 3 train files and 2 test ones, so I’d like to affect the train files' data to X_train and y_train, and the test files' data to X_test and y_test.

Thanks in advance!

You can concatenate dataframes together. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html So concat the training ones together and do likewise for test dataframes. — , Oct 24 '19 at 19:46
You dont have to use `train_test_split()` at all. Load your three train files as x1,x2,x3. then use `np.hstack()` or `np.vstack()` or `pd.concat()` to merge your data and name that as X_train. Do similar process for x_test — erncyp, Oct 24 '19 at 19:47
actually the difference between the train and test files is that in the test ones there is a column which is empty, so I'm wondering what can be affected to the y_test variable. — Mehdi, Oct 24 '19 at 19:54

score 0 · Answer 1 · answered Oct 25 '19 at 21:01

0

You can try something like this

splitRow = int(x_data.shape[0]*0.99)  #This split data to 99% and 1%

x_train = x_data[0:splitRow ]
y_train = y_data[0:splitRow ,]
x_test =  x_data[splitRow :]
y_test =  y_data[splitRow :]

answered Oct 25 '19 at 21:01

George Yu

136
6

Machine Learning train and test data

1 Answers1