-1

Does anyone know how may I divide data manually instead of using the "train_test_split()" method please?

Let me explain, actually I’ve got 3 train files and 2 test ones, so I’d like to affect the train files' data to X_train and y_train, and the test files' data to X_test and y_test.

Thanks in advance!

Mehdi
  • 1
  • 1
    You can concatenate dataframes together. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html So concat the training ones together and do likewise for test dataframes. –  Oct 24 '19 at 19:46
  • You dont have to use `train_test_split()` at all. Load your three train files as x1,x2,x3. then use `np.hstack()` or `np.vstack()` or `pd.concat()` to merge your data and name that as X_train. Do similar process for x_test – erncyp Oct 24 '19 at 19:47
  • actually the difference between the train and test files is that in the test ones there is a column which is empty, so I'm wondering what can be affected to the y_test variable. – Mehdi Oct 24 '19 at 19:54

1 Answers1

0

You can try something like this

splitRow = int(x_data.shape[0]*0.99)  #This split data to 99% and 1%

x_train = x_data[0:splitRow ]
y_train = y_data[0:splitRow ,]
x_test =  x_data[splitRow :]
y_test =  y_data[splitRow :]
George Yu
  • 136
  • 6