How to choose data columns and target columns in a dataframe for test_train_split?

Question

I'm trying to set up a test_train_split with data I have read from a csv into a pandas dataframe. The book I am reading says I should separate into x_train as the data and y_train as the target, but how can I define which column is the target and which columns are the data? So far i have the following

import pandas as pd
from sklearn.model_selection import train_test_split
Data = pd.read_csv("Data.csv")

I have read to do the split in the following way however the following was using a bunch where the data and target were already defined:

X_train, X_test, y_train, y_test = train_test_split(businessleisure_data['data'],
                                                    iris_dataset['target'], random_state=0)

score 9 · Accepted Answer · answered Nov 04 '19 at 20:04

You can do like this:

Data = pd.read_csv("Data.csv")    
X = Data.drop(['name of the target column'],axis=1).values
y = Data['name of the target column'].values
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)

In most cases, the target variable is the last column of the data set so you can also try this:

Data = pd.read_csv("Data.csv")
X = Data.iloc[:,:-1]
y = Data.iloc[:,-1]
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)

How to choose data columns and target columns in a dataframe for test_train_split?

1 Answers1