2

I tried to do a train test split on credit card default data from https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#

This is my code:

import sklearn
import pandas as pd

data = pd.read_excel("default of credit card clients.xls", sep=";")

x = data.drop(columns=['ID', 'default payment next month'], axis=1)

y = data['default payment next month']

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(test_size=0.1)

When I try to run it, I get following message:

File "C:\Users\Kizo\Anaconda3\envs\tensorflow\lib\site-packages\sklearn\model_selection\_split.py", 
line 2086, in train_test_split
     raise ValueError("At least one array required as input")
ValueError: At least one array required as input

It seems to me that x & y are not passed to train test split function, because when I print the variables they look like arrays. Please help!

1 Answers1

1

As you very well stated, x and y are not being passed in the line of code you provide. You are only passing the value for the parameter test_size yet x and y values are missing. You should try:

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)

Also you can change your imports to make it a bit easier on your code:

from sklearn.model_selection import train_test_split

To later use:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1)
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53