-1

In order to divide to train test data :

X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y)

but when I run I saw this error:(I have written the size of x and y)

Traceback (most recent call last):
  ...
   , in <module>
   X_train, X_test, y_train, y_test = train_test_split(X, y.iloc[:,1], test_size=0.3,random_state=seed, stratify=y)
    AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

EDIT: The shapes are:

Shape(X)= (284807, 28)
Shape(y)= (284807,)

Then I used:

X_train, X_test, y_train, y_test = train_test_split(X, y[:,1], test_size=0.3,random_state=seed, stratify=y)

But I saw:

IndexError: too many indices for array

How to solve this problem?

user10296606
  • 121
  • 1
  • 8

2 Answers2

0

As the comments suggest, try replacing y.iloc[:,1] by y:

X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.3,
                                                    random_state=seed)

Edit: As the documentation suggests, the size of the stratify parameters must be 2 * len(arrays) where array is X or y.

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
  • Check the shape of `y` with `print(y.shape)`. The new error message mean `y` is a one dimensional vector and you're trying to select 2 dimensions. For more explanations, have a look at [IndexError: too many indices for array](https://stackoverflow.com/questions/28036812/indexerror-too-many-indices-for-array) – Alexandre B. Jan 05 '20 at 11:20
  • I have written the size of y in question: Shape(X)= (284807, 28) Shape(y)= (284807,) the reason I used [:,1] is to change it but it does not work! – user10296606 Jan 05 '20 at 11:22
  • What's about the updated answer ? – Alexandre B. Jan 05 '20 at 11:24
  • \site-packages\sklearn\model_selection\_split.py", line 1636, in _iter_indices raise ValueError("The least populated class in y has only 1" ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2. – user10296606 Jan 05 '20 at 11:28
  • The `stratify` options is raising this error. As the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) suggests, the size of the `stratify` parameters must be `2 * len(arrays)` – Alexandre B. Jan 05 '20 at 11:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205378/discussion-between-alexandre-b-and-user10296606). – Alexandre B. Jan 05 '20 at 11:39
-1

iloc is method of pandas DataFrame and Series Objects

To access elements you may use ndarray with indexing and slicing notation or convert ndarray to pandas data frame as follows

import pandas as pd
df = pd.DataFrame(nda)
y = df.iloc[:,1].to_numpy() #convert selected series from DataFrame to ndarray

DataFrames offer great flexibility in working with data. Since train_test_split takes arrays as arguments DataFrame can be converted to ndarray using DataFrame.to_numpy

neotam
  • 2,611
  • 1
  • 31
  • 53