8

I am reading a file in python using pandas and then saving it in a numpy array. The file has the dimension of 11303402 rows x 10 columns. I need to split the data for cross validation and for that I sliced the data into 11303402 rows x 9 columns of examples and 1 array of 11303402 rows x 1 col of labels. The following is the code:

tdata=pd.read_csv('train.csv')
tdata.columns='Arrival_Time','Creation_Time','x','y','z','User','Model','Device','sensor','gt']

User_Data = np.array(tdata)
features = User_Data[:,0:9]
labels = User_Data[:,9:10]

The error comes in the following code:

classes=np.unique(labels)
idx=labels==classes[0]
Yt=labels[idx]
Xt=features[idx,:]

On the line:

Xt=features[idx,:]

it says 'too many indices for array'

The shapes of all 3 data sets are:

print np.shape(tdata) = (11303402, 10)
print np.shape(features) = (11303402, 9)
print np.shape(labels) = (11303402, 1)

If anyone knows the problem, please help.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Farhan Javed
  • 413
  • 2
  • 5
  • 17
  • 1
    What is `c`? A complete, standalone, runnable program with self-generated random (or zero) input data would help. – John Zwinck May 11 '16 at 12:53
  • 1
    Try removing the semicolon so that you have `Xt=features[idx:]`. – kazbeel May 11 '16 at 12:54
  • And what is the shape of `idx`? – John Zwinck May 11 '16 at 12:54
  • @JohnZwinck Sorry i updated the code. It just means the first class in classes and shape of `idx` is `(11303402,1)` @WoozyCoder Nopes, didn't work. – Farhan Javed May 11 '16 at 16:58
  • Does this answer your question? [IndexError: too many indices for array](https://stackoverflow.com/questions/28036812/indexerror-too-many-indices-for-array) – AMC Apr 05 '20 at 18:40

1 Answers1

11

The problem is idx has shape (11303402,1) because the logical comparison returns an array of the same shape as labels. These two dimensions use all of the indexes in features. The quick work around is

Xt=features[idx[:,0],:]
Keith Prussing
  • 803
  • 8
  • 19