1

I'm trying to split a dataset into train and test groups in Python using a method similar to what I'm used to in R (I realize there are other options). So I'm defining an array of row numbers that will make up my train set. I then want to grab the remaining row numbers for my test set using np.delete. Since there are 170 rows total and 136 go to the train set, the test set should have 34 rows. But it's got 80 -- the actual number varies when I change my random seed ... What have I got wrong here?

np.random.seed(222)
marriage = np.random.rand(170,55)
rows,cols = marriage.shape
sample = np.random.randint(0,rows-1,(round(.8*rows)))
train = marriage[sample,:]
test = np.delete(marriage, sample, axis=0)

print(marriage.shape)
print(len(sample))
print(train.shape)
print(test.shape)
LMM3
  • 11
  • 1

0 Answers0