0

am a newbie in ML and l have been trying out the udacity ML project.However, l got an error that l am having a hard time solving. The code seems okay but l can't seem to iterate through the data. I know that its to do with the new StratifiedShuffleSplit changes made. The code is down.

def Stratified_Shuffle_Split(X,y,num_test):
    sss = StratifiedShuffleSplit(y, 1, test_size=num_test, random_state = None)
    for train, test in sss:
        X_train, X_test = X.iloc[train], X.iloc[test]
        y_train, y_test = y.iloc[train], y.iloc[test]
    return X_train, X_test, y_train, y_test

# First, decide how many training vs test samples you want
num_all = student_data.shape[0]  # same as len(student_data)
num_train = round(num_all*0.75)  # about 75% of the data
num_test = num_all - num_train
#print(num_test)

y = student_data['passed'] # identify target variable
X_train, X_test, y_train, y_test = Stratified_Shuffle_Split(X_all, y, num_test)

print("Training Set: {0:.2f} Samples".format(X_train.shape[0]))
print("Testing Set: {0:.2f} Samples".format(X_test.shape[0]))

The error l have is this

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-2147158fcaf2> in <module>
     13 
     14 y = student_data['passed'] # identify target variable
---> 15 X_train, X_test, y_train, y_test = Stratified_Shuffle_Split(X_all, y, num_test)
     16 
     17 print("Training Set: {0:.2f} Samples".format(X_train.shape[0]))

<ipython-input-20-2147158fcaf2> in Stratified_Shuffle_Split(X, y, num_test)
      1 def Stratified_Shuffle_Split(X,y,num_test):
      2     sss = StratifiedShuffleSplit(y, 1, test_size=num_test, random_state = None)
----> 3     for train, test in sss:
      4         X_train, X_test = X.iloc[train], X.iloc[test]
      5         y_train, y_test = y.iloc[train], y.iloc[test]

TypeError: 'StratifiedShuffleSplit' object is not iterable
'''
merv
  • 67,214
  • 13
  • 180
  • 245

1 Answers1

0

According to the documentation, you need to run the .split() function on StratifiedShuffleSplit. You need .split() to generate the indices that you're trying to slice. So this part could be :

def Stratified_Shuffle_Split(X,y,num_test):
     sss = StratifiedShuffleSplit(y, 1, test_size=num_test, random_state = None)
     for train, test in sss.split(X, y):
        X_train, X_test = X.iloc[train], X.iloc[test]
        y_train, y_test = y.iloc[train], y.iloc[test]
     return X_train, X_test, y_train, y_test

I'm also not sure I see the need to define a new function, StratifiedShuffleSplit is already a ready-made function that will do what you want with this for loop you have.

M Thomas
  • 1
  • 1