2

I used an extreme learning machine (ELM) model for predicting as a regression. I used K-fold to validate model prediction. But after executing the following code I get this message error:

ValueError: The number of folds must be of Integral type. [array([[0.25      , 0. ........

And when I print the prediction, it is not printed.

my code:

dataset = pd.read_excel("ar.xls")

X=dataset.iloc[:,:-1]
y=dataset.iloc[:,-1:]

#----------Scaler----------
scaler = MinMaxScaler(feature_range=(0, 1))
X=scaler.fit_transform(X)


#---------------------- Divided the datset----------------------

kfolds = KFold(train_test_split(X, y) ,n_splits=5, random_state=16, shuffle=False)   
for train_index, test_index in kfolds.split(X):
    
    X_train_split, X_test_split = X[train_index], X[test_index]
    y_train_split, y_test_split = y[train_index], y[test_index]
      
#------------------------INPUT------------------

input_size = X.shape[1]

#---------------------------(Number of neurons)-------
hidden_size = 26

#---------------------------(To fix the RESULT)-------
seed =26   # can be any number, and the exact value does not matter
np.random.seed(seed)

#---------------------------(weights & biases)------------
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

#----------------------(Activation Function)----------
def relu(x):
   return np.maximum(x, 0, x)

#--------------------------(Calculations)----------
def hidden_nodes(X):
    G = np.dot(X, input_weights)
    G = G + biases
    H = relu(G)
    return H

#Output weights 
output_weights = np.dot(pinv2(hidden_nodes(X)), y)
output_weights = np.dot(pinv2(hidden_nodes(X_train_split)), y_train_split)


#------------------------(Def prediction)---------
def predict(X):
    out = hidden_nodes(X)
    out = np.dot(out, output_weights)
    return out

#------------------------------------(Make_PREDICTION)--------------
prediction = predict(X_test_split)
print(prediction)
sera
  • 63
  • 5

1 Answers1

1

The KFold considers the first argument as n_splits which can be seen here class sklearn.model_selection.KFold(n_splits=5, *, shuffle=False, random_state=None) and you are passing the train_test_split(X, y) in its place and hence you are getting this error. Also, in the below loop

for train_index, test_index in kfolds.split(X):
    
    X_train_split, X_test_split = X[train_index], X[test_index]
    y_train_split, y_test_split = y[train_index], y[test_index]

You are overwriting your variables and hence at the end you will only be considering the last fold values. The correct way would be as below

kfolds = KFold(n_splits=5, random_state=16, shuffle=False)  

train_folds_idx = []
valid_folds_idx = []

for train_index, valid_index in kfolds.split(dataset.index):
    train_folds_idx.append(train_index)
    valid_folds_idx.append(valid_index)
Abhishek Prajapat
  • 1,793
  • 2
  • 8
  • 19
  • Thank you, I wrote your code, but I get this error message: NameError: name 'valid_index' is not defined. How can I define the 'valid_index' @Abhishek Prajapat – sera Aug 15 '21 at 12:29
  • Come on dude. That was just a variable name error. I have edited the answer. – Abhishek Prajapat Aug 15 '21 at 12:30
  • Ok, I corrected the code and then I got this error message: ValueError: shapes (5,640) and (26,26) not aligned: 640 (dim 1) != 26 (dim 0) @Abhishek Prajapat – sera Aug 15 '21 at 12:42
  • You are doing a matrix multiplication and for that the second dimension of first matrix and the first dimension of second matrix should be equal. I am not sure where you are doing it but for now I would recommend you do a course on `Numpy` and `Impliment` just the logistic regression using Python and Numpy. That would give a lot of clearity. – Abhishek Prajapat Aug 15 '21 at 13:35
  • Ok. Thank you very much. @ Abhishek Prajapat – sera Aug 15 '21 at 13:56