0

I used an extreme learning machine (ELM) model for predicting. I used the training dataset and testing dataset and I want to validate the model by using cross-validation (K-fold). How can I add code to make cross-validation (K-fold)?.

#------------------------------import data--------------
train = pd.read_excel('nametrain.xlsx')
test = pd.read_excel('nametest.xlsx')

#--------------------------------(scaler data)------------
scaler = MinMaxScaler()
scaler_X = MinMaxScaler()
scaler_Y = MinMaxScaler()
# fit_transform for training data:
X_train = scaler_X.fit_transform(train.values[:,:-1])
y_train = scaler_Y.fit_transform(train.values[:,-1:])
X_test = scaler_X.transform(test.values[:,:-1])
y_test = scaler_Y.transform(test.values[:,-1:])
#----------------------------(input size)-------------
input_size = X_train.shape[1]

#---------------------------(Number of neurons)-------
hidden_size = 17

#---------------------------(To fix the RESULT)-------
seed =16   # can be any number, and the exact value does not matter
np. random.seed(seed)

#---------------------------(weights & biases)------------
input_weights = np.random.normal(size=[input_size,hidden_size])
biases = np.random.normal(size=[hidden_size])

#----------------------(Activation Function)----------
def relu(x):
   return np.maximum(x, 0, x)

#--------------------------(Calculations)----------
def hidden_nodes(X):
    G = np.dot(X, input_weights)
    G = G + biases
    H = relu(G)
    return H

#Output weights 
output_weights = np.dot(pinv2(hidden_nodes(X_train)), y_train)


#------------------------(Def prediction)---------
def predict(X):
    out = hidden_nodes(X)
    out = np.dot(out, output_weights)
    return out
#------------------------------------(Make_PREDICTION)--------------
prediction = predict(X_test)
unscaler_prediction=prediction*(4.5862069-1.23333333)+1.23333333
unscaler_y_test=y_test*(4.5862069-1.23333333)+1.23333333

#--------------------------(Calculate metrics)---------------

mse = metrics.mean_squared_error(y_test, prediction)
rmse = np.sqrt(mse) # or mse**(0.5)  
sera
  • 63
  • 5

1 Answers1

0

you can use sklearn's KFold, GroupKFoldor RepeatedKFold

# Splits dataset into k consecutive folds (without shuffling by default).

kfolds = KFold(n_splits=5, random_state=16, shuffle=False)   
for train_index, test_index in kfolds.split(X_train, y_train):
   X_train_folds, X_test_folds = X_train[train_index], X_train[test_index]
   y_train_folds, y_test_folds = y_train[train_index], y_train[test_index]
   
   # put all code in the for loop so that for every set of (X_train_folds, y_train_folds), the model is fitted.
   # call predict() for corresponding set of X_test_folds
   prediction = predict(X_test_folds)
Priya
  • 723
  • 1
  • 5
  • 7
  • Thank you very much. But after execut your code I get this message error: ValueError: Found input variables with inconsistent numbers of samples: [185, 99] .. How can I solve this problem. @Priya – sera Aug 11 '21 at 15:58
  • please check whether the length of `X_train` and `y_train` are equal..i.e., `X_train.shape[0] != y_train.shape[0]`. The KFold just splits the indices of the `X_train` and `y_train` samples and returns a set of samples `(X_train_folds,y_train_folds)` for every iteration of the for loop. – Priya Aug 11 '21 at 16:54
  • Dear priya, I checked the length of x_ train and y-train and it was equal, but the error message is still appear@Priya – sera Aug 11 '21 at 17:14
  • Its really difficult to tell without seeing the data...atleast post the stack trace of the error. – Priya Aug 11 '21 at 17:23