0

I am using the Nearest Neighbor regression from Scikit-learn in Python with 20 nearest neighbors as the parameter. I trained the model and then saved it using this code:

knn = neighbors.KNeighborsRegressor(n_neighbors, weights='uniform')
knn.fit(trainInputs, trainOutputs)
filename = "KNN_model_%d_%d.sav" % (n_neighbors,windowSize)
pickle.dump(knn, open(filename, 'wb'))

Now I am trying to load the model and predict the output value for a new input using this method:

filename = 'KNN_model_20_720.sav'
loaded_knn_model = pickle.load(open(filename, 'rb'))
nextPrediction = loaded_knn_model.predict(data_pred_input_window)

However, when I do this, I get this error:

--------------------------------------------------------------------------- ValueError                                Traceback (most recent call last) <ipython-input-1-bc1f744a44b3> in <module>()
     26 filename = 'KNN_model_20_720_Solar11months.sav'
     27 loaded_knn_model = pickle.load(open(filename, 'rb'))
---> 28 nextPrediction = loaded_knn_model.predict(data_pred_input_window)
     29 
     30 print(nextPrediction)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)
    142         X = check_array(X, accept_sparse='csr')
    143 
--> 144         neigh_dist, neigh_ind = self.kneighbors(X)
    145 
    146         weights = _get_weights(neigh_dist, self.weights)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\neighbors\base.py in kneighbors(self, X, n_neighbors, return_distance)
    341                 "Expected n_neighbors <= n_samples, "
    342                 " but n_samples = %d, n_neighbors = %d" %
--> 343                 (train_size, n_neighbors)
    344             )
    345         n_samples, _ = X.shape

ValueError: Expected n_neighbors <= n_samples,  but n_samples = 1, n_neighbors = 20

I have no idea why this is happening. I know that I am only giving 1 input for the testing of prediction, but shouldn't that not throw errors because I would assume that the saved model would have saved the historical data to run the knn on? How can I resolve this issue?

Paolo Forgia
  • 6,572
  • 8
  • 46
  • 58

1 Answers1

0

Scikit-Learn docs recommend using joblib for model persistence.

from sklearn.externals import joblib 

knn = neighbors.KNeighborsRegressor(n_neighbors, weights='uniform')
knn.fit(trainInputs, trainOutputs)
joblib.dump(knn, f"KNN_model_{n_neighbors}_{windowSize}.joblib")

# load the model from a file
model = joblib.load(f"KNN_model_{n_neighbors}_{windowSize}.joblib")

Also, in your original code I notice that you are not using a context block when opening your files. This may or may not make you original code work properly.

davidrpugh
  • 4,363
  • 5
  • 32
  • 46