Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.
Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?
Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.
Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?
There is no built-in parameter to adjust the weighting to use the median rather than the mean (you can see in the source that the mean is hard-coded). But because scikit-learn estimators are just Python classes, you can subclass KNeighborsRegressor
and override the predict
method to do whatever you want.
Here's a quick example, where I've copied and pasted the original predict()
method and modified the relevant piece:
from sklearn.neighbors.regression import KNeighborsRegressor, check_array, _get_weights
class MedianKNNRegressor(KNeighborsRegressor):
def predict(self, X):
X = check_array(X, accept_sparse='csr')
neigh_dist, neigh_ind = self.kneighbors(X)
weights = _get_weights(neigh_dist, self.weights)
_y = self._y
if _y.ndim == 1:
_y = _y.reshape((-1, 1))
######## Begin modification
if weights is None:
y_pred = np.median(_y[neigh_ind], axis=1)
else:
# y_pred = weighted_median(_y[neigh_ind], weights, axis=1)
raise NotImplementedError("weighted median")
######### End modification
if self._y.ndim == 1:
y_pred = y_pred.ravel()
return y_pred
X = np.random.rand(100, 1)
y = 20 * X.ravel() + np.random.rand(100)
clf = MedianKNNRegressor().fit(X, y)
print(clf.predict(X[:5]))
# [ 2.38172861 13.3871126 9.6737255 2.77561858 17.07392584]
I've left out the weighted version, because I don't know of a simple way to compute a weighted median with numpy/scipy, but it would be straightforward to add in once that function is available.