10

Sklearn-KNN allows one to set weights (e.g., uniform, distance) when calculating the mean x nearest neighbours.

Instead of predicting with the mean, is it possible to predict with the median (perhaps with a user-defined function)?

Eugene Yan
  • 841
  • 2
  • 9
  • 23

1 Answers1

12

There is no built-in parameter to adjust the weighting to use the median rather than the mean (you can see in the source that the mean is hard-coded). But because scikit-learn estimators are just Python classes, you can subclass KNeighborsRegressor and override the predict method to do whatever you want.

Here's a quick example, where I've copied and pasted the original predict() method and modified the relevant piece:

from sklearn.neighbors.regression import KNeighborsRegressor, check_array, _get_weights

class MedianKNNRegressor(KNeighborsRegressor):
    def predict(self, X):
        X = check_array(X, accept_sparse='csr')

        neigh_dist, neigh_ind = self.kneighbors(X)

        weights = _get_weights(neigh_dist, self.weights)

        _y = self._y
        if _y.ndim == 1:
            _y = _y.reshape((-1, 1))

        ######## Begin modification
        if weights is None:
            y_pred = np.median(_y[neigh_ind], axis=1)
        else:
            # y_pred = weighted_median(_y[neigh_ind], weights, axis=1)
            raise NotImplementedError("weighted median")
        ######### End modification

        if self._y.ndim == 1:
            y_pred = y_pred.ravel()

        return y_pred    

X = np.random.rand(100, 1)
y = 20 * X.ravel() + np.random.rand(100)
clf = MedianKNNRegressor().fit(X, y)
print(clf.predict(X[:5]))
# [  2.38172861  13.3871126    9.6737255    2.77561858  17.07392584]

I've left out the weighted version, because I don't know of a simple way to compute a weighted median with numpy/scipy, but it would be straightforward to add in once that function is available.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • 1
    Just found the [wquantiles](https://pypi.python.org/pypi/wquantiles) package which claims to implement the weighted median. I haven't checked it out, but you might find it useful! – jakevdp Nov 15 '15 at 05:21