Providing user defined sample weights for knn classifier in scikit-learn

Question

I am using the scikit-learn KNeighborsClassifier for classification on a dataset with 4 output classes. The following is the code that I am using:

knn = neighbors.KNeighborsClassifier(n_neighbors=7, weights='distance', algorithm='auto', leaf_size=30, p=1, metric='minkowski')

The model works correctly. However, I would like to provide user-defined weights for each sample point. The code currently uses the inverse of the distance for scaling using the metric='distance' parameter.

I would like to continue to keep the inverse distance scaling but for each sample point, I have a probability weight as well. I would like to apply this as a weight in the distance calculation. For example, if x is the test point and y,z are the two nearest neighbors for which distance is being calculated, then I would like the distance to be calculated as (sum|x-y|)*w_y and (sum|x-z|)*w_z respectively.

I tried to define a function that was passed into the weights argument but then I also would like to keep the inverse distance scaling in addition to the user defined weight and I do not know the inverse distance scaling function. I could not find an answer from the documentation.

Any suggestions?

score 3 · Answer 1 · answered Dec 03 '18 at 21:40

KNN in sklearn doesn't have sample weight, unlike other estimators, e.g. DecisionTree. Personally speaking, I think it is a disappointment. It is not hard to make KNN support sample weight, since the predicted label is the majority voting of its neighbours. A stupid walk around, is to generate samples yourself based on the sample weight. E.g., if a sample has weight 2, then make it appear twice.

score 0 · Answer 2 · answered Jan 28 '23 at 21:45

You can use resampling to adapt your sample weights with K-neighbors since the sklearn implementation does not include sample weights. Here is how you could do this:

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# Get training and testing data
Xtrain, ytrain, sample_weight_train = get_train_data() 
Xtest, ytest, sample_weight_test = get_test_data()

# Derive probability values from your sample weights
prob_train = np.asarray(sample_weight_train) / np.sum(sample_weight_train)
upsample_size = int(np.max(prob_train) / np.min(prob_train) * len(ytrain))
newids = np.random.choice(range(len(ytrain)), size=upsample_size, p=prob_train, replace=True)

# Upsample training data using sample weights as probabilities
# so that the data distribution is upsampled to fit the corresponding sample weights
Xtrain, ytrain = Xtrain[newids,:], ytrain[newids]

# Fit your model
model = KNeighborsClassifier()
model = model.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)

score -1 · Answer 3 · answered Aug 30 '20 at 08:10

-1

sklearn.neighbors.KNeighborsClassifier.score() has a sample_weight parameter. Is that what you're looking for?

answered Aug 30 '20 at 08:10

ItM

301
1
5
16

Providing user defined sample weights for knn classifier in scikit-learn

3 Answers3