0

Usually, I would perform hyperparameter tuning by using the default scoring, like this:

import skopt 
import sklearn
from skopt.space import Real, Categorical, Integer
import numpy as np
from sklearn.neighbors import KNeighborsRegressor

x = np.random.random((100,10))
y = np.random.random((100,1))

knn = KNeighborsRegressor(n_neighbors=10, p=10)

n_iter = 20
ps = {"n_neighbors": Integer(30,50, prior='log-uniform'),
      "p": Categorical([1,2]),
      "weights": Categorical(["distance"])}

grid = skopt.BayesSearchCV(knn, ps, n_jobs=-1, n_iter=n_iter, refit=True, verbose=0)

grid.fit(x, np.ravel(y))

However, now I would like to use change the optimization score, and use weighted mean absolute error. Following the tutorial I thus defined a custom score, as below


import skopt 
import sklearn
from skopt.space import Real, Categorical, Integer
import numpy as np
from sklearn.neighbors import KNeighborsRegressor

from sklearn.metrics import mean_absolute_error, make_scorer

x = np.random.random((100,10))
y = np.random.random((100,1))
w = np.random.random((100,1))

wmae_score = make_scorer(mean_absolute_error,
                          #multioutput=w,
                          sample_weight=w,
                          greater_is_better=False)

knn = KNeighborsRegressor(n_neighbors=10, p=10)

n_iter = 20
ps = {"n_neighbors": Integer(30,50, prior='log-uniform'),
      "p": Categorical([1,2]),
      "weights": Categorical(["distance"])}

grid = skopt.BayesSearchCV(knn, ps, n_jobs=-1, n_iter=n_iter, refit=True, verbose=0,
                           scoring=wmae_score)

grid.fit(x, np.ravel(y))

but this doesn't work. Specifically, I get an error ValueError: Found input variables with inconsistent numbers of samples: [20, 20, 100]. I think this happens because the model is splitting the data in 5 chunks and using 20 datapoints at the time. How can I define the loss so that it uses the right set of weights?

Ferdinando Randisi
  • 4,068
  • 6
  • 32
  • 43

0 Answers0