Usually, I would perform hyperparameter tuning by using the default scoring, like this:
import skopt
import sklearn
from skopt.space import Real, Categorical, Integer
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
x = np.random.random((100,10))
y = np.random.random((100,1))
knn = KNeighborsRegressor(n_neighbors=10, p=10)
n_iter = 20
ps = {"n_neighbors": Integer(30,50, prior='log-uniform'),
"p": Categorical([1,2]),
"weights": Categorical(["distance"])}
grid = skopt.BayesSearchCV(knn, ps, n_jobs=-1, n_iter=n_iter, refit=True, verbose=0)
grid.fit(x, np.ravel(y))
However, now I would like to use change the optimization score, and use weighted mean absolute error. Following the tutorial I thus defined a custom score, as below
import skopt
import sklearn
from skopt.space import Real, Categorical, Integer
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_error, make_scorer
x = np.random.random((100,10))
y = np.random.random((100,1))
w = np.random.random((100,1))
wmae_score = make_scorer(mean_absolute_error,
#multioutput=w,
sample_weight=w,
greater_is_better=False)
knn = KNeighborsRegressor(n_neighbors=10, p=10)
n_iter = 20
ps = {"n_neighbors": Integer(30,50, prior='log-uniform'),
"p": Categorical([1,2]),
"weights": Categorical(["distance"])}
grid = skopt.BayesSearchCV(knn, ps, n_jobs=-1, n_iter=n_iter, refit=True, verbose=0,
scoring=wmae_score)
grid.fit(x, np.ravel(y))
but this doesn't work. Specifically, I get an error ValueError: Found input variables with inconsistent numbers of samples: [20, 20, 100]
. I think this happens because the model is splitting the data in 5 chunks and using 20 datapoints at the time. How can I define the loss so that it uses the right set of weights?