5

I'm using LGBMRanker for a ranking problem and want optimize the hyperparameters with GridSearchCV.

I have three splits of data (X,y):

X_1, X_2, X_3 , y_1, y_2, y_3

I also have the query group size for each split (three lists): gp_1, gp_2, gp_3

I'm defining a specific split for cross-validation:

from sklearn.model_selection import PredefinedSplit
valid_fold = list(np.zeros(n))+ list(np.ones(n))+list(2*np.ones(n))
ps = PredefinedSplit(valid_fold)
ranker = lightgbm.LGBMRanker(**estimator_params)

grid = GridSearchCV(ranker, params_grid, cv=ps, verbose=2, scoring=make_scorer(ndcg_score, greater_is_better=True), refit=False)

How to feed the data to grid.fit? The following code doesn't work:

X_concat = np.concatenate((X_1, X_2, X_3), axis=0)
y_concat = np.concatenate((y_1, y_2, y_3), axis=0)

grid.fit(X_concat, y_concat, group= next(iter([gp_1, gp_2, gp_3])), **params_fit)
Ellie
  • 51
  • 4

0 Answers0