I use the python implementation of XGBoost. One of the objectives is rank:pairwise
and it minimizes the pairwise loss (Documentation). However, it does not say anything about the scope of the output. I see numbers between -10 and 10, but can it be in principle -inf to inf?

- 7,684
- 15
- 61
- 128
-
did u found what is the output? – Mpizos Dimitris Sep 14 '16 at 09:45
-
Nope. I did not find the answer yet. – Soerendip Sep 15 '16 at 15:49
-
It also does not say what type of loss it uses. Anyone has a clue? – ambodi Jun 16 '21 at 12:25
3 Answers
good question. you may have a look in kaggle competition:
Actually, in Learning to Rank field, we are trying to predict the relative score for each document to a specific query. That is, this is not a regression problem or classification problem. Hence, if a document, attached to a query, gets a negative predict score, it means and only means that it's relatively less relative to the query, when comparing to other document(s), with positive scores.
-
1in this case, would it not just be a kind of structured regression problem? Or at least it is more "close" to a regression problem: since we have ground truth ranks, our objective is to assign the ranks such that in aggregate, our new ranking is close to the original ranking. I would imagine we would penalize ranking the #3 item as #200 much more than ranking it as #5 – information_interchange Jul 03 '19 at 20:42
-
It gives predicted score for ranking. However, the scores are valid for ranking only in their own groups. So we must set the groups for input data.
For esay ranking, refer to my project xgboostExtension

- 71
- 1
- 1
If I understand your questions correctly, you mean the output of the predict
function on a model fitted using rank:pairwise
.
Predict
gives the predicted variable (y_hat
).
This is the same for reg:linear
/ binary:logistic
etc. The only difference is that reg:linear
builds trees to Min(RMSE(y, y_hat))
, while rank:pairwise
build trees to Max(Map(Rank(y), Rank(y_hat)))
. However, output is always y_hat
.
Depending on the values of your dependent variables, output can be anything. But I typically expect output to be much smaller in variance vs the dependent variable. This is usually the case as it is not necessary to fit extreme data values, the tree just needs to produce predictors that are large/small enough to be ranked first/last in the group.
-
1The question is more aiming at the range of the output. Can it be -9999999999999999 to +9999999999999999999 or is it limited to certain boundaries? E.g. -1000 to 1000. – Soerendip Mar 25 '21 at 23:08
-
1As mentioned I expect the range of the output to be smaller than the range of the dependent variable. But this being said, thats more from a statistical point of view. Within the code itself, i don't believe there is any hard constrain on the output. – hjw Mar 26 '21 at 11:45