What is the output of XGboost using 'rank:pairwise'?

Question

I use the python implementation of XGBoost. One of the objectives is rank:pairwise and it minimizes the pairwise loss (Documentation). However, it does not say anything about the scope of the output. I see numbers between -10 and 10, but can it be in principle -inf to inf?

It also does not say what type of loss it uses. Anyone has a clue? — ambodi, Jun 16 '21 at 12:25

score 8 · Answer 1 · edited Aug 10 '17 at 02:20

8

good question. you may have a look in kaggle competition:

Actually, in Learning to Rank field, we are trying to predict the relative score for each document to a specific query. That is, this is not a regression problem or classification problem. Hence, if a document, attached to a query, gets a negative predict score, it means and only means that it's relatively less relative to the query, when comparing to other document(s), with positive scores.

edited Aug 10 '17 at 02:20

maxymoo

35,286
11
92
119

answered Dec 02 '16 at 08:26

Kehe CAI

1,161
12
18

1

in this case, would it not just be a kind of structured regression problem? Or at least it is more "close" to a regression problem: since we have ground truth ranks, our objective is to assign the ranks such that in aggregate, our new ranking is close to the original ranking. I would imagine we would penalize ranking the #3 item as #200 much more than ranking it as #5 – information_interchange Jul 03 '19 at 20:42
This does not answer the OP question – OmerB Jan 13 '20 at 13:26

score 7 · Answer 2 · answered Dec 01 '17 at 05:15

7

It gives predicted score for ranking. However, the scores are valid for ranking only in their own groups. So we must set the groups for input data.

For esay ranking, refer to my project xgboostExtension

answered Dec 01 '17 at 05:15

bigdong

71
1
1

score 6 · Answer 3 · edited Jan 21 '20 at 09:56

6

If I understand your questions correctly, you mean the output of the predict function on a model fitted using rank:pairwise.

Predict gives the predicted variable (y_hat).

This is the same for reg:linear / binary:logistic etc. The only difference is that reg:linear builds trees to Min(RMSE(y, y_hat)), while rank:pairwise build trees to Max(Map(Rank(y), Rank(y_hat))). However, output is always y_hat.

Depending on the values of your dependent variables, output can be anything. But I typically expect output to be much smaller in variance vs the dependent variable. This is usually the case as it is not necessary to fit extreme data values, the tree just needs to produce predictors that are large/small enough to be ranked first/last in the group.

edited Jan 21 '20 at 09:56

OmerB

4,134
3
20
33

answered Jul 20 '17 at 07:59

hjw

1,279
1
11
25

1

The question is more aiming at the range of the output. Can it be -9999999999999999 to +9999999999999999999 or is it limited to certain boundaries? E.g. -1000 to 1000. – Soerendip Mar 25 '21 at 23:08
1

As mentioned I expect the range of the output to be smaller than the range of the dependent variable. But this being said, thats more from a statistical point of view. Within the code itself, i don't believe there is any hard constrain on the output. – hjw Mar 26 '21 at 11:45

What is the output of XGboost using 'rank:pairwise'?

3 Answers3