3

I have a ranking task, where my training data looks like this:

session_id   item_id   item_features   target
---------------------------------------------
session1     item1     ...             1
session1     item2     ...             0
...
sessionN     item1     ...             0
sessionN     itemX     ...             10
sessionN     itemY     ...             0
...

I am using xgboost in R with the objective "rank:pairwise" for training the model. xgboost expects grouped data (same session_id) to be bunched together in the training and test sets. The lines belonging to the same session_id have to be specified using the function setinfo() (e. g. setinfo(model, 'group', group_info).

When I evaluate the model in R, applying new data works perfectly. However, I have used the package pmml to convert the model into a pmml file in order to use it in Java.

In Java the pmml file gets parsed and evaluated via the org.jpmml pmml-evaluator dependency (v. 1.3.15). Feeding the same data as in R to the org.jpmml.evaluator.Evaluator yields different results, though. The results are mostly negative values - which is no valid result in my setup- all predicted targets should be positive.

I have come up with two possible explanations:

  • There might be a bug in the pmml conversion in my scenario
  • I have no idea, where I can apply the equivalent of setinfo() in Java. Since I am only applying the model to a single session at a time, I was under the impression that I did not need to specify it. But maybe, I was wrong.

Please contact me for fully working example including training and test data, I will send via mail. But for starters, here is the R code from training the model:

library(xgboost)
example_matrix_train <- xgb.DMatrix(X, label = y)
setinfo(example_matrix_train, 'group', example_train_groupInfo)
example.model <- xgboost(data = example_matrix_train, objective = "rank:pairwise", max.depth = 8, eta = 0.2, nthread = 8, nround = 10, verbose=0)

library(pmml)
library(pmmlTransformations)
xgb.dump(example.model, "example.model.dumped.trees")
logfile <- file(paste0("pmml_example_model",Sys.Date(),".txt"), open="a")
sink(logfile)
pmml(example.model, inputFeatureNames = colnames(example_train), outputLabelName = "prediction1", xgbDumpFile = "example.model.dumped.trees")
sink()

Any help is welcome

kb-kou
  • 31
  • 2

2 Answers2

1

I have come up with two possible explanations: There might be a bug in the pmml conversion

This is the true explanation - the pmml package is producing incorrect PMML for XGBoost models. The technical reason is that it is using XGBoost text dump file as input, but the information contained therein is incomplete (eg. rounded threshold values).

If you're looking to export XGBoost models into PMML, then you should be using the r2pmml package, which is using XGBoost binary files as input.

user1808924
  • 4,563
  • 2
  • 17
  • 20
0

In truth, the 'pmml' package currently does not support the 'rank:pairwise' objective function you need. The upcoming release of the 'pmml' package (version 1.5.3) includes a check for unsupported objective functions.

Tridi
  • 186
  • 2