2

I'm trying to deploy a gbdt model with synapseml lightgbm[0.9.5] on google dataproc[2.0-debian10]. I use Spark StringIndexer to index string categorical columns and assemble all columns as a vector. With categorical features setting, I found the model error doesn't converge and there are lots of warnings:

DEFAULT [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN

It's strange that I checked all categorical features are in [0.0, 72234.0] which are in the range of Int32 https://github.com/microsoft/LightGBM/issues/1359

Then I removed the categorical meta info and treat all features as numeric features. The warning is gone but the metric seems still wierd.

lightgbm valid metric in logs

The model works on local spark environment. So I guess there is something wrong with data shared from JVM to C on DataProc. Can anybody help?

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42

0 Answers0