synapseml lightgbm model doesn't converge on Dataproc

Asked Feb 18 '22 at 12:06

Active Mar 08 '22 at 17:27

Viewed 281 times

I'm trying to deploy a gbdt model with synapseml lightgbm[0.9.5] on google dataproc[2.0-debian10]. I use Spark StringIndexer to index string categorical columns and assemble all columns as a vector. With categorical features setting, I found the model error doesn't converge and there are lots of warnings:

DEFAULT [LightGBM] [Warning] Met negative value in categorical features, will convert it to NaN

It's strange that I checked all categorical features are in [0.0, 72234.0] which are in the range of Int32 https://github.com/microsoft/LightGBM/issues/1359

Then I removed the categorical meta info and treat all features as numeric features. The warning is gone but the metric seems still wierd.

lightgbm valid metric in logs

The model works on local spark environment. So I guess there is something wrong with data shared from JVM to C on DataProc. Can anybody help?

edited Mar 08 '22 at 17:27

CHEEKATLAPRADEEP

12,191
1
19
42

asked Feb 18 '22 at 12:06

jiamei shen

Is it possible to share your code and sample data here? – Dagang Feb 21 '22 at 02:46

synapseml lightgbm model doesn't converge on Dataproc

0 Answers0