I train GBM models with H2O and want to use them in my backend (not Java). To do so, I download the MOJOs, convert it to ONNX and run it in my apps.
In order to make inference, I need to know how categorical columns transformed to their one-hot encoded versions. I was able to find it in the POJO:
static final void fill(String[] sa) {
sa[0] = "Age";
sa[1] = "Fare";
sa[2] = "Pclass.1";
sa[3] = "Pclass.2";
sa[4] = "Pclass.3";
sa[5] = "Pclass.missing(NA)";
sa[6] = "Sex.female";
sa[7] = "Sex.male";
sa[8] = "Sex.missing(NA)";
}
So, here is the workflow for non-Java backend as I see it:
- Encode categorical features with
OneHotExplicit
. - Train GBM model.
- Download MOJO and convert to ONNX.
- Download POJO and find feature alignment in the source code.
- Implement the inference in your backend.
Is it the most straightforward and correct way?