While going through the LightGBM docs I found that predict
supports a pred_leaf
argument. The docs say
pred_leaf (bool, optional (default=False)) – Whether to predict
leaf index.
However, when doing a
data := (1, 28)
gbm := num_boost_round = X
embedding = gbm.predict(data, pred_leaf=True)
embedding.shape # [1, X]
print(embedding[0, :]) # [29, 2, 8, 26, 2, 2, 16, 18, 25, 30, 16, 25, 0, 17, 15]
I don't understand why it is outputting an array that is filled as opposed to a one-hot vector or a scalar value? It says it predicts the leaf index? Can this be used as an "embedding" to another model?
Ps: I'd post this in stats-stackexchange but it looks like this is 1) specific to lightgbm and 2) they don't have a lightgbm tag