what if i pass categorical value for ML.Net prediction never passed before

Question

for example, if I trained the model for these values

Column1 = A , Column2 = B , Column3 = C , Label = 10 
Column1 = D , Column2 = E , Column3 = F , Label = 20
Column1 = G , Column2 = H , Column3 = I , Label = 30

What if i want to predict?

Column1 = A , Column2 = B , Column3 = Z

what the model do for that?

score 1 · Answer 1 · answered Aug 13 '18 at 16:19

It depends on how you process the categorical data. If, for example, you used dictionary-based one-hot vectorizer:

new CategoricalOneHotVectorizer("Column2", "Column2", "Column3")

then the model will build a dictionary of terms per column: Column1 -> [A, D, G] Column2 -> [B, E, H] Column3 -> [C, F, I]

If the value has not been seen (is not present in a dictionary), the CategoricalOneHotVectorizer assigns zero to all the 'one-hot' slots. So your example A B Z will turn into 1 0 0 1 0 0 0 0 0.

If, on the other hand, you use hash-based one-hot encoding:

new CategoricalHashOneHotVectorizer("Column2", "Column2", "Column3")

the incoming value Z will be hashed in the same way as the seen values C, F and I, and this will activate one of the 2^HashBits slots of the output column, based on the value of the hash.

The doc on the CategoricalOneHotVectorizer is not very clear on this one, but it still says:

The Key value is the one-based index of the slot set in the Ind/Bag options. If the Key option is not found, it is assigned the value zero.

I created an issue on the ML.NET Github page to clarify the docs. https://github.com/dotnet/machinelearning/issues/675 — Zruty, Aug 13 '18 at 16:27

what if i pass categorical value for ML.Net prediction never passed before

1 Answers1