h2o pojo on test data with extra columns than the model trained on and sometimes missing columns from the train dataset

Question

I have created my model POJO, I have to keep my columns in same order with same datatype when generating predictions using Hive UDF? what is the cleanest way to ignore extra columns and add the columns which are present in train data set but not in test data set, my all columns are either double or long.

score 1 · Answer 1 · answered Nov 08 '18 at 10:38

1

If you use the Easy wrapper, it does this for you automatically.

If you are not using the Easy wrapper then you need to invent the same kind of behavior.

With the Easy wrapper, new columns are ignored and missing columns are treated as N/A.

answered Nov 08 '18 at 10:38

TomKraljevic

3,661
11
14

h2o pojo on test data with extra columns than the model trained on and sometimes missing columns from the train dataset

1 Answers1