GLM model: h2o.predict gives very different results depending on number of rows used in the validation data

Question

I built a H2O (v. 3.14) GLM model. However, when I check the predictions using h2o.predict, I got very different results based on how many rows I use in the validation set.

Calling h2o.predict on the first 10 rows, I got:

# Predict using the first 10 lines in validation set
h2o.predict(glm.test, df.valid[1:10,])
# Result:
  predict        p0           p1
1       0 0.9999224 7.756014e-05
2       0 0.9962711 3.728930e-03
3       0 0.9997378 2.622195e-04
4       0 0.9999556 4.437544e-05
5       0 0.9998994 1.006037e-04
6       0 0.9999394 6.062479e-05

But if I call h2o.predict on the first 100 rows, I got very different result.

h2o.predict(glm.test, df.valid[1:100,])
# Result:
  predict         p0        p1
1       1 0.06196439 0.9380356
2       1 0.15371122 0.8462888
3       1 0.01654756 0.9834524
4       1 0.12830090 0.8716991
5       1 0.07195659 0.9280434
6       1 0.09725532 0.9027447

I have posted the code which repro the problem. The data set (which is very sparse) can be downloaded from https://www.dropbox.com/s/58ul6zrekpmjh20/dt.truth.csv.gz

h2o.removeAll()

# Note: The zipped data file can be downloaded from:
#       https://www.dropbox.com/s/58ul6zrekpmjh20/dt.truth.csv.gz

df.truth <- h2o.importFile(
  path="data/dt.truth.csv.gz", sep=",", header=T)

df.truth$isTarget <- h2o.asfactor(df.truth$isTarget)

# Split into train / test
splits <- h2o.splitFrame(df.truth, c(0.7), seed=1234)
df.train <- h2o.assign(splits[[1]], "df.train.hex")   
df.valid <- h2o.assign(splits[[2]], "df.valid.hex")

# Build a GLM model
glm.test <- h2o.glm(         
  training_frame = df.train,        
  y="isTarget",                 
  family = "binomial",
  missing_values_handling = "MeanImputation",
  seed = 1000000) 

# Predict using the first 10 lines in validation set
h2o.predict(glm.test, df.valid[1:10,])

# Predict using the first 100 lines in validation set.  Got very different result!
h2o.predict(glm.test, df.valid[1:100,])

Thanks, we will take a look at this. Did you see the same behavior on another dataset (e.g. iris), or does this only happen on your particular dataset? — Erin LeDell, Nov 21 '17 at 05:55
@ErinLeDell I couldn't repro this on other dataset. So far I could repro only on my data set, which is a very sparse set. — Patrick Ng, Nov 21 '17 at 06:48
@PatrickNg, thanks for the report. We found a bug in the GLM implementation (jira here: https://0xdata.atlassian.net/browse/PUBDEV-5096). We have a fix ready and it is currently in testing. If everything goes well this fix will be included in 3.16 release (coming this week). — Michal Kurka, Nov 21 '17 at 22:58
@MichalKurka That will be great! I found this bug as part of my investigation of another problem here (https://stackoverflow.com/questions/47390133/h2o-glm-model-saved-mojos-prediction-is-very-different-when-running-on-the-sam), which I hope can be resolved by your fix as well. — Patrick Ng, Nov 21 '17 at 23:09
@MichalKurka I think you could post that comment and link as the answer? (Then it can be accepted and marked as answered.) — Darren Cook, Nov 22 '17 at 08:49

GLM model: h2o.predict gives very different results depending on number of rows used in the validation data

0 Answers0

Linked