0

I am working with h2o glrm function. When I am trying to pass loss_by_col argument in order to specify different loss function for each column in my DataFrame (I have normal, poisson and binomial variables, so I am passing "Quadratic", "Poisson" and "Logistic" loss), the objective is not getting computed. The testmodel@model$objective returns NaN. But at the same time summary shows that there was few iterations made and objective was NA for all of them. The quality of model is very bad, but the archetypes are somehow computed. So I am confused. How should pass different loss for every variable in my dataset? Here is a (i hope) reproducible example:

df <- data.frame(p1 = rpois(100, 5), n1 = rnorm(100), b1 = rbinom(100, 1, 0.5))
df$b1 <- factor(df$b1)
h2df <- as.h2o(df)

testmodel <- h2o.glrm(h2df,
         k=3,
         loss_by_col=c("Poisson", "Quadratic", "Logistic"),
         transform="STANDARDIZE")
testmodel@model$objective
summary(testmodel)
plot(testmodel)

1 Answers1

1

Please note that there is a jira ticket for this here

It's interesting that you don't get an error when you run your code snippet. When I run your code snippet I get the following error:

Error: DistributedException from localhost/127.0.0.1:54321: 'Poisson loss L(u,a) requires variable a >= 0', caused by java.lang.AssertionError: Poisson loss L(u,a) requires variable a >= 0

I can resolve this error by removing transform="STANDARDIZE", because standardization can lead to negative values. For more information on what the transformations do you can take a look at the user guide here for your convenience here is the definition of how standardize gets used Standardize: Standardizing subtracts the mean and then divides each variable by its standard deviation.

Lauren
  • 5,640
  • 1
  • 13
  • 19
  • I using 3.20.0.8 and maybe the reason why i don't get explicit error is somewhat related to version. But of course the standardization of the Poisson variable was the cause. Thanks for pointing this out, it should have been obvious for me from the start. So if I had some normal variables in the set and I want them to be standardized, and at the same time have some Poisson variables, than i know I should standardize the normal variables by my own before running glrm. Thanks very much! – Paweł Kozielski-Romaneczko Oct 07 '18 at 19:48