1

I'm using a generalized low-rank estimator to infer missing values in a data set regarding sensor readings. I'm using H2O to create and train the model:

glrm = H2OGeneralizedLowRankEstimator(k=10,
                                      loss="quadratic",
                                      gamma_x=0.5,
                                      gamma_y=0.5,
                                      max_iterations=2000,
                                      recover_svd=True,
                                      init="SVD",
                                      transform="standardize")
glrm.train(training_frame=train)

After the model is trained, the information provided regarding the performance metrics (MSE and RMSE) both return NaN. Does anybody know why? Firstly I thought it could be related to NaN entries in the data set, but I have already tried with one that is complete, and the same problem occurs. I need this information to perform a grid search over some of the model parameters to select the best one.

Thank you very much,

Luísa Nogueira

  • 1
    It's hard to tell what could be the cause. Do you have some reproducible code with some dummy data you can share for others to replicate? If not, you can check the H2O logs to see if training was properly done. Do predictions give anything? Do any other metrics give values? – Neema Mashayekhi Apr 06 '21 at 20:19

1 Answers1

0

Below is the example found in the docs. It is expected to get MSE as NaN. It may be better to exclude it from the output. Check to see if you get Sum of Squared Error (Numeric) or use the loss function (objective) as you defined as "quadratic".

import h2o
from h2o.estimators import H2OGeneralizedLowRankEstimator
h2o.init()

# Import the USArrests dataset into H2O:
arrestsH2O = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/pca_test/USArrests.csv")

# Split the dataset into a train and valid set:
train, valid = arrestsH2O.split_frame(ratios=[.8], seed=1234)

# Build and train the model:
glrm_model = H2OGeneralizedLowRankEstimator(k=4,
                                            loss="quadratic",
                                            gamma_x=0.5,
                                            gamma_y=0.5,
                                            max_iterations=700,
                                            recover_svd=True,
                                            init="SVD",
                                            transform="standardize")
glrm_model.train(training_frame=train)

Returns MSE and RMSE and NaN:

Model Details ============= H2OGeneralizedLowRankEstimator : Generalized Low Rank Modeling Model Key: GLRM_model_python_1617769810268_1

Model Summary: number_of_iterations final_step_size final_objective_value 0 58.0 0.00005 8.250804e-31

ModelMetricsGLRM: glrm ** Reported on train data. **

MSE: NaN RMSE: NaN
Sum of Squared Error (Numeric): 1.9833472629189004e-13
Misclassification Error (Categorical): 0.0

  • Neema, thank you so much for answering, right that was what I was afraid of. Does this mean I can not use the H2O Grid (Hyperparameter) Search [link](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html)? The best combination of parameters is provided given some metric, which would be either MSE or RMSE. I'm trying to find the best low rank (k) and weights gamma x and gamma y. Do you have any suggestion on how to do this, or should I try it by hand and compare metrics like the ones you said: Sum of Squared Error (Numeric) and the loss function (objective)? – Luisa Nogueira Apr 07 '21 at 09:24
  • AKAIK, grid search can be done. These are the hyperparameters that can be used https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#gbm-hyperparameters. MSE/RMSE NaN is planned to be removed https://h2oai.atlassian.net/browse/PUBDEV-8089 – Neema Mashayekhi Apr 07 '21 at 18:22