H2O elastic net lambda search does not pick the lambda that minimizes validation deviance

Question

When cross-validating the elastic net lambda hyper-parameter using the lambda_search option, the algorithm may not pick the value of lambda from the specified grid that minimizes deviance on the validation sample. This occurs also when we set early_stopping = FALSE, i.e., when one would expect H2O to evaluate all values of lambda in the grid.

This statement can be checked by cross-validating lambda first using lambda_search = TRUE in h2o.glm(), then running a grid search over the same values of lambda using h2o.grid() and comparing the resulting hyperparameters and validation deviance values. See the R code below.

The issue is closely related to the one pointed out here and mentioned here. What this question adds is the documentation that the cross-validated value of lambda need not be the one that minimizes validation deviance. I.e., the problem can be more severe than H2O computing up-to the best lambda and then exiting, as stated in the comments here. The issue occurred for me when tuning on one validation sample in a Tweedie glm with log link, I am not sure how specific it is to this setting.

Based on these results, I would tend to always use grid search to determine lambda. Is this appropriate? Alternatively, is there some option in h2o.glm() that addresses the issue with lambda_search?

rm(list = ls())
library(h2o)
library(tweedie)
library(tidyverse)

# Configuration -----------------------------------------------------------
# DGP:
n = 1000
k = 10
phi = 1
const = 0
bet = seq(-1, 1, length.out = k)
power = 1.5

# algorithm
alpha = 0.5

# Generate some data ------------------------------------------------------
set.seed(42)

x = rnorm(n * k) %>% 
  matrix(nrow = n, dimnames = list(NULL, paste0("x", seq(1, k))))
mu = as.numeric(exp(const + x %*% bet))

dat = x %>% 
  as_tibble() %>% 
  mutate(mu = mu,
         y  = rtweedie(n, 
                       mu = mu,
                       phi = phi, 
                       power = power),
         id = row_number(),
         sample = case_when(
           id <= n / 2 ~ "train",
           TRUE ~ "valid"))

# Initialize H2O ----------------------------------------------------------
h2o.init()

df_h2o_train = dat %>% 
  filter(sample == "train") %>% 
  as.h2o()

df_h2o_valid = dat %>% 
  filter(sample == "valid") %>% 
  as.h2o()


# Tune lambda -------------------------------------------------------------
# 1. Lambda search
glm_warmstart = h2o.glm(
  x                      = paste0("x", seq(1, k)),
  y                      = "y",
  family                 = "tweedie",
  tweedie_variance_power = power,
  tweedie_link_power     = 0,
  training_frame         = df_h2o_train,
  validation_frame       = df_h2o_valid,
  alpha                  = alpha,
  lambda_search          = TRUE,
  early_stopping         = FALSE
)

lambda_warmstart = glm_warmstart@model$lambda_best 
print(lambda_warmstart) # 0.1501327

# 2. Grid search
hyper_params = list(lambda = glm_warmstart@model$scoring_history$lambda %>% 
                      h2o.asnumeric())

grid_search = h2o.grid("glm",
                       hyper_params           = hyper_params,
                       x                      = paste0("x", seq(1, k)),
                       y                      = "y",
                       family                 = "tweedie",
                       tweedie_variance_power = power,
                       tweedie_link_power     = 0,
                       training_frame         = df_h2o_train,
                       validation_frame       = df_h2o_valid,
                       alpha                  = alpha,
                       lambda_search          = FALSE)

lambda_grid_search = grid_search@summary_table %>% 
  as_tibble() %>%
  head(1) %>% 
  pull(lambda) %>% 
  stringr::str_sub(2, -2) %>% 
  as.numeric()
print(lambda_grid_search) # 0.013

glm_grid_search = h2o.glm(
  x                      = paste0("x", seq(1, k)),
  y                      = "y",
  family                 = "tweedie",
  tweedie_variance_power = power,
  tweedie_link_power     = 0,
  training_frame         = df_h2o_train,
  alpha                  = alpha,
  lambda                 = lambda_grid_search)

# Compare validation deviance ---------------------------------------------
dat %>% 
  filter(sample == "valid") %>% 
  mutate(pred_warmstart = as.vector(h2o.predict(glm_warmstart,
                                             newdata = df_h2o_valid)),
         pred_grid_search  = as.vector(h2o.predict(glm_grid_search,
                                             newdata = df_h2o_valid)),
         deviance_warmstart = tweedie.dev(y, pred_warmstart, power),
         deviance_grid_search = tweedie.dev(y, pred_grid_search, power)) %>% 
  summarise(
    mean_deviance_warmstart = mean(deviance_warmstart), # 1.16
    mean_deviance_grid_search = mean(deviance_grid_search) # 1.08
  )

# Close -------------------------------------------------------------------
h2o.shutdown(prompt = FALSE)

Let me add here that I enjoy using H2O very much! – Matthias Schmidtblaicher Jun 04 '19 at 08:29 — Matthias Schmidtblaicher, Jun 04 '19 at 08:29

H2O elastic net lambda search does not pick the lambda that minimizes validation deviance

0 Answers0