With
gmm_model <- ml_gaussian_mixture(iris_tbl, Species ~ .)
you can get the log-likelihood as
gmm_model$summary$log_likelihood
Which you can then use to get BIC or AIC.
I'm sure there must be a way to get it directly though. But if not, you may calculate BIC as
log(n) + k-1 + k * p + k * p * (p-1) / 2 - 2 * gmm_model$summary$log_likelihood
Where n
- number of samples, k
- number of clusters, p
- number of variables.
In above, the k-1 + k * p + k * p * (p-1) / 2
is the number of free-parameters in a Gaussian mixture model (with unristricted co-variance matrices)
Example:
library(sparklyr)
sc <- spark_connect(master = "local")
iris_tbl <- sdf_copy_to(sc, iris, name = "iris_tbl", overwrite = TRUE)
gmm_model <- ml_gaussian_mixture(iris_tbl, Species ~ .)
gmm_model$summary$log_likelihood
#[1] -294.1398