I want to know the significance of each coefficient of a logistic regression model using spark function ml_logistic_regression
. The code is as follows:
# data in R
library(MASS)
data(birthwt)
str(birthwt)
detach("package:MASS", unload=TRUE)
# Connection to Spark
library(sparklyr)
library(dplyr)
sc = spark_connect(master = "local")
# copy the data to Spark
birth_sc = copy_to(sc, birthwt, "birth_sc", overwrite = TRUE)
# Model
# create dummy variables for race (race_1, race_2, race_3)
birth_sc = ml_create_dummy_variables(birth_sc, "race")
model = ml_logistic_regression(birth_sc, low ~ lwt + race_2 + race_3)
The model I get is the following:
> model
Call: low ~ lwt + race_2 + race_3
Coefficients:
(Intercept) lwt race_2 race_3
0.80575496 -0.01522311 1.08106617 0.48060322
In an R model you use summary
and it gives you the significance of the coefficients, but if I use it with this model I get the same result:
> summary(model)
Call: ml_logistic_regression(birth_sc, low ~ lwt + race_2 + race_3)
Coefficients:
(Intercept) lwt race_2 race_3
0.80575496 -0.01522311 1.08106617 0.48060322
How could get the significance of each variable in the model?