I split a dataset up in a training and test sample. I then fit a logit model on the training data to predict the outcome of the test sample. I can do this in two ways:
Using Tidyverse:
logit_mod <- logistic_reg() %>%
set_mode("classification") %>%
set_engine("glm") %>%
fit(y ~ x + z, data=train)
res <- predict(logit_mod, new_data = test, type="prob")
Or with the GLM class:
logit_mod <- glm(y ~ x + z, data=train, family='logit')
res <- predict(logit_mod, newdata=test, type="response")
Both methods give me different output (probabilities of y). While the model should be the same. extracting logit_mod[["fit"]]
gives me the same coefficients as I have for logit_mod
using GLM.
Why does the second method give me different predicted probabilities?