I am trying to fit a logistic regression to a small data set (17k rows, 16 columns). But it continues to run after 60+ minutes and I just ended it. Neither my CPU nor my RAM are maxed -- we just observe higher utilization once I start the fitting process. To rule out the possibility of an egregious coding error, I tested the same code with a data set that was 5 rows by 16 columns. It worked -- I was able to get a summary
and confints
. Hence, there must be another issue.
The data set has a mixture of factor
, int
and numerical
variables. I'd like to share it's schema but it contains sensitive, proprietary information.
I'm wondering if there are some solutions that can be suggested, or if the solutions posited in the half-decade old posts shared below are still relevant (I am trying those old solutions now).
The data set dimensions and the code:
> dim(design_mat_final)
[1] 16812 16
log_model <- glm(label ~.,
family = binomial(link = 'logit'),
data = design_mat_final)
My session info:
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 dplyr_0.7.4 bit64_0.9-7 bit_1.1-12 data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 utf8_1.1.3 crayon_1.3.4 assertthat_0.2.0 R6_2.2.2 magrittr_1.5
[7] pillar_1.1.0 cli_1.0.0 rlang_0.1.6 tools_3.4.3 glue_1.2.0 yaml_2.1.16
[13] compiler_3.4.3 pkgconfig_2.0.1 knitr_1.20 bindr_0.1 tibble_1.4.2
Related to this 5 year old post: How to speed up GLM estimation in r?
and relevant to this 6 year old CrossValidated post: https://stats.stackexchange.com/questions/26965/logistic-regression-is-slow
Update:
I tried speedglm
and it did not have an appreciable effect.