I'm using the mlogit
package in R
to create a nested multinomial logit model of healthcare provider choice given choice data I have. The data look like this:
ID RES weight age wealth educ married urban partnerAge totalChildren survivingChildren anyANC
1.0 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.1 2468158 TRUE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.2 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
1.3 2468158 FALSE 0.2609153 29 Poor Primary 1 0 31 4 4 1
2.0 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.1 14233860 TRUE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.2 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
2.3 14233860 FALSE 0.2754970 19 Poorest Primary 1 0 30 1 1 1
outlier50Km optout alt spa mes dist bobs cobs Q fees chid educSec
1.0 0 -1 0 Home Home 0.000 0.0000000 0.000 0.00 0 1 0
1.1 0 -1 1 Health center Public 13.167 0.4898990 NA 0.64 0 1 0
1.2 0 -1 2 Health center Public 30.596 0.5202020 NA 0.56 0 1 0
1.3 0 -1 3 District hospital Public 41.164 0.7171717 0.825 0.88 0 1 0
2.0 0 -1 0 Home Home 0.000 0.0000000 0.000 0.00 0 2 0
2.1 0 -1 1 Health center Mission 14.756 0.7676768 NA 0.64 1 2 0
2.2 0 -1 2 Health center Public 41.817 0.3787879 NA 0.56 0 2 0
2.3 0 -1 3 District hospital Public 50.419 0.7171717 0.825 0.88 0 2 0
where spa, mes, dist, bobs, cobs, Q,
and fees
are characteristics of the provider and the remaining variables specific to the individual. These data are in long format, meaning each individual has four rows, reflecting her four choices (alt = 0:3
), with RES
being the response variable.
An un-nested model behaves appropriately
f.full <- RES ~ 0 + dist + Q + bobs + fees + spa | 0 + age + wealth + educSec + married + urban + totalChildren + survivingChildren
choice.ml.full <- mlogit(formula = f.full, data = data, weights = weight)
predict(choice.ml.full, data[1:8,])
0 1 2 3
[1,] 0.1124429 0.7739403 0.06893341 0.04468343
[2,] 0.4465272 0.3107375 0.11490317 0.12783210
By all measures of model fit, however, a nested model is better than an un-nested one. The nested model gives me coefficients appropriately:
ns2 <- mlogit(formula = f.full, nests = list(home = "0", useCare = c("1", "2", "3")), data = data, weight = weight, un.nest.el = TRUE)
summary(ns2)
Call:
mlogit(formula = f.full, data = data, weights = weight, nests = list(home = "0",
useCare = c("1", "2", "3")), un.nest.el = TRUE)
Frequencies of alternatives:
0 1 2 3
0.094378 0.614216 0.194327 0.097079
bfgs method
23 iterations, 0h:0m:13s
g'(-H)^-1g = 9.51E-07
gradient close to zero
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
dist -0.0336233 0.0040136 -8.3773 < 2.2e-16 ***
Q 0.1780058 0.0768181 2.3172 0.0204907 *
bobs -0.0695695 0.0505795 -1.3754 0.1689925
fees -0.8488132 0.1001928 -8.4718 < 2.2e-16 ***
etc...
But, I get the following error if I try to predict on a single individual:
predict(ns2, data[1:4,])
Error in apply(Gl, 1, sum) : dim(X) must have a positive length
and a different error if I try to predict on more than one individual:
predict(ns2, data[1:8,])
Error in solve.default(crossprod(attr(x, "gradi")[, !fixed])) :
Lapack routine dgesv: system is exactly singular: U[5,5] = 0
Any help would be vastly appreciated.