I am trying to estimate a nested logit model of company siting choices, with nests = countries
and alternatives = provinces
, based on a number of alternative-specific characteristics as well as some company-specific characteristics. I formatted my data to a "long" structure using:
data <- mlogit.data(DB, choice="Occurrence", shape="long", chid.var="IDP", varying=6:ncol(DB), alt.var="Prov")
Here's a sample of the data:
IDP Occurrence From Prov ToC Dist Price Yield
5p1.APY 5p1 FALSE Sao Paulo APY PY 0.0000000 0.3698913 0.0000000
5p1.BOQ 5p1 FALSE Sao Paulo BOQ PY 0.6495493 0.3698913 0.0000000
5p1.CHA 5p1 FALSE Sao Paulo CHA AR 0.7870593 0.4622464 0.4461496
5p1.COR 5p1 FALSE Sao Paulo COR AR 0.3747480 0.4622464 0.5536546
5p1.FOR 5p1 FALSE Sao Paulo FOR AR 0.6822188 0.4622464 0.4402772
5p1.JUY 5p1 FALSE Sao Paulo JUY AR 1.0000000 0.4622464 0.3617038
Note that I've reduced the table to a few variables for clarity but would normally use more.
The code I use for the nested logit is the following:
nests <- list(Bolivia="SCZ",Paraguay=c("PHY","BOQ","APY"),Argentina=c("CHA","COR","FOR","JUY","SAL","SFE","SDE"))
nml <- mlogit(Occurrence ~ DistComp + PriceComp + YieldComp, data=data, nests=nests, unscaled=T)
summary(nml)
When running this model, I get the following output:
> summary(nml)
Call:
mlogit(formula = Occurrence ~ DistComp + PriceComp + YieldComp,
data = data, nests = nests, unscaled = T)
Frequencies of alternatives:
APY BOQ CHA COR FOR JUY PHY
SAL SCZ SDE SFE
0.1000000 0.0666667 0.1333333 0.0250000 0.0750000 0.0083333 0.0083333
0.1166667 0.2583333 0.1750000 0.0333333
bfgs method
1 iterations, 0h:0m:0s
g'(-H)^-1g = 1E+10
last step couldn't find higher value
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
BOQ:(intercept) -0.29923 NA NA NA
CHA:(intercept) -1.25406 NA NA NA
COR:(intercept) -1.76020 NA NA NA
FOR:(intercept) -1.97083 NA NA NA
JUY:(intercept) -4.14476 NA NA NA
PHY:(intercept) -2.63961 NA NA NA
SAL:(intercept) -1.72047 NA NA NA
SCZ:(intercept) -0.15714 NA NA NA
SDE:(intercept) -0.57449 NA NA NA
SFE:(intercept) -2.47345 NA NA NA
DistComp 2.44322 NA NA NA
PriceComp 2.45202 NA NA NA
YieldComp 3.15611 NA NA NA
iv.Bolivia 1.00000 NA NA NA
iv.Paraguay 1.00000 NA NA NA
iv.Argentina 1.00000 NA NA NA
Log-Likelihood: -221.84
McFadden R^2: 0.10453
Likelihood ratio test : chisq = 51.79 (p.value = 2.0552e-09)
I don't understand what causes the NAs in the output, considering that I prepared the data using mlogit.data()
. Any help on this would be greatly appreciated.
Best,
Yann