I am attempting to use the mlogit
package in R to model a student's college major choice at graduation, conditional on in-major GPA, log family income, and first chosen major. First chosen major is a factor variable with all of the possible choices in majorcode
except for 6
, which represents dropping out of school. For reference, here is sample data for three students:
studentid majorcode choice majorgpa faminc firstmajor
1001 1 0 0 9.2 5
1001 2 0 0 9.2 5
1001 3 0 1.9 9.2 5
1001 4 0 0 9.2 5
1001 5 1 3.4 9.2 5
1001 6 0 0 9.2 5
1006 1 1 2.7 10.7 1
1006 2 0 2 10.7 1
1006 3 0 2.8 10.7 1
1006 4 0 0 10.7 1
1006 5 0 3 10.7 1
1006 6 0 0 10.7 1
1019 1 0 0 9.6 5
1019 2 0 0 9.6 5
1019 3 0 0 9.6 5
1019 4 0 0 9.6 5
1019 5 1 3.2 9.6 5
1019 6 0 0 9.6 5
My issue comes when I try to run mlogit. Adding the first major factor variable causes the following error:
> mlogit(choice ~ majorgpa | 1 + faminc + firstmajor,
+ data=mydata,
+ reflevel=6)
Error in solve.default(H, g[!fixed]) :
system is computationally singular: reciprocal condition number = 1.04405e-16
I'm pretty sure this error occurs because my data does not have any students whose choice is major 3 but whose first major was major 4, preventing identification of one of my factor variables. However, asclogit
in Stata is able to run the model and give me results if I use the following command:
asclogit choice majorgpa2, case(studentid) alt(majorcode) casevars(faminc i.firstmajor) base(6)
The estimates include an estimated coefficient for the factor variable that should not be identified (4.firstmajor
under majorcode = 3
), though the standard error is very large. I can't figure out how Stata could possibly have found a coefficient on this variable - normally I would have assumed Stata would drop the variable because of the empty cell. Could anyone shed light on the differences between the way R solves mlogit
and Stata solves asclogit
, or maximum likelihood in general, that might produce this weird issue?