2

I am attempting to use the mlogit package in R to model a student's college major choice at graduation, conditional on in-major GPA, log family income, and first chosen major. First chosen major is a factor variable with all of the possible choices in majorcode except for 6, which represents dropping out of school. For reference, here is sample data for three students:

studentid   majorcode   choice   majorgpa   faminc   firstmajor
1001           1         0         0        9.2      5
1001           2         0         0        9.2      5
1001           3         0         1.9      9.2      5
1001           4         0         0        9.2      5
1001           5         1         3.4      9.2      5
1001           6         0         0        9.2      5
1006           1         1         2.7      10.7     1
1006           2         0         2        10.7     1
1006           3         0         2.8      10.7     1
1006           4         0         0        10.7     1
1006           5         0         3        10.7     1
1006           6         0         0        10.7     1
1019           1         0         0        9.6      5
1019           2         0         0        9.6      5
1019           3         0         0        9.6      5
1019           4         0         0        9.6      5
1019           5         1         3.2      9.6      5
1019           6         0         0        9.6      5

My issue comes when I try to run mlogit. Adding the first major factor variable causes the following error:

> mlogit(choice ~ majorgpa |  1 + faminc + firstmajor,
+   data=mydata,
+   reflevel=6)
Error in solve.default(H, g[!fixed]) : 
system is computationally singular: reciprocal condition number = 1.04405e-16

I'm pretty sure this error occurs because my data does not have any students whose choice is major 3 but whose first major was major 4, preventing identification of one of my factor variables. However, asclogit in Stata is able to run the model and give me results if I use the following command:

asclogit choice majorgpa2, case(studentid) alt(majorcode) casevars(faminc i.firstmajor) base(6)

The estimates include an estimated coefficient for the factor variable that should not be identified (4.firstmajor under majorcode = 3), though the standard error is very large. I can't figure out how Stata could possibly have found a coefficient on this variable - normally I would have assumed Stata would drop the variable because of the empty cell. Could anyone shed light on the differences between the way R solves mlogit and Stata solves asclogit, or maximum likelihood in general, that might produce this weird issue?

Avery
  • 21
  • 3
  • 1
    I think it will be really difficult for anyone to respond to your question without a realistic example dataset. –  Oct 12 '18 at 16:36
  • 1
    You can use Stata's command `dataex` to generate one. –  Oct 12 '18 at 16:37
  • Thanks. I have edited the sample data to add more students, but note that to comply with my data use agreement I've changed numbers around. – Avery Oct 12 '18 at 16:50
  • 3
    I cannot replicate your problem with this data. –  Oct 12 '18 at 16:52
  • can you try the mlogit in STATA and see what are the results? – Yan Song Oct 13 '18 at 11:02

0 Answers0