Matrix has unexpected missing cells when collecting z-statistics from mlogit

Question

I wish to run a series of multinomial logits (600ish per covariate of interest) and gather the z-statistics from each of these (I do not care about the order in which these are recorded).

These mlogits are run on a small piece of my data (sharing a group ID). The mlogits have a varying number of outcomes involved (n), and there will be (n - 1) z statistics to gather from each mlogit. Each mlogit takes the form: y = a + _b*x + \epsilon where y can take on between 2 and 9 values (in my data), although the mean is 3.7.

I believe the difficulty comes in pulling these z-stats out of the mlogit, as there is no way I know to directly call a matrix of z-stats. My solution is to construct the z-stats from the e(V) and e(b) matrices. For each iteration of the mlogit, I construct a matrix of z-stats; I then append this to the previous matrix of z-stats (thereby building a matrix of all of them calculated). Unfortunately, my code does not seem to do this properly.

The symptoms are as follows. The matrix mat_covariate includes many missing values (over half of the matrix values have been missing in the troubleshooting I have done). It also includes many zeroes (which are possible, but unlikely - especially at this rate, about 16%). As written, the code does not yet suppress the mlogits I run, and so I can go back and check what makes it into the matrix. At most one value from each mlogit is recorded, but these are often recorded multiple times. 40% of the mlogits had nothing recorded.

The relevant loop is below:

local counter = 1
forvalues i = 1/`times' {
    preserve
    keep if group_id==`i'
    foreach covariate in `covariates' {
        if `counter' == 1 {
            mlogit class `covariate'
            sum outcomes_n, meanonly
            local max = `r(max)'
            local max_minus = `max' - 1
            matrix mat_`covariate' = J(`max_minus',1,0)
            forvalues j = 1/`max_minus' {
                mat V = e(V)
                mat b = e(b)
                local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)

                matrix mat_`covariate'[`j',1] = `z'
            }
        }
        else {
            mlogit class `covariate'
            sum outcomes_n, meanonly
            local max `r(max)'
            local max_minus = `max' - 1
            matrix mat_`covariate'_temp = J(`max_minus',1,0)
            forvalues j = 1/`max_minus' {
                mat V = e(V)
                mat b = e(b)
                local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
                matrix mat_`covariate'_temp[`j',1] = `z'
                matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
            }
            matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
        }
    }
    local counter = `counter'+1
    restore
}

Some reasons for why I did some of the things in the loop. I believe these things work, but they are not my first instincts, and I am unclear why my first instinct does not work. If there's a simpler/more elegant way to solve them, that would be a nice bonus:

the main if/else (and the counter) is to solve the issue that I cannot define a matrix as a function of itself when it has not yet been defined.
I define a local for the max, and a separate one for the (max-1). The forvalues loop would not accept "1/(`max'-1) {" and I am unsure why.

I created some sample data that can be used to replicate this problem. Below is code for a .do file which sets up data, locals for the loop, the loop above, and demonstrates the symptoms by displaying the matrix in question:

clear all
version 14

//================== sample data: ================== 
set obs 500
set seed 12345

gen id = _n

gen group_id = .
replace group_id = 1 if id <= 50
replace group_id = 2 if id <= 100 & missing(group_id)
replace group_id = 3 if id <= 150 & missing(group_id)
replace group_id = 4 if id <= 200 & missing(group_id)
replace group_id = 5 if id <= 250 & missing(group_id)
replace group_id = 6 if id <= 325 & missing(group_id)
replace group_id = 7 if id <= 400 & missing(group_id)
replace group_id = 8 if id <= 500 & missing(group_id)

gen temp_subgroup_id = .
replace temp_subgroup_id = floor((3)*runiform() + 2) if group_id < 6
replace temp_subgroup_id = floor((4)*runiform() + 2) if group_id < 8 & missing(temp_subgroup_id)
replace temp_subgroup_id = floor((5)*runiform() + 2) if missing(temp_subgroup_id)

egen subgroup_id = group(group_id temp_subgroup_id)

bysort subgroup_id : gen subgroup_size = _N
bysort group_id subgroup_id : gen tag = (_n == 1)
bysort group_id : egen outcomes_n = total(tag)

gen binary_x = floor(2*runiform())


//================== locals: ================== 
local covariates binary_x
local times = 8
// times is equal to the number of group_ids

//================== loop in question: ================== 
local counter = 1
forvalues i = 1/`times' {
    preserve
    keep if group_id==`i'
    foreach covariate in `covariates' {
        if `counter' == 1 {
            mlogit subgroup_id `covariate'
            sum outcomes_n, meanonly
            local max = `r(max)'
            local max_minus = `max' - 1
            matrix mat_`covariate' = J(`max_minus',1,0)
            forvalues j = 1/`max_minus' {
                mat V = e(V)
                mat b = e(b)
                local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)

                matrix mat_`covariate'[`j',1] = `z'
            }
        }
        else {
            mlogit subgroup_id `covariate'
            sum outcomes_n, meanonly
            local max `r(max)'
            local max_minus = `max' - 1
            matrix mat_`covariate'_temp = J(`max_minus',1,0)
            forvalues j = 1/`max_minus' {
                mat V = e(V)
                mat b = e(b)
                local z = b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)
                matrix mat_`covariate'_temp[`j',1] = `z'
                matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
            }
            matrix mat_`covariate' = mat_`covariate' \ mat_`covariate'_temp
        }
    }
    local counter = `counter' + 1
    restore
}

//================== symptoms: ================== 
matrix list mat_binary_x

I'm trying to figure out what is wrong in my code, but have been unable to find the issue (although I've found some other smaller errors, but none that have had an impact on the main problem - I would be unsurprised if there are multiple bugs).

I've not tried to follow this in detail. But my strategy might be to set up at the outset a matrix with as many rows and as many columns as I might need and fill it with missings; and then loop over models and replace just as many values as are needed. But I wouldn't start there. Your different fits are for different groups, so I would start with `statsby`. — Nick Cox, Dec 13 '18 at 10:26
Pearly: Yes, I think you are correct. I switched rows/columns when I was calling e(b). I also thought that the left out variable in the mlogit was last, but it appears this is not the case. This means I am often dividing by zero and/or _beta is zero. I will spend some time with it to see if fixing these two problems solves my issue. Thank you! Nick: I was not familiar with -statsby-, and it looks like it could be a simpler way to do things. Thanks for pointing this out! — amquack, Dec 13 '18 at 13:59
I was able to fix the issue based on your advice Pearly, and the loop appears to be running smoothly. If you want to post that as an answer, I will accept it. — amquack, Dec 13 '18 at 20:40

score 3 · Accepted Answer · answered Dec 13 '18 at 22:12

Consider the simplest case when i == 1 and max_minus == 2:

preserve
keep if group_id == 1

summarize outcomes_n, meanonly           
local max = `r(max)'
local max_minus = `max' - 1

mlogit subgroup_id binary_x

matrix V = e(V)
matrix b = e(b)

This produces the following:

. matrix list V

symmetric V[6,6]
                       1:          1:          2:          2:          3:          3:
                                               o.          o.                        
                binary_x       _cons    binary_x       _cons    binary_x       _cons
  1:binary_x   .46111111
     1:_cons       -.225        .225
2:o.binary_x           0           0           0
   2:o._cons           0           0           0           0
  3:binary_x    .2111111  -.09999999           0           0   .47896825
     3:_cons  -.09999999   .09999999           0           0  -.24285714   .24285714


. matrix list b

b[1,6]
             1:          1:          2:          2:          3:          3:
                                     o.          o.                        
      binary_x       _cons    binary_x       _cons    binary_x       _cons
y1   .10536052  -.22314364           0           0   .23889194  -.35667502


. local j = `max_minus'

. display "z = `= b[1+2*(`j'-1),1] / ( V[1+2*(`j'-1),1+2*(`j'-1)] ) ^ (.5)'"
z = .

The value of z is missing because you are dividing the value of a row in the matrix e(b) that does not exist. In other words, your loops are not set up correctly and substitute incorrect values.

Matrix has unexpected missing cells when collecting z-statistics from mlogit

1 Answers1