I try to replicate the results of multinom() function with optim() function in R, but it does not yield the same results. What was wrong?

Question

I want to replicate the results of multinom() function with optim() function in R, but it does not yield the same results. What was wrong?

First, I imported a public data as "ml".

require(foreign)
ml <- read.dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta")

The codes to get the summary statistics of "ml" data and the results are below:

with(ml, table(ses,prog))
with(ml, do.call(rbind,tapply(write, prog, function(x) c(M = mean(x), SD = sd(x)))))

        prog
ses      general academic vocation
  low         16       19       12
  middle      20       44       31
  high         9       42        7

                M       SD
general  51.33333 9.397775
academic 56.25714 7.943343
vocation 46.76000 9.318754

The codes to get the results from multinom() function that conducts multinomial logistic regression, and following results are below:

library(nnet)
ml$prog2 <- relevel(ml$prog, ref = "academic")
ml_pckg <- multinom(prog2 ~ write + ses, data = ml)
summary(ml_pckg)

Call:
multinom(formula = prog2 ~ write + ses, data = ml)

Coefficients:
         (Intercept)      write  sesmiddle    seshigh
general     2.852198 -0.0579287 -0.5332810 -1.1628226
vocation    5.218260 -0.1136037  0.2913859 -0.9826649

Std. Errors:
         (Intercept)      write sesmiddle   seshigh
general     1.166441 0.02141097 0.4437323 0.5142196
vocation    1.163552 0.02221996 0.4763739 0.5955665

Residual Deviance: 359.9635 
AIC: 375.9635

The code to get the z statistics and the results are below:

z <- summary(ml_pckg)$coefficients/summary(ml_pckg)$standard.errors
z

         (Intercept)     write  sesmiddle   seshigh
general     2.445214 -2.705562 -1.2018081 -2.261334
vocation    4.484769 -5.112689  0.6116747 -1.649967

Next, I wrote the code to replicate the results above. I generated dummy variables for the categorical dependant/independant variables as below:

ml$prog_academic <- ifelse(ml$prog == "academic", 1, 0)
ml$prog_general <- ifelse(ml$prog == "general", 1, 0)
ml$prog_vocational <- ifelse(ml$prog == "vocational", 1, 0)

ml$ses_low <- ifelse(ml$ses == "low", 1, 0)
ml$ses_middle <- ifelse(ml$ses == "middle", 1, 0)
ml$ses_high <- ifelse(ml$ses == "high", 1, 0)

I generated one vector to multiply with the intercept and subsetted write, ses_middle, and ses_high for the explanatory variable. ses_low is baseline here. I assigned these covariates into a new data frame named "X".

one <-as.data.frame(rep(1,200)) 
covar <- ml[,c(7,19,20)]
X <- data.frame(one,covar) #200*4

Next, I created another data frame for dependant variables named "Y" that consists of prog_general and prog_vocational. Here, prog_academic is the baseline.

Y <- ml[,16:17] #200*2

I set the initial value of the parameters similar to the results of mlogit() function so that the optimization function converges.

B_0 <- c(3, -0.1, -0.5, -1, 5, -0.1, 0.2, -1) #8*1 #initial value as vector

Here, I refer to a document to find the likelihood of the multinomial logistic regression. The likelihood is in equation 31 on page 12. I found out that the second part of the equation should be summed with respect to i as well.

I generated a blank matrix "xb" to include part

xb <- matrix(0, nrow=200, ncol=2) #200*2

I run the code below at once to get the results of the optimization.

mlogit <- function(B){
  B <- matrix(B, nrow=2, ncol=4, byrow=T) 
  for (i in 1:nrow(xb)){  #i is the dimension of individual: 200
    for (j in 1:ncol(xb)){  #j is the dimension of dependant variables -1 (categorical): 2
      xb[i,j] <- sum(X[i,]*B[j,]) #200*2
    }
  }
  
  exp <- exp(xb) #200*2
  sumexp <- rowSums(exp) #200*1
  sumexp <- as.numeric(sumexp)
  
  yxb <- Y*xb #200*2
  sumyxb <- sum(yxb)
  
  ll <-  sumyxb-sum(log(1+sumexp))
  -ll
}

mlogit_result <- optim(par = B_0, fn = mlogit)
mlogit_result

The results are below:


$par
[1]  0.05325004 -0.01417267 -0.64375499 -0.96137147  6.33471560 -0.86154161  0.92387035 -0.65728823

$value
[1] 103.7692

$counts
function gradient 
     353       NA 

$convergence
[1] 0

$message
NULL

If the results correspond with that of mlogit() function, $par should be as below:

2.852198 -0.0579287 -0.5332810 -1.1628226  5.218260 -0.1136037  0.2913859 -0.9826649

I reviewed my code and the likelihood function again and again, but could not find anything wrong here. I think maybe the initial parameter is wrongly set or the function I created has some problem.

Could anyone please give me any suggestions to deal with this problem?

Thank you!

I try to replicate the results of multinom() function with optim() function in R, but it does not yield the same results. What was wrong?

0 Answers0