Predicted probabilities from multinomial models in R

Question

My main question is: what probabilities are given from the predict() function of mnlogit(), and how does it differ from those of the packages nnet and mlogitand ?

Some background, I try to model outcome from only individual specific variables, as I don't know the alternatives of my choice makers. For a given model, I can get the same predicted probabilities for each outcome from all three, but mnlogitgives several sets of probabilities, where the first set is similar to the ones given by the other packages. Looking at the vignette of mnlogit, I understand that I can get individual specific probabilities, but I did not think those were ones I extracted (?), nor did I think that the model was specified to obtain those.

Look in the example below (not the most compact one, but the one I was working with when learning these functions), you can see that mnlogit gives several sets of probabilites.

    library(data.table);library(stringr);library(nnet);library(mlogit);library(mnlogit)
data("ModeCanada", package = "mlogit")
bususers <- with(ModeCanada, case[choice == 1 & alt == "bus"])
ModeCanada <- subset(ModeCanada, !case %in% bususers)
ModeCanada <- subset(ModeCanada, nchoice == 4)
ModeCanada <- subset(ModeCanada, alt != "bus")
ModeCanada$alt <- ModeCanada$alt[drop = TRUE]
KoppWen00 <- mlogit.data(ModeCanada, shape='long', chid.var = 'case',
                         alt.var = 'alt', choice='choice',
                         drop.index=TRUE)

data("ModeCanada", package = "mlogit")
busUsers <- with(ModeCanada, case[choice == 1 & alt == "bus"])
Bhat <- subset(ModeCanada, !case %in% busUsers & alt != "bus" &
                     nchoice == 4)
Bhat$alt <- Bhat$alt[drop = TRUE]
head(ModeCanada)
Mode = data.table(ModeCanada)

# Some additional editing in order to make it more similar to the typical data sets I work with
Bhat2 = data.table(KoppWen00)
Bhat2[,Choice:=gsub("\\.","",str_sub(row.names(KoppWen00),5,-1))][,id:=as.character(as.numeric(str_sub(row.names(Bhat),1,4)))]
Bhat2 = Bhat2[choice=="TRUE"][,c("Choice","urban","income","id"),with=F]

# nnet package
ml.nn<- multinom(Choice ~ urban + income,
                 Bhat2)
tmp = data.table(cbind(Bhat2, predict(ml.nn, type="probs", newdata=Bhat2)))
# nnet predictions
tmp[urban=="0" & income==45 & Choice=="air"][1,c("Choice", "urban", "income" , "air","car","train"),with=F]

# mlogit package
ml <- mlogit(Choice ~ 1| urban + income,shape="wide",
                Bhat2)
pml = data.table(cbind(Bhat2, predict(ml,mlogit.data(Bhat2, shape="wide", choice="Choice"))))
# mlogit predictions
unique(pml[Choice=="air" & urban=="0" & income==45 ][,c("Choice", "urban", "income" , "air","car","train"),with=F])

# mnlogit packages
mln.MC <- mnlogit(Choice ~ 1| urban + income, mlogit.data(Bhat2,choice = "Choice",shape="wide"))
preddata = data.table(cbind(mlogit.data(Bhat2,choice = "Choice",shape="wide"), predict(mln.MC)))
# mnlogit predictions, returns several probabilities for each outcome
preddata[Choice==TRUE & urban=="0" & income==45 & alt == "air"]

ps! feel free to add the tag "mnlogit" !

alexwhitworth · Answer 1 · 2015-11-25T23:58:46.043

I'm going to use a simpler example than yours, but the idea is the same

library(mnlogit)
data(Fish, package = "mnlogit")
fm <- formula(mode ~ price | income | catch)
fit <- mnlogit(fm, Fish, choiceVar="alt", ncores = 2)
p <- predict(fit)

R> head(p)
             beach      boat   charter       pier
1.beach 0.09299770 0.5011740 0.3114002 0.09442818
2.beach 0.09151069 0.2749292 0.4537956 0.17976449
3.beach 0.01410359 0.4567631 0.5125571 0.01657626
4.beach 0.17065867 0.1947959 0.2643696 0.37017583
5.beach 0.02858216 0.4763721 0.4543225 0.04072325
6.beach 0.01029792 0.5572462 0.4216448 0.01081103

R> summary(apply(p,1,sum))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1       1       1       1       1       1

As you can see, the output probabilities from predict.mnlogit are exactly what you'd expect: they are the probabilities that the predicted observation belongs to the specified class. That is P(Y_i = y_j | X_i) where j = 1,2,...,k for k specific classes. As noted in the comments below, the probabilities are also conditional on the model. So, a more complete notation is P(Y_i = y_j | X_i, \theta), where \theta represents the estimated parameters of your model.

In this case, for Obs 1: 9% for beach, 50% for boat, 31% for charter and 9% for pier. Any classification method that you choose (nnet, mlogit, etc) should have a similar interpretation for their prediction probabilities. Similarly, any dataset will have the same interpretation of the predicted probabilities.

As you can also see, the sum of all possible classification for a multinomial prediction sum to 1.

But now you have included individual (income), alternative (price) and individual-alternative (catch) specific variables. I ONLY have individual specific,i.e. formula(mode ~ 1| income | 1). Please see how the Bhat2 object looks like, to see how my data typically looks like. — ego_, Nov 25 '15 at 21:02
What difference does that make? You asked what the interpretation of the predicted probabilities is. I answered that. Yes, it goes without saying that the predicted probabilities are conditional on your model.... As I called out in my answer, I simply used the data in `example(mnlogit::predict.mnlogit)` since that was easier to understand than reading through your complex subsetting — alexwhitworth, Nov 25 '15 at 23:53
I thought that the model I specified did not condition on anything besides e.g. Income, so for a given income it should return the same predicted probabilities for the outcomes on all rows/observations, but it doesn't. So wondered whether there are some 'hidden' conditioning done by mnlogit, that is not done by nnet or mlogit. — ego_, Nov 26 '15 at 07:07

Predicted probabilities from multinomial models in R

1 Answers1