4

As McFadden (1978) showed, if the number of alternatives in a multinomial logit model is so large that computation becomes impossible, it is still feasible to obtain consistent estimates by randomly subsetting the alternatives, so that the estimated probabilities for each individual are based on the chosen alternative and C other randomly selected alternatives. In this case, the size of the subset of alternatives is C+1 for each individual.

My question is about the implementation of this algorithm in R. Is it already embedded in any multinomial logit package? If not - which seems likely based on what I know so far - how would one go about including the procedure in pre-existing packages without recoding extensively?

Effa
  • 55
  • 7

3 Answers3

5

Not sure whether the question is more about doing the sampling of alternatives or the estimation of MNL models after sampling of alternatives. To my knowledge, there are no R packages that do sampling of alternatives (the former) so far, but the latter is possible with existing packages such as mlogit. I believe the reason is that the sampling process varies depending on how your data is organized, but it is relatively easy to do with a bit of your own code. Below is code adapted from what I used for this paper.

library(tidyverse)
# create artificial data
set.seed(6)
# data frame of choser id and chosen alt_id
id_alt <- data.frame(
  id = 1:1000,
  alt_chosen = sample(1:30, 1)
)
# data frame for universal choice set, with an alt-specific attributes (alt_x2)
alts <- data.frame(
  alt_id = 1:30,
  alt_x2 = runif(30)
)

# conduct sampling of 9 non-chosen alternatives
id_alt <- id_alt %>% 
  mutate(.alts_all =list(alts$alt_id),
         # use weights to avoid including chosen alternative in sample
         .alts_wtg = map2(.alts_all, alt_chosen, ~ifelse(.x==.y, 0, 1)),
         .alts_nonch = map2(.alts_all, .alts_wtg, ~sample(.x, size=9, prob=.y)),
         # combine chosen & sampled non-chosen alts
         alt_id = map2(alt_chosen, .alts_nonch, c)
  ) 

# unnest above data.frame to create a long format data frame
# with rows varying by choser id and alt_id
id_alt_lf <- id_alt %>% 
  select(-starts_with(".")) %>%
  unnest(alt_id)

# join long format df with alts to get alt-specific attributes
id_alt_lf <- id_alt_lf %>% 
  left_join(alts, by="alt_id") %>% 
  mutate(chosen=ifelse(alt_chosen==alt_id, 1, 0))

require(mlogit)
# convert to mlogit data frame before estimating
id_alt_mldf <- mlogit.data(id_alt_lf, 
                           choice="chosen", 
                           chid.var="id", 
                           alt.var="alt_id", shape="long")
mlogit( chosen ~ 0 + alt_x2, id_alt_mldf) %>% 
  summary()

It is, of course, possible without using the purrr::map functions, by using apply variants or looping through each row of id_alt.

LmW.
  • 1,364
  • 9
  • 16
3

Sampling of alternatives is not currently implemented in the mlogit package. As stated previously, the solution is to generate a data.frame with a subset of alternatives and then using mlogit (and importantly to use a formula with no intercepts). Note that mlogit can deal with unbalanced data, ie the number of alternatives doesn't have to be the same for all the choice situations.

  • Yves - thanks for confirming. Good karma flying your way. – Technophobe01 Sep 12 '18 at 12:43
  • Yves, thanks for the mlogit package! What is the reason/rationale that we have to use the chid.var argument instead of the more intuitive `id.var` argument for unbalanced data? – LmW. Sep 28 '18 at 17:35
1

My recommendation would be to review the mlogit package.

Vignette:

the package has a set of example exercises that (in my opinion) are worth looking at:

You may also want to take a look at the gmnl package (I have not used it)

Question: What specific problem(s) are you trying to apply a multinomial logit model too? Suitably intrigued.

Aside from the above question, I hope the above points you in the right direction.

Technophobe01
  • 8,212
  • 3
  • 32
  • 59
  • Generally link-only answers are not great as they are not self-contained and can suffer from link rot. It would be great to see you expand this with a complete example, for example using a dataset included in one of these packages. – Thomas Aug 21 '18 at 23:09
  • 1
    @Thomas - Chuckle - in general, I agree with the heuristic to include a code example. My defense, in this case, is that I am referencing CRAN package documentation and linking to the `gmnl` paper and authors website. That and the request asks "Is it (random subsetting) already embedded in any multinomial logit package?", my answer is "Yes, I believe so", here are the links to said packages and documentation. I do however reserve the right to be completely wrong on all occasions :-) If I have time I'll update and provide a fuller example. – Technophobe01 Aug 22 '18 at 01:39
  • @Effa In support of Thomas's comments, are you looking for an example as well as a pointer to the packages? Does the answer address your immediate needs? – Technophobe01 Aug 22 '18 at 01:56
  • @DanY In support of Thomas's comments, are you looking for an example as well as a pointer to the packages? Does the answer address your immediate needs? – Technophobe01 Aug 22 '18 at 02:23
  • It seems like the answer is "Yes, with the mlogit package" and more specifically "by setting the `alt.var` and `chid.var` arguments". If @Effa agrees and marks this answer as accepted, I'll award the bounty. (FWIW, I was aware of the mlogit package but had never come across "unbalanced" choice occasions and therefore had never looked into how to implement a choice model with them.) – DanY Aug 22 '18 at 19:56
  • @DanY Great - happy to help. Note: Effa hasn't logged in since July 26th, not sure if they are still active. – Technophobe01 Aug 22 '18 at 20:20
  • So, I want to award bounty to both answers here. This is my first bounty and my thus also my first time trying to award multiple answers. Give me a minute to figure this out... – DanY Aug 22 '18 at 20:35