how to get the exact observations analyzed in multinom() in R, or how to make fitted() produce a fit on entire data including NAs

Question

My question is similar to another older question "How to get the number of observations included in a model created using the function multinom in R?" but what I want to look for are the exact observations, not the number of observations, analyzed in the model. Ultimately I want to have the original dataset combined with a new column of the predicted (fitted) probabilities. But let me use an example to illustrate my problem:

If my sample is 1000, some variables have NA values,and I fit a multinom() in R, and use fitted(), then find the length of fitted() is only 870, which means 130 obs are excluded when the model is estimated. Now, the fitted() only generates a 870*1 (i.e. one column) of numbers (probabilities), there is no way for me to know which observation does each probability number corresponds to. I think there're two ways to solve this:

Find out which observations are excluded and delete them in the raw data before estimating the model.
Try to let the fitted() produce a 1000*1 matrix with 130 elements being NA.

I don't know the answer to either one. Any advice would be appreciated. The ultimate goal is to be able to append the fitted probabilities to the original dataset (as a new column) so I can draw inferences. Thanks.

Neal Fultz · Accepted Answer · 2016-01-13T18:27:44.693

from ?multinom in nnet:

   model: logical. If true, the model frame is saved as component
          'model' of the returned object.

so call multinom(..., model=TRUE), and the model frame will be in the result.

EDIT:

Following the example in ?multinom:

options(contrasts = c("contr.treatment", "contr.poly"))
library(MASS)
example(birthwt)
bwt.mu <- multinom(low ~ ., bwt, model=TRUE)

Viewing the model frame inside the object:

>     str(bwt.mu$model)
'data.frame':   189 obs. of  9 variables:
 $ low  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
 $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
 $ race : Factor w/ 3 levels "white","black",..: 2 3 1 1 1 3 1 3 1 1 ...
 $ smoke: logi  FALSE FALSE TRUE TRUE TRUE FALSE ...
 $ ptd  : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
 $ ht   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ ui   : logi  TRUE FALSE FALSE TRUE TRUE FALSE ...
 $ ftv  : Factor w/ 3 levels "0","1","2+": 1 3 2 3 1 1 2 2 2 1 ...
 - attr(*, "terms")=Classes 'terms', 'formula' length 3 low ~ age + lwt + race + smoke + ptd + ht + ui + ftv
  .. ..- attr(*, "variables")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
  .. ..- attr(*, "factors")= int [1:9, 1:8] 0 1 0 0 0 0 0 0 0 0 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:9] "low" "age" "lwt" "race" ...
  .. .. .. ..$ : chr [1:8] "age" "lwt" "race" "smoke" ...
  .. ..- attr(*, "term.labels")= chr [1:8] "age" "lwt" "race" "smoke" ...
  .. ..- attr(*, "order")= int [1:8] 1 1 1 1 1 1 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(low, age, lwt, race, smoke, ptd, ht, ui, ftv)
  .. ..- attr(*, "dataClasses")= Named chr [1:9] "factor" "numeric" "numeric" "factor" ...
  .. .. ..- attr(*, "names")= chr [1:9] "low" "age" "lwt" "race" ...

does that mean the entire data (e.g. 870 observations in the above example) is going to be included in the results? If so, how can I extract these observations from the results? I think the second question is what I was asking. Should I just use summary(multinom())$model? Thanks. — Jenny Shu, Jan 13 '16 at 15:32
The model frame contains only the rows and columns actually used in fitting the model. See my edit above for an example of what one looks like. — Neal Fultz, Jan 13 '16 at 18:28
Thanks. If I want to use this model to predict on new data, say if I use predict(bwt.mu, newdata = data2, type = 'probs" — Jenny Shu, Jan 14 '16 at 00:04

how to get the exact observations analyzed in multinom() in R, or how to make fitted() produce a fit on entire data including NAs

1 Answers1