1

I am trying to run a multinomial logistic regression with the function mlogit from the package of the same name. I would like to ignore NA values whose presence is always dependent on the value of other predictor (i. e. their presence is not meaningful, as it's a consequence of the structure of the data frame).

Suppose that I want to model the relationship between academic level, the device used for reading, the website from which books were downloaded and several other values. My data frame would look like this:

 > ReadingChoice <- read.table("C:\\Users\\34648\\Documents\\ReadingChoice.txt", sep="\t", header=TRUE)
 > head(ReadingChoice)
    Device TimeforReading AcademicLevel DownloadedFrom
1    Ebook          Night          High         Amazon
2    Ebook        Morning          High        Genesis
3     Book          Night        Medium           <NA>
4 Computer           Noon           Low        Genesis
5     Book        Morning           Low           <NA>

NAs in the predictor DownloadedFrom always depend on the predictor Device. When I run the function, I obtain this:

> ReadingChoice <- mlogit.data(ReadingChoice, shape="wide", choice="TimeforReading")
> LogisticRegression <- mlogit(TimeforReading ~ 1 | Device + AcademicLevel + DownloadedFrom, data=MadridR, reflevel=1)
Error in solve.default(H, g[!fixed]) : 
  Lapack routine dgesv: system is exactly singular: U[7,7] = 0

How should I run the mlogit function? The aim is to make the algorithm skip those values but keeping the rows containing them. The arguments na.action or na.omit do not improve things. A similar question was posted about a year ago, but the one here is slightly different.

Thank you in advance!

  • Can you post some data for reproducibility? At first glance I would impute those NA values to something like "paper" and change the "DownloadedFrom" column to something more general like "TextSource." The fact that a text is a paper book might make a difference in your analysis. To reproduce the error you're getting we need some data to reproduce it - the error relates to the underlying linear algebra that mlogit uses to run regression, so it seems kind of strange in this context. – bstrain May 18 '20 at 19:03
  • I concur. It's not really possible to ignore certain cells when it comes to any kind of multivariate modelling problem, which is why imputation exists. The best thing you can do here is probably just put "Not downloaded" or something as a specific value. – MattB May 18 '20 at 21:12
  • @bstrain, I admire your efficiency! Well, the data frame is imaginary: I made it up just to illustrate my point. The idea is that I'd like to make the function ignore those values but take into account the rest of the data in that row, only if it's possible.To put it differently, there are two posible values for a predictor but only one of them (let's say A) can be analysed with regard to all of them: the other (i. e. B) will always have NAs. Is there a way to do it, or is it better to change the data frame? – Jorge Agulló May 18 '20 at 21:30
  • Thanks a lot for answering, @MattB! I've checked that a usual solution in cases like these is, precisely, what you've posted: to add a new value to that predictor and thus avoid NAs. I ignore, nevertheless, what are the consequences of that decision for the results. Wouldn't it be a little messier than, for example, analysing each type in different data frames? – Jorge Agulló May 18 '20 at 21:36
  • 1
    Yes, and that depends entirely on your use case. If it works for you to analyse books and ebooks completely separately then that can certainly be neater. Sometimes though that won't be desirable, and then you're stuck with some kind of imputation. – MattB May 18 '20 at 21:54

0 Answers0