1

I am trying to run a multinomial regression on my dataset to see the impact of Channel and Touchpoint on Choice with Price and Device as controls, but unfortunately receive an error mesage.

The first few lines of my data.frame after already running mlogit.data on it look like this:

ORDER_ID    PRODUCT_ID    DEVICE    PRICE    TOUCHPOINT    CHANNEL    1_or_2    CHOICE    chid    alt
123         566           laptop    99       paid          offline    1         TRUE      33      1
123         566           laptop    99       paid          offline    1         FALSE     33      2
123         534           phone     56       paid          offline    2         FALSE     45      1
123         534           phone     56       paid          offline    2         TRUE      45      2
124         876           laptop    85       unpaid        online     1         TRUE      111     1
124         876           laptop    85       unpaid        online     1         FALSE     111     2

The code I am trying to run is:

Choice_mlg <- mlogit(Choice_A_or_B ~ 1 | Channel + Touchpoint + Price + Device, 
                                    data = ml_choice_1, reflevel = 1, na.action = na.exclude)

What I then receive is the following error message:

Error in solve.default(H, g[!fixed]) : Lapack routine dgesv: system is exactly singular: U[6,6] = 0

Could anyone help on what I am doing wrong here?

Thank you and best

steffiabc
  • 11
  • 1
  • 2

1 Answers1

2

The error means that Hessian matrix is singular, i.e. the determinant is equal to zero. Effectively, you cannot obtain the variance-matrix, which is equal to the negative inverse of the Hessian.

Looking at your model and your data, there might be a couple of things causing it. You have not provided a MWE, so I can only go off the information that you have provided.

  1. Look at your alt variable, it varies between 1 and 2, but both alternatives are identical. This means that there is absolutely no variation between alternatives to explain choice. This may lead to a computationally singular Hessian (another reason would be very strong correlations between alternatives).
  2. Your choice variable is called Choice_A_or_B. It is not part of the data, so hard to see, but in long format it should take the values TRUE/FALSE. It should be TRUE for the chosen alternative adn FALSE for all non-chosen alternatives in each choice occasion. Look at your CHOICE variable, which looks to be the correct one to use here.
  3. Looking at the ORDER_ID variable, it is the same for two chid. Does that mean that the same customer bought two items?
  4. Is there a reason why you have specified Channel, Touchpoint, Price and Device to be alternative specific? This does not matter for the example above (see point 1), but should be carefully considered in your final model.

In general, when you set up your data (in long format) you would want to have one line per alternative with indices for individual, choice occasion and alternative.

edsandorf
  • 757
  • 7
  • 17