0

there are some linked questions but I really can not make any sense out of it. I am new to statistics, R, the mlogit package and also to stockoverflow. I will try to ask my question as precisely as possible. Here is [a link to the data ].(https://docs.google.com/spreadsheets/d/1IvN6ZgCgDERu3Mn4AglZMjicoXnFQQHc9GhAhbrpFRI/edit?usp=sharing) I have a data set from a discrete choice experiment with a dependent variable "choice" with two levels (yes/no) and 4 independent variables with each 3 levels.

I try to estimate with mlogit but I have some real problems and my supervisor is not able to help. In my dataset the values for each variable are either 1,2,3, (1 for brand 1, 2 for brand 2, etc...)

    t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv")
t1 <- mlogit.data(data=t1, choice="choice",shape="long",alt.levels=paste("pos",1:4),id.var="id")

To run the estimation I use the following function:

m1 <- mlogit(choice~ 0 + Brand+ Features+ Valence+ Volume, data=t1)
summary(m1)

and got this outcome: model 1 estimates and noticed that Rstudio interpreted my data set variables as integer. As the variables are 3 different brands, 3 different features and 3 different categories of valenve and volume (low, med and high), I would like to include the estimates of the levels. I therefore tired to upload them into Rstudio and specified them as characters using this function

library(readr)
t1 <- read_csv("~/Dokumente/UvA/Thesis/R/t1.csv", 
col_types = cols(Brand = col_character(), 
    Features = col_character(), Valence = col_character(), 
    Volume = col_character()))

If I run the same mlogit function now, I get an error:

Error in solve.default(H, g[!fixed]) : system is computationally singular: reciprocal condition number = 3.11303e-18

When I use characters for the different levels (e.g. brand names instead of 1,2,3 see data sheet 2"t2") I have the same singularity problem. a) Does the outcome make any sense if I use the numbers in the first data set? b) how can I integrate my values as characters to estimate the attribute levels?

I hope someone can help me because I am really confused and new to all of this. I am most certainly making an very basic or stupid mistake.

Cheers

Konrad
  • 1

1 Answers1

0

There are several issues. The first issue is that you have one choice value labeled as "10", but you say it should have only two levels.

library(readxl)
library(dplyr)

t1 <- read_excel("~/Downloads/Data mlogit.xlsx", sheet=1) %>% as.data.frame
t1$choice %>% table

   0    1   10 
2770  925    1 

Assuming that it's just mislabled, you should also not be running a multinomial logit, which only applies if you have more than two levels. Instead, you should be running a standard logistic or similar. Example:

# Correct mislabeled sample
t1$choice[t1$choice == 10] <- 1

# Make everything factors
for(i in 1:ncol(t1)) {
  t1[[i]] <- factor(t1[[i]])
}

# Run logistic
library(glmnet)

y <- t1$choice
t1d <- dplyr::select(t1, Brand, Features, Valence, Volume)
t1d <- model.matrix( ~ .-1, t1d)
fit <- glmnet(t1d,y, family="binomial", intercept=F, lambda = 0, alpha=0)
coefficients(fit)

(Intercept)  .        
Brand0      -2.0328103
Brand1      -0.4518273
Brand2      -1.4383109
Brand3      -1.4903840
Features1   -0.5857877
Features2    0.2900501
Features3    0.2717443
Valence1     1.4788752
Valence2    -0.1585652
Valence3    -1.9390001
Volume1     -0.6920187
Volume2     -0.1013821
Volume3      0.7010679

There are lots of ways to run logistic regression in R, I tend to use the glmnet package.

thc
  • 9,527
  • 1
  • 24
  • 39
  • I'm following a similar paper which uses a multinominal logit for something similar and Chapman and McDonnell Feit (2015) (R for Marketing Research and Analytics) also say that linear models can not be used for a choice-based conjoint experiment. The mlogit package is also explicitly designed to analyse choice based scenarios and all the examples have a two level dependent variable, [link] (https://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.pdf) – Konrad May 25 '18 at 08:33
  • The mlogit model IS a generalized linear model. It's a generalization of the "logit" model, also known as logistic regression. Mlogit -- multiple outcomes; Logit -- binary outcomes. If you were to use an mlogit model on a binary outcome, it would not be accepted for publication. – thc May 25 '18 at 20:20
  • The very first example in the vignette has 3 levels, not 2. – thc May 25 '18 at 20:22