I am currently working on a behavior modelling project that involves estimating a multinomial logit model. After searching over the internet I came across the mnlogit package which seems very suitable for me.
The problem I am trying to model can be described as follows: A customer is offered 5 products from which he is to pick 1 or decide to not pick any. These products differ by price and delivery time. The prices and delivery time for these products are fixed across all customers. So, a customer can pick from 6 alternatives, 1, 2, 3, 4, 5 and 0. Alternative 1 represents product 1, while alternative 0 represents the option of not picking any product. Products 1 and 2 cost $1, products 3 and 4 cost $2, and product 5 also costs $1. Alternative 0, on the other hand, costs 0.
In order to simulate customer's decision I self-generated 7 parameters. I defined 'Price' as an alternative independent variable, meaning that all alternatives' price will have the same weight on the products utility. Besides, I defined 'Alternative' as an alternative specific variable, what yields to another 6 parameters. My goal was to simulate the attractiveness of a product due to its delivery time, since each alternative has a fixed delivery time. I calculated the utility of a product using the following expression:
product_utility = (B_alternative[ alternativeNum ] * alternativeNum) + (B_price * productPrice)
Where B_alternative is a vector of my alternatives parameters: [0, 0.6, 0.5, 0.45, 0.3, 0.3], with each index of this vector representing one alternative number (B_alternative[0] : parameter for alternative 0); And B_price is my price parameter: -0.5.
So, the utility I calculated for each product is : 0.00 ; 0.10 ; 0.50 ; 0.35 ; 0.20 ; 1.00 , being the first number the utility for alternative 0 and the last for product 5.
After calculating these utilities, I calculated the probability of a customer choosing the nth-product with the following expression:
Pn = exp(Un) / sum(exp(U))
Where 'sum(U)' is the sum of all utilities
And the probabilities (which adds up to 1) calculated were: 0.1097376 ; 0.1212788 ; 0.1809268 ; 0.1557251 ; 0.1340338 ; 0.2982978 , for each respective product from 0 to 5.
Using these probabilities and a random function, I generated a 'Mode' column in my table, representing the customer choice:
Finally, following the documentation I found on CRAN, I made this code to estimate the model:
artificialData <- read.csv(PathToData, sep = ";")
# define model description (formula)
fm <- formula(MODE ~ PRICE - 1 | 1 | ALT)
# Define a mlogit data
TestData <- mlogit::mlogit.data(artificialData,
choice = "MODE", shape = "long",
alt.levels = c(1,2,3,4,5,0),
id.var = "CUSTOMER_ID")
# Estimate mnl
fit <- mnlogit::mnlogit(fm, TestData)
print(summary(fit))
However, no matter what parameters I set, I always get these two errors messages:
Error in solve.default(hessian, gradient, tol = 1e-24) : Lapack routine dgesv: system is exactly singular: U[7,7] = 0
or
In sqrt(diag(vcov(object))) : NaNs produced