I want to model car purchase choice using nested logit approach. The data that I used currently is hypothetical since I want to make myself sure how to handle them before doing the actual questionnaire.
The data contains 1,000 hypothetical choice observation on 568 car models that are nested across 35 manufacturers.
I have read the mlogit
package vignette about the Data management, model description and testing and found out about the wide and long data format. Both the two formats example show that all choice set is stored in the dataset like this example of long type dataset:
case alt choice dist cost ivt ovt freq income urban noalt
1 1 train 0 83 28.25 50 66 4 45 0 2
2 1 car 1 83 15.77 61 0 0 45 0 2
3 2 train 0 83 28.25 50 66 4 25 0 2
4 2 car 1 83 15.77 61 0 0 25 0 2
5 3 train 0 83 28.25 50 66 4 70 0 2
6 3 car 1 83 15.77 61 0 0 70 0 2
The choice is indicated by the choice
column.
On the other hand, mine looks like this:
case brand model length <other_var>
1 1 Mazda CX-30 GT 4395 ...
2 2 Mercedes~ GLS-Class~ 5130 ...
3 3 Maserati Ghibli S ~ 4971 ...
...
I think my dataset above neither is long type or wide as I don't present each people choice set as rows and put choice
variable as choice indicator.
My question is:
- Do I really need to re-format my data into long or wide in order to be able to estimate the variables?
- If yes, how do I do that? I imagine if I choose long data format, for example, I would have 568 row for one person so in total I would have 1,000 * 568 rows.
Thank you so much.