Discrete Choice Analysis with Python. Generally, there are two formats for representing regression data:
- long format
- wide format
Long format features a row for each potential option, plus a Y column with either 0 or 1 based on the choice. Wide format has only one line per person (survey respondent), and the Y comprehends all the features that are selected and the X comprehends all the product alternatives.
Example Long
person answer Y ~ x1 x2
1 1 0 green large
1 1 1 red large
1 2 1 green small
...
Example Wide
y1 y2 ~ x11 x12 x21 x22
green large green large red large
green small green small red small
...
- Is my description correct?
- does statsmodel mlogit use the wide format here described?