How to appropriately predict/classify using smf.mnlogit.predict()?

Question

I'm curious if someone can help me understand an effective way of getting a single vector of predictions using .predict() for the multinomial logistic regression using statsmodels.formula.api.mnlogit()

import pandas as pd, statsmodels.formula.api as smf

df = pd.DataFrame({'age': np.random.choice([20, 30, 23, 40, 39, 27], size = 100, replace = True), 
                  'animal': np.random.choice(['dog', 'parrot', 'cat', 'turtle'], size = 100, 
                                             replace = True)})

mod = smf.mnlogit('animal ~ age', df).fit()

predictions = mod.predict(df['age'])
print(predictions)

This will output a matrix shape of 100 rows x 4 columns:.predict() outputs

I'm guessing this is the classification probability of the value to one of the four output variables from df.animals.unique()?

I'm guessing I'd create a function to assign each value to one of the four DVs using the highest probability score in the row from the .predict() outputs?

`predictions.argmax(1)` will give you the index of the choice with the highest predicted probability. or maybe `np.asarray(predictions).argmax(1)` — Josef, Sep 21 '20 at 03:12
Thank you @Josef - `np.asarray(predictions).argmax(1)` works well! — sunsets_in_august, Sep 21 '20 at 05:13
@Josef Do you happen to know how I can find out which dependent values correspond to each of the indices? For example, index 0 = 'dog', etc.? — sunsets_in_august, Sep 21 '20 at 05:21
That should be in `mod.model.data.ynames` but also in some of the returns, e.g. summary and `mod.params` DataFrame as index. — Josef, Sep 21 '20 at 12:09

How to appropriately predict/classify using smf.mnlogit.predict()?

0 Answers0