I'm curious if someone can help me understand an effective way of getting a single vector of predictions using .predict() for the multinomial logistic regression using statsmodels.formula.api.mnlogit()
import pandas as pd, statsmodels.formula.api as smf
df = pd.DataFrame({'age': np.random.choice([20, 30, 23, 40, 39, 27], size = 100, replace = True),
'animal': np.random.choice(['dog', 'parrot', 'cat', 'turtle'], size = 100,
replace = True)})
mod = smf.mnlogit('animal ~ age', df).fit()
predictions = mod.predict(df['age'])
print(predictions)
This will output a matrix shape of 100 rows x 4 columns:.predict() outputs
I'm guessing this is the classification probability of the value to one of the four output variables from df.animals.unique()
?
I'm guessing I'd create a function to assign each value to one of the four DVs using the highest probability score in the row from the .predict()
outputs?