I would like to use a multinomial logistic regression to get win probabilities for each of the 5 horses that participate in any given race using each horses previous average speed.
RACE_ID H1_SPEED H2_SPEED H3_SPEED H4_SPEED H5_SPEED WINNING_HORSE
1 40.482081 44.199627 42.034929 39.004813 43.830139 5
2 39.482081 42.199627 41.034929 41.004813 40.830139 4
I am stuck on how to handle the independent variables for each horse given that any of the 5 horses average speed can be placed in any of H1_SPEED
through H5_SPEED
.
Given the fact that for each race I can put any of the 5 horses under H1_SPEED
meaning there is no real relationship between H1_SPEED
from RACE_ID 1
and H1_SPEED
from RACE_ID 2
other than the arbitrary position I selected.
Would there be any difference if the dataset looked like this -
- For
RACE_ID 1
I swappedH3_SPEED
andH5_SPEED
and changedWINNING_HORSE
from5
to3
- For
RACE_ID 2
I swappedH4_SPEED
andH1_SPEED
and changedWINNING_HORSE
from4
to1
RACE_ID H1_SPEED H2_SPEED H3_SPEED H4_SPEED H5_SPEED WINNING_HORSE
1 40.482081 44.199627 43.830139 39.004813 42.034929 3
2 41.004813 42.199627 41.034929 39.482081 40.830139 1
Is this an issue, if so how should this be handled? What if I wanted to add more independent features per horse?