I am trying to create/validate a predictive model using a fictitious dataset, using Phyton with sklearn, following this tutorial.
The dataset contains information about baseball pitcher throws, and these are the most important fields:
- Result (whether the player was successful/unsuccessful in throwing a strike)
- Direction (whether it was a High, Medium, or Low throw)
- Other fields like speed of ball, player stats, etc.
Based on the different fields, the model will attempt to predict what direction (the Direction field) a pitcher should throw in order to get a strike.
In the tutorial I am following (the link above,) this is an example of a call to the function that generates the model, in this case for logistic regression (but we could use any of the other classification techniques listed):
outcome_var = 'Direction'
model = LogisticRegression()
predictor_var = ['Result', <insert other fields here>]
classification_model(model, df,predictor_var,outcome_var)
How can I tell the model about the negative impact (and importance) of the Result field?
Basically, if Result is "Successful", it should train the model to choose the same Direction (High/Medium/Low) if faced with the same scenario. However, if Result is "Unsuccessful", it should train the model to choose a different Direction from the one in the sample because it was not a good choice (regardless of the other fields.)
How can I tell the model how to use the Result field in order to make decisions? I can include any more details (or code) if needed. Thanks!