I'm working on creating a contextual bandit for recommending actions to a user on our website. I want to limit certain actions from showing based on the users context.
For example, if a user has already signed up, I don't want it to recommend them "sign up"
model = pyvw.vw(f'--cb_explore_adf -q PA --quiet --epsilon {EPSILON}')
Here is an example of the input data:
shared |Page pageViewCount:6 videoViewCount:3 language:es user_nation:US page_section:news time_on_site:8.632452878867632 is_signed_up:0 is_subscribed:1 has_downloaded_app:1 favorites_last_updated:56.7385141116986
|Action a=sign_up
|Action a=subscribe_mktg_comms
|Action a=recommend_content
|Action a=favorites
|Action a=download_app
|Action a=do_nothing
|Action a=survey
I keep reading to put the probability to 0 but a bit confused because for training, I see that we need to put action:reward:probability on the chosen arm, but I don't see where to put it in input data.
I already read to remove the actions but I'm not sure if this would affect the training data since the arm indexes would be different then.