How to limit certain actions from Vowpal Wabbit Contextual Bandit based on context

Question

I'm working on creating a contextual bandit for recommending actions to a user on our website. I want to limit certain actions from showing based on the users context.

For example, if a user has already signed up, I don't want it to recommend them "sign up"

model = pyvw.vw(f'--cb_explore_adf -q PA --quiet --epsilon {EPSILON}')

Here is an example of the input data:

shared |Page pageViewCount:6 videoViewCount:3 language:es user_nation:US page_section:news time_on_site:8.632452878867632 is_signed_up:0 is_subscribed:1 has_downloaded_app:1 favorites_last_updated:56.7385141116986 
|Action a=sign_up 
|Action a=subscribe_mktg_comms 
|Action a=recommend_content 
|Action a=favorites 
|Action a=download_app 
|Action a=do_nothing 
|Action a=survey

I keep reading to put the probability to 0 but a bit confused because for training, I see that we need to put action:reward:probability on the chosen arm, but I don't see where to put it in input data.

I already read to remove the actions but I'm not sure if this would affect the training data since the arm indexes would be different then.

score 0 · Answer 1 · answered May 08 '23 at 08:34

When you train you are passing a list of potential actions you want to train on so for each row (user/context) you can pass only the list of potential actions you want.

then on the inference side: you will pass only relevant actions

def get_action(vw, context, actions):
    vw_text_example = to_vw_example_format(context, actions)
    pmf = vw.predict(vw_text_example)
    chosen_action_index, prob = sample_custom_pmf(pmf)
    return actions[chosen_action_index], prob

I already read to remove the actions but I'm not sure if this would affect the training data since the arm indexes would be different then.

it shouldn't affect it if you train again and give the list of potential actions and the action that was chosen.

another thing you need to consider is how you format your data `shared |Page pageViewCount:6 videoViewCount:3 language:es user_nation:US page_section:news time_on_site:8.632452878867632 is_signed_up:0 is_subscribed:1 has_downloaded_app:1 favorites_last_updated:56.7385141116986 ` the numbers should be as : like pageView:5 but strings should be with = like country=US — Aviel, May 08 '23 at 08:37

How to limit certain actions from Vowpal Wabbit Contextual Bandit based on context

1 Answers1