Suppose I have a context vector x of length 5 which I sample randomly between 0 and 1. This I can code in python as
import numpy as np
x = np.random.uniform(0,1,5)
First I want to model a reward function which is dependent on the context vector. Suppose the reward is either 0
or 1
. What is the best way to model this in a simulation?
Next, let's say I have 100 different users and for each of them the way the reward function changes over the context is different. So I guess if I model the reward function as a Bernoulli distribution, I can give different mean values for different users. But I want to model it with respect to different contexts. I am not sure how to model that. What is the best way to model the reward in different contexts for a set of 100 users?