0

Suppose I have a context vector x of length 5 which I sample randomly between 0 and 1. This I can code in python as

import numpy as np
x = np.random.uniform(0,1,5) 

First I want to model a reward function which is dependent on the context vector. Suppose the reward is either 0 or 1. What is the best way to model this in a simulation?

Next, let's say I have 100 different users and for each of them the way the reward function changes over the context is different. So I guess if I model the reward function as a Bernoulli distribution, I can give different mean values for different users. But I want to model it with respect to different contexts. I am not sure how to model that. What is the best way to model the reward in different contexts for a set of 100 users?

user77005
  • 1,769
  • 4
  • 18
  • 26
  • Your question is very unclear. How does the "context vector of length 5" enter into the picture? The slight amount of code you've shown generates a single ***continuous*** Uniform(0,1) outcome, not a Bernoulli. How is the reward for a set of 100 users determined, is it the sum of the rewards of the individual users? Last but not least, the [how-to-ask](https://stackoverflow.com/help/on-topic) guidance explicitly states that asking for outside resources is off-topic (see item #4). – pjs Feb 13 '19 at 14:49
  • I have updated the description. Does it add clarity to my question? I can add more details if needed. – user77005 Feb 19 '19 at 09:54
  • yes please do so - any code you have already should be posted. You say the reward should be between 0 or 1. But on what conditions? How must the context vector look like? Do you want the reward to be random? – ohlr Feb 19 '19 at 10:02

0 Answers0