1

I am working on contextual bandits in tf_Agents and using the linearUCB agent and leanr thompson sampling agent.

I can get the actions, but not sure how to get the distributions (over actions) out of the agents for a given timestep.

I know linearUCB is deterministic and hence no distribution, but couldn't get the distribution from thompson sampling even with linearthompsonsamplingagent.policy.distribution(timestep). It says distribution are deterministic and the log_probability is blank. Can someone please explain how to get distributions out of it.

tjt
  • 620
  • 2
  • 7
  • 17
  • We're also facing a similar problem using the tf-agents library for Lin-UCB. Just checking if you were able to find any workaround for this. Did any other policy help with providing a distribution? – sreeraag Nov 22 '22 at 12:40

0 Answers0