I'm doing a reinforcement learning project, and I'm trying to get a tensor that represents the expected reward of all the given actions. I have a long tensor of chosen actions of size batch
with values of either zero or one (the two potential actions). I have a tensor of expected rewards for each action of size batch * action_size
, and I want a tensor of size batch
.
For example, if batch size was 4, then I have
action = tensor([1,0,0,1])
expectedReward = tensor([[3,7],[5,9],[-1,12],[0,1]])
and what I want is
rewardForActions = tensor([7,5,-1,1])
I thought this would answer my question, but it's not the same at all, because if I went with that solution, it would end up with a 4*4 tensor, selecting from each row 4 times, instead of once.
Any ideas?