1

I'm doing a reinforcement learning project, and I'm trying to get a tensor that represents the expected reward of all the given actions. I have a long tensor of chosen actions of size batch with values of either zero or one (the two potential actions). I have a tensor of expected rewards for each action of size batch * action_size, and I want a tensor of size batch.

For example, if batch size was 4, then I have

action = tensor([1,0,0,1])
expectedReward = tensor([[3,7],[5,9],[-1,12],[0,1]])

and what I want is

rewardForActions = tensor([7,5,-1,1])

I thought this would answer my question, but it's not the same at all, because if I went with that solution, it would end up with a 4*4 tensor, selecting from each row 4 times, instead of once.

Any ideas?

  • The solution with torch.index_select() actually gives the vector I want, except it's along the diagonal of the output 4*4 tensor, instead of as a vector. – Joseph Summerhays Mar 29 '20 at 02:02

1 Answers1

2

You could do

rewardForActions = expectedReward.index_select(1, action).diagonal()  
# tensor([ 7,  5, -1,  1])                                                                                                                                                                                                            
Steven Fontanella
  • 764
  • 1
  • 4
  • 16