I’m a newbie in DQN and try to understand its coding. I am trying the code below as epsilon greedy action selection but I am not sure how it works
if sample > eps_threshold:
with torch.no_grad():
# t.max(1) will return largest column value of each row.
# second column on max result is index of where max element was
# found, so we pick action with the larger expected reward.
return policy_net(state).max(1)[1].view(1, 1)
else:
return torch.tensor([[random.randrange(n_actions)]], device=device, dtype=torch.long)
Could you please let me know what are indices in max(1)[1] and what is view(1, 1) and it’s indices. Also why “with torch.no_grad():” has been used