I'm working with the Cooperative push block environment ( https://github.com/Unity-Technologi...nvironment-Examples.md#cooperative-push-block) (exported in order to use the Python API) using the latest stable version. The issue is that I'm not getting the reward (positives or negatives). It is always 0. If I export the Single push block environment, I receive the rewards correctly. Below you have the code I'm using from the collab example https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API.md
decision_steps, terminal_steps = env.get_steps(behavior_name)
if tracked_agent in decision_steps:
episode_rewards += decision_steps[tracked_agent].reward
print('REWARD', decision_steps.reward) # Always 0
# Each decision_steps[tracked_agent].reward also returns 0
I should receive a negative penalty (-0.0001) or a positive signal +1, +2, +3 as per the docs. Even if they randomly push a block, I receive 0 as reward.
They say in the docs that the reward is given as a "Group reward". I don't know if that implies a change in the above code.