I am using the 3DBall example environment, but I am getting some really weird results that I don't understand why they are happening. My code so far is just a for range loop that views the reward and fills in the inputs needed with random values. However when I was doing it, never a negative reward was shown, and randomly there would be no decision steps, which would make sense, but shouldn't it just keep on simulating until there is a decision step? Any help would be greatly appreciated as other then the documentation there are little to no recourses out there for this.
env = UnityEnvironment()
env.reset()
behavior_names = env.behavior_specs
for i in range(50):
arr = []
behavior_names = env.behavior_specs
for i in behavior_names:
print(i)
DecisionSteps = env.get_steps("3DBall?team=0")
print(DecisionSteps[0].reward,len(DecisionSteps[0].reward))
print(DecisionSteps[0].action_mask) #for some reason it returns action mask as false when Decisionsteps[0].reward is empty and is None when not
for i in range(len(DecisionSteps[0])):
arr.append([])
for b in range(2):
arr[-1].append(random.uniform(-10,10))
if(len(DecisionSteps[0])!= 0):
env.set_actions("3DBall?team=0",numpy.array(arr))
env.step()
else:
env.step()
env.close()