Decisions with Mlagent in Unity

Question

I am learning to use Unity and Mlagents. I prepared a training environment in which I have an agent who must recycle objects by type of material (metal, plastic, etc.).

For example: When the agent collides a plastic bottle and detects that it has a plastic Tag, the object physically moves to a space that is for plastic and so does the metal (and receives +1 reward), if it collides with a human ( Human Tag) receives a punishment of -2.

So far everything works well, the agent when trained (reinforced learning) he correctly fulfills his purpose.

The problem is that internally the code only consists of lines like: if the agent collides with a human he is punished (-2) if the agent collides with a wall he is punished (-1) if the agent collides with a plastic object, the object moves to a specific space for plastic (+1) if the agent collides with a metal object, the object moves to a specific space for metal (+1)

The question: How could you give the agent a "chance" to make a mistake in misclassifying the object and to learn from it later? Currently it simply acts following the "if"

I thought that maybe using some random parameter that when the agent collides with an object, it randomly assigns it to a recycle point and if this led to an incorrect recycle point it will be punished, but ... Can the agent learn from that ? that would mean that the agent would have "control" over the random parameter during his learning and training.

I would appreciate any kind of idea or suggestion.

This question is about [tag:c#], not [tag:unityscript]. – Ruzihm Jan 26 '21 at 21:38 — Ruzihm, Jan 26 '21 at 21:38

score 0 · Answer 1 · answered Jun 12 '20 at 19:34

0

Having a well trained agent is what results in it basically just following if statements. A possible solution would be to cut off its training early, and use that model. That way it won't be perfect, but it'll have a general idea of what to do.

answered Jun 12 '20 at 19:34

Rocket_noob

1

Decisions with Mlagent in Unity

1 Answers1