1

I am trying to use tf_agent library with a custom environment to build an agent that does the following:

  • The agent serves a number of users (10 users) by giving them resources (12 resources).

  • It receives an input (observation / state) that is a 2D matrix of 0s and 1s; let's say it has 10 rows (number of users) and 3 columns. If a column of a certain row has a value of 1, it means that a resource must be allocated to the user at that row.

Example of observation (state) matrix:

[
[0, 0, 0],
[0, 1, 0],
[1, 0, 0],
...
]
  • Which means (for example) that user at row[1] needs a resource. (user 1)

  • The output is a 2D matrix of 0s and 1s as well, with 12 rows (number of resources) and 10 columns (number of users).

  • If the resource 0 is allocated to, say, user 1, then the first row in the output matrix will have the second column with the value 1. row[0][1]

  • Each resource can be allocated to one user only at a time, and each user can be given one resource only at a time. (which means that each row has only one column with the value of 1 and each column across the entirety of the matrix can have only one row with the value of one | example: cannot have, say, first column of the first row and first column of the second row with the value of 1 both, one of them must be 1 and the others must all be 0).

Example of the action matrix:

[
[0, 1, 0, 0, ...],
[0, 0, 0, 0, ...],
[1, 0, 0, 0, ...],
[0, 0, 0, 1, ...],
...
]
  • In the example above, the first row has 1 in the second column (row[0][1]), which means that the second user, i.e user 1, was allocated the resource 0, and if we verify the input matrix, we see indeed that user 1 (row[1]) has asked for a resource to be allocated.

  • If a user didn't ask for a resource, then that user shouldn't be given any resources.

I have no idea how to initialize the observation_spec and action_spec of the tf_agent for this custom environment.

And I don't know for sure if tf_agent is the best approach.

I am new to Deep Learning and building the neural network for this custom environment from scratch is very confusing.

Ness
  • 158
  • 1
  • 12

0 Answers0