can we get 'good' values of predefined constants in a cost function using reinforcement learning?

Question

I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows:

Given an environment with resources: X, Y, and Z
Given a set of items I, each with (x, y, z, r), where x, y, and z are required resources for the item to serve, and r is the reward the agent receives if the item is served, (X, Y, Z) >> (x, y, z)
To select the items from the set to serve, I am using a cost function f = ax + by + cz, where a, b, and c are predefined constants.
The items are prioritized for selection based on the ratio r/f
Objective: select items to serve so that the total reward (sum of r for all selected items) is maximum considering x, y, and z for each item and resources X, Y, and Z
Problem: how to tune the values of a, b, and c, so that total reward is maximized?

Can you please suggest to me the following?

a) whether I can use reinforcement learning to tune the 'good' values of constants a, b, and c

b) If YES, how can I do that?

c) If NO, any suggestions for appropriate solution approaches?

Thank you.

score 0 · Accepted Answer · answered Aug 20 '21 at 18:48

0

What you're looking to do is a hyperparameter sweep, not a RL problem. This is at least how I interpret your post.

To do a sweep you have a few possibilities: Grid Search, Random Search or advanced search methods suchs as Asynchronous Successive Halving Algorithm (ASHA). Grid search is worse at finding an optimum than random search and ASHA is more resource efficient than random search.

To do a efficient sweep I suggest Ray Tune. There is a rally great example on how to use Tune in the PyTorch Documentation. It includes using ASHA as a simple, imported object instead of implementing a distributed sweep yourself.

https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html

answered Aug 20 '21 at 18:48

tnfru

296
1
10

Yes, it worked for me. Thanks! I was wondering if I could use another criterion as follows: **0 < a, b, c < 1** and **a+b+c = 1**. I am using the following search option in RayTune: `config = { "a": tune.uniform(0.001, 1.0), "b": tune.uniform(0.001, 1.0), "c": tune.uniform(0.001, 1.0) }` – Samaresh Bera Aug 28 '21 at 07:31
@SamareshBera If it works please mark the answer as the accepted answer. I don't know how to make that constraint work in Ray so you should create a new question for it. – tnfru Aug 28 '21 at 12:25

can we get 'good' values of predefined constants in a cost function using reinforcement learning?

1 Answers1