I am new to reinforcement learning and I know the basic theory behind it. However, I could not map the problem to the existing frameworks. The problem is as follows:
Given an environment with resources: X, Y, and Z
Given a set of items I, each with (x, y, z, r), where x, y, and z are required resources for the item to serve, and r is the reward the agent receives if the item is served, (X, Y, Z) >> (x, y, z)
To select the items from the set to serve, I am using a cost function f = ax + by + cz, where a, b, and c are predefined constants.
The items are prioritized for selection based on the ratio r/f
Objective: select items to serve so that the total reward (sum of r for all selected items) is maximum considering x, y, and z for each item and resources X, Y, and Z
Problem: how to tune the values of a, b, and c, so that total reward is maximized?
Can you please suggest to me the following?
a) whether I can use reinforcement learning to tune the 'good' values of constants a, b, and c
b) If YES, how can I do that?
c) If NO, any suggestions for appropriate solution approaches?
Thank you.