Does Q-Learning apply here?

Question

Let's say that we have an algorithm that given a dataset point, it runs some analysis on it and returns the results. The algorithm has a user-defined parameter X that affects the run-time of the algorithm (result of the algorithm is always constant for the same input point). Also, we already know that there's a relation between dataset point and the parameter X. For instance, if two dataset points are close to each other, their parameter X will also be the same.

Can we say that in this example we have the following and thus can use Q-Learning to find the best parameter X given any dataset point?

Initial state: dataset point, current value of X (for initial state = 0)
Terminal state: dataset point, current value of X (the value chosen based on action)
Actions: Different values that X can have
Reward: -1 if execution time decreases, +1 if it increases, 0 if it stays the same

Is it correct if we define different input dataset points as episodes and different values of X as the steps in each episode (where in each step, an action is chosen either randomly or via the network)? In this case, what would be the input to the neural network?

Since all of the examples and implementations I've seen so far are containing several states where each state is dependent on the previous one, I'm confused with my scenario where I only have two states.

I'm not sure I fully understand but if you only have two things that effect the outcome, why can't you just do a normal regression with two independent invariables? Q learning would usually involve dynamics through multiple states to get a trajectory of actions. The state is what goes into the neural network but if you only have two numbers a neural network would be pointless. Maybe you could post the algorithm so we can understand the problem better? — MarcusRenshaw, May 04 '20 at 21:11
@MarcusRenshaw Thanks! My scenario is very similar to papers using RL to do knob setting in databases (For example: [this paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod19-cdbtune.pdf) and [this paper](https://arxiv.org/pdf/1801.05643.pdf)). What they're doing is to tune the database parameters for a given query. In my case, I only have one parameter (X) and to use supervised learning, I have to run the algorithm many times to find the best X for a given point. — Omid, May 04 '20 at 22:34
Also, the points are not in 1D space but have more dimensions. So, we can say that my input is a point in for example a 10 dimensional space (10 integers showing the location of the point) and the goal is to tune parameter X which is directly affected by the location of the points. And if for a point A, the value 12 for X gives the shortest execution time and we assume that point B is near point A (Euclidean distance), then the optimum value of X for point B will also be equal to 12. — Omid, May 04 '20 at 22:36

Does Q-Learning apply here?

0 Answers0