0

I'm training a deep q network to trade stocks; it has two possible actions; 0 : wait, 1 : buy stock if one isn't bought, sell one if one is bought. It gets, as input, the value of the stock it bought, the current value of the stock and the values of the stock for the previous 5 time steps relative to it. So something like

[5.78, 5.93, -0.1, -0.2, -0.4, -0.5, -0.3]

The reward is simply the difference between the price of the sale and the price of the purchase. The reward for any other action is 0, though I've tried having it be negative or something else without results.

simple, right? Unfortunately, the agent always converges on taking the "0" action. Even when I magnify the reward for selling at a profit or any number of things. I'm really pulling my hair out, is there something obvious I've missed?

RichKat
  • 57
  • 1
  • 8
  • 1
    Could you please include your code in your question? It is difficult to answer in the abstract. – mozart_kv467 May 24 '20 at 15:27
  • What is the precise definition or your reward function? Also adding some code could help. – a_guest May 24 '20 at 15:33
  • https://www.youtube.com/watch?v=6DGNZnfKYnU – Samwise May 24 '20 at 15:57
  • Added reward function definition. I could add code, but what pieces of code? There's a good few hundred lines of "relevant" code, I don't want to just copy-paste the whole thing and ask you to figure it out. – RichKat May 24 '20 at 16:18

1 Answers1

0

Although something was probably broken with the agent itself, the second agent I wrote exhibited similar behavior. I finally solved the issue by decreasing the learning rate; in the end it had to be about a thousand times lower than it was

RichKat
  • 57
  • 1
  • 8