0

I have programmed a reinforcement model with a DQN approach that is supposed to make purchase decisions based on stock prices.

For the training I use two stock prices. One has an upward trend and one has a downward trend. The time period for both is 1 year (100,000 data points).

As observation I use the price data of the last 1000 data points.

For the training I first collect 100 episodes (one episode is one run of the complete stock price, where the stock price (upward/downward trend is chosen randomly). Per episode I get about 1000 actions (buy, sell, skip).

Then the training takes place with a batch size of 64.

The problem is that the model specializes on one of the stock prices and generates a good reward there. For the other stock price, however, it is very bad and I get a negative reward.

It seems that the model does not try to optimize the average profit over all episodes (upward/downward trend).

As a reward I simply take the money I make per trade in profit or loss. As descout I have set 1.0.

Does anyone have an idea what the problem could be.

masterkey
  • 65
  • 4

0 Answers0