1

I am developing simple game program to show q-learning with linear function approximation. screen shot

In this game, there are uncountable state. I have to consider many factors like player's position, speed, and enemy's position (there are 12 ~ 15 enemy objects). I ended up changing my algorithm from using table to use linear function approximation.

I decided around 20 ~ 22 features.(constant, player position, player speed, all of enemies position). and there is

After implementing that algorithm, I got stuck in some problem.

Weight value is overflowed in a few second after running my program. I found that I didn't normalize features and weight.

It was easy to normalize feature value because each feature has their bound . However, It wasn't enough to normalize only feature value. It still end up overflow.

My problem is how do I normalize my weights.

Below is my code to implement to normalize features.

//f is feature 

    f[0] = 1;
    f[1] = this.getNormMinMax(this.player.x,0,cc.winSize.width);
    f[2] = this.getNormMinMax(this.player.vel,-80,80);

    for(var i=0; i<pooList.length;++i)
    {
        f[3 + 2*i] = this.getNormMinMax(pooList[i].x,0,cc.winSize.width);
        f[3 + 2*i+1] = this.getNormMinMax(pooList[i].y,0,cc.winSize.height*3);
    }

And this below code is updating weight without any normalization.

for(var i=0; i<this.featureSize; ++i)
        {
            var w = this.weightArray[this.doAction][i];
            this.weightArray[this.doAction][i] =
                w + this.learningRate*(this.reward + this.discountFactor*maxAction - this.updateQSA) * f[i];
        }
Juho Sung
  • 59
  • 5
  • You have to careful about using function approximation for q-learning because even linear function approximation for off policy learning is not guaranteed to converge, which could explain why your weights blow up – goh Oct 26 '17 at 08:56
  • @goh do you think using a DNN could guarantee better convergence in such cases? I experienced the same problem with a simple linear function approximation – AleB Jan 08 '20 at 14:18

1 Answers1

1

It seems you're using Linear Regression without regularization, and there are collinear features. Try adding L1 or L2 regularization (use Ridge, Lasso or Elastic Net models).

Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65