I am developing simple game program to show q-learning with linear function approximation. screen shot
In this game, there are uncountable state. I have to consider many factors like player's position, speed, and enemy's position (there are 12 ~ 15 enemy objects). I ended up changing my algorithm from using table to use linear function approximation.
I decided around 20 ~ 22 features.(constant, player position, player speed, all of enemies position). and there is
After implementing that algorithm, I got stuck in some problem.
Weight value is overflowed in a few second after running my program. I found that I didn't normalize features and weight.
It was easy to normalize feature value because each feature has their bound . However, It wasn't enough to normalize only feature value. It still end up overflow.
My problem is how do I normalize my weights.
Below is my code to implement to normalize features.
//f is feature
f[0] = 1;
f[1] = this.getNormMinMax(this.player.x,0,cc.winSize.width);
f[2] = this.getNormMinMax(this.player.vel,-80,80);
for(var i=0; i<pooList.length;++i)
{
f[3 + 2*i] = this.getNormMinMax(pooList[i].x,0,cc.winSize.width);
f[3 + 2*i+1] = this.getNormMinMax(pooList[i].y,0,cc.winSize.height*3);
}
And this below code is updating weight without any normalization.
for(var i=0; i<this.featureSize; ++i)
{
var w = this.weightArray[this.doAction][i];
this.weightArray[this.doAction][i] =
w + this.learningRate*(this.reward + this.discountFactor*maxAction - this.updateQSA) * f[i];
}