I look for a solution to train a DNNClassifier (4 classes, 20 numeric features) from imbalanced rewarded samples datafile. Each class represents a game action and reward the action score. Features are given observations. So it looks as QLearning model... But QLearning model is a dataless on-line training method.
I tried to manage with samle weights with following formula :
weight = ((reward-minreward)/(maxreward-minreward))*(totalsamples/classsamples)
with 180k samples, poor accuracy ; 490k samples accuracy of 83 % ; not enought to be good.
So what is the best way to perform this :
- with weight as I did but with more samples or other formula
- with a QLearning algorithm (but don't know how to do...)
- with a Learning to Rank algorithm (did not found any good and complete tutorial)
Thanks for answer