-1

I want to build an RL agent which can justify if a handwritten word is written by the legitimate user or not. The plan is as follow:

Let's say I have written any word 10 times and extracted some geometrical properties for all of them to use as features. Then I have trained an RL agent to learn to take the decision on the basis of the differences between geometrical properties of new and the old 10 handwritten texts. Reward is assigned for correct identification and nothing or negative for incorrect one.

Am I going in the right direction or I am missing anything which is vital? Is it possible to train the agent with only 10 samples? Actally as a new student of RL, I am confused about use case of RL; if it is best fit for game solving and robotic problems or it is also suitable for predicting on the basis of training.

EMT
  • 458
  • 3
  • 14

1 Answers1

1

Reinforcement learning would be used over time. If you were following the stroke of the pen, over time, to find out which way it was going that would be more reinforcement learning's wheelhouse. The time dimension (or over a series of states) is why it's used in games like Starcraft II.

You are talking about taking a picture of the text that was written and eventually classifying it into a boolean (Good or Not). You are looking for more Convolutional neural networks to solve your problem (those types of algos are good for pictures).

Eventually you won't be able to tell. There are techniques with GAN's (Generative Adversarial Networks) that can train with your discriminator and finally figure out the pattern it's looking for and fool it. But this sounds good as a homework problem.

Back2Basics
  • 7,406
  • 2
  • 32
  • 45
  • Eventually I am following the strokes of pen with time. So I will collect the time series data and extract the speed, variation of speed and the coordinates of the pen strikes. In that case, is it possible to use reinforcement technique? I am just stuck in the preliminary concept about how to set the reward and predict. Also confused that with only 10 samples is it possible to implement. – EMT Apr 30 '20 at 08:13
  • What I think is going to happen is it's going to memorize those 10 and only those 10 and say if it out of the field of those 10 only then it'll match otherwise it won't. Be careful about small samples. – Back2Basics Apr 30 '20 at 09:04