9

Imagine a binary classification problem like sentiment analysis. Since we have the labels, cant we use the gap between actual - predicted as reward for RL ?

I wish to try Reinforcement Learning for Classification Problems

Anuj Gupta
  • 6,328
  • 7
  • 36
  • 55
  • 1
    What is the point on using RL for classification problems? I mean, do you expect any improvement or advantage? As stated in this question, in general the performance should be worse (or more expensive computationally): https://stackoverflow.com/questions/44594007 – Pablo EM Jun 21 '17 at 07:14

1 Answers1

9

Interesting thought! According to my knowledge it can be done.

  1. Imitation Learning - On a high level it is observing sample trajectories performed by the agent in the environment and use it to predict the policy given a particular stat configuration. I prefer Probabilistic Graphical Models for the prediction since I have more interpretability in the model. I have implemented a similar algorithm from the research paper: http://homes.soic.indiana.edu/natarasr/Papers/ijcai11_imitation_learning.pdf

  2. Inverse Reinforcement Learning - Again a similar method developed by Andrew Ng from Stanford to find the reward function from sample trajectories, and the reward function can be used to frame the desirable actions. http://ai.stanford.edu/~ang/papers/icml00-irl.pdf

philipvr
  • 5,738
  • 4
  • 32
  • 44
vikky 2405
  • 289
  • 2
  • 11