Why should we use RNNs instead of Markov models?

Question

Recently I stumbled across this article, and I was wondering what the difference between the results you would get from a recurrent neural net, like the ones described above, and a simple Markov chain would be.

I don't really understand the linear algebra happening under the hood in an RNN, but it seems that you are basically just designing a super convoluted way of making a statistical model for what the next letter is going to be based on the previous letters, something that is done very simply in a Markov Chain.

Why are RNNs interesting? Is it just because they are a more generalizable solution, or is there something happening that I am missing?

score 10 · Accepted Answer · edited Jan 01 '18 at 22:29

The Markov chain assumes the Markov property, it's "memoryless". The probability of the next symbol is calculated based on the k previous symbols. In practice k is limited to low values (let's say 3-5), because the transition matrix grows exponentially. Therefore sentences generated by a Hidden Markov Model are very inconsistent.

On the other hand, RNNs (e.g. with LSTM units) are not bound by the Markov property. Their rich internal state allows them to keep track of long-distant dependencies.

Karpathy's blog post lists C-sourcecode generated by an RNN character by character. The model impressively captures the dependencies of things like opening and closing brackets.

Why should we use RNNs instead of Markov models?

1 Answers1