Word prediction : neural net versus n-gram approach

Question

For example if I attempt to predict the next word in a sentence I can use a bi gram approach and compute the probabilities of a word occurring based on the previous word in the corpus.

If instead I use a neural net to predict the next word. The training data consists of word pairs where each pair contains the current and next word in the corpus. Training the net uses an input value as a vectorized representation of the word , the output value is a vectorized representation of next word in the corpus.

I expect the neural net to perform better but I'm not sure why ?

When is it better to use a neural net versus a classical approach. In this case a neural net versus an n-gram model. Apologies if this question is ambiguous.

Maybe the answer is trial and error and check which model has faster performance and makes better predictions ?

The neural net will perform better as making the prediction is just a vector multiplication whereas using a n-gram model to predict requires a probability calculation.

Something in the line of https://arxiv.org/abs/1606.07470 or https://arxiv.org/abs/1608.04631 ? — alvas, Sep 28 '16 at 16:54

score 3 · Accepted Answer · answered Sep 27 '16 at 19:53

The answer to your question depends on the specific data that you have. As you say, n-gram models are based on counting the probability of observing each possible bi-gram. This is a really efficient way to make use of the data especially when you don't have a lot of text to train from. N-gram models can easily beat neural network models on small datasets.

Neural networks have a few strengths that n-gram models don't have. They can leverage longer word histories, assuming the use of a recurrent neural network. They can also share parameters across similar n-grams.

Word prediction : neural net versus n-gram approach

1 Answers1