I would recommend using Markov chains https://en.wikipedia.org/wiki/Markov_chain
Very trivial example for your reference.
Assume that you are going to analyse 1-grams.
Analyzed texts:
monkey eats banana
dog eats bone
unigrams: monkey
, eats
,banana
, dog
,bone
, BEGIN
,END
.
Each sentence starts with beginning.
Two transitions are possible:
BEGIN->monkey
BEGIN->dog
This means that there is 50% chance that sentence will begin with monkey
.
Now after monkey
there is 100% chance of transition monkey->eats
(because there was no other monkey->*
transition in analysed texts.
Now after eats
there is 50% chance of banana
and 50% chance of bone
.
So in general with this model we can generate following sentences:
monkey eats banana
monkey eats bone
dog eats bone
dog eats banana
each of those has 25% to be produced
Note that bone and banana always transists into END
With digrams you will just split it to monkey eats
-> banana END
This is just simplified big picture, hope it helps
Edit
As for smoothing mentioned in comment, go with Laplace.
Assume that you did saw each word one more than we really did.
So for example now we will have:
eats bone (2)
eats banana (2)
eats chocolate (1)
eats dog (1)
Of course in this case we have very small dataset, but for bigger dataset you will get something like:
eats bone (104)
eats banana (1031)
eats chocolate (1)
eats dog (3)
...