I am trying to implement Markov chains and need to compute the probability of the previous word. I have created a data frame and tried both a mutate and a for loop. In both cases for some reason it is always returning only the 1st element's previous word. The data frame used is
freq term
ball costs 1 ball costs
bat bal 1 bat bal
bat ball 1 bat ball
bread eggs 1 bread eggs
buy bread 1 buy bread
costs rupe 1 costs rupe
I wrote a function to get Previous
getPrevious <- function(term)
{
b <- strsplit(term,split=" ")
c <- unlist(b)
c[1]
}
I tried both mutate and a for loop. Both populate all rows with the previous word of the 1st row only (see below)
mutate(bigram, x= getPrevious(term))
and I only seem to get
freq term prob x
1 1 ball costs 0.0625 ball
2 1 bat bal 0.0625 ball
3 1 bat ball 0.0625 ball
4 1 bread eggs 0.0625 ball
5 1 buy bread 0.0625 ball
6 1 costs rupe 0.0625 ball
I don't understand why it is unable to pick the previous word from each term. I even tried a for loop to the same effect.
Where am I going wrong?
Thanks Ganesh