0

I am trying to implement Markov chains and need to compute the probability of the previous word. I have created a data frame and tried both a mutate and a for loop. In both cases for some reason it is always returning only the 1st element's previous word. The data frame used is

             freq         term
ball costs      1   ball costs
bat bal         1      bat bal
bat ball        1     bat ball
bread eggs      1   bread eggs
buy bread       1    buy bread
costs rupe      1   costs rupe

I wrote a function to get Previous

getPrevious <- function(term)
{
    b <- strsplit(term,split=" ")
    c <- unlist(b)
    c[1]
 }

I tried both mutate and a for loop. Both populate all rows with the previous word of the 1st row only (see below)

   mutate(bigram, x= getPrevious(term))

and I only seem to get

      freq         term   prob    x
  1     1   ball costs 0.0625 ball
  2     1      bat bal 0.0625 ball
  3     1     bat ball 0.0625 ball
  4     1   bread eggs 0.0625 ball
  5     1    buy bread 0.0625 ball
  6     1   costs rupe 0.0625 ball

I don't understand why it is unable to pick the previous word from each term. I even tried a for loop to the same effect.

Where am I going wrong?

Thanks Ganesh

Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
Tinniam V. Ganesh
  • 1,979
  • 6
  • 26
  • 51

1 Answers1

1

Try with function:

getPrevious <- function(term)
{
    sapply(strsplit(term,split=" "), head, 1)
}

What you did is that you splitted each elements of the column in a list, shift it to a vector and took the first element of this vector, ball. Whereas you need to split each elements of the column in a list and take the first word of each element of this list.

Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87