0

i'm trying to apply markov chains algorithm for a simple text generation and i found a code from internet and made changes to fit my data as follows

library(markovchain)
library(tidyverse)
library(tidytext)
library(stringr)
#use readLines to read any text file
text <- readLines('example.txt')
#check few lines of our text file
head(text)

> head(text)
[1] "PRIDE AND PREJUDICE" ""                    "By Jane Austen"      ""                   
[5] ""                    ""                


#seperate the texts i.e sentences
text <- text[nchar(text) > 0]
head(text)
#removing all punctuations in the texts
text <- str_replace_all(text, "[[:punct:]]", "")
head(text)

> head(text)
[1] "PRIDE AND PREJUDICE"                                                   
[2] "By Jane Austen"                                                        
[3] "Chapter 1"                                                             
[4] "It is a truth universally acknowledged that a single man in possession"
[5] "of a good fortune must be in want of a wife"                           
[6] "However little known the feelings or views of such a man may be on his"

#splitting the text into terms
terms <- unlist(strsplit(text, ' '))
head(terms)

> head(terms)
[1] "PRIDE"     "AND"       "PREJUDICE" "By"        "Jane"      "Austen"

#creates the states for each term to fit in a the current satate and next state 
    fit <- markovchainFit(data = terms)

#plot(fit$estimate)
#paste(markovchainSequence(n=50, markovchain=fit$estimate), collapse=' ')
#s <- createSequenceMatrix(terms, sanitize=FALSE)
#fit2 <- fitHigherOrder(s)

#create new variable initially as a empty 
new1 <- NULL
# generate new lines by varying the number for i as desired and 
for(i in 1:1000){
  new1 <- c(new1, 
              c(paste(markovchainSequence(n=6, markovchain=fit$estimate), collapse=' '))) 
}

# Check out the first few lines 
head(new1)
# save our titles to a .txt file
write(new1, "new_example2.txt")

here when we create a sequence of sentences with the desired number of terms in each. can anyone please help me understand how the markovchainSequence is randomly choosing its first term in a sentence? is there a way to give a first term as a user input term and generate a sequence following it?

hareen tej
  • 89
  • 1
  • 3
  • 9
  • You claim you made changes to an example you found, but you don't use the `markovchain` package here, do you ? – Stéphane Laurent Apr 03 '18 at 08:53
  • i did loaded markovchain package in the first line of the code and used two functions markovchainFit and markovchainSequence which doesnt work if i didnt used. But i'm not sure whether i understood your question as i have come across this for the first time. – hareen tej Apr 03 '18 at 14:31
  • Ah sorry I didn't scroll down your code... stupid of me – Stéphane Laurent Apr 03 '18 at 15:17

1 Answers1

0

I'm not sure to understand your question, but here is an example of usage of the markovchain package where one sets the initial value.

# define the states
words <- c("hello", "how", "are", "you")

# define the transition matrix (each row sums to 1)
transitions <-  rbind(c(0.1, 0.2, 0.3, 0.4),
                      c(0.1, 0.2, 0.3, 0.4),
                      c(0.1, 0.2, 0.3, 0.4),
                      c(0.1, 0.2, 0.3, 0.4))
rownames(transitions) <- colnames(transitions) <- words

# define a markovchain object
library(markovchain)
markovChain <- new("markovchain", states=words, 
                   transitionMatrix = transitions)

# sample from the Markov chain
# initial value given by t0
markovchainSequence(10, markovChain, t0="how")
# output: "how" "hello" "are" "are" "are" "hello" "you" "are" "you" "you" 
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225