i'm trying to apply markov chains algorithm for a simple text generation and i found a code from internet and made changes to fit my data as follows
library(markovchain)
library(tidyverse)
library(tidytext)
library(stringr)
#use readLines to read any text file
text <- readLines('example.txt')
#check few lines of our text file
head(text)
> head(text)
[1] "PRIDE AND PREJUDICE" "" "By Jane Austen" ""
[5] "" ""
#seperate the texts i.e sentences
text <- text[nchar(text) > 0]
head(text)
#removing all punctuations in the texts
text <- str_replace_all(text, "[[:punct:]]", "")
head(text)
> head(text)
[1] "PRIDE AND PREJUDICE"
[2] "By Jane Austen"
[3] "Chapter 1"
[4] "It is a truth universally acknowledged that a single man in possession"
[5] "of a good fortune must be in want of a wife"
[6] "However little known the feelings or views of such a man may be on his"
#splitting the text into terms
terms <- unlist(strsplit(text, ' '))
head(terms)
> head(terms)
[1] "PRIDE" "AND" "PREJUDICE" "By" "Jane" "Austen"
#creates the states for each term to fit in a the current satate and next state
fit <- markovchainFit(data = terms)
#plot(fit$estimate)
#paste(markovchainSequence(n=50, markovchain=fit$estimate), collapse=' ')
#s <- createSequenceMatrix(terms, sanitize=FALSE)
#fit2 <- fitHigherOrder(s)
#create new variable initially as a empty
new1 <- NULL
# generate new lines by varying the number for i as desired and
for(i in 1:1000){
new1 <- c(new1,
c(paste(markovchainSequence(n=6, markovchain=fit$estimate), collapse=' ')))
}
# Check out the first few lines
head(new1)
# save our titles to a .txt file
write(new1, "new_example2.txt")
here when we create a sequence of sentences with the desired number of terms in each. can anyone please help me understand how the markovchainSequence is randomly choosing its first term in a sentence? is there a way to give a first term as a user input term and generate a sequence following it?