Okay, so I want to have a single plot where I would have a "x" number of novels and we would be able to see the dispersion of a particular word throughout all novels. Every novel has a different length (number of total words), so the "x" axis would have to be the novels and the "y" axis would have to be the length of each novel. Right now, I am able to create a separate plot for every novel, but I want to have all of them together. Here's what I have so far:
input.dir<-("corpus2")
files.v<-dir(input.dir, "\\.txt$")
corpus<-corpus(files.v, input.dir)
tiempo<-tiempo(corpus)
noche<-palabra("día", corpus, tiempo)
dispersion(noche)
#corpus
corpus<-function(files.v, input.dir){
text.word.vector.l<-list()
for(i in 1:length(files.v)){
text.v <- scan(paste(input.dir, files.v[i], sep="/"), what="character", sep="\n")
Encoding(text.v)<-"UTF-8"
text.v <- paste(text.v, collapse=" ")
text.lower.v <- tolower(text.v)
text.words.v <- strsplit(text.lower.v, "\\W")
text.words.v <- unlist(text.words.v)
text.words.v <- text.words.v[which(text.words.v!="")]
text.word.vector.l[[files.v[i]]] <- text.words.v
}
return(text.word.vector.l)
}
#tiempo
tiempo <- function(argument1){
tiempo.l<-list()
for (i in 1:length(argument1)){
time<-seq(1:length(argument1[[i]]))
tiempo.l[[files.v[i]]]<-time
}
return(tiempo.l)
}
#palabra
palabra<-function(keyword, argument1, argument2){
hits.l<-list()
for (i in 1:length(argument1)) {
hits.v<-which(argument1[[i]]==keyword)
hits.keyword.v<-rep(NA, length(argument2[[i]]))
hits.keyword.v[hits.v]<-1
hits.l[[files.v[i]]]<-hits.keyword.v
}
return(hits.l)
}
#dispersion
dispersion<-function(argument1){
options(scipen=5)
for (i in 1:length(argument1)) {
plot(argument1[[i]], main="Dispersion plot",
xlab="time", ylab="keyword", type="h", ylim=c(0,1), yaxt='n')
}
}
How can I plot this together? Here's a picture of what I feel it should look like:
What I am trying to do is more or less having all these plots together: