I'd like to plot how the proportion of a particular topic changes over time, but I've been having some trouble isolating a single topic and plotting over time, especially for plotting multiple groups of documents separately (let's create two groups to compare - journals A and B). I've saved dates associated with these journals in a function called dateConverter
.
Here's what I have so far (with much thanks to @scoa):
library(tm); library(topicmodels);
txtfolder <- "~/path/to/documents/"
source <- DirSource(txtfolder)
myCorpus <- Corpus(source, readerControl=list(reader=readPlain))
for (i in 1:10){
meta(myCorpus[[i]], tag = "origin") <- "A"
}
for (i in 11:length(myCorpus)){
meta(myCorpus[[i]], tag = "origin") <- "B"
}
dates <- do.call("c", dateConverter)
for (i in 1:length(myCorpus)){
meta(myCorpus[[i]], tag = "datetimestamp") <- dates[i]
}
dtm <- DocumentTermMatrix(myCorpus, control = list(minWordLength=3))
n.topics <- 10
lda.model <- LDA(dtm, n.topics)
terms(lda.model,10)
df <- data.frame(id=names(topics(lda.model)),
topic=posterior(lda.model),
date=as.POSIXct(unlist(lapply(meta(myCorpus,type="local",tag="datetimestamp"),as.character))),
origin=unlist(meta(myCorpus,type="local",tag="origin")) )
How can I plot these?