I am relatively new to R.
I am able to create a correlation plot that looks like this:
using the following code:
source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)
# creating corpus on variable that I want to create plot on
myCorpus <- Corpus(VectorSource(final$MH2))
dtm2 <- DocumentTermMatrix(myCorpus)
# correlation of terms plot
freq.terms <- findFreqTerms(dtm2)[1:25] # choose top 25 terms
plot(dtm2, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1
However, this plot only takes in single words, and not phrases. For instance, "varicose" and "veins" should not be split up. It should read "varicose veins." I am able to create a dtm that actually does parse in phrases, but it's not able to plot the phrased dtm, only the single one. This is what my plot looks like after running the following code:
source("https://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")
library(tm)
library(qdap)
library(qdapTools)
# create corpus with phrases kept together based off https://stackoverflow.com/questions/24038498/corpus-build-with-phrases
dat <- final[ , 3]
colnames(dat) <- c("text")
# create 2 variables to combine into 1 that will eventually read doc1...doc1000 etc
dat$docs <- "doc"
dat$num <- ""
dat$num <- 1:nrow(dat)
# combine both variables
dat$docs <- paste(dat$docs, dat$num, sep = "")
dat <- dat[ , -c(3)]
x <- sub_holder(", ", dat$text)
# create dtm here
MH_parsed <- apply_as_tm(t(wfm(x$unhold(gsub(" ", "~~", x$output)), dat$docs)),
weightTfIdf, to.qdap = FALSE)
# correlation of terms plot
freq.terms <- findFreqTerms(MH_parsed)[1:25] # choose top 25 terms
plot(MH_parsed, term = freq.terms, corThreshold = 0.1, weighting = T) # choose terms with correlation of at least 0.1
How can I make a correlation plot with phrases in the image?
Thanks.