I have individual level causes of death data (from the 19th century) and want to compare the frequencies between males and females, either using scatterplots or comparing word clouds. I have manage to do this by using the following command (exemplified for comparing Word clouds):
all=c(female,male)
corpus = Corpus(VectorSource(all))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = c("female", "male")
comparison.cloud(tdm, max.words=200, random.order=FALSE,rot.per=.0, colors=c("indianred3","lightsteelblue3"), use.r.layout=FALSE,title.size=3)
At some point during this process the causes of deaths are split into single words (they are merged when I read in the data). My question: Is there a way to make word clouds or scatterplots where I take into account that some causes of deaths consist of more than one word? For example: "verval" + "van" + "krachten" does not mean that much separately, but merged together "verval van krachten" is a highly frequent cause of death, with a proper meaning.