1

I am very new to R and I am trying to do an NGram WordCloud. However, my results always show a 1Gram instead of an NGram. I have searched for days for answers on the web and tried different methods...still the same result. Also, for some reason, I don't have the Ngramtokenizer function that I see everyone is using. However, I found another tokenizer function that I am using here. I hope someone can help me out. Thanks in advance!

library(dplyr)
library(ggplot2)
library(tidytext)
library(wordcloud)
library(tm)
library(RTextTools)

library(readxl)
library(qdap)
library(RWeka)
library(tau)
library(quanteda)


rm(list = ls())

#setwd("C:\\RStatistics\\Data\\")

#allverbatims <-read_excel("RS_Verbatims2018.xlsx") #reads excel files
#selgroup <- subset(allverbatims, FastNPS=="Detractors")
#selcolumns <- selgroup[ ,3:8]

#sample data 
selcolumns <- c("this is a test","my test is not working","sample data here")

Comments <- Corpus(VectorSource(selcolumns))
CommentClean <- tm_map(Comments, removePunctuation)
CommentClean <- tm_map(CommentClean, content_transformer(tolower))
CommentClean <- tm_map(CommentClean,removeNumbers)
CommentClean <- tm_map(CommentClean, stripWhitespace)
CommentClean <- tm_map(CommentClean,removeWords,c(stopwords('english')))

#create manual tokenizer using tau textcnt since NGramTokenizer is not available

tokenize_ngrams <- function(x, n=2) return(rownames(as.data.frame(unclass(textcnt(x,method="string", n=n))))) 

    #test tokenizer
    head(tokenize_ngrams(CommentClean))

    td_mat <- TermDocumentMatrix(CommentClean, control = list(tokenize = tokenize_ngrams))

    inspect(td_mat) #should be bigrams but the result is 1 gram

    matrix <- as.matrix(td_mat)
    sorted <- sort(rowSums(matrix),decreasing = TRUE)
    data_text <- data.frame(word = names(sorted),freq = sorted)

    set.seed(1234)
    wordcloud(word = data_text$word, freq = data_text$freq, min = 5, max.words = 100, random.order = FALSE, rot.per = 0.1, colors = rainbow(30))
RdR
  • 11
  • 2
  • Welcome to SO! It's pretty hard to help without having your code working with your data and, in this case your data are missing. You'll have better answers if you share some of your real data, or some fake data that create the issue with your code. Howerver, if you are more confident with, [here](https://www.rdocumentation.org/packages/textmineR/versions/1.7.0/topics/NgramTokenizer) the function you did not find. – s__ Dec 12 '18 at 22:49
  • thanks for the reply but I'm still not able to get the NgramTokenizer function. installed the textminer and the function still doesn't show. However, the tokenize_ngrams function works. It's the TDM that cannot create the ngram data. – RdR Dec 12 '18 at 23:16

0 Answers0