1

I am new to R and exploring Text Mining. Using the below steps I could get through till stemming however, I would need to do POS tagging and get Text/Theme Pattern. The data that I am using is the customer verbatim. Please help how to proceed further. Most of the articles that I checked do not explain how to do POS tagging for the data in Corpus and I could not find any details on Pattern detection. Any help would be greatly appreciated...! Thanks in advance,

CSVfile = read.csv("Testfortextcsv.csv",stringsAsFactors = FALSE)
TestSplit = as.data.frame(sent_detect_nlp(CSVfile$Comment))
colnames(TestSplit)[colnames(TestSplit)=="sent_detect_nlp(CSVfile$Comment)"]<- "Comment"
TestCorpus = Corpus(VectorSource(TestSplit$Comment))
TestCorpus = tm_map(TestCorpus, tolower)
TestCorpus = tm_map(TestCorpus, PlainTextDocument)
TestCorpus = tm_map(TestCorpus, removePunctuation)
TestCorpus = tm_map(TestCorpus, removeWords,c("Test",stopwords("SMART"),stopwords("english")))
TestCorpus = tm_map(TestCorpus, stripWhitespace)
TestCorpus = tm_map(TestCorpus, stemDocument)
dtm <- TermDocumentMatrix(TestCorpus)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)

This I used for getting wordcloud, association and a Barplot.


WordCloud
----------
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,max.words=200,random.order=FALSE, rot.per=0.35, colors=brewer.pal(8,
"Dark2"))

Find Frequent Terms
-----------------
findFreqTerms(dtm, lowfreq = 15)

Find Association:
-----------------------
findAssocs(dtm, terms = "account", corlimit = 0.3)

Bar Plot for frequencies
--------------------------
barplot(d[1:10,]$freq, las = 2, names.arg = d[1:10,]$word,col ="lightblue", main ="Most frequent words",ylab = "Word frequencies")
drmonkeyninja
  • 8,490
  • 4
  • 31
  • 59
Pavan
  • 71
  • 7

1 Answers1

3

The qdap package allows you to identify the part of speech of each word in a string.:

library(qdap)
s1<-c("Hello World")  
pos(s1)

You might find other resources openNLP and RTextTools and another possibility

Community
  • 1
  • 1
lawyeR
  • 7,488
  • 5
  • 33
  • 63
  • Thanks @lawyeR . I think it did the POS tagging however, I am not able to export it to a csv file. Gives me the following error. Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ""pos"" to a data.frame – Pavan Sep 08 '15 at 16:10
  • Please someone shed some light on Text Theme Pattern/Detection. Thanks in advance – Pavan Sep 08 '15 at 16:12