text mining in R- input is an Excel file with each row being one document

Asked Mar 18 '16 at 14:34

Active Mar 18 '16 at 14:52

Viewed 2,114 times

I am new to R. I have a CSV file that includes 15000 rows of text, each row belongs to one person. I want to do Latent Dirichlet Allocation on it. But, first I need to create a term document matrix. However, I don't know how to make R to treat each row as a document. Here is what I've done, but it doesn't look correct:

text <- read.csv("text.csv", stringsAsFactors = FALSE)
corpus  Corpus(VectorSource(text))



corpus <- tm_map(corpus, content_transformer(removePunctuation))
corpus <- tm_map(corpus, removeWords, stopwords("english"))
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, stemDocument)
corpus <- tm_map(corpus, stripWhitespace)
dtm <- DocumentTermMatrix(corpus)

the current dtm doesn't look like having all the terms in all the documents in columns. I feel like they're only words in each document.

I really appreciate your help

edited Mar 18 '16 at 14:52

asked Mar 18 '16 at 14:34

Monica Muller

Depends a lot on pre-processing and what packages you are using. I recently used LDA for the first time and [this](http://cpsievert.github.io/LDAvis/reviews/reviews.html) tutorial helped me through using it for the first time. – TBSRounder Mar 18 '16 at 14:42
Thanks Mark. I added all the preparations I did above. My main question is whether the document term matrix I created from the csv file is accurate. Thanks so much. – Monica Muller Mar 18 '16 at 14:53
you could try `readLines()` instead of `read.csv()` – C8H10N4O2 Mar 18 '16 at 14:58
I checked, it worked. – Madhu Sareen Sep 08 '17 at 07:56

text mining in R- input is an Excel file with each row being one document

0 Answers0