1
    library("RTextTools", "topicmodels", "tm")         
    matrix <- create_matrix(data, language="english", removeNumbers=TRUE, stemWords = TRUE, weighting=weightTf)
    > matrix
    #DocumentTermMatrix (documents: 3104, terms: 7699)
    #Non-/sparse entries: 28138/23869558
    #Sparsity           : 100%
    #Maximal term length: 19
    #Weighting          : term frequency (tf)

This is the metadata that is being given as the output. What I want to know is that if there is any command to actually see the words being stemmed or observe the non-sparse entries. By using nrow(matrix) and ncol(matrix), I got an idea of the size of the matrix but I need more help to deconstruct the matrix.I'm using NYTimes dataset.

BlackSwan
  • 275
  • 3
  • 12
  • Have you tried the tidytext package? You can easily see the non-sparse entries with it. Don't know about seeing the words being stemmed. – lawyeR Jun 25 '17 at 13:23
  • You can use `str(matrix)` to understand the structure of your matrix. It allows you to see that you can access the non-sparse entries with `matrix$dimnames$Terms`. – Scarabee Jun 26 '17 at 09:21
  • About the stemming: I did a few tests, and it seems that the words are NOT stemmed, despite the `stemWords = TRUE` argument. I think it's a bug. (Note that RTextTools is no longer actively maintained, as said on their homepage.) – Scarabee Jun 26 '17 at 09:22
  • @Scarabee Please can you tell me what tests they are? I'm afraid I might have to discard the package if there is no stemming. – BlackSwan Jun 26 '17 at 11:47
  • Run the example in the `create_matrix` help page with `stemWords=TRUE` instead of `FALSE`. In the resulting matrix you get one row for _attack_, and a distinct row for _attacks_. If the stemming worked properly these two rows should be merged. – Scarabee Jun 26 '17 at 12:03
  • @Scarabee Please tell me how safe it is to use RTextTools when words are not being stemmed? Should I use some other package? This is a trial run of the package and I was going to use LDA on an RTI applications dataset as a part of my project. How much do you think will the final output be affected if the words are not being stemmed? – BlackSwan Jun 26 '17 at 12:11
  • Stemming is not always necessary. It depends on your data and on what you want to do. One possibility is to stem your words "manually" before using `create_matrix`. For a more general answer, I refer you to [this post](https://stackoverflow.com/questions/42952476/impossible-to-see-results-of-rtexttoolstolower-text-in-document-term-matri/42953161#42953161) by smci. – Scarabee Jun 26 '17 at 12:23

0 Answers0