1

Before I updated my version of RStudio, everything worked great. With the update something has changed with Document Term Matrix in the 'tm' package. I want to create a dtm, but with numbers. For instance if I have a .csv with one column as shown below:

x
1.01
11.21
123.35
212.11

I want the column names in my term matrix to look like this:

1.01 11.21 123.35 212.11
1    0     0      0
0    1     0      0
0    0     1      0
0    0     0      1

But instead it looks like this:

123 212
0   0
0   0
1   0
0   1

Here's the code that used to work:

corpus = Corpus(VectorSource(x)) dtm = DocumentTermMatrix(corpus) dtm_df = as.data.frame(as.matrix(dtm))

Thanks in advance

Will Ebert
  • 21
  • 6
  • what version do you have? I have Version 1.0.136 and it seems to be working as you had hoped. – Lucy Mar 14 '17 at 02:18
  • I have 1.0.136 as well..... – Will Ebert Mar 14 '17 at 02:36
  • Actually the results are: `123 212` as the column names. Not `1 11 123 212` as mention previously @Lucy – Will Ebert Mar 14 '17 at 02:52
  • I have tried uninstalling everything and reinstalling, tried older versions (Studio 0.99.489 & R 3.3.1), but I get the same results. I ask others to test it out and it works for them. Also, I had someone download R, Rtools, and RStudio to test this and they got the same results I did. I have no idea what has happened and would greatly appreciate help on this matter as it is extremely urgent. – Will Ebert Mar 14 '17 at 17:55

1 Answers1

1

From the 'tm' package maintainer Ingo Feinerer:

Here's the code that used to work:

corpus = Corpus(VectorSource(x))

Try VCorpus() instead of Corpus().

dtm = DocumentTermMatrix(corpus) dtm_df = as.data.frame(as.matrix(dtm))

That is highly inefficient (since as.matrix() generates a dense representation from the sparse term-document matrix).

Best regards, Ingo

Will Ebert
  • 21
  • 6