0

I already have a data frame at the document term count level, noting that documents and terms are simply indexed by integers, and scores are weighted continuous numbers, if that is relevant, e.g.:

doc  term  count
1    2     2
1    5     3.1
2    2     0.4
3    5     5.9

But it is currently a data frame, and I would like to convert it to a dtm format in order to make use of some dtm-ready functions (namely, the "documents.compare" function of RNewsflow).

I have been trying to use "cast_dtm" through something like:

dtm <- as.matrix(df) %>% cast_dtm(document, term, count)

where "df" is the data frame exampled above, but I get the following error:

Error in UseMethod("ungroup") : no applicable method for 'ungroup' applied to an object of class "c('matrix', 'double', 'numeric')"
phiver
  • 23,048
  • 14
  • 44
  • 56
km5041
  • 351
  • 1
  • 4
  • 13

1 Answers1

3

You are almost there. instead of "document" you needed "doc" as input since your column name is doc not document. See example below.

library(tidytext)
library(dplyr)
dtm <- df %>% cast_dtm(document = doc, term = term, value = count)

data:

df <- structure(
  list(
    doc = c(1L, 1L, 2L, 3L),
    term = c(2L, 5L, 2L,5L),
    count = c(2, 3.1, 0.4, 5.9)),
  .Names = c("doc", "term", "count"),
  class = "data.frame",
  row.names = c(NA,-4L)
)
phiver
  • 23,048
  • 14
  • 44
  • 56