0

How can I use the text2vec package to create a tdf-idf matrix with character n-gram features?

Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38
Kwiebes
  • 43
  • 1
  • 6

1 Answers1

0

How about:

library(text2vec)
data("movie_review")
it = itoken(movie_review$review, tolower, char_tokenizer)
v = create_vocabulary(it, ngram = c(3, 3), sep_ngram = "_")
dtm = create_dtm(it, vectorizer = vocab_vectorizer(v))

PS in future please try to provide some reproducible example of what did you try to solve your problem.

Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38