Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

  1. Core parts written in C++
  2. Small memory footprint
  3. Concise, pipe friendly API
  4. No need load all data into RAM - process it in chunks
  5. Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions
0
votes
0 answers

Error in glove_event$fit_transform in text2vec package

While experimenting with word embedding using text2vec package in R, the following error is thrown embd_dim <- 5 glove_event <- GlobalVectors$new(rank = embd_dim, x_max = 10,learning_rate = 0.01, alpha = 0.95, lambda = 0.005) wrd_embd_event <-…
0
votes
1 answer

text2vec's vocab_vectorizer ouput is the function itself

I am trying to run through text2vec's example on this page. However, whenever I try to see what the vocab_vectorizer function returned, it's just an output of the function itself. In all my years of R coding, I've never seen this before, but it…
maloneypatr
  • 3,562
  • 4
  • 23
  • 33
0
votes
1 answer

R : error inherits(x, "matrix") || inherits(x, "Matrix") is not TRUE when trying to calculate cosine similarity with tf-idf

I have a corpus filled with 5 different books (all .txt files). I want to calculate the cosine similarity between these books, so I can tell how similar they are with one another. Following is my…
Jimmy
  • 1
  • 1
0
votes
1 answer

R text2vec; rsparse::GloVe$new() GlobalVectors$new() Env Set/Not Set

Problem: R GloVe environment using library(text2vec). Set environment with code execution of rsparse::GloVe$new(), BUT, not set with code execution of GlobalVectors$new(). Then ran wv_main = glove$fit_transform(tcm...), error: Error at…
manager_matt
  • 395
  • 4
  • 19
0
votes
1 answer

Return several objects from a shiny server function in R for plotting an LDAvis plot first

The code below is the one I am using for plotting an LDA plot using text2vec inside topic_model function in a shiny app. input$date is a checkboxGroupInput selection, input$data works perfectly fine for a DT::renderDataTable output & topic_model…
MelaniaCB
  • 427
  • 5
  • 16
0
votes
1 answer

I can't create tf-idf matrix for my test data using text2vec

I'm following this tutorial and doing it as I did the training set, but it keeps saying the same thing. Someone know what's wrong with this? > #Construct sample document-term matrix con el vectorizer inicial > sample.it <- itoken(rawsample$Abstract,…
MelaniaCB
  • 427
  • 5
  • 16
0
votes
1 answer

Why are distances in text2vec's RWMD module between 1 and -1?

From what I understand, the dist2 RWMD feature of the great text2vec package calculates distances between matrixes as cosine distances. Wouldn't that mean 1 - (cosine similarity)? If cosine similarity runs between 0 and 1, then shouldn't that result…
0
votes
1 answer

Error while finding topics quantity on Latent Dirichlet Allocation model using ldatuning library

This is the outcome error and I can tell this is because there is at least one document without some term, but I don't get why and how I can solve it. prep_fun = function(x) { x %>% str_to_lower %>% #make text lower…
MelaniaCB
  • 427
  • 5
  • 16
0
votes
1 answer

Perplexity issues using text2vec

I am ussing text2vec on 230k docs, as I always mention. I am trying to find the best topic number for my document term matrix by using perplexity. When I use it one by one it works perfectly fine, but when I try to use a loop to get it for a range…
MelaniaCB
  • 427
  • 5
  • 16
0
votes
0 answers

Where can I find the coherence funtion in r?

Excuse me for being basic, but I want to use the 'coherence' function I found on this link to evaluate my latent dirichlet allocation topics and it isn't working with text2vec and I can't tell which library it is in, if it isn't that…
MelaniaCB
  • 427
  • 5
  • 16
0
votes
1 answer

Why does text2vec show more files than actually exist?

I am testing text2vec. There are only 2 files under a dir (1.txt, 2.txt, of very small size, about 20 k each). I wanted to test their similarity. I do not know why it says 54 documents. > library(stringr) > library(NLP) > library(tm) > …
Dylan
  • 1,183
  • 4
  • 13
  • 26
0
votes
1 answer

convert R matrix to text2vec dtm

I have a R matrix mat and I want to perform LDA on it. When I run lda_model$fit_transform(mat, n_iter = 20), I get an error: Error in super$check_convert_input(x) : don't know how to deal with input of class 'matrix' Is there an easy way to…
tomaz
  • 493
  • 4
  • 13
0
votes
1 answer

hash vectorizer in R text2vec package with stopwords removal option

I am using R text2vec package for creating document-term-matrix. Here is my code: library(lime) library(text2vec) # load data data(train_sentences, package = "lime") # tokens <- train_sentences$text %>% word_tokenizer it <- itoken(tokens,…
Sam S.
  • 627
  • 1
  • 7
  • 23
0
votes
1 answer

Using text2vec in R - Error: no package called ‘futile.options’

I successfully installed text2vec in R, but when I try to load it with library(text2vec), I'm getting an error: Error: package or namespace load failed for ‘text2vec’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): there is no…
0
votes
1 answer

using text2vec for multilabel classification

I want to know if text2vec package can be used for multilabel classification like python's BinaryRelevance in skmultilearn.problem_transform I'm currently referring to the pipeline documented at: http://text2vec.org/vectorization.html
savi
  • 323
  • 1
  • 11