Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

  1. Core parts written in C++
  2. Small memory footprint
  3. Concise, pipe friendly API
  4. No need load all data into RAM - process it in chunks
  5. Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions
0
votes
1 answer

Glove Word Mover Similarity

I want to calculate text similarity using relaxed word movers distance. I have two different datasets (corpus). See below. A <- data.frame(name = c( "X-ray right leg arteries", "consultation of gynecologist", "x-ray leg arteries", "x-ray leg…
john
  • 1,026
  • 8
  • 19
0
votes
1 answer

Word Mover Distance Similarity in R

I want to calculate text similarity using relaxed word movers distance. I have two different datasets (corpus). See below. A <- data.frame(name = c( "X-ray right leg arteries", "consultation of gynecologist", "x-ray leg arteries", "x-ray leg…
john
  • 1,026
  • 8
  • 19
0
votes
1 answer

Glove word embedding model parameters using tex2vec in R, and display training output (epochs) after every n iterations

I am using text2vec package in R for training word embedding (Glove Model) as: library(text2vec) library(tm) prep_fun = tolower tok_fun = word_tokenizer tokens = docs %>% # docs: a collection of text documents prep_fun %>% tok_fun it =…
Sam S.
  • 627
  • 1
  • 7
  • 23
0
votes
1 answer

Create Co-occurrence matrix with bigrams

I am looking to create a co-occurrence matrix with bigrams in stead of unigrams from a single string. I am referring the following…
NinjaR
  • 621
  • 6
  • 22
0
votes
2 answers

looping to tokenize using text2vec

Edited to shorten and provide sample data. I have text data consisting of 8 questions asked of a number of participants twice. I want to use text2vec to compare the similarity of their responses to these questions at the two points in time…
Will Hauser
  • 197
  • 7
0
votes
1 answer

Get LDAvis json from text2vec

Given a document term matrix dtm, text2vec provides a nice integration with the LDAvis package. However, I want to embed this visualisation into a markdown document. The LDAvis package has methods such as createJSON, which would allow me to do this,…
TMrtSmith
  • 461
  • 3
  • 16
0
votes
1 answer

In R text2vec package -How can the topics generated by LDA model can be assigned to the related documents

Using text2vec package in R -implemented LDA model,but iam wondering how to assign each documents to the topics BELOW HERE is my…
manjari
  • 1
  • 1
0
votes
2 answers

R - Installation of text2vec Ubuntu VM

I'm trying to install text2vec on an AWS EC2 Free-tier Ubuntu VM. I get this error message: > install.packages(c("text2vec"), type = "source") Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) trying URL…
Christopher Costello
  • 1,186
  • 2
  • 16
  • 30
0
votes
1 answer

How to get IDF Vector with text2vec

is it possible to extract not just the transformed TF-IDF Term-Document Matrix, but also the IDF vector that was used for this transformation with the latest version of text2vec (0.5.1)? Thank you!
Tobi
  • 43
  • 4
0
votes
1 answer

How can I create a tf-idf matrix with character n-gram features?

How can I use the text2vec package to create a tdf-idf matrix with character n-gram features?
Kwiebes
  • 43
  • 1
  • 6
0
votes
1 answer

Error creating vocabulary from big text file on disk

I try to perform example from https://cran.r-project.org/web/packages/text2vec/vignettes/files-multicore.html but with my file "text" - 3.7Gb plain text, build from Wikipedia XML dump with Perl script from here -…
0
votes
1 answer

Normalized topic document probabilities text2vec R

I am trying to find out the topic document probabilities after running the lda model using text2vec package in R. Following commands generate the model: lda_model <- LDA$new(n_topics = n_topics, doc_topic_prior = 0.1, topic_word_prior =…
ds_newbie
  • 79
  • 8
0
votes
1 answer

Matching documens with text2vec -- scaling problems

I am having a few issues with scaling a text matching program. I am using text2vec which provides very good and fast results. The main problem I am having is manipulating a large matrix which is returned by the text2vec::sim2() function. First,…
markthekoala
  • 1,065
  • 1
  • 11
  • 24
0
votes
1 answer

R : text2vec DTM's document number is not correct with origin document number

I am a student who uses text2vec very often. Until last year, I used this program without any problems. But today when I build the DTM with using Parallel fuction, the number of DTM's document is not correct with origin document numbers. The DTM's…
유승환
  • 129
  • 1
  • 1
  • 10
0
votes
1 answer

Implement Arora 2017 in Text2vec

I am trying to replicate Arora 2017 (https://github.com/PrincetonML/SIF / https://openreview.net/forum?id=SyK00v5xx) using text2vec. The authors compute sentence embeddings by averaging word embeddings and subtracting the first principal component.…