Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

Core parts written in C++
Small memory footprint
Concise, pipe friendly API
No need load all data into RAM - process it in chunks
Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions

vote

1 answer

Join doc_topic_distr with DTM raw data using doc_id

I want to try some kind of prediction stuff similar to this one: https://www.quora.com/How-do-I-use-LDA-Latent-Dirichlet-Allocation-for-document-classification-preferably-with-solutions-that-can-be-implemented-in-R I think that I will have to merge…

r tm text2vec

asked Aug 29 '19 at 09:30

Flocke Haus

vote

1 answer

Why do fit_transform and transform produce different results?

I was playing around with LDA in the text2vec package and was confused why the fit_transfrom and transform were different when using the same data. The documentation states that transform applys the learned model to new data but the result is a lot…

r nlp lda text2vec

asked Jul 16 '19 at 18:28

George Hall

vote

1 answer

Read GloVe pre-trained embeddings into R, as a matrix

Working in R. I know the pre-trained GloVe embeddings (e.g., "glove.6B.50d.txt") can be found here: https://nlp.stanford.edu/projects/glove/. However, I've had zero luck reading this text file into R so that the product is the word embedding matrix…

r nlp word-embedding text2vec glove

asked May 10 '19 at 11:33

Drew

vote

2 answers

Why is LSA in text2vec producing different results every time?

I was using latent semantic analysis in the text2vec package to generate word vectors and using transform to fit new data when I noticed something odd, the spaces not being lined up when trained on the same data. There appears to be some…

r quanteda lsa text2vec

asked Feb 13 '19 at 03:10

user3554004

1,044
9
24

vote

0 answers

Relaxed Word Mover's Distance in R

I am using Relaxed Word Mover's Distance in the package text2vec to compute the distance between documents, so as to identify the most similar document for each target document. Word vectors are compiled using FastText available in the pacakage…

python r gensim wmd text2vec

asked Dec 06 '18 at 09:43

TMC

vote

1 answer

Using GLOVEs pretrained glove.6B.50.txt as a basis for word embeddings R

I'm trying to convert textual data into vectors using GLOVE in r. My plan was to average the word vectors of a sentence, but I can't seem to get to the word vectorization stage. I've downloaded the glove.6b.50.txt file and it's parent zip file from:…

r word-embedding text2vec glove

asked Nov 17 '18 at 05:18

Travasaurus

vote

1 answer

How to represent each word occurrence as a separate tcm vector in R?

I am looking for an efficient way to create a term co-occurrence matrix for (each) target word in a corpus, such that each occurrence of the word would constitute its own vector (row) in a tcm, where the columns are the context words (i.e., a…

r sparse-matrix quanteda tidytext text2vec

asked Oct 23 '18 at 17:00

user3554004

1,044
9
24

vote

1 answer

LDA topic model using R text2vec package and LDAvis in shinyApp

Here is the code for LDA topic modelling with R text2vec package: library(text2vec) tokens = docs$text %>% # docs$text: a colection of text documents word_tokenizer it = itoken(tokens, ids = docs$id, progressbar = FALSE) v =…

r shiny visualization topic-modeling text2vec

asked Sep 11 '18 at 04:58

Sam S.

vote

2 answers

R function with reference to argument without evaluating it

islands1<-islands #a named num (vector) data.frame(island_col=names(islands1), number_col=islands1,row.names=NULL) This creates a dataframe consisting of two columns, the first contains the names from the named vector and is called "island_col", the…

r function indexing text2vec

asked Jul 19 '18 at 22:30

Will Hauser

vote

1 answer

how to train a lasso with both text and numeric variables?

Consider this modified classic example: library(dplyr) library(tibble) dtrain <- data_frame(text = c("Chinese Beijing Chinese", "Chinese Chinese Shanghai", "France", …

r classification tm text2vec

asked Jun 16 '18 at 13:56

ℕʘʘḆḽḘ

18,566
34
128
235

vote

0 answers

How to use build classifier (based on word embeddings) on new data for sentiment analysis?

So I used the text2vec R package to build word vectorizations for feature selection. I did that according to Dmitriy Selivanov's page http://text2vec.org/vectorization.html, which explains how to properly use text2vec before building a…

r sentiment-analysis text2vec

asked Apr 30 '18 at 12:42

Lucinho91

vote

0 answers

How to create svm plot with document term matrix from text2vec package in R?

I'm using the text2vec package to create a vocabulary document term matrix as described here: http://text2vec.org/vectorization.html#vectorization In particular, I am using SVM from the e1071 package. I made a similar vocabulary term document matrix…

r text2vec

asked Apr 18 '18 at 15:39

Kwiebes

vote

0 answers

get word vectors for each document

I stumbled upon text2vec package, it implements word embeddings in R. I have been experimenting with it successfully. However, I have been trying implement word vectors onto each document exactly like i found in H2O(python) here…

r text2vec

asked Mar 03 '18 at 17:10

Shoaibkhanz

1,942
3
24
41

vote

2 answers

How do I include stopwords(terms) in text2vec

In text2vec package, I am using create_vocabulary function. For eg: My text is "This book is very good" and suppose I am not using stopwords and an ngram of 1L to 3L. so the vocab terms will be This, book, is, very, good, This book,..... book is…

r text-mining text2vec

asked Feb 22 '18 at 09:39

tej kiran

vote

1 answer

ngrams using hash_vectorizer in text2vec

I was trying to create ngrams using hash_vectorizer function in text2vec, when I noticed that it doesn't change the dimensions of my dtm wit changing values. h_vectorizer = hash_vectorizer(hash_size = 2 ^ 14, ngram = c(2L, 10L)) dtm_train =…

r hash text-mining text2vec

asked Dec 14 '17 at 14:38

Akhil

Prev 1 2

4 5 6 7 8 Next