Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

Core parts written in C++
Small memory footprint
Concise, pipe friendly API
No need load all data into RAM - process it in chunks
Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions

vote

1 answer

text2vec - Do topics' words update with new data?

I'm currently performing a topic modelling using LDA from text2vec package. I managed to create a dtm matrix and then apply LDA and its fit_transform method with n_topics=50. While looking at the top words from each topic, a question popped into my…

text2vec

asked Nov 27 '17 at 22:51

Samuel Kožuch

vote

1 answer

tokenizing a list doesn't work with UTF8

I extract some data from Oracle DB to do some text mining. My data is UTF8 and vocab can't handle it.…

r encoding utf-8 text2vec

asked Sep 19 '17 at 06:33

parvij

1,381
3
15
31

vote

1 answer

LDA$new model constructor text2vec R package error: Error in .subset2(public_bind_env, "initialize")(...) : unused argument (...)

The error is: > lda_model = LDA$new(n_topics = 3, vocabulary = vocab, doc_topic_prior = 0.1, topic_word_prior = 0.01) Error in .subset2(public_bind_env, "initialize")(...) : unused argument (vocabulary = list(term = c("normal", "bobo", "lixo",…

r nlp text-mining lda text2vec

asked Aug 29 '17 at 01:10

Alexandre Peres

vote

1 answer

Lemmatization using txt file with lemmes in R

I would like to use external txt file with Polish lemmas structured as follows: (source for lemmas for many other languages http://www.lexiconista.com/datasets/lemmatization/) Abadan Abadanem Abadan Abadanie Abadan Abadanowi Abadan …

r text-mining tm quanteda text2vec

asked Aug 18 '17 at 18:02

Jacek Kotowski

vote

2 answers

Why do I get two different performances when creating Jaccard similarity matrix using two sparse matrices that seem to be the same

I'm confounded by a strange performance issue when I try to create a Jaccard similarity matrix using sim2() from text2vec package. I have a sparse matrix [210,000 x 500] for which I'd like to obtain Jaccard similarity matrix as mentioned above. When…

r sparse-matrix similarity text2vec

asked Jun 23 '17 at 08:36

Ankhnesmerira

1,386
15
29

vote

1 answer

R: how to add numeric variables to a sparse matrix?

Consider the following example library(text2vec) library(glmnet) library(dplyr) dataframe <- data_frame(id = c(1,2,3,4), text = c("this is a test", "this is another",'hello','what???'), value =…

r machine-learning r-caret text-classification text2vec

asked Jun 08 '17 at 00:36

ℕʘʘḆḽḘ

18,566
34
128
235

vote

1 answer

Can text2vec package split Chinese sentence?

How to set itoken in text2vec for spliting Chinese sentence? The example is for English! There are exsited Chinese word separation package: jieba etc. However, I want to use text2vec to do text clustering and LDA model. In addtion, how to do text…

r text-mining text2vec

asked May 04 '17 at 08:21

cindy

vote

1 answer

How to get topic probability table from text2vec LDA

The LDA topic modeling in the text2vec package is amazing. It is indeed much faster than topicmodel However, I don't know how to get the probability of each document belongs to each topic as the example below: V1 V2 V3 V4 1 0.001025237…

r lda text2vec

asked Nov 27 '16 at 06:15

Lucia

vote

1 answer

Write a text2vec dtm to a file (csv or svmlight)

I came across the text2vec package today and it's exactly what I need for a particular problem. However, I haven't been able to figure out how to export a dtm created with text2vec to some kind of output file. My ultimate goal is to generate…

r sparse-matrix svmlight text2vec

asked Nov 27 '16 at 02:32

Dave Kincaid

3,970
3
24
32

vote

2 answers

text2vec: Iterate over the vocabulary after using function create_vocabulary

Using text2vec package, I created a vocabulary. vocab = create_vocabulary(it_0, ngram = c(2L, 2L)) vocab looks something like this > vocab Number of docs: 120 0 stopwords: ... ngram_min = 2; ngram_max = 2 Vocabulary: terms…

r text-analysis text2vec

asked Nov 26 '16 at 06:28

Hardik Gupta

4,700
9
41
83

vote

1 answer

text2vec in R- Transform new data?

There is documentation on creating a DTM (document term matrix) for the text2vec package, for example the following where a TFIDF weighting is applied after building the matrix: data("movie_review") N <- 1000 it <- itoken(movie_review$review[1:N],…

r text-mining text2vec

asked Aug 26 '16 at 20:45

B_Miner

1,840
4
31
66

votes

0 answers

Including a covariate in a word embedding model in R using text2vec and quanteda packages

I am trying to build a word embedding model in r with the following code: library(quanteda) library(text2vec) fcm_ <- fcm(tokens, context = "window", count = "weighted", weights = 1 / (1:5), tri = TRUE) glove <- GlobalVectors$new(rank = 50, x_max…

r nlp word-embedding quanteda text2vec

asked Jan 26 '23 at 13:12

Iamembarassed123

votes

1 answer

How can I hide messages in R markdown when "message=FALSE" doesn't work

I am using R Markdown and text2vec and would like to suppress the messages that come from running the function glove$fit_transform(). I've tried message=FALSE and warning=FALSE, as well as a number hacky attempts to fixing the problem, but to no…

r printing r-markdown text2vec

asked Apr 29 '22 at 19:59

generic

votes

1 answer

How can I solve my problems with the installation of the text2vec package?

I'm trying to install the R package text2vec, I get the following error. It says it cannot open a certain shared object file. > install.packages("text2vec") Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object…

r installation text2vec

asked Aug 30 '21 at 14:34

Nina van Bruggen

votes

0 answers

Viewing saved LDAvis plot from directory in browser

I created an LDAvis figure using the text2veec package in R. Tried but failed to save it to my local directory as the fully interactive webpage that it is. I get either a blank page in my browser or a static when save thee figure with the following…

r text2vec pyldavis

asked Sep 08 '20 at 22:44

nigus21

Prev 1 2 3

5 6 7 8 Next