Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

Core parts written in C++
Small memory footprint
Concise, pipe friendly API
No need load all data into RAM - process it in chunks
Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions

votes

0 answers

Convert dgeMatrix for downstream tasks

I am trying to cluster sentence embeddings based on Glove model from text2vec. I generated the embeddings using the glove model like so (I create the iterator, vocab etc in the standard way). # create document term matrix dtm = create_dtm(it,…

text2vec

asked Jan 04 '18 at 10:53

user2300301

votes

1 answer

error running glmnet on 2 combined DTMs (via cBind) in text2vec

I created a tf-idf DTM and a n-gram based DTM in text2vec, using the same dataset. now, i am able to run glmnet on each of them separately, but when i combine these 2 DTMs to via cBind, glmnet gives me an error: Error in validObject(.Object)…

r text2vec

asked Dec 14 '17 at 16:59

Akhil

votes

1 answer

Sparse matrix in CSC format dgCMatrix in LiblineaR occurs error [R]

dtm_train_tfidf is a sparse matrix in CSC format dgCMatrix I am using the function LiblineaR which is supposed to accept sparse matrices. However when I use the sparse matrix dtm_train_tfidf, the following error occurs: library(LiblineaR) …

r liblinear text2vec

asked Nov 28 '17 at 15:52

toumperlekis

votes

1 answer

I have done TF-IDF and want to implement models in caret package [R]

I have implemented the TF-IDF algorithm that is explained in this link: https://cran.r-project.org/web/packages/text2vec/vignettes/text-vectorization.html#tf-idf So, the classifier is implemented like this: glmnet_classifier = cv.glmnet(x =…

r r-caret text2vec

asked Nov 27 '17 at 21:05

toumperlekis

votes

1 answer

How to use prepare_analogy_questions and check_analogy_accuracy functions in text2vec package?

Following code: library(text2vec) text8_file = "text8" if (!file.exists(text8_file)) { download.file("http://mattmahoney.net/dc/text8.zip", "text8.zip") unzip ("text8.zip", files = "text8") } wiki = readLines(text8_file, n = 1, warn = FALSE) #…

text2vec

asked Nov 14 '17 at 14:10

семен антонов

votes

1 answer

Text preprocessing and topic modelling using text2vec package

I have a large number of documents and I want to do topic modelling using text2vec and LDA (Gibbs Sampling). Steps I need are as (in order): Removing numbers and symbols from the text library(stringr) docs$text <-…

r tm topic-modeling synonym text2vec

asked Oct 20 '17 at 04:54

Sam S.

votes

1 answer

In R text2vec package - LDA model can show the topic distribution for each tokens in document?

library (text2vec) library (parallel) library (doParallel) N <- parallel::detectCores() cl <- makeCluster (N) registerDoParallel (cl) Ky_young <- read.csv("./Ky_young.csv") IT <- itoken_parallel (Ky_young$TEXTInfo, ids …

r lda topic-modeling text2vec

asked Sep 11 '17 at 06:15

유승환

votes

1 answer

The compatibility between text2vec and RHadoop

At present, we are using text2vec processing large dataset in AWS EC2(single instance), the text data will bigger and bigger in the future, we may try to RHadoop(MapReduce) architecture and don't know if it can be compatibility between text2vec and…

text2vec

asked Aug 13 '17 at 03:02

Zheng Lu

votes

1 answer

TM, Quanteda, text2vec. Get strings on the left of term in wordlist according to regex pattern

I would like to analyse a big folder of texts for the presence of names, addressess and telephone numbers in several languages. These will usually be preceded with a word "Address", "telephone number", "name", "company", "hospital", "deliverer". I…

r tm quanteda text2vec

asked Jul 31 '17 at 08:19

Jacek Kotowski

votes

2 answers

How to produce document term matrix in text2vector only from stored list of words

What is the syntax in text2vec to vectorize texts and achieve dtm with only the indicated list of words? How to vectorize and produce document term matrix only on indicated features? And if the features do not appear in the text the variable should…

r text-mining text2vec

asked Jul 28 '17 at 12:34

Jacek Kotowski

votes

1 answer

Text2Vec classification with caret - Naive Bayes warning message

Please see the question listed here for more context. I attempting to use a document term matrix, built using text2vec, to train a naive bayes (nb) model using the caret package. However, I get this warning message: Warning message: In eval(xpr,…

r r-caret naivebayes text2vec

asked Jul 16 '17 at 13:04

UbuntuNewbie

votes

1 answer

Text2Vec classification with caret SVM warning message

I am working on a text classification problem with the text2vec package and caret. I am using text2vec to build a document-term matrix before building different models with caret. The goal is to identify string similarity between two strings, using…

r svm r-caret text2vec

asked Jul 16 '17 at 11:30

UbuntuNewbie

votes

0 answers

text2vec tfidf fails in R with odd message

I encountered an odd issue when I try to use tf-idf on my corpus. Here is my code: prep_fun <- function(x) { x %>% # make text lower case str_to_lower %>% # remove non-alphanumeric symbols str_replace_all("<.*?>", " ")…

r text-mining text2vec

asked Mar 27 '17 at 15:13

Zakkery

votes

1 answer

Plotting the effect of document pruning on text corpus in R text2vec

Is it possible to check how many documents remain in the corpus after applying prune_vocabulary in the text2vec package? Here is an example for getting a dataset in and pruning vocabulary library(text2vec) library(data.table) library(tm) #Load…

r nlp text2vec

asked Mar 06 '17 at 18:59

sriramn

2,338
4
35
45

votes

1 answer

Compute unweighted bag-of-words based TCM using text2vec in R?

I am trying to compute a term-term co-occurrence matrix (or TCM) from a corpus using the text2vec package in R (since it has a nice parallel backend). I followed this tutorial, but while inspecting some toy examples, I noticed the create_tcm…

r nlp n-gram text2vec

asked Oct 30 '16 at 15:22

user3554004

1,044
9
24

Prev 1 2 3 4 5 6

8 Next