Questions tagged [text2vec]

text2vec - R package which provides a fast and memory efficient framework for text mining applications within R. Vectorization, word embeddings, topic modelling and more.

text2vec goal is to provide tools to easily perform text mining in R with C++ speeds:

  1. Core parts written in C++
  2. Small memory footprint
  3. Concise, pipe friendly API
  4. No need load all data into RAM - process it in chunks
  5. Easily vertical scaling with multiple cores, threads.

See development page at github.

111 questions
0
votes
1 answer

In text2vec package in R, could not find function "create_vocab_corpus"

I was trying to understand the text2vec package from http://dsnotes.com/articles/text2vec but at the following step: Now we can costruct DTM. Again, since all functions related to corpus construction have streaming API, we have to create iterator…
Saurabh Yadav
  • 365
  • 4
  • 13
-1
votes
1 answer

text2vec document similarity code returns two values

I am learning to assess text similarity in between documents. Going through the text2vec tutorial (http://text2vec.org/similarity.html) on the topic, I noticed that the code returns two values for similarity. Here is the tail end of the code in the…
-1
votes
1 answer

Combine two words in a corpus with R

So here is my code ny <- read.csv2("nyt.csv", sep = "\t", header = T) ny_texte <- as.vector(ny) iterator <- itoken(ny_texte, preprocessor=tolower, tokenizer=word_tokenizer, …
-1
votes
1 answer

Text Similarity - Cosine - Control

I would like to ask you, if anybody could check my code, because it was behaving weird - not working, giving me errors to suddenly working without changing anything - the code will be at the bottom. Background: So my goal is to calculate text…
-1
votes
1 answer

How to convert text fields into numeric/vector space for a SVM in R Studio?

I am attempting to train a Support Vector Machine to aid in the detection of similarity between strings. My training data consists of two text fields and a third field that contains 0 or 1 to indicate similarity. This last field was calculated with…
UbuntuNewbie
  • 29
  • 1
  • 5
-2
votes
0 answers

How to make embedding models sensitive to numbers?

I have a set of data, but it is presented in the form of logs such as v0.1.1, v0.2.3, and when I try it with a pretrained text2vec model I find it hard to pinpoint the exact version number or update date, seeing as it seems to be insensitive to the…
Omnis
  • 1
  • 1
1 2 3 4 5 6 7
8