So here is my code
ny <- read.csv2("nyt.csv", sep = "\t", header = T)
ny_texte <- as.vector(ny)
iterator <- itoken(ny_texte,
preprocessor=tolower,
tokenizer=word_tokenizer,
progressbar=FALSE)
vocabulary <- create_vocabulary(iterator)
My .csv is articles from the new york times. I would like to combine words like "new york", "south africa", "ellis island" in vocabulary and not just have token like this : "new" , "york", etc
How can I do this ?
Thank You
for more precision: I m using these libraries
library(text2vec)
library(stopwords)
library(tm)
library(dplyr)
library(readr)
- and for example about my results
ny[1]
1 " LEAD Governor Cuomo with possible Presidential campaign waiting the wings took the oath office New Year Eve for second term New York chief executive LEAD Governor Cuomo with possible Presidential campaign waiting the wings ...
vocabulary
enter image description here