I am looking to split a string into ngrams of 3 characters - e.g HelloWorld would become "Hel", "ell", "llo", "loW" etc How would I achieve this using R?
In Python it would take a loop using the range function - e.g. [myString[i:] for i in range(3)]
Is there a neat way to loop through the letters of a string using stringr
(or another suitable function/package) to tokenize the word into a vector?
e.g.
dfWords <- c("HelloWorld", "GoodbyeMoon", "HolaSun") %>%
data.frame()
names(dfWords)[1] = "Text"
I would like to generate a new column which would contain a vector of the tokenized Text variable (preferably using dplyr
). This can then be split later into new columns.