I would like to explicitly replace specific tokens defined in objects of class tokens
of the package quanteda. I fail to replicate a standard approach that works well with stringr.
The objective is to replace all tokens of the form "XXXof"
in two tokens of the form c("XXX", "of")
.
Please, have a look at the minimal below:
suppressPackageStartupMessages(library(quanteda))
library(stringr)
text = "It was a beautiful day down to the coastof California."
# I would solve this with stringr as follows:
text_stringr = str_replace( text, "(^.*?)(of)", "\\1 \\2" )
text_stringr
#> [1] "It was a beautiful day down to the coast of California."
# I fail to find a similar solution with quanteda that works on objects of class tokens
tok = tokens( text )
# I want to replace "coastof" with "coast"
tokens_replace( tok, "(^.*?)(of)", "\\1 \\2", valuetype = "regex" )
#> Tokens consisting of 1 document.
#> text1 :
#> [1] "It" "was" "a" "beautiful" "day"
#> [6] "down" "to" "the" "\\1 \\2" "California"
#> [11] "."
Any workaround?
Created on 2021-03-16 by the reprex package (v1.0.0)