There is this strange behavior of stringr
, which is really annoying me. stringr
changes without a warning the encoding of some strings that contain exotic characters, in my case ø, å, æ, é and some others... If you str_trim
a vector of characters, then those with exotic letters will be converted to a new Encoding.
letter1 <- readline('Gimme an ASCII character!') # try q or a
letter2 <- readline('Gimme an non-ASCII character!') # try ø or é
Letters <- c(letter1, letter2)
Encoding(Letters) # 'unknown'
Encoding(str_trim(Letters)) # mixed 'unknown' and 'UTF-8'
This is a problem because I use data.table for (fast) merge of big tables and that data.table does not support mixed encoding and because I could not find a way to get back to the uniform encoding.
Any work-around?
EDIT: i thought I could get back to the base functions, but they don't either protect encoding. paste
conserves it, but not sub
for instance.
Encoding(paste(' ', Letters)) # 'unknown'
Encoding(str_c(' ', Letters)) # mixed
Encoding(sub('^ +', '', paste(' ', Letters))) # mixed