I have a string called 'str', that I get from loading an RDS file.
This string contains French accents that display just fine in R studio console. However when using the ngram package on this string, the accented characters don't display right.
If I define an accented string directly in R it works just fine (see 'str2' in the code below).
How can I solve this, for example, by forcing a new encoding on my original string.
str # console displays "crédit hypothécaire en juillet"
ng <- ngram(str, n = 2,sep= " ")
get.phrasetable(ng)
# ngrams freq prop
# 1 hypothécaire en 1 0.3333333
# 2 crédit hypothécaire 1 0.3333333
# 3 en juillet 1 0.3333333
str2 <- "crédit hypothécaire en juillet"
ng2 <- ngram(str2, n = 2,sep= " ")
get.phrasetable(ng2)
# ngrams freq prop
# 1 hypothécaire en 1 0.3333333
# 2 crédit hypothécaire 1 0.3333333
# 3 en juillet 1 0.3333333
EDIT:
Suggested link (handling special characters e.g. accents in R) didn't provide a solution to my issue in the validated answer, so it's not a duplicate question, but it did provide some clues, see answer below