I am doing a simple text embedding task with the textEmbed function in r-text.
rm(list=ls())
Sys.setenv(LANG = "C.UTF-8", LC_ALL="C.UTF-8")
library(text)
temp <- textEmbed("I'm trying to do so good and I keep messing up my life. I hate it so much.", model="roberta-large", layers=23:24, dim_name = FALSE)
View(temp[["tokens"]][["texts"]][[1]])
In the result, the column "tokens" has strange characters "Ġ", "<s>", "</s>", "<pad>". And some of the embedding rows do not have values, only "NA" values.
Could anyone kindly help me find out why?
I have tried nothing to solve it yet.