I have a dataset with strings (data$text) containing names of emojis instead of actual images (e.g., FACE_WITH_TEARS_OF_JOY
). Now I'm trying to replace each emoji name with the actual emoji. The names and emojis are saved in an extra dataset which works as "dictionary" (emojis$name
and emojis$emoji
).
So this is my dataset:
data <- structure(list(text = c("blabla HUGGING_FACE PARTY_POPPER", "bla FACE_WITH_TEARS_OF_JOY bla FACE_WITH_TEARS_OF_JOY", "PARTY_POPPER")), class = "data.frame", row.names = c(NA, -3L))
looking like:
text
1 blabla HUGGING_FACE PARTY_POPPER
2 bla FACE_WITH_TEARS_OF_JOY bla FACE_WITH_TEARS_OF_JOY
3 PARTY_POPPER
Note that the emoji names are just part of the text. The rest oft the text should remain.
And this is my "dictionary":
emojis <- structure(list(name = c("FACE_WITH_TEARS_OF_JOY", "HUGGING_FACE",
"PARTY_POPPER"), emoji = c("\U0001f602", "\U0001f917", "\U0001f389"
)), class = "data.frame", row.names = c(NA, -3L))
looking like:
name emoji
1 FACE_WITH_TEARS_OF_JOY \U0001f602
2 HUGGING_FACE \U0001f917
3 PARTY_POPPER \U0001f389
For a single emoji this code works:
data$text <- gsub("FACE_WITH_TEARS_OF_JOY", "\U0001f602", data$text)
the result is:
text
1 blabla HUGGING_FACE PARTY_POPPER
2 bla \U0001f602 bla \U0001f602
3 PARTY_POPPER
However, I want to replace the other emoji names as well. The result should be:
text
1 blabla \U0001f917 \U0001f389
2 bla \U0001f602 bla \U0001f602
3 \U0001f389
As there are thousands of emojis, I need something like:
data$text <- gsub(emojis$name, emojis$emoji, data$text)
This doesn't work (error: "argument 'pattern' has length > 1 and only the first element will be used numeric ") and I couldn't find a solution myself.
Thanks in advance!