I have a string vector, in which some values are in Vietnamese, written in UTF-8 encoding.
> so_wrong
[1] "Thiết bị & dịch vụ" "Quản lý"
[3] "Hãng" "Thời tiết"
[5] "Lý do khác" "Tàu bay về muộn"
[7] "Kỹ thuật" "Thương mại"
[9] "Khai thác" "Quản lý, điều hành bay"
[11] " "
I want to remove another vector which contains the last two values: "Quản lý, điều hành bay" and " ". But R does not recognize them.
> any(so_wrong == " ")
[1] FALSE
> any(so_wrong == "Quản lý, điều hành bay")
[1] FALSE
...even through the values input in these commands is exactly the values in the vector (I copy-pasted them in). This work, on the other hand:
> any(so_wrong == so_wrong[11])
[1] TRUE
What is the problem and how to solve/workaround with it?
EDIT: The encoding
> Encoding(so_wrong)
[1] "UTF-8" "UTF-8" "latin1" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
[8] "UTF-8" "latin1" "UTF-8" "UTF-8"
EDIT: I saved the vector to a csv and pushed it here