0

I have vector (x) of countries, one of the countries is Cote d'Ivoire

x <- c("c\u00f4te", "côte") I investigate x I realized that the both cote are not the same

showNonASCII(x) 1: cte 2: cte iconv(x, to="ASCII//TRANSLIT") [1] "cA?te" "cote" Encoding(x) [1] "UTF-8" "latin1"

I would like to unify to so both x are latin1 and equal to each other.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Mohamed
  • 95
  • 7

1 Answers1

0

It seems like this problem does not arise when used in Mac-OS (with R 3.5.0 MacOS High Sierra v 10.13.6).

x <- c("c\u00f4te", "côte")

# check if both are equal
x[1] == x[2]

[1] TRUE

# try to extract the word, if they are different only one should be returned
library(stringr)
str_extract_all(x, "côte")

[[1]]
[1] "côte"

[[2]]
[1] "côte"

The problem might be related to different encoding systems used in Windows.

Suhas Hegde
  • 366
  • 1
  • 6
  • 13