1

I just realized that their behaviour is different and I wonder whether that is on purpose. As explained here it might not be possible to supply a good reproducible example as encoding is crucial:

a <- c("a", "b", "ä", "ü", "ö")
print(a)
# [1] "a" "b" "ä" "ü" "ö"
print(Encoding(a))
# [1] "unknown" "unknown" "latin1"  "latin1"  "latin1"
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8" 

match("ä", a)
# [1] NA # This is what I did not expect...
charmatch("ä", a)
# [1] 3 # ok
grepl("ä", a)
# [1] FALSE FALSE  TRUE FALSE FALSE # ok

The match documentation only states

Character strings will be compared as byte sequences if any input is marked as "bytes" (see Encoding).

Comment on encoding: I also tried the following without problem

a <- c("a", "b", "c", "d", "e")
print(a)
# [1] "a" "b" "c" "d" "e"
print(Encoding(a))
# [1] "unknown" "unknown" "unknown" "unknown" "unknown"
match("a", a)
# [1] 1 # ok!!!
charmatch("a", a)
# [1] 1 # ok
grepl("a", a)
# [1]  TRUE FALSE FALSE FALSE FALSE # ok

Edit:

Sys.getlocale()
[1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;
     LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C LC_TIME=German_Switzerland.1252"

Edit 2:

I just realized that the effect only appears after

a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8" 

I changed the code above.

Christoph
  • 6,841
  • 4
  • 37
  • 89
  • The first `match()` works for me so it's a bit weird : `> match("ä", a)` : `[1] 3` – Mbr Mbr Jul 11 '17 at 08:05
  • @MbrMbr What is `Encoding(a)` in your case? – Christoph Jul 11 '17 at 08:06
  • Same result as yours – Mbr Mbr Jul 11 '17 at 08:08
  • @MbrMbr See my edit. Any difference there? Do you have any other idea? – Christoph Jul 11 '17 at 08:13
  • Yeah it's a bit different since I'm not in the same country : `"LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"` and I do not know why `match`returns NA in your case :/ – Mbr Mbr Jul 11 '17 at 08:16
  • @MbrMbr Can you reproduce it now? See my edit... – Christoph Jul 11 '17 at 09:01
  • https://gyazo.com/05fcc400f9195239af1737dcd6cbe902 – Mbr Mbr Jul 11 '17 at 09:04
  • Given your locale you’re on Windows: it’s known that R has trouble with encodings on Windows. Unfortunately I don’t know *why* your particular code doesn’t work, or how to fix it. As @MbrMbr noted, this works as expected (on non-Windows systems, although given the font in the screenshot it looks as if Mbr Mbr is also using Windows?!). – Konrad Rudolph Jul 11 '17 at 09:09
  • @KonradRudolph Yean I'm using Windows. – Mbr Mbr Jul 11 '17 at 09:11

0 Answers0