I just realized that their behaviour is different and I wonder whether that is on purpose. As explained here it might not be possible to supply a good reproducible example as encoding is crucial:
a <- c("a", "b", "ä", "ü", "ö")
print(a)
# [1] "a" "b" "ä" "ü" "ö"
print(Encoding(a))
# [1] "unknown" "unknown" "latin1" "latin1" "latin1"
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8" "UTF-8" "UTF-8"
match("ä", a)
# [1] NA # This is what I did not expect...
charmatch("ä", a)
# [1] 3 # ok
grepl("ä", a)
# [1] FALSE FALSE TRUE FALSE FALSE # ok
The match
documentation only states
Character strings will be compared as byte sequences if any input is marked as "bytes" (see Encoding).
Comment on encoding: I also tried the following without problem
a <- c("a", "b", "c", "d", "e")
print(a)
# [1] "a" "b" "c" "d" "e"
print(Encoding(a))
# [1] "unknown" "unknown" "unknown" "unknown" "unknown"
match("a", a)
# [1] 1 # ok!!!
charmatch("a", a)
# [1] 1 # ok
grepl("a", a)
# [1] TRUE FALSE FALSE FALSE FALSE # ok
Edit:
Sys.getlocale()
[1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;
LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C LC_TIME=German_Switzerland.1252"
Edit 2:
I just realized that the effect only appears after
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8" "UTF-8" "UTF-8"
I changed the code above.