Difference between match, charmatch and grepl

Question

I just realized that their behaviour is different and I wonder whether that is on purpose. As explained here it might not be possible to supply a good reproducible example as encoding is crucial:

a <- c("a", "b", "ä", "ü", "ö")
print(a)
# [1] "a" "b" "ä" "ü" "ö"
print(Encoding(a))
# [1] "unknown" "unknown" "latin1"  "latin1"  "latin1"
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8" 

match("ä", a)
# [1] NA # This is what I did not expect...
charmatch("ä", a)
# [1] 3 # ok
grepl("ä", a)
# [1] FALSE FALSE  TRUE FALSE FALSE # ok

The match documentation only states

Character strings will be compared as byte sequences if any input is marked as "bytes" (see Encoding).

Comment on encoding: I also tried the following without problem

a <- c("a", "b", "c", "d", "e")
print(a)
# [1] "a" "b" "c" "d" "e"
print(Encoding(a))
# [1] "unknown" "unknown" "unknown" "unknown" "unknown"
match("a", a)
# [1] 1 # ok!!!
charmatch("a", a)
# [1] 1 # ok
grepl("a", a)
# [1]  TRUE FALSE FALSE FALSE FALSE # ok

Edit:

Sys.getlocale()
[1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;
     LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C LC_TIME=German_Switzerland.1252"

Edit 2:

I just realized that the effect only appears after

a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8"

I changed the code above.

The first `match()` works for me so it's a bit weird : `> match("ä", a)` : `[1] 3` — Mbr Mbr, Jul 11 '17 at 08:05
@MbrMbr See my edit. Any difference there? Do you have any other idea? — Christoph, Jul 11 '17 at 08:13
Yeah it's a bit different since I'm not in the same country : `"LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252"` and I do not know why `match`returns NA in your case :/ — Mbr Mbr, Jul 11 '17 at 08:16
Given your locale you’re on Windows: it’s known that R has trouble with encodings on Windows. Unfortunately I don’t know *why* your particular code doesn’t work, or how to fix it. As @MbrMbr noted, this works as expected (on non-Windows systems, although given the font in the screenshot it looks as if Mbr Mbr is also using Windows?!). — Konrad Rudolph, Jul 11 '17 at 09:09

Difference between match, charmatch and grepl

0 Answers0

Linked