You could do workarounds. Here's one:
dia.count <- function(string) {
y <- unlist(strsplit(string, ''))
length(grep('[A-Za-z0-9]', y, value=T))
}
dia.count(x)
[1] 4
Methods for dealing directly with character encoding is preferable. This is again, a workaround. In the general case, there may be packages or combinations of functions to address your issue comprehensively.
Update
Here is another workaround provided by comment:
nchar(sub('[^A-Za-z]+', '', x))
[1] 4
The dia.count
function looks for capital and lowercase letters along with numbers in the string. The added script does the opposite; it eliminates all string tokens that are not letters, capital or otherwise. credit @akrun
The best I could find in the package stringi
is str_enc_toascii
which gives:
stri_enc_toascii(x)
[1] "n\032ala"
Given that output, subbing out everything but letters will provide the desired output.
nchar(sub('[^A-Za-z]', '', stri_enc_toascii(x)))
[1] 4
A nice balance between a general answer and a quick script is found in the comments:
nchar(iconv("n̥ala", to="ASCII", sub=""))
[1] 4
This uses the base R
function iconv
, that converts the string for you. credit @Molx