How to detect if a string has UTF-8 characters in it?

Question

Possible Duplicate:
How do I detect non-ASCII characters in a string?

I have an array representing a US-ASCII transliteration table, liket this one: http://www.geopostcodes.com/encoding#az

If the string has one of those characters, then I replace it with the ASCII correspondent (with strtr).

Because the array is huge, I wish to load it into a variable and transliterate the string only if the string contains these type of UTF-8 characters.

Is there a decently fast way to find this out?

score 3 · Accepted Answer · answered Aug 11 '12 at 21:50

3

There is no real way to do this. However, if you don't need any codepoints above ASCII 127 (so no "extended ASCII" like éáÿ), you can check if any bytes have the first bit set:

for (var i = 0; i < text.length; i++)
    if (ord(text[i]) > 127)
        // Unicode/UTF-8 character!

answered Aug 11 '12 at 21:50

Luc

5,339
2
48
48

-1 but surely that is not php eg php would use $i . If you wrote pseudocode then you shouldve said, and pseudocode(or some other language) is a bad answer to a php question. http://www.w3schools.com/php/php_looping_for.asp and apparently php has an ord function but the php manual seems to say that the ord function has issues with utf8 http://php.net/manual/en/function.ord.php which you haven't addressed or mentioned – barlop Jun 25 '16 at 09:51
@barlop It's not much work to convert this to PHP: `for($i=0;$i127){ /* unicode/utf8 character! */ } }`. As for `ord(utf8)` issues, the documentation itself doesn't mention it, only the comments claim you need to use something else. I was just posting what worked for me, of course things might be different in different cases. As I said in the post, this is for when "you don't need any codepoints above ASCII 127". – Luc Jun 25 '16 at 16:45

How to detect if a string has UTF-8 characters in it?

1 Answers1