How do I determine if a string is Unicode text or contains any binary data?
- Using ctype_print will only work if you 100% expect the string to only be ASCII, mine will contain Unicode.
preg_match('~[^\x20-\x7E\t\r\n]~', $str) > 0
only covers a limited range of Unicode.strpos($string, "\0")===FALSE
implies all binary data must have a NUL byte.- mb_detect_encoding detects strings as UTF-8 even if all characters are exclusively binary.
- mb_check_encoding detects strings as UTF-8 even if all characters are exclusively binary.
strlen($string) != strlen(utf8_decode($string))
can only detect if a string is not ASCII.
One possible approach: detect if any characters have an ID that is beyond Unicode. However I don't know how binary data works and if that is applicable. Nor could I find anything on returning a character's numeric assignment (e.g. ! is 0021).