PHP mb_strpos of UTF-8 encoded Thai strings

Question

First, want I need: I simply want to get a substring STARTING from '<' (the position of the first '<' character.)

<php
  mb_internal_encoding("UTF-8");
  $s  = iconv('UTF-8', 'cp874', "เรา <l>AB");
  $lt = mb_strpos($s, "<");
  $newString = mb_substr($s, $lt, 99999);
?>

mb_strpos seemed to be the problem, so I tried to "low level debug" it.

Before someone complains, $s came from a UTF8 DB read, and if I simply print it, it works, it contains the same characters that you see there. Also, if I "hexprint" it, it matches with this one.
But the above code simply doesn't give me the right position.

mb_internal_encoding("UTF-8");
// "s" originally came from a DB
// but the hexprint is EXACTLY the same so...
$s  = iconv('UTF-8', 'cp874', "เรา <l>AB");
$lt = mb_strpos($s, "<");
$os = utf8_to_hex($s);
echo "lt=$lt, [$os]<br>";
$newString = mb_substr($s, $lt, 99999);
echo "New string: [".utf8_to_hex($newString)."]";

This is the output:

lt=4, [E0C3D23C6C3E4142]
New string: [3E4142]

How can lt be 4? Shouldn't it be 3? Then, with lt being 4, mb_strpos is "correct" in its own wrongness, but that behavior messes up all my substring calculations.
Is there a better way to do it? It's driving me mad.
Again: I simply need a SUBSTRING of an utf8 string UNTIL (not including) the first '<' character (or the opposite, a substring FROM the first '<' until the end...)

In needed, I grabbed the "utf8tohex" function from SO, here it is:

function utf8_to_hex($string) {
  $hex = '';
  for ($i = 0; $i < strlen($string); $i++) {
      $ord = ord($string[$i]);
      if ($ord < 128) {
          $hex .= dechex($ord);
      } else if ($ord < 224) {
          $hex .= substr(dechex($ord), -2);
          $i++;
      } else if ($ord < 240) {
          $hex .= substr(dechex($ord), -2);
          $i++;
          $hex .= substr(dechex(ord($string[$i])), -2);
      } else {
          $hex .= substr(dechex($ord), -2);
          $i++;
          $hex .= substr(dechex(ord($string[$i])), -2);
          $i++;
          $hex .= substr(dechex(ord($string[$i])), -2);
      }
  }
  return strtoupper($hex);
}

PHP mb_strpos of UTF-8 encoded Thai strings

0 Answers0