If a text has German Umlauts [äöü] the result of preg_match_all has wrong offsets (it seems each Umlaut extend the offset by 1)
I need the position of each word, because they will be replaced by other strings. With this tool https://regex101.com/r/UosqVD/2 it worked, the matches have the correct start value.
$pattern = "~\b\w+\b~u";
$text = "Käthe würde gerne wählen.";
if (preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[0] as $m) {
echo $m[0]."; ".$m[1]."; ".mb_strlen($m[0], "utf-8")."<br />";
}
}
Text; Start, Length<br>
Käthe; 0; 5<br>
würde; 7; 5<br>
gerne; 14; 5<br>
wählen; 20; 6<br>