0

I need to allow the 'okina character through the following code

<?php $char = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\xFF]/u', '', $char); ?>

I've tried to find a clear explanation of how the ranges work here and how to pick out one letter from a range with no luck. How do I do that?

Thanks

awolfey
  • 9
  • 1

2 Answers2

0

Since your regular expression is only blocking "certain" characters one byte in length or less, it can't possibl be blocking the okina character, since it is a two byte character. So, I propose you try this regex:

<?php $char = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F]/u', '', $char); ?>

and allow all characters x80-xFF through and see if that solves your problem.

If it does, it's probably the "left single quotation mark" (x91 in the western character set) you are confusing with the okina.

keyboardSmasher
  • 2,661
  • 18
  • 20
0

The character you refer to is MODIFIER LETTER TURNED COMMA (U+02BB), which is described, in the Unicode standard, as “used in Hawaiʻian orthography as ʻokina (glottal stop)”. It might be argued that it is the most correct ʻokina, but it is surely not the only character used for that purpose. Quite often ʻokina is written as right single quotation mark ‘ U+2019, a left single quotation mark ’ U+2018, (Ascii) apostrophe ' U+0027, or grave accent ` U+0060.

But considering U+02BB, it can be written in a PHP regexp as \x{02bb}. For an explanation of the notation, see How do I replace characters not in range [0x5E10, 0x7F35] with '*' in PHP?

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390