2

I have the following encoded Hebrew strings in an old DB:

éçìéó àú ùîåàì æåñîï äòåáã á÷áåöä îòì 50 ùðä

The ASP code that is being used to decode this string is the following:

function Get_RightHebrew(ByVal sText)
    Dim i
    Dim sRightText

    if isNull(sText) then
        sRightText = ""
    else
        For i = 1 To Len(sText)
            If (AscW(Mid(sText, i, 1)) >= 1488 And AscW(Mid(sText, i, 1)) <= 1514) Then
                sRightText = sRightText & Chr(AscW(Mid(sText, i, 1)) - 1264)
            else
                sRightText = sRightText & Mid(sText, i, 1)
            End If
        Next
    end if

    Get_RightHebrew = sRightText

End Function

I'm looking for an equivalent PHP function to convert the string to correct UTF-8

STF
  • 1,485
  • 3
  • 19
  • 36
lior r
  • 2,220
  • 7
  • 43
  • 80
  • I can not convert your code from ASP to PHP but you can use mb_convert_encoding() function of PHP. You need to save your PHP file as UTF-8 without BOM. – Koray Küpe May 23 '17 at 13:07
  • A BOM is superfluous with UTF-8 anyway, it is used for text editors to hint for Unicode encoding. – Code4R7 May 23 '17 at 13:25
  • @KorayKüpe CP1255 is not supported: http://php.net/manual/en/mbstring.supported-encodings.php – Alex Blex May 23 '17 at 13:28
  • @Code4R7 Then give a try iconv("utf-8", "cp1255", $value); – Koray Küpe May 23 '17 at 13:30
  • @Koray Küpe, you mean at Alex Blex ;) Because ICU is the _facto de standard_ from the Unicode Consortium, I'd skip all other functions for transcoding. Although `iconv` does come in handy for transliteration. – Code4R7 May 23 '17 at 13:35
  • @Code4R7 Sorry for wrong mention. :) – Koray Küpe May 23 '17 at 13:37
  • iconv(): Detected an illegal character in input string – lior r May 23 '17 at 16:10
  • @liorr, Is it the string you have in the database, or a string you see in your db client? It doesn't look like Hebrew to me. Could you update the question with result of `bin2hex` for the string as you get it from the db. It is essential to get the value with php db driver and pass it to the function directly, not just copy-paste the string to avoid wrong transcoding. – Alex Blex May 23 '17 at 21:23
  • Check out Kul-Tigin reply , that did the trick – lior r May 25 '17 at 13:14

1 Answers1

3

You've got a CP1255 encoded string but decoded with CP1252 (Latin1), so you can get your Hebrew text back by cheating.

# mis-decoded string
$str = "éçìéó àú ùîåàì æåñîï äòåáã á÷áåöä îòì 50 ùðä";

# convert to CP1252 from UTF-8
$str = iconv("UTF-8", "CP1252", $str);

# convert to UTF-8 by claiming $str is encoded with CP1255
$str = iconv("CP1255", "UTF-8", $str);

echo $str;

Here's the test I made online: https://3v4l.org/7taaN

I'd like to share an example code that uses mb_* functions instead of iconv but CP1255 is not supported. Using the charset ISO-8859-8 with mb_* instead is an option but since it's a subset of CP1255 it's likely to experience data loss.

Kul-Tigin
  • 16,728
  • 1
  • 35
  • 64
  • Nice catch! How did you find out that it was cp1252 decoded? I'm also curious about why you prefer mb_* / iconv_* functions ? – Code4R7 May 24 '17 at 07:42
  • @Code4R7 From experience in fact. It's a common mistake made in ASP. When you don't specify CodePage it's often CP1252 by default, but the ` – Kul-Tigin May 24 '17 at 07:56
  • Thank you for sharing, I see now why you prefer `mb_*` functions over `iconv`. Personally I like `Intl` somewhat better, then you don't have to configure/overwrite the in/internal/out encodings before use. And after all, when Unicode is used the application/site is likely to be international as the rest of the 'net, and Intl offers all kinds of extra's like the [IntlCalendar](http://php.net/manual/en/class.intlcalendar.php). – Code4R7 May 24 '17 at 08:16
  • @Code4R7 It's very promising but lacks of good documentation. I will keep it in my mind, thanks. – Kul-Tigin May 24 '17 at 08:32
  • 1
    Most of the documentation is at http://userguide.icu-project.org . Lack of good documentation is a problem indeed, it happens all the time when real fun begins with PHP. When the going gets tough, the tough get going. – Code4R7 May 24 '17 at 08:53
  • YOU ARE A GENIUS! – lior r May 25 '17 at 13:06