0

hey guys was hoping you could help me out.

I am required to make a website coded in php+codeigniter to work with utf 16 charset.

So to convert it,

I have converted the database.php settings to:

$db['default']['char_set'] = 'utf16';
$db['default']['dbcollat'] = 'utf16_unicode_ci';

I have made the config.php settings to:

$config['charset'] = 'UTF-16';

That seemed to solve the problem that was caused when outputing data, however I now have a new problem.

My form validation checks have started failing, particular the length one.

i.e when debugging I found that it was taking admin@admin.com as length 7 with the mb_strlen function.

note that it was working properly before the charset change! problem started after charset change.


update: turns out if you do mb_strlen($str,'utf-8') i get the correct answer, meaning that I am getting utf-8 encoded strings from the form.


changing the min_length function from

public function min_length($str, $val)
    {
        if (preg_match("/[^0-9]/", $val))
        {

            return FALSE;
        }

        if (function_exists('mb_strlen'))
        {

            return (mb_strlen($str) < $val) ? FALSE : TRUE;
        }

        return (strlen($str) < $val) ? FALSE : TRUE;
    }

to this:

public function min_length($str, $val)
    {
        if (preg_match("/[^0-9]/", $val))
        {

            return FALSE;
        }

        if (function_exists('mb_strlen'))
        {
            echo $str,"<br/>";
            echo mb_strlen($str),"<br/>";
            echo $val;die();
            return (mb_strlen($str) < $val) ? FALSE : TRUE;
        }

        return (strlen($str) < $val) ? FALSE : TRUE;
    }

I get the following output:

admin@admin.com
7
8

i.e it is taking admin@admin.com as length 7!

Ahmed-Anas
  • 5,471
  • 9
  • 50
  • 72
  • What version of php are you using ?? this works : http://codepad.viper-7.com/ntzVAL – Baba Oct 21 '12 at 14:20
  • yes that works. shows the correct length of 15. using php version 5.4.4. Also, the form validation was working correctly before, problem started after changing charset. – Ahmed-Anas Oct 21 '12 at 14:22

2 Answers2

0

use this:

   $utf16_string = unicode_encode($string, 'UTF-16');

   echo strlen($utf16_string);

so only after converting it to utf-16, then strlen() - will work properly

  • heres the problem, the incoming data is in utf 8 format. i.e "mb_strlen($str,'utf-8')" gives the correct result. – Ahmed-Anas Oct 21 '12 at 14:48
  • `UTF-8` is different form `UTF-16` ??? definitely UTF-8 would give you 15 because it would return `admin@admin.com` – Baba Oct 21 '12 at 14:53
  • maybe you shut try to put in .htaccess file in the root of your application "addDefaultCharset UTF-16"? – Rustam Kichinsky Oct 21 '12 at 14:53
0

Yes its correct because after conversion

 "admin@admin.com" = 摡業䁮摡業⹮潣
                        ^---------- after conversion to UTF-16

And

 mb_strlen('admin@admin.com') // 15;
 mb_strlen('摡業䁮摡業⹮潣') // 7;
Baba
  • 94,024
  • 28
  • 166
  • 217
  • Yes you are right.. after doing further reading i found out you should add "header('Content-Type: text/html; charset=utf-16');" to the index.php file. I did and after that everything was displayed in chinese like characters. so how do i fix this? – Ahmed-Anas Oct 21 '12 at 14:54
  • lol ... thinking you are chinese kung fu guy thats is why you need `UTF-16` ... my advice to convert emails or database entry to such .... :) – Baba Oct 21 '12 at 14:55
  • lol.... What do you mean by converting emails? and I have already converted my database to utf16_unicode_ci. and right now the data I am getting is from a form. – Ahmed-Anas Oct 21 '12 at 14:57
  • Ok convert it to UTF-8 during validation or string length counting – Baba Oct 21 '12 at 14:59