1

I have a PHP file which takes UTF-8 (Malayalam) words from a MySQL database and displays it in a browser after encoding it into JSON. The MySQL database is in UTF-8 format. The database contains Malayalam words. When I try to display the words without converting it into JSON, it displays fine as Malayalam, whereas when I convert it into JSON using json_encode the Malayalam words are displayed as unknown characters, which I think is of ASCII format. I will show my PHP file and the code which I have used here:

<html>
 <head>
  <meta charset="utf-8">
 </head>
 <body>
  <?php
   error_reporting(E_ALL); 
   ini_set('display_errors', 1);
   $con=mysqli_connect("localhost","username","password","db_name"); 
   if (mysqli_connect_errno($con)) 
   { 
      echo "Failed to connect to MySQL: " . mysqli_connect_error(); 
   } 
   $con->set_charset("utf8");

   $cresult = mysqli_query($con,"SELECT * FROM leaders"); 
   $rows = array();
   while($r = mysqli_fetch_assoc($cresult)) {
      $rows[] = $r["name"];
      //This displays the names correctly in malayalam like this: പോള്‍ ജോസഫ്‌ 
      // etc in the browser
      //echo ($r["name"]);
   }
   $encoded= json_encode(array('Android' => $rows));
   //Converting to json displays the names as weird characters like this: 
   //  \u0d2a\u0d3f.\u0d35\u0d3f.\u0d2a\u0d4b\u0d33\u0d4d\u200d
   echo ($encoded);
   mysqli_close($con);
  ?> 
 </body>
</html>

How do I get Malayalam correctly as JSON? I need JSON because I need this JSON data sent to my client side (Android) for displaying it in my app. Please correct me if I'm going in the wrong track.

Dave Cousineau
  • 12,154
  • 8
  • 64
  • 80
njnjnj
  • 978
  • 4
  • 23
  • 58

1 Answers1

1

JSON fully supports Unicode (rather should I say the standard for parsers does). The problem is that PHP does not fully support Unicode.

In this stack overflow question, I'll quote

Some frameworks, including PHP's implementation of JSON, always do the safe numeric encodings on the encoder side. This is intended for maximum compatibility with buggy/limited transport mechanisms and the like. However, this should not be interpreted as an indication that JSON decoders have problems with UTF-8.

Those "unknown characters" that you are referring to are actually known as Unicode Escape Sequences, and are there for parsers built in programming languages that do not fully support Unicode. These sequences are also used in CSS files, for displaying Unicode characters (see CSS content property).

If you want to display this in your client side app (I'm going to assume you're using Java), then I'll refer you to this question

tl;dr: There is nothing wrong with your JSON file. Those encodings are there to help the parser.

Community
  • 1
  • 1
Frank the skank
  • 108
  • 2
  • 10