4

I want to store UTF8 in database. I have data in Unicode Hindi and I want to store in MySQL database using php after converting it to HTML character sets. let's say someone enters a bullet (•) character into a text box. When saving that data, should it be converted to •.

Suppose I have data मेरा भारत महान I want to store it in database by converting it to html character. How can I do that? I tried to use htmlentities function but that doesn't work satisfactorily for me.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
Rahul Singh
  • 1,614
  • 6
  • 22
  • 39
  • 10
    Why do you want to convert to entities? What is wrong with just using UTF-8? – Quentin Nov 25 '11 at 10:08
  • 4
    Please define "not satisfactorily". Also seconded, what's wrong with UTF-8? – deceze Nov 25 '11 at 10:09
  • `मेरा भारत महान` stores as it is in database I want to convert it in somewhat this kind of format `•` – Rahul Singh Nov 25 '11 at 10:11
  • 5
    @Rahul but *why*? This doesn't make any sense at all. What problem is leading you to doing this? This sounds like a really dumb idea. – Pekka Nov 25 '11 at 10:18

2 Answers2

3

The • thingies are called HTML Entities. In PHP there is a function that can create these: mb_encode_numericentityDocs, it's part of the Multibyte String extension (Demo):

$string = 'मेरा भारत महान';
$encoding = 'UTF-8';

$convmap = array(0, 0xffff, 0, 0xffff);
$encoded = mb_encode_numericentity($string, $convmap, $encoding);

echo $encoded; मेरा भारत महान

However: You need to know the encoding of your string. In this case I've chosen UTF-8, depending on it you need to modify the $encoding parameter of the function and the $convmap array.

However, don't store it that way into your database. Store it as-is and convert the output encoding after you retrieved the data from your database.

Similar Question: Convert (doublebyte) string to Hex

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • 2
    It is also worth mentioning that doing this is probably a very dumb idea in general. (I don't mean your answer, which is fine, but the very idea of converting multibyte characters into HTML entities - my bet is the OP is having charset problems that should be fixed at their core) – Pekka Nov 25 '11 at 10:19
  • Yes, I added a note that the text should not be stored with HTML entities into the database. That would be one encoding too much. – hakre Nov 25 '11 at 10:20
  • @pekka: I don't have such problem Even Ic an view my data in DB as-is rather then barcode... – Rahul Singh Nov 25 '11 at 10:26
  • @Rahul why do you want to do this then? – Pekka Nov 25 '11 at 10:30
  • @pekka: Just to learn how can I achieve the task..... :D hakre: wat does this conmap defines? pelase explain is it character range? – Rahul Singh Nov 25 '11 at 10:32
  • @Rahul Using HTML entities is *no longer necessary* if you are using UTF-8 everywhere in your application. But it's your call – Pekka Nov 25 '11 at 10:33
  • I did and edited my DB to UTF-8 So I won't have any problem in future...but just for learning point of view.... I asked here... Thanks for helping... – Rahul Singh Nov 25 '11 at 10:36
  • @RahulSingh: `convmap` specifies which from all characters of a charset are being converted into numeric entities. In the example, all are converted. The numbers are hexadecimal in the example. – hakre Nov 25 '11 at 12:51
0

htmlentities has a charset parameter, Try: htmlentities($text, ENT_COMPAT, "UTF-8")

Daniel Fekete
  • 4,988
  • 3
  • 23
  • 23