2

I'm trying to store text (with emoji) From an iPhone Client App on a MySQL database with Erlang. (Into a varchar column)

I used to do it with a socket connection server done with C++ and mysqlpp, it was working great. (It is the exact same database, So I can assume that the issue is not coming from the database)

However, I decided to pass everything on Erlang for scalability reasons, and since, I am unable to store and retrieve correctly emojis.

I'm using emysql to communicate with my database.

When I'm storing, I'm sending this list to the database :

[240,159,152,130]

When I'm retrieving, here what I get :

<<195,176,194,159,194,152,194,130>>

There is some similarities obviously, we can see 159, 152 and 130 on both lines, but no 240. I do not know where 195, 176 and 194 come from.

I though about changing the emysql encoding when creating the connection pool.

 emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", utf8)

But I can seems to find the proper atom for utf32 encoding. (The interesting thing is that I have not set any encoding on C++ and mysqlpp, it worked out of the box).

I have made some test...

storing from C++, retrieving from C++ (Works fine)
storing from Erlang, retrieving from Erlang (Does not work)
storing from Erlang, retrieving from C++ (Does not work)
storing from C++, retrieving from Erlang (Does not work)

One more information, I'm using prepared statement on Erlang, while I'm not on C++

Any help would be appreciated.

AS requested, here the query for storing data :

UPDATE Table SET c=? WHERE id=?

Quite simple really...

TheSquad
  • 7,385
  • 8
  • 40
  • 79
  • show us the query you are sending to the Database from the Erlang side. Show us exactly what you intend to write to the database. `VARCHAR2` i would assume holds a string, yet you are sending an erlang List !!!! show us more. – Muzaaya Joshua Mar 12 '12 at 16:07
  • I'm pretty sure that Strings in Erlang are Lists (With printable characters)... isn't it ? – TheSquad Mar 12 '12 at 16:49
  • Strings in Erlang can be either lists or binaries or a combination of both. – rvirding Mar 12 '12 at 23:19
  • But, what he was sending: `[240,159,152,130]` is not at all a list of printable characters, unfortunately :) – Muzaaya Joshua Mar 13 '12 at 05:32
  • I have set the encoding language of the SQL connection to latin1, since everything works fine. – TheSquad Oct 18 '12 at 15:18

1 Answers1

1

It is all about utf-8 encoding. In Erlang a list of characters, in your case [240,159,152,130], aren't normally encoded but are the unicode code points. When you retrieved the data you got a binary containing with utf-8 encoding bytes of your characters. Exactly where this encoding occurred I don't know. From the erlang shell:

10> Bin = <<195,176,194,159,194,152,194,130>>.   
<<195,176,194,159,194,152,194,130>>
11> <<M/utf8,N/utf8,O/utf8,P/utf8,R/binary>> = Bin.
<<195,176,194,159,194,152,194,130>>
12> [M,N,O,P].
[240,159,152,130]

Handling unicode in erlang is pretty simple, characters in lists are usually the unicode code points and are very rarely encoded, while storing them in binaries means you have to encode them in some way, as binaries are just arrays of bytes. The default encoding is utf-8. In the module unicode there are functions for converting between unicode lists and binaries.

rvirding
  • 20,848
  • 2
  • 37
  • 56
  • This is clearly an encoding issue, is there a way to determine weather the String is latin1 or unicode encoded? – TheSquad Mar 13 '12 at 02:10
  • No unfortunately, but just make sure that String is not encoded. Although that may be difficult with data sent into you. But if you receive it as a binary, which you did, then you will know that it is encoded, it cannot be otherwise. – rvirding Mar 14 '12 at 22:12