0

I Have wrote a question which got a right answer here about emysql encoding. The answer pinpoint another question...

I'm trying to store iPhone emojis into a database...

When I do :

Query = io_lib:format("UPDATE Users SET c=\"~s\" WHERE id=~B", [C, Id]),
emysql:execute(mydb, Query).

Everything works fine...

But with:

emysql:prepare(update_c, <<"UPDATE Users SET c=? WHERE id=?">>),
emysql:execute(mydb, update_c, [C, Id]).

I'm retrieving Mojibake. EDITED TO USE THE CORRECT TERM

I'm connecting with :

 emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", latin1)

Unfortunately, I cannot use utf8 because of the previous software that used the database and stored emoji's that way, If I do use utf8, it will work with the new system, but not with rows inserted by the old one.

EDIT:

I really would really like to use prepared statement, that would prevent SQL injection effectively.

Community
  • 1
  • 1
TheSquad
  • 7,385
  • 8
  • 40
  • 79
  • Can you qualify "gibberish", e.g are you receiving Mojibake or just errors? – zetavolt Mar 14 '12 at 21:42
  • gibberish = Mojibake, yes... but only with prepared statement, right now, I'm using non-prepared statement, with mysql_util:encode() to be sure there is no injection possible, but I really would like to use prepared statement. – TheSquad Mar 14 '12 at 23:21

2 Answers2

2

Edit: should be fixed in 253b7f94f9b04526e6868d7b693e6e9ee41de374. Thanks for feedback. https://github.com/Eonblast/Emysql/commit/253b7f94f9b04526e6868d7b693e6e9ee41de374


I believe it's an error in Emysql and I think I fixed it. Still working out the unit tests so it all makes sense. I'll let you know when it's posted to github.

I opened an issue for this: https://github.com/Eonblast/Emysql/issues/24

Essentially, you are tricking the driver and the database because you open the connection with latin-1 but the database is utf-8. Then you trip over the automatic conversion.

Still, I think you are right that the driver should respect that you set the connection to latin-1 and not do the magic of automatic conversion to utf-8. If you read issue #14 at Eonblast/Emysql at github you'll find I always suspected automatic conversion was a bad idea.

However, just from the fact that the unit tests for the conversions are now blowing up by the factor of four (and pose some rather uninteresting but mind boggling fringe issues I can't get my head around) I think tricking the database the way you do is likewise a bad idea. If you can, you should clean this up rather than rely on the mechanics in-between to hold. There are multiple levels in MySQL where conversions occur. As you know you can set the connection, the database, also the table to a character set. It's a great way to produce bugs. Can you describe why you could not? Because you have no control and must act blind to encoding? I'd like to know if there is a real case where you can't live without this hack.

Regardless, your complaint about the setting of the connection to latin-1 probably showed the way to eliminate all or most of the guessing in the character conversions in Emysql. That's very much appreciated and I hope I'll have a solution for you later today.

Henning

Henning
  • 96
  • 3
  • Thanks Henning, I have found also this "bug-feature" I'll keep an eye on github. Concerning the fact that I use latin-1 connection for an utf8, I know this is a bad idea. Unfortunately I have no choice, I picked up the project and redone it from scratch with Erlang, but I can't change the database right now. But I will slowly clean it up by migrating "on prod" to something more clusterable, like Cassandra, or maybe even Mnesia. Thanks for confirming the issue ! +1 – TheSquad Mar 22 '12 at 16:42
0

Just convert you table to UTF-8:

ALTER TABLE Users CONVERT TO CHARACTER SET utf8;

Then you can use utf-8 with new data and the old will have been converted to UTF-8 aswell.

barsju
  • 4,408
  • 1
  • 19
  • 24
  • Users table is already in utf8 character set,but that's not really the issue here... My question is about why the prepared statement is using utf8 since I explicitily asked emysql to connect to the database with latin1 characters set. – TheSquad Mar 19 '12 at 12:50
  • Well if I understand erlang correctly, strings are ISO8859-1 (latin) encoded while binaries are UTF-8. So: "Hi" is latin but <<"Hi">> is utf-8. What happens if you instruct emysql to connect using utf-8? – barsju Mar 19 '12 at 13:43
  • If I use UTF8 for connecting to database, I can Add/retrieve correctly with the new software, the only thing is I can't retrieve correctly Data already on the database (I get Mojibake Data) which were inserted with the previous software. And unfortunately I need to retrieve old Data correctly. – TheSquad Mar 19 '12 at 17:02
  • Actually, I'm wondering why it works correctly with non-prepared statements, although I use emysql_util:encode() with it in order to avoid SQL Injections, BUT with prepared statement, It does not insert correctly in latin1 – TheSquad Mar 19 '12 at 17:06