1

Up until now, I have just used ISO-8859-1 for my mySQL data, but now we have started getting Russian orders, which cause the characters to show up as ? (not great when trying to print an invoice!)

So, I have been looking into converting the table into UTF8. My table structure is pretty simple:

DROP TABLE IF EXISTS `glinks_AdminSystemEBay`;
CREATE TABLE IF NOT EXISTS `glinks_AdminSystemEBay` (
  `ebay_transaction_id` bigint(20) NOT NULL DEFAULT '0',
  `paypal_trans_id_fk` varchar(200) DEFAULT NULL,
  `payer_email` varchar(200) DEFAULT NULL,
  `date_paid` varchar(200) DEFAULT NULL,
  `shipping_paid` varchar(200) DEFAULT NULL,
  `address` varchar(200) DEFAULT NULL,
  `product_id_purchased` int(11) DEFAULT NULL,
  `payment_amount` float DEFAULT NULL,
  `total_amount` float DEFAULT NULL,
  `been_added_to_system` int(11) DEFAULT NULL,
  `sale_from` varchar(25) DEFAULT NULL,
  `product_id` bigint(20) DEFAULT NULL,
  `currency` varchar(10) DEFAULT NULL,
  `paypal_fee` float DEFAULT NULL,
  `units_sold` int(11) DEFAULT NULL,
  `ebay_fees` float DEFAULT NULL,
  `item_name` longtext,
  `been_emailed` tinyint(4) NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

enter image description here

When importing into it, I still got broken characters. I then came across this post:

http://www.shawnolson.net/a/946/unicode_data_with_php_5_and_mysql_41.html

I was not aware of the SET NAMES 'utf8' being needed before. This fixed that "saving" part (running it before the query)

So now, you can see how it saves:

enter image description here

The problem I'm having now, is getting the data back out properly!

As a test, I am simply grabbing the record from the DB, and then doing a dumper:

print $IN->header;
use Data::Dumper;
print Dumper($invoice->{address});

All I get is:

$VAR1 = 'Matveenkova ,??????? ??????, ?. 21, ????. 2, ??. 90, ??????, 117208, Russian Federation';

I'm really baffled as to what I'm doing wrong! Can anyone shed some light?

UPDATE: Ok, so it actually looks like this issue is coming from our PDF generator (DOMPDF). It's fine in the HTML version:

enter image description here

...but the PDF version is broken:

enter image description here

I'll file a bug report with them, and see if they can help :)

Andrew Newby
  • 4,941
  • 6
  • 40
  • 81

2 Answers2

2

If you're using DBD::mysql then do

$dbh->{'mysql_enable_utf8'} = 1;

This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off.

When set, a data retrieved from a textual column type (char, varchar, etc) will have the UTF-8 flag turned on if necessary. This enables character semantics on that string. You will also need to ensure that your database / table / column is configured to use UTF8. See for more information the chapter on character set support in the MySQL manual: http://dev.mysql.com/doc/refman/5.7/en/charset.html

Additionally, turning on this flag tells MySQL that incoming data should be treated as UTF-8. This will only take effect if used as part of the call to connect(). If you turn the flag on after connecting, you will need to issue the command SET NAMES utf8 to get the same effect.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • Thanks for the reply. I'm actually using this: `$CONN{$conn_key} = DBI->connect($dsn, $self->{connect}->{login}, $self->{connect}->{password}, { RaiseError => $self->{connect}->{RaiseError}, PrintError => $self->{connect}->{PrintError}, AutoCommit => 1, mysql_enable_utf8 => 1 })` . It doesn't seem to work. I wonder if its worth giving the "proper" DBI module a go, instead of using this 3rd party one. – Andrew Newby Oct 13 '16 at 08:52
  • You are already passing `mysql_enable_utf8`. That looks correct. It could be that Data::Dumper is not handling it properly? I'm not sure. Try this: `use utf8; use Data::Dumper::AutoEncode; print eDumper($invoice->{address});` – Chankey Pathak Oct 13 '16 at 09:02
  • 1
    Thanks. Interestingly, I added some debug in and it looks like the address IS being setup correctly: ADDRESS: Matveenkova, Сумской, д. 21, корп. 2, кв. 90, Москва, xxxx, Russian Federation` . The problem seems to be with the actual template parser. I'll keep digging# – Andrew Newby Oct 13 '16 at 09:06
  • thanks for going through this with me. It seems to be an issue with DOMPDF: http://stackoverflow.com/questions/990181/dompdf-problem-with-cyrillic-characters . I'll do some digging, to see what solution will work :) – Andrew Newby Oct 13 '16 at 09:18
  • Sure. Good luck! Make sure you post your solution here once you find it. – Chankey Pathak Oct 13 '16 at 09:20
  • 1
    I got it :) https://github.com/dompdf/dompdf/issues/698 . I'll write up a proper answer in a minute – Andrew Newby Oct 13 '16 at 09:22
  • @AndrewNewby: Please tell us what this third-party database module is that you're using. – Borodin Oct 13 '16 at 11:32
  • @Borodin - its GT::SQL, and is part of a CMS system we use. It wasn't actually that that caused the issue though :) – Andrew Newby Oct 13 '16 at 11:58
1

Ok, so after quite a lot of debugging (thanks Chankey), I finally pinned it down.

The issues wasn't with the database at all - it was with the PDF creator I was using. Basically, I was doing:

  • Save data from eBay API into DB
  • Select database from DB
  • Create a HTML template of the invoice
  • Create a PDF version (based on the HTML template), using DOMPDF

After a bit of looking around, I came across some other posts of people who were having issues with Russian in DOMPDF. I found:

DOMPDF problem with Cyrillic characters

The final solution, was to make sure I was using the latest version of DOMPDF, and then in the HTML page, I had to make sure I as using the *DejaVu * font, as this supports Russian.

html { font-family : DejaVu Sans, Helvetica, sans-serif; overflow: auto; }

Phew, glad thats sorted! :)

Community
  • 1
  • 1
Andrew Newby
  • 4,941
  • 6
  • 40
  • 81