1

I have an XML file in UTF-8 encoded. I use Perl to parse the file (using XML::Simple module). The parsed code I want to put into a MySQL table what encoded utf8 too (exactly utf8_generic_ci). Everything is good, but two characters go wrong (tipically ő and ű with their uppercase pairs ŐŰ).

Here is my perl code:

use strict;
use warning;
use utf8;
use XML::Simple;
use DBI;

my $db = DBI->connect("dbi:mysql:dbname=$dbname;host=$host;port=$port",
         $user, $passwd, {mysql_enable_utf8 => 1}) || die $DBI::errstr;

my $ref = XMLin("file.xml");

for ( my $i = 0; $i < scalar(@{$ref->{"PRODUCTS"}}); $i++ ) {
    my $name = $ref->{"PRODUCTS"}[$i]->{"NAME"};
    # some changes on the $name, for example removing whitespaces, etc.
    $db->do("INSERT INTO products (productname) VALUES ('".$name."');");
}

And here is my mysql table structure (SHOW CREATE TABLE products; output):

| products | CREATE TABLE `products` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `productname` varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

I think everything is in UTF-8. If I see the source XML file there is good the őű characters. But after mysql insert they gone wrong. And the other accents are good.

Any idea what is the problem?

netdjw
  • 5,419
  • 21
  • 88
  • 162

1 Answers1

1

You have a typo in your code, it should be:

{ mysql_enable_utf8 => 1 }  

Also, you should bind the SQL params:

$db->do("INSERT INTO products (productname) VALUES (?)", undef, $name);

And finally, this SO question may helps you

Community
  • 1
  • 1
Miguel Prz
  • 13,718
  • 29
  • 42
  • Although you usually want `mysql_enable_utf8` enabled when retrieving text strings from a database, you usually would not want it enabled when retrieving an XML document. XML parsers almost always expect bytes as input rather than character strings. – Grant McLean Nov 09 '15 at 02:00