3

I'm a novice perl programmer trying to use DBI to write a buffer of text that contains an email with umlauts and other non-ASCII characters to a joomla database and having a problem.

DBD::mysql::st execute failed: Incorrect string value: '\xD6sterl...' for column `lsv5webstage`.`xuxgc_content`.`fulltext` at row 1 at /home/alerts/scripts_linstage/AdvisoryTest.pm line 373.

I'm not familiar enough with how encoding works to fully understand what the problem is. This is a fedora29 system with mariadb-10.3.12 and joomla-3.9.

Apparently the '\xD6' is an O with an umlaut in "Sebastian �sterlund". I read something about utf8 not being able to handle 4-char, but I don't fully understand.

I found the following reference online which talks about changing the encoding type from utf8 to utf8mb4, but the tables all appear to already be using that encoding:

> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR 
Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

I'm not sure it's helpful, but this is the insert statement I'm using in my perl code:

    my $sql                 = <<EOF;
    INSERT INTO xuxgc_content (title, alias, introtext, `fulltext`, state, catid, created, created_by, created_by_alias, modified, modified_by, checked_out, checked_out_time, publish_up, publish_down, images, urls, attribs, version, ordering, metakey, metadesc, metadata, access, hits, language)
    VALUES ($title, "$title_alias", $introText, $fullText, $state, $catid, $created, $created_by, $created_by_alias, $modified, $modified_by, $checked_out, $checked_out_time, $publish_up, $publish_down, $images, $urls, $attribs, $version, $ordering, $metakey, $metadesc, $metadata, $access, $hits, $language);
    EOF

    my $sth = $dbh->prepare($sql);
    $sth->execute();
    db_disconnect($dbh);

The $fullText variable is populated from a buffer that contains the body of the email. I'm running it through quote() before performing the INSERT.

$fullText       = $dbh->quote($fullText);

I also tried using "SET NAMES utf8mb4;INSERT INTO Mytable ...;" and it just didn't like the format.

Here's the full function that's used to connect to the database:

sub db_connect () {
  my %DB        = (
    'host'  => 'myhost',
    'db'    => 'mydb',
    'user'  => 'myuser',
    'pass'  => 'mypass',
  );

  return DBI->connect("DBI:mysql:database=$DB{'db'};host=$DB{'host'}", $DB{'user'}, $DB{'pass'}, { mysql_enable_utf8mb4 => 1 });
 }

I don't recall having this problem in the past, and this script has been in use for quite a while.

Alex Regan
  • 477
  • 5
  • 16
  • I'm not familiar with Perl, but it should be either `\u00D6` if Perl supports 16-bit escapes, or `\xC3\x96` for the two bytes that make up the UTF-8 encoding. – Mr Lister Jun 19 '19 at 09:01
  • Are you using DBD::MariaDB? Where does the $fullText variable get populated from? – Grinnz Jun 19 '19 at 14:52
  • Note that while your connection and the database might use utf8mb4 by default, you should also check the table definition as the column itself has the setting on what character set is ultimately used for storage. – Grinnz Jun 19 '19 at 14:53
  • You would use `\N{U+00D6}` to represent the character in a string by the way. Not the UTF-8 encoding (that is the job of the DBI driver and MariaDB). – Grinnz Jun 19 '19 at 15:46
  • make sure you mysql connection setup contains this flag set `mysql_enable_utf8` or even better `mysql_enable_utf8mb4` – Ossip Jun 19 '19 at 16:54
  • I've updated the post to include a little info on the use of quote() prior to performing the INSERT and the connect() function, including the use of "mysql_enable_utf8mb4 => 1" – Alex Regan Jun 19 '19 at 21:23
  • ok. I agree with Rick's suggestion to first check if you got bogus characters in the database already. also very important: use `?` syntax to safely insert data, so you write `?` for value in sql string and pass the values as parameters to `execute` – Ossip Jun 20 '19 at 17:24

1 Answers1

1

D6 is hex for Ö in CHARACTER SET latin1 (and several others).

You have declared that your client uses UTF-8 (utf8mb4) encoding, so it spit at you.

Please provide SELECT HEX(col), col ... to see if the D6 got into the database (hence an insert problem) or something else (possibly a fetch/display problem).

Also, you have not quoted your $fulltext string, so you are likely to get all sorts of syntax errors.

Please don't blindly put strings into INSERT statements, but escape them as you put them in.

There may be some useful Perl hint in this:

    use utf8;
    use open ':std', ':encoding(UTF-8)';
my $dbh = DBI->connect("dbi:mysql:".$dsn, $user, $password, {
   PrintError => 0,
   RaiseError => 1,
   mysql_enable_utf8 => 1,  # Switch to UTF-8 for communication and decode.
});
# or {mysql_enable_utf8mb4 => 1} if using utf8mb4

And look for techniques for binding/quoting/escaping.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Would you expand on "SELECT HEX(col), col ..." ? Also, I thought I was escaping the $fulltext variable with the backticks? I also believe I'm using mysql_enable_utf8mb4, although I'm not sure. I'll update the post to include the full DBI->connect() function. – Alex Regan Jun 19 '19 at 21:06
  • @AlexRegan - Backtics are uses around column _names_. The contents needs things like quotes escaped so you can make a quoted string! – Rick James Jun 20 '19 at 00:39