1

I am trying to insert UTF-8 characters into a MySQL table using Lucee and having no luck.

  • I have the MySQL table set to use utf8mb4_unicode_ci but have also tried utf8mb4_bin.
  • I have tried Apache both with and without "AddDefaultCharset UTF-8" enabled.
  • The connection string to the database includes "characterEncoding=UTF-8"
  • Lucee config includes UTF-8 in the charset settings.

The code I am running is as follows...

<cfset textValue = ' Person Raising Hand'>
<cfdump var="#textValue#">
<cfquery name="insert">
INSERT INTO TEST_UTF8 (TestText)
VALUES ('#textValue#');
</cfquery>

The dump works fine and displays as it should but the insert returns the following error...

"Incorrect string value: '\xF0\x9F\x99\x8B P...' for column 'TestText' at row 1"

I have tried inserting the same string via phpmyadmin and it goes in fine suggesting that the MySQL config is okay.

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
Jay2001
  • 59
  • 7
  • This is probably an issue with the Connector/J rather than Lucee itself. Which version of the connector are you using (ie the MySQL datasource extension)? Some more details which may help: https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-charsets.html – CfSimplicity Jun 02 '19 at 20:28
  • The connector version is "MySQL Connector Java mysql-connector-java-5.1.38 ( Revision: fe541c166cec739c74cc727c5da96c1028b4834a ) (JDBC 4.0)" – Jay2001 Jun 02 '19 at 21:42

2 Answers2

0

That 4-byte Emoji needs MySQL's utf8mb4; utf8 will not do. To the outside world, utf8mb4 is called UTF-8 (with a dash)`.

The column in the table needs to say CHARACTER SET utf8mb4 and the connection needs to say it also:

Add ?useUnicode=yes&characterEncoding=UTF-8 to the JDBC URL

<%@ page language="java" pageEncoding="UTF-8"%>
<%@ page contentType="text/html;charset=UTF-8" %>

compileJava.options.encoding = 'UTF-8'

<form method="post" action="/your/url/" accept-charset="UTF-8">

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The connection string already contains "useUnicode=true&characterEncoding=UTF-8" and the table column (and table) I have tried with "utf8mb4_unicode_ci", "utf8mb4_unicode_520_ci", and "utf8mb4_bin", all with the same results. – Jay2001 Jun 02 '19 at 22:05
0

Okay, with a little more help and reading, I needed to add "character_set_server=utf8mb4" to the MySQL config file at "/etc/mysql/my.conf". Looks like this is an issue caused by an older version of the connector.

From MySQL docs (thanks to CfSimplicity for leading me to the page)...

For Connector/J 8.0.12 and earlier: In order to use the utf8mb4 character set for the connection, the server MUST be configured with character_set_server=utf8mb4; if that is not the case, when UTF-8 is used for characterEncoding in the connection string, it will map to the MySQL character set name utf8, which is an alias for utf8mb3.

Jay2001
  • 59
  • 7
  • I think you could have avoided changing your server config by just upgrading the connector to 8.0.13 or above (although going from 5.x to 8.x would need testing as the class name changed). – CfSimplicity Jun 03 '19 at 08:23