0

Using ODBC driver, ibm-iaccess-1.1.0.10-1.0.amd64, on Debian Linux. When I perform a php sql query and validate the response, I see some of the data returned is not valid utf-8.

mb_check_encoding returns 'false' and the character on screen in a chrome browser is the diamond question mark character.

This happens on a few CHAR field types.

if(!mb_check_encoding($row["field"])) {
 ... exit with utf-8 error
}

I can work around this by converting from UTF-8 to UTF-8:

mb_convert_encoding($row["field"], 'UTF-8', 'UTF-8');

/etc/odbc.ini:

[as400]
Description = iSeries Access ODBC Driver
Driver = iSeries Access ODBC Driver
system = as400
Naming = 0
DefaultLibraries = *usrlibl
DefaultPkgLibrary = QGPL
DefaultPackage = A/DEFAULT(IBM),2,0,1,0,512
ConnectionType = 0
CommitMode = 1
ExtendedDynamic = 1
AllowDataCompression = 1
AllowUnsupportedChar = 0
ForceTranslation = 1
Trace = 0
Charset = UTF-8

/etc/odbcinst.ini:

Description=IBM i Access for Linux 64-bit ODBC Driver
Driver=/usr/lib/libcwbodbc.so
Setup=/usr/lib/libcwbodbcs.so
fileusage=1
dontdlclose=1

Additionally, does IBM document any of the odbc parameters? It seems like this is blindly trying parameters until something works.

mao
  • 11,321
  • 2
  • 13
  • 29
Ted Scheckler
  • 1,389
  • 4
  • 16
  • 34
  • What is the value of the LANG variable on Linux for the PHP session (i.e. before you launch php), or what is the system-default locale on Linux?. Also, for the table concerned is its encoding also utf-8? Have you selected the HEX() codes for the column(s) concerned to verify the encoding? – mao May 29 '19 at 15:41
  • 'locale' outputs: LANG=en_US.UTF-8 LANGUAGE= I'm not sure how to select HEX() codes to verify and I wouldn't know how to check if the table concerned has utf-8 encoding. – Ted Scheckler May 29 '19 at 16:01
  • to show the hex values in a column that displays incorrectly, use `select HEX(colname) from ...` where colname is whatever the column name is. To show the column encoding, query the catalog for example `select ccsid from qsys.syscolumns where table_name=? and table_owner=? and column_name=?` (substitute your values for each ?) – mao May 29 '19 at 16:16
  • ccsid = 37 for this field and the HEX value of the field is "4140404040404040404040404040404040404040404040404040404040404040404040" – Ted Scheckler May 29 '19 at 19:36
  • So that column has EBCDIC encoding (ccdis 37), and value 0x41 = 'A' and 0x40='@' in that encoding. I expect that renders correctly. If you select the columnName, hex(columnname) do the characters render correctly? The idea is to find a column-value that renders incorrectly, then check its encoding and compare how it gets rendered. Sometimes columns have bad data relative to the encoding. If any column has ccsid 1200(utf-16) or ccsid 1208 (utf-8), or ccsid 13488 (ucs-2) then check if these columns are impacted. – mao May 30 '19 at 09:14
  • That is definitely the hex value of that invalid field: Query: select HEX(CSMADDR1) as fixed_csmaddr1, CSMADDR1 from CSTMSP fetch first 1 rows only results: Array ( [FIXED_CSMADDR1] => 4140404040404040404040404040404040404040404040404040404040404040404040 [CSMADDR1] => � ) ... image: https://i.imgur.com/QKr9NOP.png – Ted Scheckler May 30 '19 at 13:22
  • Well, 0x41 is 'A' in ASCII and related encodings (like UTF-8) but is *not* generally used in EBCDIC. (It's listed in the character tables as "required space" but I have never seen it in actual data.) Do you know what the *intended* value is for that field? Can you confirm with someone, or inspect the value on the originating server, with its native tools? If I had to take a wild guess, I'd say it's supposed to be the letter 'A' (which *did* convert to UTF-8 on your end), and the leftover 0x40 bytes are EBCDIC spaces (left untranslated). – John Y May 30 '19 at 13:39
  • It shows blank on the green screen. Both of these fields appear blank, but only the CSMADDR1 has this weird character: Array ( [FIXED_CSMADDR1] => 4140404040404040404040404040404040404040404040404040404040404040404040 [CSMADDR1] => � [FIXED_CSMADDR2] => 4040404040404040404040404040404040404040404040404040404040404040404040 [CSMADDR2] => ) – Ted Scheckler May 30 '19 at 15:27

1 Answers1

0

A single byte install of OS/400 always returns Latin1 charset via ODBC to Linux. This is from my experience with very old versions (V4). I digged into documentation a few months ago and it seems to provide UTF-8 only with a DBCS-Install.

I can't say if this is still true for current versions of i.

PoC
  • 521
  • 3
  • 13