PostgreSQL: How to store non-compliant (UTF8) raw bytes as bytea using GnuCOBOL?

Question

The application that I use stores a 32 bit mask (1's and 0's) as a 4 character field in database by converting it to hex. (Its a pretty old application so can't change this).

This used to work well with Oracle & DB2 (UTF8 encoding), however with PostgreSQL (UTF8 encoding) when I try to insert the value using a COBOL program it produces below error:

ERROR: invalid byte sequence for encoding "SJIS": 0xa0

Binary = 01101000001001110000110010100000

Hex=0x68270CA0

Database encoding & table definition

diginst=> \encoding
UTF8
diginst=> \d tab_1
 Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
 code   | character(5) |           | not null |
 mask   | bytea        |           |          |

COBOL Program

   IDENTIFICATION              DIVISION.
   PROGRAM-ID.                 ENCODE.
   DATE-WRITTEN.               2013-06-28.
   DATA                        DIVISION.
   WORKING-STORAGE             SECTION.

       EXEC SQL BEGIN DECLARE SECTION END-EXEC.
   01  DBNAME                PIC  X(30) VALUE SPACE.
   01  SOC-REC-VARS.
       05  D-CODE             PIC X(5).
       05  D-MASK             PIC X(4).
       EXEC SQL END DECLARE SECTION END-EXEC.

       EXEC SQL INCLUDE SQLCA END-EXEC.

   PROCEDURE                   DIVISION.
   MAIN-RTN.
       MOVE  "/@diginst"         TO   DBNAME.
       EXEC SQL
           CONNECT :DBNAME
       END-EXEC.
       IF  SQLCODE NOT = ZERO DISPLAY "ERROR CONNECTING".
       MOVE "00001" TO D-CODE.
       MOVE X"68270CA0" TO D-MASK.

       EXEC SQL
         INSERT INTO TAB_1
         (CODE,
          MASK)
         VALUES(:D-CODE,
                :D-MASK)
       END-EXEC.
       IF SQLCODE = ZERO DISPLAY "INSERT SUCCESSFUL"
       ELSE DISPLAY "INSERT FAILED " SQLERRMC
            GO TO EXIT-0.
       EXEC SQL
          SELECT CODE,MASK
                 INTO :D-CODE, :D-MASK FROM TAB_1
       END-EXEC.
       IF SQLCODE = ZERO DISPLAY "SELECT SUCCESSFUL"
       ELSE DISPLAY "SELECT FAILED " SQLERRMC.
   EXIT-0.
       STOP RUN.

Not sure why it picks the encoding as SJIS even though the client_encoding and server_encoding is UTF8. However, even if it were UTF8 (I manually did SET CLIENT_ENCODING TO 'UTF8' in the above code) 0xA0 is not a valid character in UTF8 character set.

From PostgreSQL documentation on bytea

In short, binary strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are appropriate for storing text.

I am not sure what am I doing wrong in this case since bytea is supposed to work (as per the documentation).

Would also like to know why is the program picking up default encoding as SJIS when it should be UTF8. I also tried setting the environment variable PGCLIENTENCODING=UTF8 but it still gives the same error as SJIS.

*Didn't think to ask a separate question for this since I believe this too is related to the main problem.

Update: after a bit more digging around, DB2 stores raw bytes (even when db encoding doesn't support it) by adding the clause FOR BIT DATA in create table statement as-

create table tab_db2 (key_part char(5) not null, raw_data char (100) for bit data);

In Oracle, the character set used is AL32UTF8 which supports more characters (supplementary characters) then UTF8.

There is no equivalent character set in PostgreSQL for AL32UTF8 so I have resort to using LATIN1 for the moment until I find something else.

What type did you use in Oracle and DB2? What was the *actual* table schema? Both databases have explicit type support for Unicode strings, eg `nvarchar2`. They *don't* need a server encoding to translate `char` to UTF8. The application used a hack that only worked with those specific types. You can't expect it to work the same way in any other database. What were the types? — Panagiotis Kanavos, Apr 24 '18 at 13:21
@PanagiotisKanavos DB2 has the definition `CHAR (4) FOR BIT DATA` and in Oracle the `NLS_CHARACTERSET` is AL32UTF8. Not really sure what is the difference between AL32UTF8 and UTF8. Is there an equivalent to AL32UTF8 in PostgreSQL? — Ankit Jain, Apr 24 '18 at 13:33
It looks like you've correctly diagnosed your problem - it seems Postgres is validating the string is valid in the character set before converting it to binary. If you were able to change the COBOL to say `05 D-MASK SQL TYPE is BINARY(4)` or possibly `05 D-MASK S9(9) BINARY` this may solve it - not a cobol expert though. — Ben, May 05 '18 at 16:59

PostgreSQL: How to store non-compliant (UTF8) raw bytes as bytea using GnuCOBOL?

0 Answers0