The application that I use stores a 32 bit mask (1's and 0's) as a 4 character field in database by converting it to hex. (Its a pretty old application so can't change this).
This used to work well with Oracle & DB2 (UTF8 encoding), however with PostgreSQL (UTF8 encoding) when I try to insert the value using a COBOL program it produces below error:
ERROR: invalid byte sequence for encoding "SJIS": 0xa0
Binary = 01101000001001110000110010100000
Hex=0x68270CA0
Database encoding & table definition
diginst=> \encoding
UTF8
diginst=> \d tab_1
Column | Type | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
code | character(5) | | not null |
mask | bytea | | |
COBOL Program
IDENTIFICATION DIVISION.
PROGRAM-ID. ENCODE.
DATE-WRITTEN. 2013-06-28.
DATA DIVISION.
WORKING-STORAGE SECTION.
EXEC SQL BEGIN DECLARE SECTION END-EXEC.
01 DBNAME PIC X(30) VALUE SPACE.
01 SOC-REC-VARS.
05 D-CODE PIC X(5).
05 D-MASK PIC X(4).
EXEC SQL END DECLARE SECTION END-EXEC.
EXEC SQL INCLUDE SQLCA END-EXEC.
PROCEDURE DIVISION.
MAIN-RTN.
MOVE "/@diginst" TO DBNAME.
EXEC SQL
CONNECT :DBNAME
END-EXEC.
IF SQLCODE NOT = ZERO DISPLAY "ERROR CONNECTING".
MOVE "00001" TO D-CODE.
MOVE X"68270CA0" TO D-MASK.
EXEC SQL
INSERT INTO TAB_1
(CODE,
MASK)
VALUES(:D-CODE,
:D-MASK)
END-EXEC.
IF SQLCODE = ZERO DISPLAY "INSERT SUCCESSFUL"
ELSE DISPLAY "INSERT FAILED " SQLERRMC
GO TO EXIT-0.
EXEC SQL
SELECT CODE,MASK
INTO :D-CODE, :D-MASK FROM TAB_1
END-EXEC.
IF SQLCODE = ZERO DISPLAY "SELECT SUCCESSFUL"
ELSE DISPLAY "SELECT FAILED " SQLERRMC.
EXIT-0.
STOP RUN.
Not sure why it picks the encoding as SJIS even though the client_encoding and server_encoding is UTF8. However, even if it were UTF8 (I manually did SET CLIENT_ENCODING TO 'UTF8' in the above code) 0xA0 is not a valid character in UTF8 character set.
From PostgreSQL documentation on bytea
In short, binary strings are appropriate for storing data that the programmer thinks of as “raw bytes”, whereas character strings are appropriate for storing text.
I am not sure what am I doing wrong in this case since bytea
is supposed to work (as per the documentation).
Would also like to know why is the program picking up default encoding as SJIS
when it should be UTF8
. I also tried setting the environment variable PGCLIENTENCODING=UTF8
but it still gives the same error as SJIS.
*Didn't think to ask a separate question for this since I believe this too is related to the main problem.
Update: after a bit more digging around, DB2 stores raw bytes (even when db encoding doesn't support it) by adding the clause FOR BIT DATA
in create table statement as-
create table tab_db2 (key_part char(5) not null, raw_data char (100) for bit data);
In Oracle, the character set used is AL32UTF8
which supports more characters (supplementary characters) then UTF8
.
There is no equivalent character set in PostgreSQL for AL32UTF8
so I have resort to using LATIN1
for the moment until I find something else.