What are the different EBCDIC code pages for traditional chinese?

Question

All I know is code page 937 for EBCDIC traditional chinese. CP937

The reason that I am asking is that I have an EBCDIC file that contains traditional chinese characters but have boxes and question marks when I load them to Oracle using ZHT16DBCS (code page 937) character set.

Additional note: I can't process the file because it's in EBCDIC and contains traditional chinese characters. I would like to convert it to UTF-8 so I can read using my script (Perl) and extract to several files and load the files to several DB tables.

I load this using sqlldr and a control file but I set characterset first. load CHARACTER SET ZHT16DBCS — criz, Mar 10 '17 at 06:11
Are you sure your file is encoded as CP937? Where did you get the information that Oracle Character set `ZHT16DBCS` matches IBM codepage CP937? — Wernfried Domscheit, Mar 10 '17 at 06:51
Thanks. I added a response to your post below following this link where I got it. https://docs.oracle.com/cd/B19306_01/gateways.102/b16217/a_nls.htm — criz, Mar 10 '17 at 08:39

score 0 · Answer 1 · answered Mar 10 '17 at 07:48

0

I don't know whether IBM codepage CP937 matches Oracle character set ZHT16DBCS, resp. if Oracle support CP937 at all.

My recommendation would be to make a conversion to a more common character set, e.g. UTF-8. You can use for example native2ascii which is part of Java SE.

Would be like this:

native2ascii your_input_file native_file.txt -encoding x-IBM937    
native2ascii native_file.txt your_input_file_in_utf8 -reverse -encoding UTF-8

or using pipe to make it in one step:

cat your_input_file | native2ascii -encoding x-IBM937 | native2ascii -reverse -encoding UTF-8 > your_input_file_in_utf8

Then, before you run sqlldr set NLS_LANG accordingly:

SET NLS_LANG=.AL32UTF8 (Windows)

export NLS_LANG=.AL32UTF8 (Unix)

Just a note, territory and language are not relevant, so you can skip them if you like (as I did).

Of course, in your control file must you must set

CHARACTERSET AL32UTF8

Another note, "UTF-8" encoding means Oracle character set "AL32UTF8". There is also a Oracle character set named "UTF8", however this is CESU-8. For European languages they can be considered as equal, however for Chinese they might be different.

answered Mar 10 '17 at 07:48

Wernfried Domscheit

54,457
9
76
110

Thanks but I got it from oracle docs when I was searching for character sets with EBCDIC and traditional chinese. https://docs.oracle.com/cd/B19306_01/gateways.102/b16217/a_nls.htm #M 937 = ZHT16DBCS shift # Traditional Chinese multi-byte EBCDIC – criz Mar 10 '17 at 08:35
That's when I tried ZHT16DBCS and assumed it was 937 – criz Mar 10 '17 at 08:37
Yes, I also found this document but do you have an IBM DRDA server? I don't know whether "Oracle Transparent Gateway for DRDA" applies also to "SQL*Loader" - I don't think so. – Wernfried Domscheit Mar 10 '17 at 08:45
No I don't think we have. Could you please explain why I need to know about oracle DRDA server and its purpose? – criz Mar 10 '17 at 12:50
I have noted your suggestion using Java SE to convert the file to utf8 but sad to say that don't have Java installed in AIX 6.1. Will have install manually. – criz Mar 10 '17 at 12:59

What are the different EBCDIC code pages for traditional chinese?

1 Answers1