How to write data which is created in charset UTF8 into a file as charset Shift-JIS without missing character

Question

I am working on creating file by querying data form DB and using it to create a file, the situation is as follows: Database: Oracle with charset UTF8 Applicaiton Server: Resin with charset UTF8 Application framework: NTT Intra-Mart (a japanese framework based on Rihno and using javascript as server program language) Need: querying data from Oracle and creating a file by charset [Shift-JIS], the file is used as a middle file that exported by one system and transfered using FTP to another system to import. The file requires to have fixed bytes range for the destination server to locate the specified data to import: e.g. byte 1-10: [user address] byte 11-20 : [user name] However, first I create the file with UTF8, it seems all characters are shown correctly, but when I try to write data with charset [SJIS], there is some full-width charactors become half-width question mark[?], and this may lead to the bytes width shortened and can't get data correctly: e.g. when [user address]'s data like: 1－10－1, the data in the file will become 1?10?1 byte 1-10: [user address], but in current file user address is byte 1-8 byte 11-20 : [user name] could you please give me some advice?

Shift-JIS encoding cannot represent all code points that can be represented by UTF-8. UTF-8 can represent the entire Unicode range of code points. — Mark Tolonen, Jul 06 '20 at 06:06

score 0 · Answer 1 · answered Jul 14 '20 at 14:37

You will have to use charset name Windows-31J rather than Shift-JIS.

The data 1－10－1 would be typed from Microsoft IME. Microsoft IME use to U+FF0D (FULLWIDTH HYPHEN-MINUS) to represent the character －.

U+FF0D is not mapped to any character in the Shift-JIS - Unicode mapping in JavaVM. So you will get ? when you convert － from JVM internal representation (UTF-16) to Shift-JIS with charset Shift-JIS.
U+FF0D is mapped to 0x817C in Windows-31J - Unicode mapped in JavaVM. So you will get － when you convert － from JVM internal representation (UTF-16) to Shift-JIS with charset Windows-31J.

Thank you for your advice, I have communicated with SE, and now we have abandoned the way of reading file with specfied bytes and using CSV file to do that. — auwind, Jul 18 '20 at 04:05

How to write data which is created in charset UTF8 into a file as charset Shift-JIS without missing character

1 Answers1