I am receiving a file with shiftJis encoding. It has Japanese characters with shift in and shift out characters at the beginning and end of each multi byte string.
As per my requirement, I have to convert this file to utf-8 and remove the SI and SO characters from the utf-8 file? what is the best way to do this? Should I remove them before utf-8 conversion or after? and how do I remove it? thanks in advance.
my javacode is as below
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String inFilePath = "src\\encoding\\input\\dfd02.PGP_dec";
String filePath = "src\\encoding\\output\\";
String utf8FileNm = "utf8-out.txt";
String charsetName = "x-SJIS_0213";
InputStream in;
try {
in = new FileInputStream(inFilePath);
Reader reader = new InputStreamReader(in, charsetName);
StringBuilder sb = new StringBuilder();
int read;
while ((read = reader.read()) != -1){
sb.append((char)read);
}
reader.close();
String string = sb.toString();
OutputStream out = new FileOutputStream(filePath + charsetName + "-" + utf8FileNm);
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(string);
writer.close();
System.out.println("Finished writing the input file in UTF-8 format");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}