1

I am receiving a file with shiftJis encoding. It has Japanese characters with shift in and shift out characters at the beginning and end of each multi byte string.

As per my requirement, I have to convert this file to utf-8 and remove the SI and SO characters from the utf-8 file? what is the best way to do this? Should I remove them before utf-8 conversion or after? and how do I remove it? thanks in advance.

my javacode is as below

public static void main(String[] args) throws Exception {
    // TODO Auto-generated method stub
    String inFilePath = "src\\encoding\\input\\dfd02.PGP_dec";
    String filePath = "src\\encoding\\output\\";
      String utf8FileNm = "utf8-out.txt";
      String charsetName = "x-SJIS_0213";

      InputStream in;
    try {
        in = new FileInputStream(inFilePath);
    
      Reader reader = new InputStreamReader(in, charsetName);
      StringBuilder sb = new StringBuilder();
      int read;
      while ((read = reader.read()) != -1){
        sb.append((char)read);
      }
      reader.close();

      String string = sb.toString();

      OutputStream out = new FileOutputStream(filePath + charsetName + "-" + utf8FileNm);
      Writer writer = new OutputStreamWriter(out, "UTF-8");
      writer.write(string);
      writer.close();
      System.out.println("Finished writing the input file in UTF-8 format");
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

enter image description here

user1447718
  • 669
  • 1
  • 11
  • 23

0 Answers0