I've an input file which comes under ANSI UNIX file format. I convert that file into UTF-8.
Before converting to UTF-8, there is an special character like this in input file
»
After converting to UTF-8, it becomes like this
û
When I process my file as it is, without converting to utf-8, all special characters disappeared and data loss as well. But when I process my file after converting to UTF-8, All data appears with special character same as am getting after converting to UTF-8 in output file.
ANSI to UTF-8 (could be wrong, please correct me if am wrong somewhere)
FileInputStream = fis = new FileInputStream("inputtextfile.txt");
InputStreamReader isr = new InputStreamReader (fis, "ISO-8859-1");
Reader in = new BufferReader(isr);
FileOutputStream fos = new FileOutputStream("outputfile.txt");
OutPutStreamWriter osw = OutPutStreamWriter("fos", "UTF-8");
Writer out = new BufferedWriter(osw);
int ch;
out.write("\uFEFF";);
while ((ch = in.read()) > -1 ) {
out.write(ch);
}
out.close();
in.close();
After this am processing my file further for final output. I'm using Talend ETL tool for creating an final output out of generated utf-8. (Java based ETL tool)
What I want is, I want to process my file so that I could get same special characters in output as am getting in input file.
I'm using java 1.8 for this whole processing. I' 'm too stuck in this situation and never dealt this with special characters.
Any suggestion would be helpful.