0

I am trying to add an urdu string غزل as shown below:

class UnicodeCheck {
  public static void main(String args[]) {
   try {
    File f = new File("C:/Users/user/Desktop/unicodecheck.txt");
    FileWriter writer = new FileWriter(f);
    writer.write("غزل");
    writer.close();
   } catch(Exception exc) {
       exc.printStackTrace();
     }
 }
}

When I try to compile the above program I get this error.

UnicodeCheck.java:1: illegal character: \187
class UnicodeCheck {
 ^
UnicodeCheck.java:1: illegal character: \191
class UnicodeCheck {
  ^
2 errors

I do not understand this error. Why do I get this and how can I get over this error?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Suhail Gupta
  • 22,386
  • 64
  • 200
  • 328
  • 3
    Choose `UTF-8 charset` while saving code in .java file. – KV Prajapati Oct 11 '12 at 04:15
  • @Jayan do you realize that you changed the meaning of whole question ? – Suhail Gupta Oct 11 '12 at 08:32
  • @ Suhail Gupta : sorry, I have fixed with better title. Essentially a file with unicode content is a different problem. When the same file is a java source code, fix is editor/using different encoding etc. – Jayan Oct 11 '12 at 08:44
  • @Jayan what are you doing ? what is java source file ? do you even understand what am i asking.. – Suhail Gupta Oct 11 '12 at 11:24
  • @Jayan now please do not make any edit to the question – Suhail Gupta Oct 11 '12 at 11:35
  • @ As I understand you are adding a java source file to have some unicode strings. The error is coming from compiler not able to intepret BOM (see Sumit's answer). The error is not a runtime one. – Jayan Oct 11 '12 at 11:59

2 Answers2

2

The characters in the beginning of the file come from the the Byte Order Mark that some text editors like to insert into the beginning of a file. The Java compiler however does not accept files with BOM. You have two options:

  1. Use a text editor that allows saving files in Unicode without BOM, such as Notepad++.
  2. Use only ASCII characters in source code. Where you need Unicode characters use \uXXXX-escape codes. The JDK comes with a utility program to convert "native" text into this encoding, called native2ascii. For example,

    writer.write("غزل");
    

    would be converted into

    writer.write("\u063a\u0632\u0644");
    
Joni
  • 108,737
  • 14
  • 143
  • 193
0

It depends on the charset used by your Text editor (where you edit the java source file). Try to set it in UTF-8 format.

pb2q
  • 58,613
  • 19
  • 146
  • 147
neo571
  • 69
  • 1
  • 7