12

I am trying to compile a Maven Java project with files saved as UTF-8 that have a BOM, but I am getting an illegal character error from the BOM character in the despite I having both the project.build.sourceEncoding as well as the encoding of the maven-compiler-plugin set to UTF-8.

Am I missing an additional setting? Can I even get this to compile without removing the BOM (not allowed to make any change to the source, but I can modify the POM)?


The error:

java: C:\code\main\src\test\java\net\initech\finance\FinanceTest.java:1: illegal character: \65279

The property:

<properties>
    ...
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    ...
</properties>

The plugin:

<build>
    <plugins>
        <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
    </plugins>
</build>
Sled
  • 18,541
  • 27
  • 119
  • 168
  • Possible duplicate of [Compiling (javac) a UTF8 encoded Java source code with a BOM](http://stackoverflow.com/questions/9811382/compiling-javac-a-utf8-encoded-java-source-code-with-a-bom) or [How to compile a java source file which is encoded as “UTF-8”?](http://stackoverflow.com/questions/1726174/how-to-compile-a-java-source-file-which-is-encoded-as-utf-8). (It's not actually a Maven issue: Maven is successfully telling the compiler to interpret the source-code as UTF-8 -- otherwise it wouldn't even know that the illegal character is \65279.) – ruakh Jul 09 '13 at 18:28
  • But Maven has been set to UTF-8 so why isn't it getting passed to `javac` correctly? This is a Maven issue unless it's a `javac` bug. Not a duplicate because I am using Maven to do it and have set Maven to tell `javac` to treat the code as UTF-8. – Sled Jul 09 '13 at 18:35
  • 1
    The fact that you're having Maven tell the compiler that the source-code as UTF-8, instead of telling it so yourself, doesn't change anything. The problem is that your source-code, when interpreted as UTF-8, is not valid Java source-code. (It's only valid Java source-code when interpreted as UTF-8-with-BOM, which isn't an encoding that `javac` supports.) – ruakh Jul 09 '13 at 18:39
  • So the actual issue is `javac` doesn't support `UTF=8` which optionally allows for a BOM. Also the other question has only one answer, and it's not accepted, and not explained so hard to call it a duplicate. – Sled Jul 09 '13 at 18:42
  • The Unicode standard really only says that a BOM is (grudgingly) allowed in UTF-8 as an encoding signature, when there's no higher-level protocol specifying the encoding. Obviously that's not the case here, since you're explicitly telling `javac` that the encoding is UTF-8. – ruakh Jul 09 '13 at 18:53

4 Answers4

12

If it's UTF-8 and not UTF-16, the BOM serves no purpose at all. Why is it put in there? Also, only Java is complaining here -- not Maven.

Check out JDK-4508058 : UTF-8 encoding does not recognize initial BOM which is related.

Sled
  • 18,541
  • 27
  • 119
  • 168
Hut8
  • 6,080
  • 4
  • 42
  • 59
  • 1) Having a BOM helps let's an editor know that the text is to be treated as UTF-8 and not CP-1252 2) While it's Java complaining Maven should have informed it to treat the source as UTF-8 and thus no have this error. 3) The error is not related since this is `javac` and not trying to read a file through the Java API. – Sled Jul 09 '13 at 18:40
  • 2
    Re: "the BOM serves no purpose at all. Why is it put in there?": This quasi-convention was introduced by Microsoft, in its Notepad text editor, as a way to distinguish UTF-8 from other encodings. (Obviously that's not the original purpose that BOMs were intended for, but it's not totally crazy.) – ruakh Jul 09 '13 at 18:41
8

1.Close your project.

2.Try to open your file by notepad++,and switch to 'UTF-8 without BOM'.

3.Reopen your project again.

JasonChiu
  • 171
  • 2
  • 6
6

This happened to me after I opened a Java class using notepad. This changed the Encoding of the file.

A simple trick I did is this:

  1. Open your class from Android Studio/Eclipse
  2. Ctrl + A and then Ctrl + X
  3. Delete your empty class now
  4. Create a new Class with the same name
  5. Ctrl + V the code
  6. Done
0

==, take a look at the difference of the equal signs. The right one is used for normal English type, but the left longer one is usually used for Chinese languages.

Sheldon
  • 519
  • 8
  • 20