2

Looking through JCA 1.7 specification I could only find in one of their examples on the Resource Adapter Deployment Descriptor the following (Chapter 13: Message Inflow P 13-50): JCA DD example showing UTF-8 encoding This example is showing the usage of UTF-8 encoding, however there is nothing saying if this was an optional selection for the example illustration or a must restriction on the file character encoding.

I'm asking this because I'm writing a Java program to read one of these files and FindBugs™ is giving me this message:

DM_DEFAULT_ENCODING: Reliance on default encoding Found a call to a method which will perform a byte to String (or String to byte) conversion, and will assume that the default platform encoding is suitable. This will cause the application behaviour to vary between platforms. Use an alternative API and specify a charset name or Charset object explicitly.

Line 4 in this Java code snippet is where character encoding will be specified:

01.  byte[] contents = new byte[1024];
02.  int bytesRead = 0;
03.  while ((bytesRead = bin.read(contents)) != -1)
04.     result.append(new String(contents, 0, bytesRead));

So, Is it possible to specify the expected encoding of this file in this case or not?

M. A. Kishawy
  • 5,001
  • 11
  • 47
  • 72
  • UTF-8 is good. Can you please show the code where you get the FindBugs warning? – barfuin May 12 '15 at 14:20
  • @Thomas It is not about UTF-8 being good or bad :) It's more about if the user can specify other character encoding or not. I provided an example code for your convenience. – M. A. Kishawy May 12 '15 at 14:52
  • 1
    Your FindBugs warning should go away if you use `new String(contents, 0, bytesRead, StandardCharsets.UTF_8)`. You should specify the same charset in the XML header and in the code. I am not aware of a limitation that says you *must* use UTF-8. – barfuin May 12 '15 at 15:30
  • @Thomas I don't have control on the XML file encoding because it is provided to me by the end user. Is there a way to figure out the file's character encoding before reading? – M. A. Kishawy May 12 '15 at 19:52
  • If people send you data, you should really agree on a format. But well, the `` header is where the user tells you what encoding he/she chose. If that gives you an error when reading the file, then the file is broken. – barfuin May 12 '15 at 21:45
  • 1
    Why do you need to read XML as byte stream and convert it to `String` manually? Why not using ready DOM/SAX parsers? They will switch the encoding automatically according to XML file header. – Tagir Valeev May 13 '15 at 17:16
  • @TagirValeev I was thinking about that, but wouldn't this add an overhead? Specially that I won't be using any of the DOM/SAX features. – M. A. Kishawy May 13 '15 at 17:19
  • So could you add more explanations what are you doing the the resulting XML string later? If not using DOM/SAX, then... parsing via regular expressions? – Tagir Valeev May 13 '15 at 17:29
  • @TagirValeev I'm doing nothing with the String. I'm just returning it to the end user as is. What do you mean by parsing via regular expressions? – M. A. Kishawy May 13 '15 at 17:37

2 Answers2

2

From what I saw, Most people use the UTF-8 encoding for their ra.xml. However there is no restriction on using other encoding. So if you base your parsing to expect UTF-8 only, the result might not be as expected.

So you either need to count for this in your code when you are reading this as a normal text, or read it as an xml file and save yourself the headache. I don't think the difference in performance will be an issue because the ra.xml files do not usually grow to gigabytes. At least the ones I've seen so far are on an average of few megabytes.

For the Findbug issue, you just need to specify the encoding as a UTF-8. Otherwise you will be using the default of the JVM which is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system. Although using the default is not a recommended behavior here, if that is what you want then just specify the usage of default encoding. This would get rid of the Findbug issue.

So your code would look like something like this:

01. byte[] contents = new byte[1024];
02. int bytesRead = 0;
03. while ((bytesRead = bin.read(contents)) != -1)
04.     result.append(new String(contents, 0, bytesRead, Charset.defaultCharset()));
Marko
  • 570
  • 5
  • 21
1

FindBugs just warns you that you're relying on default system encoding, so it's possible that if your application will be launched by another user in another country you might get unexpected results. It's better to explicitly specify which encoding you want to use.

In your case the actual encoding should be extracted from XML file. There are several ways to get it. One method is to use XMLStreamReader as described in this answer.

Community
  • 1
  • 1
Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334