5

I'm having a problem on Java file encoding.

I have a Java program will save a input stream as a file with a given file name, the code snippet is like:

File out = new File(strFileName);
Files.copy(inStream, out.toPath());

It works fine on Windows unless the file name contains some special characters like Ö, with these characters in the file name, the saved file will display a garbled file name on Windows.

I understand that by applying JVM option -Dfile.encoding=UTF-8 this issue can be fixed, but I would have a solution in my code rather than ask all my users to change their JVM options.

While debugging the program I can see the file name string always shows the correct character, so I guess the problem is not about internal encoding.

Could someone please explain what went wrong behind the scene? and is there a way to avoid this problem programmatically? I tried get the bytes from the string and change the encoding but it doesn't work.

Thanks.

John.D
  • 311
  • 2
  • 16
  • Does setting the `-Dfile.encoding` option actually solve the problem? Because that is used when reading from files, not for the file names themselves. Where do you get the file name string from? – Thilo Feb 21 '17 at 08:17
  • It seems that parameter does in fact also affect how file names are encoded: http://stackoverflow.com/questions/9196950/setting-file-name-encoding – Thilo Feb 21 '17 at 08:19
  • @Thilo Yes the JVM option works for this problem. the problem was found when i tested with a swedish file name. – John.D Feb 21 '17 at 08:26
  • What do you assign to `strFileName`? Is it input from a web application, read from a file on disk, entered through a Swing or JavaFX UI, or what? You say that debugging shows the correct character. How did you debug it? Can you provide a [MCVE]? What is the default encoding on this machine? – erickson Feb 22 '17 at 07:24

1 Answers1

0

Using the URLEncoder class would work:

String name = URLEncoder.encode("fileName#", "UTF-8");
File output = new File(name);
bra_racing
  • 622
  • 1
  • 8
  • 32
  • Really? That seems unlikely. – Thilo Feb 21 '17 at 11:48
  • It is an approximation, it requieres a decoder operation in order to get the special chars – bra_racing Feb 21 '17 at 12:14
  • Do you have a link that explains how the Windows filesystem API has special treatment for URL encoding? Also note that the question clearly states that the issue can be resolved by just setting `file.encoding` to match the character set that the OS expects to see for file names. – Thilo Feb 21 '17 at 12:23
  • You are right, but he says he wants a solution to achieve that behaviour without changing JVM options. I think my approach could satisfied that premise, isn't it? – bra_racing Feb 21 '17 at 12:52
  • 1
    Are you sure this is the full approach? URLEncoder converts the Ö character into %E8%84%B0 and I only get a %E8%84%B0.txt file. I am going to use the string as the file name and expecting to have a Ö.txt saved in my file system. – John.D Feb 22 '17 at 02:30
  • With my solution you get the filename as you say, and if you want the special chars you can recovee them with the decoder operation. This could be useful if tou want to list the content of a directory, for example – bra_racing Feb 22 '17 at 06:36