0

I'm trying to write an application, that reads all files in a folder and its sub folders.

The problem are file names with special characters like 'ä','ü' and 'ö'. Those are read as '��'.

I develop the program in Eclipse Neon.2 Release (4.6.2) on an arch linux.

I already set Eclipse to UTF-8 encoding. My LANG is nds_DE.UTF-8

And i tried multiple ways to read the filenames (MyFile.listFiles(), DirectoryStream, FileUtils.listFiles (apache.common))

I know, that my Java and Eclipse can handle these special characters, because when they appear in a text file or when i just print them to the console, they are printed correctly.

Has anyone an idea what i can try or why these characters are a problem when reading filenames?

Thank you

Vector<Entry> entrys = new Vector<Entry>();
File[] files = new File(path).listFiles();
for(File f : files){
  System.out.println(f)
  if(f.isDirectory()){
        entrys.addAll(readFilesInPath(f.getPath()));
  }else{
        entrys.add(new Entry(f.getName(),f.getParent()));
  }
}
return entrys;
Myrkjartan
  • 166
  • 3
  • 16
  • 1
    Show the code that's giving you wrong results. – Kayaman Mar 08 '17 at 13:31
  • 1
    Which OS do you use? – Jens Mar 08 '17 at 13:32
  • The trouble is not your UTF settings but your file system. Not every file system can handle all special characters. – pepan Mar 08 '17 at 13:33
  • Related or duplicate : http://stackoverflow.com/questions/14171565/java-read-write-unicode-utf-8-filenames-not-contents ; using java.nio might solve the problem – Aaron Mar 08 '17 at 13:42
  • I run an arch linux v 4.9.11-1, x86_64 – Myrkjartan Mar 08 '17 at 13:43
  • I already tried java.nio. It didn't help – Myrkjartan Mar 08 '17 at 13:44
  • And that is why you shouldn't use special characters in file names :-) – Sean Patrick Floyd Mar 08 '17 at 13:47
  • @SeanPatrickFloyd Meh, we're in 2017, it would be nice if at some point in time the ASCII restrictions dropped. I don't care that much since my language is encoded in ISO-8859-1, but for those whose language doesn't share a single symbol with ASCII it must be annoying. – Aaron Mar 08 '17 at 13:52
  • Yes, i know that. But most people don't (or don't care). That's why i need to address this issue..... ;-) – Myrkjartan Mar 08 '17 at 13:53
  • In c# I recently found out that encoding 1252 is great on windows machines. Maybe you are having a similar problem. I now read all files in as string tempString = File.ReadAllText(fileName, System.Text.Encoding.GetEncoding(1252)); <- this is c# code – SedJ601 Mar 08 '17 at 14:17

1 Answers1

1

OK, after a lot of research and frustration with my system variables (which didn't do any good) i found another question with a solution for my problem: java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters when using national characters

So, i need to use the VM argument -Dsun.jnu.encoding=UTF-8

Thanks for the interest and help.

Community
  • 1
  • 1
Myrkjartan
  • 166
  • 3
  • 16
  • 1
    how did you set the VM argument in tomcat? – Marci-man Oct 30 '18 at 16:40
  • I set the argument in eclipse, following this manual: https://www.cse.wustl.edu/~cosgroved/courses/cse231/f16/javaagent/ – Myrkjartan Nov 20 '18 at 12:57
  • hey, I wanted to know how to do it in tomcat, not in eclipse. – Marci-man Nov 20 '18 at 13:12
  • Sorry. The way you wrote the question suggested, that you thought i set the argument in tomcat ("how did you....") and you wanted to know how that is done.... I never used tomcat, but the first page when you google the question may help: https://community.atlassian.com/t5/Bamboo-questions/How-to-add-vm-args-to-tomcat-deployment/qaq-p/442109 – Myrkjartan Nov 21 '18 at 14:21
  • Thanks for your condescending reply. I have googled this for weeks. – Marci-man Nov 22 '18 at 08:59
  • 1
    You are right. I shouldn't make assumptions. Since i got that site right away, i assumed you didn't look. Did that link answer your question? – Myrkjartan Nov 22 '18 at 10:07