-1

my program should read an entire file, it works, but has some weird characters at the start when I output it to the console:

try {
        String name = null;
        JFileChooser fc = new JFileChooser();
        int approve = fc.showOpenDialog(null);
        if (approve == JFileChooser.APPROVE_OPTION) {
            name = fc.getSelectedFile().getAbsolutePath().toString();
        }
        File file = new File(name);
        FileReader fr = new FileReader(file);
        BufferedReader br = new BufferedReader(fr);
        StringBuilder sb = new StringBuilder();
        String data;
        while ((data = br.readLine()) != null)  {
            sb.append(data).append(" ");
        }
        br.close();
        String readFile = sb.toString();
        System.out.println(readFile);
    } catch (Exception e) {
        JOptionPane.showMessageDialog(null, "Error occured", "Error", JOptionPane.ERROR_MESSAGE);
    }

The console output looks like this:

test 01.01.2018 tets test 12.03.2019 

Now in my file (html file) I selected, there arent the characters  so where do they come from?

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49

1 Answers1

0

Your file starts with a UTF-8 BOM (Byte Order Mark).

As you can see on this Wikipedia page, the BOM looks exactly like you described when the file is read with a non-UTF-8 encoding, like Windows-1252 or ISO-8851-1.

Change code to read the file using UTF-8, or change the file to not be written in UTF-8.

If sticking with UTF-8, note that Java does not natively support UTF-8 BOM's, so you have to check for that and remove it yourself. Better yet, change the code that created the file to not write a BOM. Some text editors may create one, but you can usually configure them not to.

You can also use some text editors to remove the BOM and/or change the encoding, e.g. Notepad++ can do both.

Andreas
  • 154,647
  • 11
  • 152
  • 247