0

I am using Marathi Wordnet.In this wordnet there are text documents including marathi words

I want to read these marathi documents in my java code.I have tried with using the BufferedReader and FileReader.But I failed. This is the code I have tried.

FileReader fr=new FileReader("onto_txt");

BufferedReader br=new BufferedReader(fr);
String line=br.readLine();
while(line!=null){
    System.out.println(line);
    line=br.readLine();

}
fr.close();
br.close();
Joe Taras
  • 15,166
  • 7
  • 42
  • 55

1 Answers1

0

FileReader is an old utility class using the default encoding of the platform.

Assuming that the file is in UTF-8, better explicitly specify the encoding.

try (BufferedReader br = new BufferedReader(new InputStreamReader(
        new FileInputStream("C:/xyz/onto_txt"), StandardCharsets.UTF_8))) {

    String line = br.readLine();
    while (line != null) {
        System.out.println(line);
        System.out.println(Arrays.toString(line.getBytes(StandardCharsets.UTF_8)));

        line = br.readLine();
    }
} // Closes br

Using System.out again converts the line to the encoding of the platform. That might not be able to display the String line; hence the dump of every single byte. Not very informative, but it might clarify that where ? is diplayed in the prior line, there really are Unicode characters.

Internally java String holds Unicode, and can contain any text. So you might process line as desired in the while.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138