I'm trying to use Scanner class to parse text files. But it turns out that if the file contains Russian words the scanner can't read the file at all. scanner.hasNextLine() returns false at its very first call. Is this normal behavior of the Scanner class? Can I do something to fix the problem?
Asked
Active
Viewed 347 times
0
-
4Use one of the overloaded constructors that accepts a charset name and provide an appropriate charset that contains Cyrillic characters. – Sotirios Delimanolis Feb 03 '14 at 16:18
-
@SotiriosDelimanolis Do you want to make than an answer? – Duncan Jones Feb 03 '14 at 16:20
-
@Duncan Nah, take it away. I haven't verified anything. – Sotirios Delimanolis Feb 03 '14 at 16:22
1 Answers
1
To read text, containing another encoding, you should use the constructor of Scanner with additional parameter "encoding". For example, if file, containg russian symbols is in UTF-8 encoding, try something like this:
String path = ... // full path of file
Scanner sc = new Scanner(new FileInputStream(path), "UTF-8");
//read file line by line
while (sc.hasNextLine()){
//read one line
String s = sc.nextLine();
//do something with line
System.out.println(s);
}

Lenin
- 11
- 1