0

I'm trying to read the source code from a browser, but when the code has characters like ã, á, à, õ, I get � instead.

I've tried to apply java.nio.Charset.encode on read lines, but no result: the same thing occurs.

My code is:

URLConnection connection = ...;
BufferedReader reader = new BufferedReader(connection.getInputStream());
String s = null;

while ((s = reader.readLine()) != null) {
  // got new source line...
}

The site I'm trying to read is this one (PT-BR).

durron597
  • 31,968
  • 17
  • 99
  • 158

1 Answers1

2

According to the meta tag, the charset on that page is ISO-8859-1. Try using:

Scanner scanner = new Scanner(connection.getInputStream(), "ISO-8859-1");
Aurand
  • 5,487
  • 1
  • 25
  • 35