1

I want to get content from website where are used polish letter (eg. ś, ć, ę etc) by opening connection using HttpURLConnection. I set InputStreamReader to UTF-8, but that didn't help.

This is my class responsibled for connection:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */
public class MyConnection {

URL url;
HttpURLConnection conn;

public MyConnection()
{
}
public void setConnection(URL url)
{
    this.url = url;
}
public void connect() throws IOException {
    conn = (HttpURLConnection) url.openConnection();
}

public String getContent() throws IOException
{
    String data = "";
    String tmp;
    BufferedReader rd = new BufferedReader(new InputStreamReader(conn
            .getInputStream(), "UTF-8"));
    while ((tmp = rd.readLine()) != null) {
        data += tmp + "\n";
    }
    rd.close();
    return data;
}
}

On the website it's like that:

270 Słabowski
270 Skubiszyński
270 Orzyłowski
270 Mołdrzyk
270 Łagodzki
270 Lęcznar

but my applications reads it like that:

Skubiszy�ski
Orzy�owski
Mo�drzyk
�agodzki
L�cznar

is square(but it's not displayed here)

conn.getContentEncoding() returns null

In file it looks same like in console

Could you tell me how can i change my code that it will work properly?

Sylwek
  • 856
  • 1
  • 9
  • 24
  • If you call `conn.getContentEncoding()` what is the return value? – Dev Oct 08 '13 at 19:43
  • _but my applications reads it like that_ Are you simply printing `data`? – Sotirios Delimanolis Oct 08 '13 at 19:47
  • Are you displaying the strings to the console in Windows? Try redirecting to a file and open in another editor. – Luis Oct 08 '13 at 19:47
  • `conn.getContentEncoding()` returns `null`. yes, i'm simply printing `data` in console. I will try to save it to file @EDIT in file it looks same like in console – Sylwek Oct 08 '13 at 19:51
  • 3
    do you know that the data you are consuming is encoded in UTF-8? also, are you specifying the charset when saving to the file (and opening it in and editor which can handle the charset)? – jtahlborn Oct 08 '13 at 20:12
  • 1
    That was encoded in `iso-8859-2`. I didn't check that. Thank you for asking about type of encoding in website. You solved my problem – Sylwek Oct 08 '13 at 20:14
  • @SylwekDerkacz I'm sorry, I asked you for the wrong header, `Content-Type` is what holds (optionally) the charset encoding, not Content-Encoding. Glad you solved your problem. – Dev Oct 08 '13 at 20:22

0 Answers0