1

I have a String representing an url, and I need to get its HTML source code. Problem is, i can't find a way to get it in the correct encoding (letters like à è ì ò ù are not read properly and just received as "??").

What's the best way? I came across lots of solutions but no one apparently is working.

Here's my code

private String getHtml(String url, String idSession) throws IOException 
{
    URL urlToCall   = null;
    String html     = "";

    try 
    {
        urlToCall = new URL(url); 
    } 
    catch (Exception e) 
    {
        e.printStackTrace();
        return "";
    }

    HttpURLConnection conn;

        conn = (HttpURLConnection) urlToCall.openConnection();
        conn.setRequestProperty("cookie", "JSESSIONID=" + idSession);
        conn.setDoOutput(false);
        conn.setReadTimeout(200*1000);
        conn.setConnectTimeout(200*1000);
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        InputStream openStream = conn.getInputStream();
        byte[] buffer = new byte[ 1024 ];
        int size = 0;
        while( (size = openStream.read( buffer ) ) != -1 ) {
            output.write( buffer, 0, size );
        }
    html = output.toString("utf-8");
    return html;

}
durron597
  • 31,968
  • 17
  • 99
  • 158
bs_
  • 11
  • 3

1 Answers1

0

Try JSOUP

    String url = "http://www.hamzaalayed.com/";
Document document = Jsoup.parse(new URL(url).openStream(), "utf-8", url);
Element paragraph = document.select("p").first();

for (Node node : paragraph.childNodes()) {
    if (node instanceof TextNode) {
        System.out.println(((TextNode) node).text().trim());
    }
}
Hamza Alayed
  • 635
  • 5
  • 17