1

Currently I'm using HttpURLConnection for load remote web page and present to my clients (using InputStream to HttpResponse's outputStream transfer), it loads html correctly but skips images, how to fix it?

Thanks

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
Fagoter
  • 569
  • 2
  • 7
  • 18

2 Answers2

3

You need to manipulate the HTML that way so that all resource URLs on the intranet domain are proxied as well. E.g. all of the following resource references in HTML

<base href="http://intranet.com/" />
<script src="http://intranet.com/script.js"></script>
<link href="http://intranet.com/style.css" />
<img src="http://intranet.com/image.png" />
<a href="http://intranet.com/page.html">link</a>

should be changed in the HTML that way so that they go through your proxy servlet instead, e.g.

<base href="http://example.com/proxy/" />
<script src="http://example.com/proxy/script.js"></script>
<link href="http://example.com/proxy/style.css" />
<img src="http://example.com/proxy/image.png" />
<a href="http://example.com/proxy/page.html">link</a>

A HTML parser like Jsoup is extremely helpful in this. You can do as follows in your proxy servlet which is, I assume, mapped on an URL pattern of /proxy/*.

String intranetURL = "http://intranet.com";
String internetURL = "http://example.com/proxy";

if (request.getRequestURI().endsWith(".html")) { // A HTML page is requested.
    Document document = Jsoup.connect(intranetURL + request.getPathInfo()).get();

    for (Element element : document.select("[href]")) {
        element.attr("href", element.absUrl("href").replaceFirst(intranetURL, internetURL));
    }

    for (Element element : document.select("[src]")) {
        element.attr("src", element.absUrl("src").replaceFirst(intranetURL, internetURL));
    }

    response.setContentType("text/html;charset=UTF-8");
    response.setCharacterEncoding("UTF-8");
    resposne.getWriter().write(document.html());
}
else { // Other resources like images, etc.
    URLConnection connection = new URL(intranetURL + request.getPathInfo()).openConnection();

    for (Map.Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
        for (String value : header.getValue()) {
            response.addHeader(header.getKey(), value);
        }
    }

    InputStream input = connection.getInputStream();
    OutputStream output = response.getOutputStream();
    // Now just copy input to output.
}
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Yea, this is really make sense... that is wierd that there are no tools to make this out of the box – Fagoter Jun 08 '11 at 20:15
1

You have to make a separate request for each image. That's what browsers do as well.

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • Bozho, that won't help him (w/ the images I mean)... since the images source has to be redirected as well. – bestsss Jun 08 '11 at 15:11
  • I think HtmlUnit can extract the image urls from a web page and let you make new requests for each of them. – Bozho Jun 08 '11 at 15:36
  • That's clear, but as far as I understand the intent, the OP wants the clients to be able to request them. In order to do so, they have to parse the html and make specialized requests for the said images. It's not an easy task. – bestsss Jun 08 '11 at 16:21
  • I'm not sure how HtmlUnit is useful if those requests have to come from the client. – BalusC Jun 08 '11 at 18:26
  • sorry, I didn't get the scenario that he is loading a page on behalf of the clients. – Bozho Jun 08 '11 at 21:35