5

I have the following code:

public BufferedImage urlToImage(String imageUrl) throws MalformedURLException, IOException {
    URL url = new URL(imageUrl);
    BufferedImage image = ImageIO.read(url);
    return image;
}

That is supposed to return an image from a given URL.

I tested with these two randomly chosen URLs:

The first one works fine, but the second gives a 403 error:

Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.earthtimes.org/newsimage/osteoderms-storing-minerals-helped-huge-dinosaurs-survive_3011.jpg
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at javax.imageio.ImageIO.read(ImageIO.java:1367)

What could be the cause of the error ? Thanks.

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Majid Laissi
  • 19,188
  • 19
  • 68
  • 105

1 Answers1

16

The ImageIO.read(URL) method opens a URL connection with pretty much all default settings, including the User-Agent property (which will be set to the JVM version you are running on). Apparently, the site you listed expects a more 'standard' UA. Testing with a straight telnet connection:

Request sent by ImageIO.read(url):

GET /newsimage/osteoderms-storing-minerals-helped-huge-dinosaurs-survive_3011.jpg HTTP/1.1
User-Agent: Java/1.7.0_17
Host: www.earthtimes.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, /; q=.2
Connection: keep-alive

Response code is 404 (for me at least), with a default text/html page being returned.

Request sent by 'standard' browser:

GET /newsimage/osteoderms-storing-minerals-helped-huge-dinosaurs-survive_3011.jpg HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
Host: www.earthtimes.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, /; q=.2
Connection: keep-alive

Response code is 200, with the image data.

The following simple fix lengthens your code, but gets around the problem, by setting a more 'standard' UA:

final String urlStr = "http://www.earthtimes.org/newsimage/osteoderms-storing-minerals-helped-huge-dinosaurs-survive_3011.jpg";
final URL url = new URL(urlStr);
final HttpURLConnection connection = (HttpURLConnection) url
        .openConnection();
connection.setRequestProperty(
    "User-Agent",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31");
final BufferedImage image = ImageIO.read(connection.getInputStream());
Perception
  • 79,279
  • 19
  • 185
  • 195
  • thank you very much. A side question: does it mean the server does not want to serve non standard UA to prevent its content from being used outside the standard use ? or simply because the server is set to default behavior ? – Majid Laissi Apr 24 '13 at 11:44
  • Actually, most server implementations do not blacklist UA's by default. More than likely the administrators of this particular website went out of their way to ban Java agents. – Perception Apr 24 '13 at 11:52
  • I'm concerned about copyright problems. So I take it if the admin blacklists Java agents it might mean their images cannot be used outside of a browser.. (in my app the user can provide a URL as profile image, and I store said image so if the URL is not available anymore, the profile image will still be available). I'm not sure whether I'll have to prevent the user from using the image or just warn them about possible copyright infringment.. – Majid Laissi Apr 24 '13 at 21:11
  • I am pretty sure that copyright law is applicable irrespective of wether a website accepts one particular UA over another. But IANAL so definitely cannot give advice in that regard. – Perception Apr 24 '13 at 21:17