0

From one day to the next I'm getting a java.io.FileNotFoundException error when returning an InputStream from an URL object.

This happens ONLY with some URLs of the site I'm scanning with JSoup. The URL address is correct, since by copying the address into the browser it returns me the correct page.

This is frustrating, since I have not made any changes to the code. Did the site detect my page-scanning and blocked my requests?

The code:

    public Document getDocument( String source ) {

    Document doc = null;
    InputStream is = null;
    URL newUrl = null;
    try {
        newUrl = new URL( source );
        is = newUrl.openStream();
        Connection conn = Jsoup.connect( source );
        conn.timeout( 0 );
        doc = Jsoup.parse( is, "CP1252", source );
    }
    catch ( IOException ioe ) {
        ioe.printStackTrace();
    }

    return doc;
}

The error:

java.io.FileNotFoundException: http://www.ratebeer.com/breweries/antigua-barbuda/0/9/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at beerparser.web.RateBeerParser.getDocument(RateBeerParser.java:389)

Line 389 is the InputStream object assignment.

nobody
  • 19,814
  • 17
  • 56
  • 77
  • 1
    When I try your URL, it returns nothing. Are you sure you don't have the page in cache in your browser (try CTRL + F5 on the page to be sure) and that it actually doesn't work ? – Julien May 19 '14 at 15:23
  • I've found that this page seems to work : http://www.ratebeer.com/breweries/british-columbia/53/39/. Can you give a try with this URL to confirm that your code still work ? – Julien May 19 '14 at 15:26
  • Urm. It seems that links with "/0/" in the URL address aren't working. Yup, it was the cache. At the moment every url with "/0/" returns blank page. Bumper. Thanks Julien. I'm changing the query string to avoid temporarily the issue. – PatrickBateman1981 May 19 '14 at 16:03

2 Answers2

0

You could try:

URLConnection connection = new URL(url).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
InputStream response = connection.getInputStream();
Danix
  • 1,947
  • 1
  • 13
  • 18
0

When you get an error response code keep in mind that then you cannot read the returned body content with getInputStream()

In this case you have to read the returned body with getErrorStream()

Teixi
  • 1,077
  • 8
  • 21