1

I'm validating links by trying to hit them and getting the response codes(in Java). But I get invalid response codes(403 or 404) from code but from browser, I get 200 status code when I inspect the network activity. Here's my code that gets the response code. [I do basic validations on urls beforehand, like making it lowercase, etc.]

static int getResponseCode(String link) throws IOException {
    URL url = new URL(link);
    HttpURLConnection http = (HttpURLConnection) url.openConnection();
    return http.getResponseCode();
}

For link like http://science.sciencemag.org/content/220/4599/868, I am getting 403 status when I run this code. But on browser(chrome), I am getting 200 status. Also, if I use the below curl command, I am getting 200 status code.

curl -Is http://science.sciencemag.org/content/220/4599/868
  • 2
    That website probably doesn't like bots, and checks whether incoming requests are made by known browsers. That may sound stupid, but it's a rather common attitude. – kumesana Jan 08 '19 at 13:45
  • 1
    Do check all request headers and response headers in your browser - then see what's different in your code. Also make sure to read the body of the response after you get 403 or 404 - it may have additional information. – Aleks G Jan 08 '19 at 13:50
  • Your code is correct, but seems something is wrong with the URL you are trying to test. I am afraid somehow the HttURLConnection class is not able to get the right url address – cralfaro Jan 08 '19 at 15:20
  • @kumesana any way to overcome that? also, if I use curl, I am getting correct status code. – yash.agarwal Jan 09 '19 at 10:08
  • Can you post the code snippet you used to make the request? – shakhawat Jan 09 '19 at 10:09

1 Answers1

1

The only way to overcome that is to:

I made this analysis for you, and it turns out this website requires an Accept header that resemble the Accept headers of an existing browser. By default Java sends something valid, but not resembling that.

You just need to change your program as so:

static int getResponseCode(String link) throws IOException {
  URL url = new URL(link);
  HttpURLConnection http = (HttpURLConnection) url.openConnection();
  http.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
  return http.getResponseCode();
}

(Or any other value that an actual browser uses)

kumesana
  • 2,495
  • 1
  • 9
  • 10