1

I am trying out a simple program for reading the HTML content from a given URL. The URL I am trying in this case doesn't require any cookie/username/password, but still I am getting a io.IOException: Server returned HTTP response code: 403 error. Can anyone tell me what am I doing wrong here? (I know there are similar question in SO, but they didn't help):

    import java.net.*;
import java.io.*;
import java.net.MalformedURLException;
import java.io.IOException;
public class urlcont {
public static void main(String[] args) {
try {
  URL u = new URL("http://www.amnesty.org/");
  URLConnection uc = u.openConnection();
  uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
  uc.connect();
  InputStream in = uc.getInputStream();
  int b;
  File f = new File("C:\\Users\\kausta\\Desktop\\urlcont.txt");
  f.createNewFile();
  OutputStream s = new FileOutputStream(f);
  while ((b = in.read()) != -1) {
    s.write(b);
  }
}
catch (MalformedURLException e) {System.err.println(e);}
catch (IOException e) {System.err.println(e);} 
}
}
SexyBeast
  • 7,913
  • 28
  • 108
  • 196

3 Answers3

3

If you can fetch the URL in a browser, but not via Java, that indicates, to me, that they are blocking programmatic access to the page via user-agent filtering. Try setting the user-agent on your connection so that your code appears, to the webserver, to be a web-browser.

See this thread for help on that: What is the proper way of setting headers in a URLConnection?

Community
  • 1
  • 1
jeffmurphy
  • 446
  • 2
  • 11
  • I tried the code and it is working for me. I only changed the local file path. I have the full html in the file now. – lbalazscs Jan 11 '13 at 16:43
1

There is a permission problem:

A web server may return a 403 Forbidden HTTP status code in response to a request from a client for a web page or resource to indicate that the server refuses to allow the requested action

Panciz
  • 2,183
  • 2
  • 30
  • 54
  • Why is the server blocking me? I can open it jolly well in my browser. Do I need to send some browser headers? – SexyBeast Jan 11 '13 at 14:49
  • nope you just are using the wrong protocol, you are getting the file directly, but as far as i understood you want the html-output only. look up `http-request`. the server doesn't want to show you the source code, but he will give the browser the result as he does only request the output and not the file itself – Vogel612 Jan 11 '13 at 14:51
0

you are not doing anything "wrong", the server you are trying to access is blocking your request, as you are not allowed to access the file

Http-Error 403 means Forbidden --> the remote server blocks the request.

check if you need to give authentification to access the document you want and in that case provide it with the request ;)

Vogel612
  • 5,620
  • 5
  • 48
  • 73