Why I can not read some https pages with java code?

Question

I write a java program like I saw here How to read the https page content using java? but for some sites the code does not work.

I got Error Server returned HTTP response code: 403 for URL: https://research.investors.com/stock-quotes/nyse-sailpoint-tech-holdings-sail.htm

It works for url = "https://maven.apache.org/guides/mini/guide-repository-ssl.html";

Can someone help me ?

score 1 · Answer 1 · edited Dec 02 '21 at 20:44

403 Forbidden The request contained valid data and was understood by the server, but the server is refusing action. This may be due to the user not having the necessary permissions for a resource or needing an account of some sort, or attempting a prohibited action (e.g. creating a duplicate record where only one is allowed). This code is also typically used if the request provided authentication by answering the WWW-Authenticate header field challenge, but the server did not accept that authentication. The request should not be repeated.

So probably website, which you want to scrape, just restricted requests like yours (i mean requests, that was made not from browser).

But you can try Selenium.

score 1 · Answer 2 · answered Sep 17 '21 at 20:33

1

403 HTTP status stands for "Forbidden", most likely investors.com can check your request headers and deny the resource.

Try modifying the request headers using an User-Agent that site might accept.

answered Sep 17 '21 at 20:33

OscarRyz

196,001
113
385
569

score 0 · Answer 3 · answered Sep 17 '21 at 20:57

0

OK , I solved. I use con.setRequestProperty and set "User-Agent", "Accept", "Content-Type", "Accept-Language".

Thank you.

answered Sep 17 '21 at 20:57

alin

27
3

Why I can not read some https pages with java code?

3 Answers3