0

I am trying to fetch data from a webpage using urllib2. The page is visible on the browser but through the script I keep getting HTTPError: HTTP Error 403: Forbidden

I also tried mimicking a browser request by changing the user-agent string but no success.

Any ideas on this?

zubinmehta
  • 4,368
  • 7
  • 33
  • 51
  • Does the site requires authentication? If yes how are users being tracked? Does the site uses cookies to track authenticated users? If yes you need to send a cookie along with your HTTP request. – Darin Dimitrov Dec 28 '10 at 12:40
  • Can you give some more details of the website and code which you are using to access the above mentioned site. It may not be an issue with User-Agent. – Senthil Kumaran Dec 28 '10 at 12:40
  • @Darin - no authemtication required. Cookies, i will have to check. This is the url of the page I am trying to fetch. http://www.nseindia.com/content/fo/fo_underlyinglist.htm – zubinmehta Dec 28 '10 at 12:43

3 Answers3

2

I tried with tamper data and firefox to send only user agent, and I get 403. Try to add other headers:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

I tried, and this should work.

pythonFoo
  • 2,194
  • 3
  • 15
  • 17
1

The site is checking your User-Agent just set it to Internet Explorer:

request.add_header('User-Agent', 'Internet Explorer')

I confirmed that this works with wget, and you get 403 unless you set your user agent to Internet Explorer.

ismail
  • 46,010
  • 9
  • 86
  • 95
0

:) Am trying to get quotes from NSE too ! like pythonFoo says you need additional headers. Hower only Accept is sufficient. The user-agent can say python ( stay true ! )

Samvid
  • 1