ruby Nokogiri requests 403 Forbidden

Question

Hi I use gem Nokogiri to scrape the gem getails from ruby-toolbox

Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name"))

but I get the error: "403 Forbidden"

Can anyone tell me why I am getting this error?

Thanks in advance

score 7 · Accepted Answer · answered Jul 15 '14 at 13:39

7

Try to change your user-agent:

Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'firefox'))

www.ruby-toolbox.com doesn't seem to accept 'ruby' as an agent.

answered Jul 15 '14 at 13:39

Oliver Zeyen

783
5
7

score 1 · Answer 2 · answered Jul 15 '14 at 14:09

1

As mentioned, the user agent has to be changed. However, in addition to that you have to disable the SSL certificate verification since it would throw an error as well.

require 'nokogiri'
require 'open-uri'
require 'openssl'

url = 'https://www.ruby-toolbox.com/categories/by_name'
content = open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE, 'User-Agent' => 'opera')
doc = Nokogiri::HTML(content)
doc.xpath('//div[@id="teaser"]//h2/text()').to_s
# "All Categories by name"

answered Jul 15 '14 at 14:09

Daniël Knippers

3,049
1
11
17

1

It would be good for you to explain why disabling verification works, and why it's there in the first place, and what problems turning it off can cause. SSL without verification is crippled. – the Tin Man Jul 15 '14 at 20:12

score 0 · Answer 3 · edited May 23 '17 at 12:23

0

This seems to be an OpenURI issue. Try this:

Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'ruby'))

edited May 23 '17 at 12:23

Community

1
1

answered Jul 15 '14 at 10:40

dax

10,779
8
51
86

but it still show the error "OpenURI::HTTPError: 403 Forbidden " – Siva KB Jul 15 '14 at 10:57
ah..how about changing https > http? – dax Jul 15 '14 at 11:46

score 0 · Answer 4 · answered May 31 '20 at 16:40

I spent ~1 hour trying solutions for a 403 forbidden, including tinkering with the User-Agent argument to Nokogiri::HTML(open(www.something.com, User-Agent: "Safari")), looking into proxies, and other things.

But the whole time there was nothing wrong with my code, the website I had been automated browsing had subtly changed url, and the url it previously visited was fobidden.

I hope this may save someone else some time.

ruby Nokogiri requests 403 Forbidden

4 Answers4