Hi I use gem Nokogiri to scrape the gem getails from ruby-toolbox
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name"))
but I get the error: "403 Forbidden"
Can anyone tell me why I am getting this error?
Thanks in advance
Hi I use gem Nokogiri to scrape the gem getails from ruby-toolbox
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name"))
but I get the error: "403 Forbidden"
Can anyone tell me why I am getting this error?
Thanks in advance
Try to change your user-agent:
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'firefox'))
www.ruby-toolbox.com doesn't seem to accept 'ruby' as an agent.
As mentioned, the user agent has to be changed. However, in addition to that you have to disable the SSL certificate verification since it would throw an error as well.
require 'nokogiri'
require 'open-uri'
require 'openssl'
url = 'https://www.ruby-toolbox.com/categories/by_name'
content = open(url, ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE, 'User-Agent' => 'opera')
doc = Nokogiri::HTML(content)
doc.xpath('//div[@id="teaser"]//h2/text()').to_s
# "All Categories by name"
This seems to be an OpenURI issue. Try this:
Nokogiri::HTML(open("https://www.ruby-toolbox.com/categories/by_name", 'User-Agent' => 'ruby'))
I spent ~1 hour trying solutions for a 403 forbidden
, including tinkering with the User-Agent
argument to Nokogiri::HTML(open(www.something.com, User-Agent: "Safari"))
, looking into proxies, and other things.
But the whole time there was nothing wrong with my code, the website I had been automated browsing had subtly changed url, and the url it previously visited was fobidden.
I hope this may save someone else some time.