1

I'm trying to send a request to pixabay.

Here's my code

import requests

url = 'https://pixabay.com'
header = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'
}

req = requests.get(url, headers=header)

print(req.status_code)
print(req.headers)
print(req.text)

It won't work(403 error). How can I do to work?

isopach
  • 1,783
  • 7
  • 31
  • 43
buttercrab
  • 70
  • 2
  • 9

1 Answers1

2

Pixabay has Cloudflare security that requires you to solve a captcha if you connect from a blacklisted IP.

In order to bypass this, you have to first connect via a browser and then copy the headers and cookies into your python script. This works for me, but you have to replace the parts like __cfduid which is your cloudflare fingerprint in order to access the website. Also check that your User-Agent is correct.

import requests

url = 'https://pixabay.com/'
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3835.0 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'Upgrade-Insecure-Requests': '1',
    'Host': 'pixabay.com'
}

cookie = {
    '__cfduid': '<redacted>',
    'cf_clearance': '<redacted>',
    'anonymous_user_id': '<redacted>',
    '_sp_ses.aded': '*',
    '_sp_id.aded': '<redacted>',
    'is_human': '1',
    'client_width':'1540'
}
req = requests.get(url, headers=header, cookies=cookie)

print(req.status_code)
print(req.headers)
isopach
  • 1,783
  • 7
  • 31
  • 43
  • I tried to connect via my browser, and it turns out that your IP might have been blacklisted. The Cloudflare security requires you to solve a captcha before allowing connection to the site. – isopach Feb 12 '20 at 09:31
  • 1
    @isopach while I agree with your code and the Cloudflare, I have to say it is not about blacklisted IP as mine has not been blacklisted however his code still doesn't work for me. Thumbs up on your solution however – maestro.inc Feb 12 '20 at 09:45
  • @isopach Thanks alot. However I couldn't figure out what ch_clearance is. – buttercrab Feb 12 '20 at 10:47
  • @isopach Also I can normally connect in my browser, but when I try this code, it would send me captcha. – buttercrab Feb 12 '20 at 10:58
  • @jaeyongsung You have to solve the captcha in your browser, then copy the cookies containing your cloudflare ID and captcha clearance. If you can't find a cookie, just exclude it and see if it works. – isopach Feb 12 '20 at 20:22
  • 1
    @isopach Oh, I couldn't figure it out, but I was able to solve this problem by using selenium chromedriver. Thank you for your help – buttercrab Feb 13 '20 at 12:31