0

I am trying to resolve a DOI like this:

import requests                                     
url = 'https://dx.doi.org/10.3847/1538-4357/aafd31'
r1 = requests.get(url)                          
actual_url = r1.url

But the requests.get call actually takes of the order of 10s of seconds up to 5 minutes (it varies)! I tried stream=True or verify=False but that does not really help.

John Smith
  • 1,059
  • 1
  • 13
  • 35
  • sounds like an issue with that site or the server its on or something – SuperStew Feb 11 '20 at 14:48
  • what do you get when you ping this site? What if you use a proxy? depending on the site and findings it could be that they are slowing you down on purpose – Chrisvdberge Feb 11 '20 at 14:52
  • ^This. Maybe the site doesn't like to be scrapped? Try changing your useragent sent with the request. – h4z3 Feb 11 '20 at 14:54

3 Answers3

1

try:

import urllib.request
response = urllib.request.urlopen('https://dx.doi.org/10.3847/1538-4357/aafd31')
html = response.read()
GSBYBF
  • 160
  • 3
1

It seems they are slowing you down on purpose. Try setting a valid user agent. Below code runs ok (quick response) for me;

import requests
url = 'https://dx.doi.org/10.3847/1538-4357/aafd31'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}

req = requests.get(url, headers=headers)

print(req.text)

If you are doing multiple requests just make sure you do it slow enough and possibly use multiple user agents at random

Chrisvdberge
  • 1,824
  • 6
  • 24
  • 46
-1

I had the same problem. My solution is to create a new environment with more recent python version.