I have collected links to wanted people from Interpol website . There are about 10k links. Scraping one by one takes hours so I am looking for the way to do it asynchronously with grequests
.
This is the preview of my links list:
final_links[:20]
['https://www.interpol.int/notice/search/wanted/2009-19572',
'https://www.interpol.int/notice/search/wanted/2015-74196',
'https://www.interpol.int/notice/search/wanted/2014-37667',
'https://www.interpol.int/notice/search/wanted/2011-30019',
'https://www.interpol.int/notice/search/wanted/2009-34171',
'https://www.interpol.int/notice/search/wanted/2012-334072',
'https://www.interpol.int/notice/search/wanted/2012-334068',
'https://www.interpol.int/notice/search/wanted/2012-334070',
'https://www.interpol.int/notice/search/wanted/2013-26064',
'https://www.interpol.int/notice/search/wanted/2013-2528',
'https://www.interpol.int/notice/search/wanted/2014-32597',
'https://www.interpol.int/notice/search/wanted/2013-23413',
'https://www.interpol.int/notice/search/wanted/2010-42146',
'https://www.interpol.int/notice/search/wanted/2015-30555',
'https://www.interpol.int/notice/search/wanted/2013-2514',
'https://www.interpol.int/notice/search/wanted/2010-53288',
'https://www.interpol.int/notice/search/wanted/2015-58805',
'https://www.interpol.int/notice/search/wanted/2015-58807',
'https://www.interpol.int/notice/search/wanted/2015-58803',
'https://www.interpol.int/notice/search/wanted/2015-62307']
FOr now I am trying to just obtain response fro each link:
unsent_request = (grequests.get(url) for url in final_links)
results = grequests.map(unsent_request)
The first couple of results are responses 200 but then most of them (not all though) are 403. Is it just the Interpol server that doesn't allow that or it's me doing something wrong (am I too greedy?:) )? When I go one by one with requests
, it works fine.