0

I wrote a small script for checking proxies:

async def proxy_check(session, proxy):
    global good_proxies
    proxy_str = f'http://{proxy}'
    async with semaphore:
        try:
            async with session.get(host, proxy=proxy_str, timeout=10) as r:
                if r.status == 200:
                    resp = await r.json()
                    if resp['ip'] == proxy:
                        good_proxies.append(proxy)
                        proxies.remove(proxy)
        except Exception:
            logging.exception(proxy)
            proxies.remove(proxy)


async def main():
    async with aiohttp.ClientSession() as session:
        tasks = []
        for proxy in proxies:
            tasks.append(asyncio.create_task(proxy_check(session, proxy)))
        await asyncio.gather(*tasks)  

But when I run it, I get one of these errors:

aiohttp.http_exceptions.BadHttpMessage: 400, message='invalid constant string' aiohttp.client_exceptions.ClientResponseError: 400, message='invalid constant string' concurrent.futures._base.TimeoutError

There are almost 20,000 proxies in my list and this script does not connect through all these proxies. Not one proxy does not work in this script.

But if you do this:

proxy = {'http': f'http://{proxy}'}
r = requests.get(url, proxies=proxy)

That everything works. A lot of proxies work. What i'm doing wrong?

Felix Quehl
  • 744
  • 1
  • 9
  • 24
kshnkvn
  • 876
  • 2
  • 18
  • 31
  • Are you sure that your protocol prefix (http) is correct for the proxy you try to use? Maybe you need to use https? – Felix Quehl Sep 15 '19 at 17:06
  • @FelixQuehl but it works with requests. Also as far as I know aiohttp does not support https – kshnkvn Sep 15 '19 at 17:08
  • okay, you are right. So as far as I can see the only thing that varies between the these two requests is that you are using a session object. Do use any argument on the session's object constructor? – Felix Quehl Sep 15 '19 at 17:24
  • @FelixQuehl no. I run code exactly as you see it – kshnkvn Sep 15 '19 at 17:25
  • Could you please rewrite your code so that your collection "proxies" is not altered during the iteration and test again? Maybe you having some kind of a race case condition there. – Felix Quehl Sep 15 '19 at 17:31
  • 1
    @FelixQuehl omg you are right! Write an answer and i mark it. Ty! – kshnkvn Sep 15 '19 at 17:38
  • I wrote a short summary and posted it as an answer. Let me know if you are able to fix your code ;) otherwise i will take another look at it. – Felix Quehl Sep 15 '19 at 18:00

1 Answers1

2

The collection proxies is iterated within your main method. It's elements processed in parallel by multiple tasks. This is fine so far but within the processing function, you are altering the collection you are intreating on. This results in a race condition causing corruption of the collection you are iterating on.

  1. You should never alter a collection you are intreating on.
  2. If you have code altering a shared resource in parallel you need to use a mutual exclusion to make it thread-safe. You could use "Lock" in python 3.7.
Mayur
  • 4,345
  • 3
  • 26
  • 40
Felix Quehl
  • 744
  • 1
  • 9
  • 24