0

I have a python application that uses eventlet Green thread (pool of 1000 green threads) to make HTTP connections. Whenever the client fired more than 1000 parallel requests ETIMEDOUT occurs. Can anyone help me out with the possible reason?

Avv
  • 555
  • 1
  • 10
  • 18

1 Answers1

0

Most likely reason in this case: DNS server request throttling. You can easily check if that's the case by eliminating DNS resolving (request http://{ip-address}/path, don't forget to add proper Host: header). If you do web crawling these steps are not optional, you absolutely must:

  • control concurrency automatically (without human action) based on aggregate (i.e. average) execution time. This applies at all levels independently. Back off concurrent DNS requests if you get DNS responses slower. Back off TCP concurrency if you get response speed (body size / time) slower. Back off overall request concurrency if your CPU is overloaded - don't request more than you can process.
  • retry on temporary failures, each time increase wait-before-retry period, search backoff algorithm. How to decide if an error is temporary? Mostly research, trial and error.
  • run local DNS server, find and configure many upstreams

Next popular problem with high concurrency that you'll likely face is OS limit of number of open connections and file descriptors. Search sysctl somaxconn and ulimit nofile to fix those.

temoto
  • 5,394
  • 3
  • 34
  • 50
  • I'm already using this http://{ip-address}:{port}/path -X {method} to make request. – Avv Mar 24 '17 at 07:42