I have a service running on a local server, written using Python threading library. Think of it as a kind of web crawler. It uses 50 threads. I want deploy it on Amazon Web Services cloud and scale it up, so it uses more threads.
Simply, I have two queues: Qinput
with URLs and Qoutput
with pages content. The threads pick URLs from Qinput
, fetch content of the web page an put it to Qoutput
Question: is it enough that I simply increase the number of threads to, say, 500, 5,000 or 50,000 and AWS + Python will handle it? Should I expect the service to run seamlessly or there are some "standard" design pitfalls that I should be aware of when porting a multithreading service on AWS?
I am aware of Global Interpreter Lock although it should not be an issue here, as the main task of the threads is to call outside the interpreter while crawling / scraping pages