5

My understanding of the background of this question:

  • The GIL limits python to one thread running at a time.
  • Because of the GIL, multithreading long calculations is not useful
  • Threading can still be useful
  • Threading may be useful with I/O operations

Therefore my question is:

How would the GIL affect the downloading of a requested webpage? Would making parallel webpage request be a good use of python threading? Because downloading a webpage is an I/O operation, would this mean that threading is useful?

I would imagine that one thread would make a request > another thread would get passed control at some point and make its own request > another thread would get passed control, etc. And then data would start streaming in, but how would this be handled? Would downloads get interrupted? I suppose I am lacking the low-level understanding of response handling by the OS, the python interpreter, and the OS.

ADJenks
  • 2,973
  • 27
  • 38

1 Answers1

5

The GIL won't hurt you here.

For I/O bound tasks (like downloading webpages), the GIL is not a problem. Python releases the GIL when I/O is happening, which means all the threads will be able execute the requests in parallel. Whenever you're doing processing of the downloaded pages, this is where the GIL can hurt you.

You're right about the general rule of thumb: you can do I/O and the GIL doesn't hurt you, but with processor-bound tasks, you should try to use multiprocessing instead.

For more info about the GIL, you can check out David Beazley's talk

Community
  • 1
  • 1
Cody Piersall
  • 8,312
  • 2
  • 43
  • 57
  • I actually read that talk. I just don't have a strong computer science background so I wanted to confirm my understanding with the community. Thank you very much, I will multithread my requests. I originally tried to use multiprocessing with the "Pool" function, but the requests were so small that it wasn't worth the effort of starting up a new process for each of them. – ADJenks Aug 27 '15 at 18:43
  • However I would also like to understand how the OS and the interpreter manage to download multiple files simultaneously while not locking the GIL. This is not clear to me. – ADJenks Aug 27 '15 at 18:47
  • @adjenks The GIL is short for "Global Interpreter Lock". It is a global variable in the Python interpreter. When a thread is executing Python code, it "acquires" the lock so that other threads cannot change things that should not be changed at that point in the program. Threads release the lock whenever they do I/O, and when the lock is released multiple threads can execute code at the same time. The OS doesn't know anything about the GIL, and it uses fancier things than I understand to get lots of I/O done at a time. Sorry I can't tell you more about the OS! – Cody Piersall Aug 27 '15 at 19:18
  • No problem. You've helped plenty! – ADJenks Aug 27 '15 at 20:45