3

From python threading documentation

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Now I have a thread worker like this

def worker(queue):
    queue_full = True
    while queue_full:
        try:
            url = queue.get(False)
            w = Wappalyzer(url)
            w.analyze()
            queue.task_done()

        except Queue.Empty:
            queue_full = False

Here w.analyze() doing two things

  1. Scrape the url using requests library
  2. Analyzing the scraped html using pyv8 javascript library

As far as I know, 1 is I/O bound and 2 is CPU bound.

Does that mean, GIL applied for 2 and my program won't work properly?

PrivateUser
  • 4,474
  • 12
  • 61
  • 94
  • I would check an assumption here. I think `requests` is CPU bound, or at least it locks its thread until the request completes. For a callback-capable library, I would check out [requests-futures](https://github.com/ross/requests-futures). – huu May 09 '14 at 21:42
  • 4
    That's incorrect. `requests` (and `urllib`, `httplib2`, etc) are all very much I/O bound. `threading` speeds all of them up. – roippi May 09 '14 at 21:46
  • 1
    @Huu Just because something locks the thread doesn't mean it's CPU bound. If you put `sleep(1000)` into a thread the thread will be blocked for some time, but it won't do any work and will release the GIL in between. Same goes for any other kind of IO request. – Voo May 09 '14 at 23:03
  • Does that mean that even if a particular python function is I/O bound, but the author of the function/module did not block(release the GIL), then a multi-threading solution will not have any performance gain? Or is it not possible for an I/O bound program to not release the GIL, how does that work? – LSM Apr 22 '23 at 14:37

1 Answers1

5

The GIL description does not say anything about correctness, only about efficiency.

If 2 is CPU bound, you will not be able to get multicore performance out of threading, but your program will still perform correctly.

If you care about CPU Parallelism, you should use Python's multiprocessing library.

merlin2011
  • 71,677
  • 44
  • 195
  • 329