Multiprocessing hanging with requests.get

Question

I have been working with a very simple bit of code, but the behavior is very strange. I am trying to send a request to a webpage using requests.get, but if the request takes longer than a few seconds, I would like to kill the process. I am following the response from the accepted answer here, but changing the function body to include the request. My code is below:

import multiprocessing as mp, requests
def get_page(_r):                   
  _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
  _r.put(_rs)

q = mp.Queue() 
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()
try:
   result = q.get(False)
   print(result)
except:
   print('failed')

The code above simply hanges when running it. However, when I run

requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text

independently, a response is returned in under two seconds. Therefore, main code should print the page's HTML, however, it just stalls. Oddly, when I put an infinite loop in get_page:

def get_page(_r): 
  while True:
     pass
  _r.put('You will not see this')

the process is indeed terminated after three seconds. Therefore, I am certain the behavior has to do with requests. How could this be? I discovered a similar question here, but I am not using async. Could the issue have to do with monkey patching, as I am using requests along with time and multiprocessing? Any suggestions or comments would be appreciated. Thank you!

I am using:

Python 3.7.0
requests 2.21.0

Edit: @Hitobat pointed out that a param timeout can be used instead with requests. This does indeed work, however, I would appreciate any other ideas pertaining to why the requests is failing with multiprocessing.

Have you considered using the timeout param for requests.get? — Hitobat, Sep 07 '19 at 19:06
@Hitobat Thank you, that does indeed work. I had never heard of that parameter before :). If you would like to post that suggest as a complete answer, I can certainly upvote it, but I am curious to know from other users why the original attempt with `multiprocessing` failed. — Ajax1234, Sep 07 '19 at 19:14

RomanPerekhrest · Accepted Answer · 2019-09-07T20:00:43.367

I have reproduced your scenario and I have to refute the mentioned supposition "I am certain the behavior has to do with requests".
requests.get(...) returns the response as expected.

Let see how the process goes with some debug points:

import multiprocessing as mp, requests
import time


def get_page(_r):
    _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
    print('--- response header', _rs[:17])
    _r.put(_rs)


q = mp.Queue()
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()

try:
    print('--- get data from queue of size', q.qsize())
    result = q.get(False)
    print(result)
except Exception as ex:
    print('failed', str(ex))

The output:

--- response header 
<!DOCTYPE html>
--- get data from queue of size 1

As we see the response is there and the process even advanced to try block statements but it hangs/stops at the statement q.get() when trying to extract data from the queue. Therefore we may conclude that the queue is likely to be corrupted. And we have a respective warning in multiprocessing library documentation (Pipes and Queues section):

Warning

If a process is killed using Process.terminate() or os.kill() while it is trying to use a Queue, then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on.

Looks like this is that kind of case.

How can we handle this issue?

A known workaround is using mp.Manager().Queue() (with intermediate proxying level) instead of mp.Queue:

...
q = mp.Manager().Queue()
p = mp.Process(target=get_page, args=(q,))

Multiprocessing hanging with requests.get

1 Answers1