5

We have a queue of jobs and workers process these jobs one at a time. Each job requires us to format some data and issue an HTTP POST request, with the data as the request payload.

How can we have each worker issue these HTTP POST requests asynchronously in a single-threaded, non-blocking manner? We don't care about the response from the request -- all we want is for the request to execute as soon as possible and then for the worker to immediately move onto the next job.

We have explored using gevent and the grequests library (see Why does gevent.spawn not execute the parameterized function until a call to Greenlet.join?). Our worker code looks something like this:

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.post, url, params=params)

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()

The first print statement executes, but the second and third print statements never get printed and the url is never hit.

How can we get these asynchronous requests to execute?

Community
  • 1
  • 1
David Kravitz
  • 96
  • 1
  • 4
  • There is a standard lib called [asyncore](http://docs.python.org/2/library/asyncore.html) but it is maybe too low-level for your use case. – lucasg Apr 03 '13 at 07:29
  • I'd have to agree with @georgesl on this one, asyncore would be a great place to migrate because it will give you better flexebility over your application for later development. Also, `http://stackoverflow.com/questions/15753901/python-asyncore-client-socket-can-not-determaine-connection-status/15754244#15754244` here's a good start and example of how it can be used (see the answer to my question). If not, you'd have to actually do it in multiple processes, even the "sub" libraries of python will most likely thread it for you if can send requests paralell, that's the thing about multi-process – Torxed Apr 03 '13 at 07:44
  • Your gevent code looks okay (and a quick test tells me it works just fine; I use gevent 1.0b3). I guess it depends on the context in which `execute_task` is called. – robertklep Apr 03 '13 at 07:58
  • May i ask if you really need `gevent`? It's always a calculated risk to use non-standard libraries since they might be version dependent, require more development in next releases or lack functions later on while standard libraries doesn't change :) Just a thought now when i read your comment about versions etc – Torxed Apr 03 '13 at 08:04

4 Answers4

1

1) make a Queue.Queue object

2) make as many "worker" threads as you like which loop and read from the Queue.Queue

3) feed the jobs onto the Queue.Queue

The worker threads will read off the Queue.Queue in the order they are placed on it

example that reads lines from a file and puts them in a Queue.Queue

import sys
import urllib2
import urllib
from Queue import Queue
import threading
import re

THEEND = "TERMINATION-NOW-THE-END"


#read from file into Queue.Queue asynchronously
class QueueFile(threading.Thread):
    def run(self):
        if not(isinstance(self.myq, Queue)):
            print "Queue not set to a Queue"
            sys.exit(1)
        h = open(self.f, 'r')
        for l in h:
            self.myq.put(l.strip())  # this will block if the queue is full
        self.myq.put(THEEND)

    def set_queue(self, q):
        self.myq = q

    def set_file(self, f):
        self.f = f

An idea of what a worker thread might be like (example only)

class myWorker(threading.Thread):
    def run(self):
        while(running):           
            try:
                data = self.q.get()  # read from fifo

                req = urllib2.Request("http://192.168.1.10/url/path")
                req.add_data(urllib.urlencode(data))
                h1 = urllib2.urlopen(req, timeout=10)
                res = h1.read()
                assert(len(res) > 80)

            except urllib2.HTTPError, e:
                print e

            except urllib2.URLError, e:
                print "done %d reqs " % n
                print e
                sys.exit()

To make the objects based on threading.Thread go, create the object then call "start" on the instance

Vorsprung
  • 32,923
  • 5
  • 39
  • 63
1

You'd have to run it in different threads or use the built-in asyncore library. Most libraries will utelize threading without you even knowing, or it will rely on asyncore which is a standard part of Python.

Here's a combination of Threading and asyncore:

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import asyncore, socket
from threading import *
from time import sleep
from os import _exit
from logger import *  # <- Non-standard library containing a log function
from config import *  # <- Non-standard library containing settings such as "server"

class logDispatcher(Thread, asyncore.dispatcher):
    def __init__(self, config=None):
        self.inbuffer = ''
        self.buffer = ''
        self.lockedbuffer = False
        self.is_writable = False

        self.is_connected = False

        self.exit = False
        self.initated = False

        asyncore.dispatcher.__init__(self)
        Thread.__init__(self)

        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            self.connect((server, server_port))
        except:
            log('Could not connect to ' + server, 'LOG_SOCK')
            return None

        self.start()

    def handle_connect_event(self):
        self.is_connected = True

    def handle_connect(self):
        self.is_connected = True
        log('Connected to ' + str(server), 'LOG_SOCK')

    def handle_close(self):
        self.is_connected = False
        self.close()

    def handle_read(self):
        data = self.recv(8192)
        while self.lockedbuffer:
            sleep(0.01)

        self.inbuffer += data


    def handle_write(self):
        while self.is_writable:
            sent = self.send(self.buffer)
            sleep(1)

            self.buffer = self.buffer[sent:]
            if len(self.buffer) <= 0:
                self.is_writable = False
            sleep(0.01)

    def _send(self, what):
        self.buffer += what + '\r\n'
        self.is_writable = True

    def run(self):
        self._send('GET / HTTP/1.1\r\n')

while 1:
    logDispatcher() # <- Initate one for each request.
    asyncore.loop(0.1)
    log('All threads are done, next loop in 10', 'CORE')
    sleep(10)

Or you could simply do a thread that does the job and then dies.

from threading import *
class worker(Thread):
    def __init__(self, host, postdata)
        Thread.__init__(self)
        self.host = host
        self.postdata = postdata
        self.start()
    def run(self):
        sock.send(self.postdata) #Pseudo, create the socket!

for data in postDataObjects:
    worker('example.com', data)

If you need to limit the number of threads (if you're sending over 5k posts or so it might get taxing on the system) just do a while len(enumerate()) > 1000: sleep(0.1) and let the looper object wait for a few threads to die out.

Torxed
  • 22,866
  • 14
  • 82
  • 131
1

You may want to use the join method instead of sleep and then checking the status. If you want to execute one at a time that will solve the problem. Modifying your code slightly to test it seems to work fine.

import gevent
import requests

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.get, 'http://example.com', params={})

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()
    print greenlet.get()

execute_task(None, None)

Gives the results:

About to spawn request
Request spawned, about to call sleep
Greenlet status:  True
<Response [200]>

Is there more going on in this Python process that could be blocking Gevent from running this greenlet?

Philip Cristiano
  • 904
  • 5
  • 11
0

wrap your url and params in a list, then pop one pair once a time to the task pool(the task pool here either has one task or is empty), create threads, read the task from the task pool, when one thread get the task and send the request, then pop out another one from your list(i.e. this is actually a queue list)

Roger Liu
  • 1,768
  • 3
  • 16
  • 25